This is a summary of a talk I gave during Gophercon Vietnam & Gophercon India. If you have seen this talk already, you might find the concepts crystallized after reading the blog post. You can find the slides here.
I have split this blog post, similar to the way I had structured the talk:
In this walk-through, I will be talking about a simple program to scrape from Amazon's product listing page. All the code is available here.
Here, we are scraping data into a struct
which looks like:
type Product struct {
Name string
Link string
Image string
Price string
Reviews []Review
}
In this program, I am using a simple scraping & parsing library called soup
.
The algorithm can be summarized as:
Open up the product listing page
Parse the products
link
In the product's page,
Parse the reviews
As a result, the snippet of code below,
produces the following output:
{
"Name": "Sanyo 108.2 cm (43 inches) Full HD IPS LED TV XT-43S7100F (Black)",
"Link": "https://www.amazon.in/Sanyo-108-2-inches-XT-43S7100F-Black/dp/B01ICVLK4S/ref=lp_1389396031_1_5/260-5276449-2035623?s=electronics&ie=UTF8&qid=1540385124&sr=1-5",
"Image": "https://images-eu.ssl-images-amazon.com/images/I/51F0yKjVX1L._AC_US218_.jpg",
"Price": "17,990",
"Reviews": [
{
"Name": "Cooper Vrf",
"Rating": "5.0 out of 5 stars",
"Content": "Much has been said about the clarity and performance of the TV. Yes its a great value for money product, but here I am going to say about the customer service part. After using my 49 inch TV for about 21 months one day my TV went black. I could hear only the sound but no picture. I emailed the Sanyo customer care and within four hours a service person called me and fixed the appointment for very next day. The next day he inspected the TV and told that the panel has gone dead and they will have to change the panel. As my TV was under 2 year warranty he assured me that the company will possibly replace the panel and it could be done within 10 days. I was very much skeptical about the panel replacement as the next day I received an email from the company saying that they will be taking all possible actions to solve my query but no mention of panel replacement. After about a week I received a call from their service center in Bilimora that my tv has arrived and the service person is coming to install it. Even though it was raining heavily on that day he came right on time and to my amazement he brought out from his car a whole new TV box. Believe me the company has replaced a whole new tv in place of the old one, only the two boards from my old tv were fixed on the new one. Really hats off to the Company's customer service and special mention to their service centre in Bilimora which got the job done on time even in heavy rainfall and even though my hometown is about 25 kms away. Within the next two hours I received three calls from the company regarding my query solved. Thanks Sanyo..... Really satisfied on buying your company's TV."
},
{
"Name": "Debasish Sen",
"Rating": "4.0 out of 5 stars",
"Content": "Great clarity. Decent sound. A cracking deal at this price."
},
{
"Name": "Naziruddin",
"Rating": "4.0 out of 5 stars",
"Content": "A good tv for the exchange price.Good reception n colours.Gurantee card does not say 10 years tv panel warranty but mentioned on the web site of the Tv,hopefully it's true. Worth buying."
}
]
}
The above code has room for out-of-order execution, this means we could use some of Go's concurrency paradigms like channels and goroutines to make the code run faster. This is what the code looks like now:
Go has two channel implementations:
ch := make(chan Product, 3)
ch := make(chan Product)
They both make use of the same hchan
struct under the hood.
hchan
struct?The information in a hchan struct can be broadly grouped as:
Let's look at each of them in detail:
elemsize
Represents the size of a single element in memory in bytes. For eg, our Product
struct takes 88 bytes (string
takes 16 bytes, Review
takes 24 bytes) in the stack. Type int
takes 8 bytes.
elemtype
Used when messages are copied over from one goroutine to the other. It has a bunch of fields which provide type and size information for the type of values the channel can hold. There is a bit of overlap with elemsize
.
buf
This is a circular queue. With a buffered channel of size 3, ch := make(chan Product, 3)
, buf will look as follows:
Essentially make(chan Product, 1) != make(chan Product)
, because when channel size is defined a circular queue implementation is used internally to hold the elements of a channel.
recvq & sendq
Before we deep dive into these two fields, let's talk about sudog
struct:
sudog
represents a goroutine. recvq
& sendq
hold the information on the goroutines which are currently blocking on the channel while receiving and sending respectively. They are pointers to sudog
.
Taking recvq
as an example:
sudog
is designed as a doubly linked list. It has pointers to the prev
and next
goroutines which are blocking on the channel. It also contains an unsafe pointer to elem
, the element which the goroutine is currently blocking on.
As and when the channel is ready, the next goroutine to run is picked from the recvq
and sendq
by the Go's scheduler.
dataqsiz
This represents the entire buffer size of the channel, as specified by the user. Essentially, the queue capacity.
qcount
This represents the number of slots in the circular queue currently filled up.
sendx & recvx
These are indices on the circular queue, buf
to figure out where to next dequeue from (recvx) or enqueue to (sendx).
Let's look at an alternative representation of the buffered channel and let's go through a few scenarios to understand the behavior of sendx
and recvx
.
When the channels is empty:
After an enqueue happens:
After the channel becomes full, note the dataqsiz
& qcount
become equal. All sends to this channel will block at this point.
Let's dequeue one element:
closed
This is a flag which shows whether the channel is currently closed or not. A channel can be closed by any of the goroutines which have access to the channel.
lock
This is a mutex which controls access to the channels. Channels are goroutine safe due to the use of this lock. It controls receives and sends to the channel, ensuring no race condition happens.
This was all you need to know to under the following demo.
Please watch the following video to catch channels live in action.
FIFO (first-in, first-out) - recvq/sendq
Do not communicate by sharing memory; instead, share memory by communicating.1
Thread Goroutine safe
First-class citizens!