Summary
InPrevious chapterIn, the data structure section explains a lot of basic fields, and you may wonder if #&(! ……#(! ¥! What are you doing here? Let’s take a look at the basic concepts together. Let’s start discussing the key points of today’s article. I believe that in this way you can better understand this article.
Original address:In-depth understanding of Go map: assignment and expansion migration
Hash function
Hash function, also known as hash algorithm and hash function. The main function is to regenerate the data according to a certain rule combination through a specific algorithm to obtain onehashed value
In hash table, the hash value generated by it is often used to find which bucket its key maps to. A good hash function should have as few hash conflicts as possible to ensure the time complexity of hash table operation (but hash conflicts are inevitable at present. We need to “solve” it)
Chain address method
In hash operation, one of the core processing actions is the resolution of “hash conflict”. In Go map, “chain address method” is used to solve hash conflicts, also known as “zipper method”. The main method is the data structure of array+linked list, and the storage memory of overflow nodes is dynamically applied, so it is relatively more flexible. Each element is a linked list. The following figure:
Barrel/overflow barrel
type hmap struct {
...
buckets unsafe.Pointer
...
extra *mapextra
}
type mapextra struct {
overflow *[]*bmap
oldoverflow *[]*bmap
nextOverflow *bmap
}
In the previous chapter, we introduced the concepts of bucket and overflow bucket in Go map, in which only 8 key-value pair elements can be stored. When the number exceeds 8, the overflow bucket will be used for storage or expansion.
You may have questions, what if hint is greater than 8? The answer is obvious: the time complexity of the performance problem changes (that is, there is a problem with the execution efficiency)
Preface
After a brief review, we will discuss the other three core behaviors of Go map: assignment, expansion and migration. Let’s officially start our discussion tour:)
Assignment
m := make(map[int32]string)
m[0] = "EDDYCJY"
the function prototype
In the assignment of map, there are still different conversion processes for 32/64 bit, string and pointer types. The total function prototype is as follows:
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
func mapaccess1_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer
func mapaccess2_fast32(t *maptype, h *hmap, key uint32) (unsafe.Pointer, bool)
func mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer
func mapassign_fast32ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
func mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer
func mapaccess2_fast64(t *maptype, h *hmap, key uint64) (unsafe.Pointer, bool)
func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer
func mapassign_fast64ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
func mapaccess1_faststr(t *maptype, h *hmap, ky string) unsafe.Pointer
func mapaccess2_faststr(t *maptype, h *hmap, ky string) (unsafe.Pointer, bool)
func mapassign_faststr(t *maptype, h *hmap, s string) unsafe.Pointer
...
Next, we will divide it into several parts to see what the bottom layer has done when assigning values.
Source code
Phase 1: Checksum Initialization
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
if h == nil {
panic(plainError("assignment to entry in nil map"))
}
...
if h.flags&hashWriting != 0 {
throw("concurrent map writes")
}
alg := t.key.alg
hash := alg.hash(key, uintptr(h.hash0))
h.flags |= hashWriting
if h.buckets == nil {
h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}
...
}
- Judge whether hmap has been initialized (nil)
- Judge whether to read and write map concurrently, if yes, throw an exception
- According to different types of key, calling different hash methods to calculate hash values
- Set flags to indicate that a goroutine is writing data. Because ..
alg.hash
It’s possiblepanic
Cause anomaly - Judge whether buckets is nil, if yes, call
newobject
Allocate according to current bucket size (e.g. as mentioned in the previous sectionmakemap_small
Method, there is no initial buckets at initialization, then it will allocate buckets at the first assignment)
The second stage: finding insertable bits and updating existing values
...
again:
bucket := hash & bucketMask(h.B)
if h.growing() {
growWork(t, h, bucket)
}
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
top := tophash(hash)
var inserti *uint8
var insertk unsafe.Pointer
var val unsafe.Pointer
for {
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
if b.tophash[i] == empty && inserti == nil {
inserti = &b.tophash[i]
insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey {
k = *((*unsafe.Pointer)(k))
}
if !alg.equal(key, k) {
continue
}
// already have a mapping for key. Update it.
if t.needkeyupdate {
typedmemmove(t.key, k, key)
}
val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
goto done
}
ovf := b.overflow(t)
if ovf == nil {
break
}
b = ovf
}
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
goto again // Growing the table invalidates everything, so try again
}
...
- The memory address of bucket is calculated according to the lower eight bits, and whether expansion is in progress is judged; if expansion is in progress, migration is carried out first and then processing is carried out.
- The bmap pointer address of bucket is calculated and obtained, and the upper eight bits of Key hash are calculated to find the key
- Iterative buckets (8 Buckets in total) for comparison
bucket.tophash
Is it consistent with top - If not, judge whether it is an empty slot. If the slot is empty (there are two cases, the first isNot inserted. The second isDelete after insertion), the location is identified as insertable tophash location. Note that this is the first place where data can be inserted.
- Skip if key does not match current k. However, if there is a match (that is, it already exists), the update is made. Finally jump out and return the value’s memory address
- Judging whether the iteration is finished, if so, ending the iteration buckets and updating the current bucket position
- If three conditions are met: maximum trigger
LoadFactor
There are too many overflow barrelsoverflow buckets
No expansion is in progress. The expansion action will be carried out (to ensure subsequent actions)
In general, this piece of logic has done two important things. The first isFind a vacancy and record the position for subsequent insertion.. The second isJudging whether the Key already exists in the hash table, and updating if so. In the second scenario, after the update is completed, the ending action will be performed, and the first scenario will continue to execute the following code
The third stage: applying for new insertion bits and inserting new values
...
if inserti == nil {
newb := h.newoverflow(t, b)
inserti = &newb.tophash[0]
insertk = add(unsafe.Pointer(newb), dataOffset)
val = add(insertk, bucketCnt*uintptr(t.keysize))
}
if t.indirectkey {
kmem := newobject(t.key)
*(*unsafe.Pointer)(insertk) = kmem
insertk = kmem
}
if t.indirectvalue {
vmem := newobject(t.elem)
*(*unsafe.Pointer)(val) = vmem
}
typedmemmove(t.key, insertk, key)
*inserti = top
h.count++
done:
...
return val
After the previous iteration search operation, if no insertable position is found, which means that all barrels are full, a new overflow barrel will be reassigned for the insertion operation. Finally, at the new insertion position applied in the previous step, the key-value pair is stored and the memory address of the value is returned
Phase IV: Writing
But is there any doubt here? The last reason is to return the memory address. This is because the last hidden write operation (copying the value to the specified memory area) is completed through the cooperation of the underlying assembly, and only most of the operations are completed in the runtime.
func main() {
m := make(map[int32]int32)
m[0] = 6666666
}
The corresponding assembly part:
...
0x0099 00153 (test.go:6) CALL runtime.mapassign_fast32(SB)
0x009e 00158 (test.go:6) PCDATA $2, $2
0x009e 00158 (test.go:6) MOVQ 24(SP), AX
0x00a3 00163 (test.go:6) PCDATA $2, $0
0x00a3 00163 (test.go:6) MOVL $6666666, (AX)
Here is divided into several parts, mainly callmapassign
Function and get the memory address where the value is stored, and then store the value 666666666 into the memory address. In addition, we see thatPCDATA
Instructions, mainly containing some garbage collection information, are generated by the compiler.
Summary
Through the analysis of the previous stages, we can sort out some key points. For example:
- Different types of corresponding hash functions are different
- The upper eight bits are used to locate the bucket.
- The lower eight bits are used to locate the key, and complete comparison will be made after quick trial and error.
- Buckets/overflow buckets traversal
- Processing of Insertable Bits
- The interaction between the final write action and the underlying assembly
Expansion
In all actions, the expansion rule is the point that everyone pays more attention to and is also a very important part of the assignment. So let’s take this section out and discuss the details.
When will the expansion take place
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
goto again
}
Under certain conditions and currently no expansion is in progress (to judgehmap.oldbuckets ! = nil
As a benchmark). Hash table will trigger expansion behavior under the actions of assignment and deletion under the following conditions:
- Trigger
load factor
The load factor has reached the current limit - Spill bucket
overflow buckets
Too much
When will it be affected
Under what circumstances will these two “values” be affected? As follows:
- Load factor
load factor
The purpose is to evaluate the current time complexity of the hash table, which is related to the number of keys and buckets currently contained in the hash table. If the load factor is larger, the space usage is higher, but the probability of hash collision is higher. However, the smaller the load factor, the lower the space usage rate and the lower the probability of hash collision. - Spill bucket
overflow buckets
The determination of is associated with the total number of buckets and the total number of overflow buckets.
Factor relation
loadFactor | %overflow | bytes/entry | hitprobe | missprobe |
---|---|---|---|---|
4.00 | 2.13 | 20.77 | 3.00 | 4.00 |
4.50 | 4.05 | 17.30 | 3.25 | 4.50 |
5.00 | 6.85 | 14.77 | 3.50 | 5.00 |
5.50 | 10.55 | 12.94 | 3.75 | 5.50 |
6.00 | 15.27 | 11.67 | 4.00 | 6.00 |
6.50 | 20.90 | 10.79 | 4.25 | 6.50 |
7.00 | 27.14 | 10.15 | 4.50 | 7.00 |
- LoadFactor: load factor
- %overflow: overflow rate, with overflow bucket
overflow buckets
Percentage of barrels - Bytes/entry: Cost of Bytes Per Key Value Pair
- Hitprobe: the average number of entries to retrieve when searching for existing keys
- Missprobe: the average number of entries to retrieve when searching for a key that does not exist.
This set of data can show how different load factors will affect hash table actions. In the previous chapter, we mentioned that the default load factor is 6.5 (loadfactor num/loadfactor den), which can be seen as a reasonable factor taken out after testing. The timing of the expansion action of the hash table can be better influenced
Source code analysis
func hashGrow(t *maptype, h *hmap) {
bigger := uint8(1)
if !overLoadFactor(h.count+1, h.B) {
bigger = 0
h.flags |= sameSizeGrow
}
oldbuckets := h.buckets
newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
...
h.oldbuckets = oldbuckets
h.buckets = newbuckets
h.nevacuate = 0
h.noverflow = 0
if h.extra != nil && h.extra.overflow != nil {
if h.extra.oldoverflow != nil {
throw("oldoverflow is not nil")
}
h.extra.oldoverflow = h.extra.overflow
h.extra.overflow = nil
}
if nextOverflow != nil {
if h.extra == nil {
h.extra = new(mapextra)
}
h.extra.nextOverflow = nextOverflow
}
// the actual copying of the hash table data is done incrementally
// by growWork() and evacuate().
}
The first stage is to determine the rules for capacity expansion.
There are two bases for expansion mentioned in the previous section, namelyhashGrow
The division was made at the beginning. As follows:
if !overLoadFactor(h.count+1, h.B) {
bigger = 0
h.flags |= sameSizeGrow
}
If it is not a load factorload factor
Beyond the current limit, that is, it belongs to the overflow bucket.overflow buckets
Too many cases. Therefore, this expansion rule will besameSizeGrow
That isExpansion without changing size. What about the former?
bigger := uint8(1)
...
newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
According to code analysis, if the load factorload factor
Reaching the current limit will result in dynamic expansion.Double the current sizeAs its new capacity size
The second stage: initialization, exchange of new and old barrels/overflow barrels
Mainly for expansion of relevant dataPreprocessing, involving storage-related fields such as buckets/oldbuckets, overflow/oldoverflow
...
oldbuckets := h.buckets
newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
flags := h.flags &^ (iterator | oldIterator)
if h.flags&iterator != 0 {
flags |= oldIterator
}
h.B += bigger
...
h.noverflow = 0
if h.extra != nil && h.extra.overflow != nil {
...
h.extra.oldoverflow = h.extra.overflow
h.extra.overflow = nil
}
if nextOverflow != nil {
...
h.extra.nextOverflow = nextOverflow
}
Note this code here:newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
. The first reaction is to apply for and initialize the memory immediately when expanding the capacity? Assuming a large amount of memory allocation is involved, that is quite performance-intensive. …
However, it is not true that the internal will only be pre-allocated first, and will be de-initialized when it is used.
Phase III: Expansion
In the source code, it was found that the third stage of circulation was not explicitly displayed. This is because the circulation is controlled by the bottom layer. However, by analyzing the code and comments, we can know that the third stage involvesgrowWork
Andevacuate
Methods. As follows:
func growWork(t *maptype, h *hmap, bucket uintptr) {
evacuate(t, h, bucket&h.oldbucketmask())
if h.growing() {
evacuate(t, h, h.nevacuate)
}
}
In this method, there are mainly twoevacuate
The call of the function. What is the difference between them in calling? As follows:
- Evacuate (t, h, bucket & h oldbucketmask ()): migrate elements in oldbucket to rehash to the expanded new bucket
- Evacuate (evacuate(t, h, h.nevacuate): if expansion is currently in progress, perform another migration.
In addition, when the expansion is performed, it can be found that it is all based on bucket/oldbucket instead of the traditional buckets/oldbuckets. Combined with code analysis, we can know that in Go mapExpansion is an incremental expansion, not a one-step process.
为什么是增量扩容?
If it is full capacity expansion, then the problem arises. Assuming that the current hmap capacity is relatively large, direct full capacity expansion will cause the expansion to take a lot of time and memory, resulting in system stuck. The most intuitive performance is slow. Obviously, this cannot be done.
Incremental expansion can solve this problem. It allocates the total one-off actions through each map operation. Therefore, with the design of buckets/oldbuckets, it is completed step by step and will not be emptied until the expansion is completed.
Summary
Through the analysis of the previous three stages, we can know the general process of expansion. Let’s summarize it by stages. The main points are as follows:
- According to different reasons for expansion (overload factor/TOOMANYOVERFLOWBUCKETS), it can be divided into two types of regular capacity directions, equal expansion (without changing the original size) or double expansion
- Newbuckets/newoverflow are pre-allocated and will not be initialized until they are actually used.
- After the expansion is completed (pre-allocation), the migration will not take place immediately. But to takeIncremental expansion, when there is access to the specific bukcet, will gradually migrate (oldbucket to bucket)
At that time, it occurred to me that since the migration was carried out step by step. What if we need to expand the capacity on the way?
again:
bucket := hash & bucketMask(h.B)
...
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
goto again
}
It is noted here thatgoto again
Statement, combined with context, can be obtained. If expansion is in progress, it will be migrated continuously. The next expansion will not start until the migration is completed.
Migration
In the complete closed loop of expansion, there is a movement, also known as “relocation”. Therefore, we will continue our in-depth study.evacuate
Function. Next, let’s open the door to the world of migration. As follows:
type evacDst struct {
b *bmap
i int
k unsafe.Pointer
v unsafe.Pointer
}
evacDst
Is the underlying data structure in the migration and contains the following fields:
- B: current target barrel
- I: Number of Key Value Pairs Stored in Current Target Bucket
- K: memory address pointing to current key
- V: memory address pointing to current value
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
newbit := h.noldbuckets()
if !evacuated(b) {
var xy [2]evacDst
x := &xy[0]
x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
x.k = add(unsafe.Pointer(x.b), dataOffset)
x.v = add(x.k, bucketCnt*uintptr(t.keysize))
if !h.sameSizeGrow() {
y := &xy[1]
y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
y.k = add(unsafe.Pointer(y.b), dataOffset)
y.v = add(y.k, bucketCnt*uintptr(t.keysize))
}
for ; b != nil; b = b.overflow(t) {
...
}
if h.flags&oldIterator == 0 && t.bucket.kind&kindNoPointers == 0 {
b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
ptr := add(b, dataOffset)
n := uintptr(t.bucketsize) - dataOffset
memclrHasPointers(ptr, n)
}
}
if oldbucket == h.nevacuate {
advanceEvacuationMark(h, t, newbit)
}
}
- Calculate and get the bmap pointer address of oldbucket.
- Calculate the number of barrels before hmap increases
- Judge the current migration (relocation) status so that subsequent operations can be transferred. If no migration is in progress
! evacuated(b)
According to the different rules of expansion, when the rules are equal expansionsameSizeGrow
Only one is usedevacDst
Buckets are used for diversion. For double expansion, two will be used.evacDst
Carry out shunting operation - When the diversion is completed, the data to be migrated will pass through
typedmemmove
Function to the specified target bucket - If there is currently no flags to use, the oldbucket iterator is used, and bucket is not a pointer type. Then unlink overflow bucket, clear key value
- In the end.
advanceEvacuationMark
The migration progresshmap.nevacuate
Carry out cumulative counting and callbucketEvacuated
The old barrels of oldbuckets are continuously moved. Until all migration is completed. Then it means that the expansion has been completed and will be correcthmap.oldbuckets
Andh.extra.oldoverflow
Empty
In general, it is to calculate the location of the required data. According to the current migration status and expansion rules, data split migration is carried out. Clean up after completion to promote GC recovery
Summary
In this chapter, we mainly discuss several core actions of Go map, namely: “assignment, expansion and migration”. Through this reading, we can further realize some key points, such as:
- Will expansion be triggered when assigning values?
- What is the load factor? What problems will be brought about by too high a level? Will its changes affect hash table operations?
- What problems will the more overflow barrels bring?
- What are the basic conditions for expansion?
- What are the capacity rules for expansion?
- What are the expansion steps? What data structures are involved?
- Is expansion one-time or incremental?
- What should I do when I am expanding my capacity?
- What is the migration and diversion during expansion?
- What role does the bottom assembly play in the expansion? What have you done?
- When searching in buckets/overflow buckets, how is the value located “quickly”? What are the uses of the lower eight bits and the higher eight bits?
- Is it possible for empty slots to appear anywhere? Assuming there are no empty slots, but there are new values to insert, what will the bottom layer do
Finally, I hope that you can understand more clearly how Go map works through reading this article:)