Learn more about Go map: initializing and accessing elements

  Back end, golang, php

Starting from this article, let’s explore the secret of Go map together, and see how it is built inside and what is worth noticing respectively.

The first article will discussInitialize and access elementsRelevant sections, we take questions to learn, for example:

  • Will memory be allocated immediately during initialization?
  • How is the underlying data stored?
  • How does the bottom layer use key to find data?
  • How does the bottom layer resolve hash conflicts?
  • With so many data types, how is the bottom layer handled?

Original address:Learn more about Go map: initializing and accessing elements

Data structure

First of all, let’s look at the basic data structure of Go map and have a general impression.

image

hmap

type hmap struct {
    count     int 
    flags     uint8
    B         uint8
    noverflow uint16
    hash0     uint32
    buckets    unsafe.Pointer
    oldbuckets unsafe.Pointer
    nevacuate  uintptr 
    extra *mapextra
}

type mapextra struct {
    overflow    *[]*bmap
    oldoverflow *[]*bmap
    nextOverflow *bmap
}
  • Count: the size of the map, which is the value of len (). Represents the number of key-value pairs in map
  • Flags: status identification, mainly related to status control of the goroutine write and expansion mechanism. This value is one of the judging conditions for concurrent reading and writing.
  • B: Barrel, the maximum number of elements that can be accommodated, with a value ofLoad factor (default 6.5) * 2 bIs an index of 2
  • Noverflow: number of overflow barrels
  • Hash0: hash factor
  • Buckets: the pointer address that holds the current bucket data (points to a continuous memory address, mainly storing key-value pair data)
  • Oldbuckets, Keep Pointer Address of Old Buckets
  • Nevacuate: migration progress
  • Extra: After the original buckets are fully loaded, expansion will occur. Incremental expansion is used in the Go mechanism, as follows:

    • overflowForhmap.buckets(current) pointer address of overflow bucket
    • oldoverflowForhmap.oldbuckets(Old) Pointer Address of Spill Bucket
    • nextOverflowPointer address for free overflow bucket

Here we should pay attention to several points, as follows:

  1. If keys and values do not contain pointers and inlining is allowed. Bucket will be identified as containing no pointer. Using extra to store overflow bucket can avoid GC scanning the entire map and save unnecessary overhead
  2. As mentioned earlier, Go used incremental expansion. AndbucketsAndoldbucketsIt is also a carrier related to capacity expansion, which is usually only used.buckets,oldbucketsIt is empty. But if expansion is under way,oldbucketsIt is not empty,bucketsThe size of the will also change
  3. WhenhintWhen it is greater than 8, it will be used*mapextraMake an overflow bucket. If it is less than 8, it is stored in buckets bucket

bmap

image

bucketCntBits = 3
bucketCnt     = 1 << bucketCntBits
...
type bmap struct {
    tophash [bucketCnt]uint8
}
  • Top hash: the hash value of key is 8 bits higher.
  • Keys: 8 keys
  • Values: 8 values
  • Overflow: pointer address of next overflow bucket (when hash collision occurs)

The actual bmap is a bucket in buckets. A bucket stores up to 8 key-value pairs

tophash

Tophash is an array with a length of 8, and the maximum number of key-value pairs that a bucket can hold is 8.

Stores the upper 8 bits of each element hash value, iftophash [0] <minTopHash, thentophash [0]Indicates the migration progress

Keys and values

Here we notice that the carriers storing k and v are not usedk/v/k/v/k/v/k/vThe model, butk/k/k/k/v/v/v/vTo store in the form of. Why is this?

map[int64]int8

In this example, if according tok/v/k/v/k/v/k/vAlthough the value of each key-value pair occupies only 1 byte. However, it requires 7 padding bytes to make up the memory space. Eventually, a large amount of memory will be “wasted.”

image

But if you takek/k/k/k/v/v/v/vIf stored in the form of, can solve the “waste” of memory space due to alignment

Therefore, the split of this part mainly takes into account the memory alignment problem. Although it is relatively complicated, it is still worth such design.

image

overflow

Some students may wonder why there is such a thing as overflowing the bucket. In fact, in the case of no hash conflict, the overflow bucket is removed, that is, only the bucket, hash factor and hash algorithm are needed. A simple hash table can also be implemented. But hash collision is inevitable …

In Go maphmap.bucketsWhen it is full, the overflow bucket will be used for subsequent storage. We can confirm that Go uses array+chain address method to solve hash conflicts through analysis.

image

Initialization

usage

m := make(map[int32]int32)

the function prototype

According to the source code, there are several initialization methods. The prototype of the function is as follows:

func makemap_small() *hmap
func makemap64(t *maptype, hint int64, h *hmap) *hmap
func makemap(t *maptype, hint int, h *hmap) *hmap
  • Makemap_small: whenhintCalled when is less than 8makemap_smallTo initialize hmap. The main difference is whether hash table will be initialized immediately.
  • Makemap64: whenhintSpecial conversion and verification processing when the type is int64, and subsequent substantive callmakemap
  • Makemap: Implements standard map initialization actions

Source code

func makemap(t *maptype, hint int, h *hmap) *hmap {
    if hint < 0 || hint > int(maxSliceCap(t.bucket.size)) {
        hint = 0
    }

    if h == nil {
        h = new(hmap)
    }
    h.hash0 = fastrand()

    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B

    if h.B != 0 {
        var nextOverflow *bmap
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
}
  • According to the incomingbucketType to obtain the maximum capacity size that its type can apply for. And for its lengthmake(map[k]v, hint)Carry out boundary value inspection
  • Initialize hmap
  • Initialize hash factor
  • According to the incominghint, calculate a can put downhintA bucket of elementsBThe minimum value of
  • Assign and initialize hash table. IfBA value of 0 will allocate the bucket in the subsequent laziness, and a value greater than 0 will be allocated immediately.
  • Returns the initialized hmap

It can be noted here that (whenhintGreater than or equal to 8) the first time map is initializedmakeBucketArrayAllocate buckets. Therefore, we often say that an appropriate size of capacity is specified during initialization. Can improve performance.

If the capacity is too small and there are many new key-value pairs. It will lead to frequent allocation of buckets and rehash actions such as expansion and migration. The end result is a direct drop in performance (knocking on the blackboard)

And whenhintLess than 8, this kind of problemRelativeIt will not be too obvious, as follows:

func makemap_small() *hmap {
    h := new(hmap)
    h.hash0 = fastrand()
    return h
}

Icon

image

Access

usage

v := m[i]
v, ok := m[i]

the function prototype

There are several ways to implement map element access, mainly including special processing for 32/64-bit and string types. the general function prototype is as follows:

mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool)

mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer)

mapaccess1_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) unsafe.Pointer
mapaccess2_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) (unsafe.Pointer, bool)

mapaccess1_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer
mapaccess2_fast32(t *maptype, h *hmap, key uint32) (unsafe.Pointer, bool)
mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer
mapassign_fast32ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer

mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer
...

mapaccess1_faststr(t *maptype, h *hmap, ky string) unsafe.Pointer
...
  • Mapaccess1: returnh[key]If the key is not presentmap, a zero value of the corresponding type will be returned
  • Mapaccess2: returnh[key]If the key is not presentmapIn, zero and boolean values are returned for judgment

Source code

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    ...
    if h == nil || h.count == 0 {
        return unsafe.Pointer(&zeroVal[0])
    }
    if h.flags&hashWriting != 0 {
        throw("concurrent map read and map write")
    }
    alg := t.key.alg
    hash := alg.hash(key, uintptr(h.hash0))
    m := bucketMask(h.B)
    b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
    if c := h.oldbuckets; c != nil {
        if !h.sameSizeGrow() {
            // There used to be half as many buckets; mask down one more power of two.
            m >>= 1
        }
        oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
        if !evacuated(oldb) {
            b = oldb
        }
    }
    top := tophash(hash)
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey {
                k = *((*unsafe.Pointer)(k))
            }
            if alg.equal(key, k) {
                v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
                if t.indirectvalue {
                    v = *((*unsafe.Pointer)(v))
                }
                return v
            }
        }
    }
    return unsafe.Pointer(&zeroVal[0])
}
  • Judge whether map is nil and length is 0. If yes, zero value is returned
  • Judging whether the map is currently read and written concurrently, and if so, throwing an exception
  • According to different types of key, calling different hash methods to calculate hash values
  • Determine which bucket key is in and get its position.
  • Judge whether expansion is taking place (h.oldbuckets is nil). if expansion is taking place, look in the old buckets (because there may not be any value in the buckets, the relocation is not completed). if the bucket has been relocated. Then continue searching in buckets
  • Calculates the tophash value of the hash (the upper eight bits)
  • According to the calculated tophash, the tophash values of buckets are compared cyclically in turn (fast trial and error)
  • If the tophash match is successful, the location of the key is calculated, and whether the two keys are consistent is formally and completely compared.
  • If the search is successful and returns, if it does not exist, returns a value of zero

In the above-mentioned step 3, it is mentioned that hash values are calculated according to different types, and the upper eight bits and lower eight bits of hash values are also calculated. The lower eight bits are used as bucket index to find the bucket where key is located. The upper eight bits are stored in the bmap tophash.

Its main function is to perform iterative fast positioning in step 7 above. This can improve performance instead of directly comparing consistency with key at the beginning.

Icon

image

Summary

In this chapter, we introduced the following knowledge points of map type:

  • The basic data structure of map
  • Initialize map
  • Access map

From reading the source code, we know that Go itselfFor some properties of different sizes and types, including hash methods, there are specific methods for writing them.To run. Generally speaking, the design of this piece contains many ideas, and there are many points worth tasting carefully:)

Note: This article is based on Go 1.11.5