In-depth understanding of gosice

  Back end, golang, php, slice

image

Original address:In-depth understanding of gosice

What is it?

In Go, Slice is a special type abstracted from Array. In order to better understand Slice, the first step is to understand Array. After a deep understanding of the difference between Slice and Array, we can better explore the bottom layer

usage

Array

func main() {
    nums := [3]int{}
    nums[0] = 1

    n := nums[0]
    n = 2

    fmt.Printf("nums: %v\n", nums)
    fmt.Printf("n: %d\n", n)
}

We can know that in Go, the array type needs to specify the length and element type. In the above code, it can be known that[3]int{}Represents an array of 3 integers and is initialized. The underlying data is stored as a continuous memory space and retrieved by a fixed index value (subscript)

image

After the array is declared, the initial value of its elements (that is, zero value) is 0. And the variable can be directly used without special operations

At the same time, the length of the array is fixed, its length is part of the type, therefore[3]intAnd[4]intIt is different in type and cannot be called “one thing”

Output results

nums: [1 0 0] 
n: 2 

Slice

func main() {
    nums := [3]int{}
    nums[0] = 1

    dnums := nums[:]

    fmt.Printf("dnums: %v", dnums)
}

Slice is an abstraction of Array of type[]T. In the above code,dnumsVariable passnums[:]Make an assignment. Note that Slice is different from Array in that it does not need to specify a length. It is also more flexible and can automatically expand the capacity.

Data structure

image

type slice struct {
    array unsafe.Pointer
    len   int
    cap   int
}

Slice’s underlying data structure is divided into three parts, as follows:

  • Array: a pointer to the referenced array (unsafe.PointerA pointer that can represent any addressable value)
  • Len: length, number of elements currently referencing slice
  • Cap: Capacity, the capacity of the current reference slice (the total number of elements in the underlying array)

In actual use, cap must be greater than or equal to len. Otherwise, panic will be caused.

Example

For better understanding, we review the code in the previous section for easy demonstration, as follows:

func main() {
    nums := [3]int{}
    nums[0] = 1

    dnums := nums[:]

    fmt.Printf("dnums: %v", dnums)
}

image

In the code, it can be observed thatdnums := nums[:]This code determines that Slice’s Pointer points to an array, and len and cap are the basic attributes of the array. Consistent with the graphic expression

Len and cap are different.

func main() {
    nums := [3]int{}
    nums[0] = 1

    dnums := nums[0:2]

    fmt.Printf("dnums: %v, len: %d, cap: %d", dnums, len(dnums), cap(dnums))
}

image

Output results

dnums: [1 0], len: 2, cap: 3

Obviously, it is specified hereSlice[0:2]So len is the number of referenced elements and cap is the total number of referenced array elements. Consistent with expectations

Create

Slice can be created in two ways, as follows:

  • var []TOr[]T{}
  • func make([] T,len,cap)[] T

Note the make function, we all know Slice needs to point to an Array. How does make do it?

When it calls make, it assigns an array and returns Slice that references the array

func makeslice(et *_type, len, cap int) slice {
    maxElements := maxSliceCap(et.size)
    if len < 0 || uintptr(len) > maxElements {
        panic(errorString("makeslice: len out of range"))
    }

    if cap < len || uintptr(cap) > maxElements {
        panic(errorString("makeslice: cap out of range"))
    }

    p := mallocgc(et.size*uintptr(cap), et, true)
    return slice{p, len, cap}
}
  • According to the incoming Slice type, obtain the maximum capacity that the type can apply for
  • Judge whether len is compliant and check whether it is within the range of 0 < x < maxElements.
  • Judge whether cap is in compliance and check whether it is within len < x < maxElements
  • The memory space object required to request Slice. For large objects (larger than 32 KB), they are allocated directly from the heap
  • Returns the Slice memory address and related attributes of successful application (default returns the memory starting address applied to)

Expansion

When Slice is used, if the stored elements keep growing (e.g. through append). When the conditions meet the expansion strategy, automatic expansion will be triggered.

So what are the rules respectively? Let’s look at what the source code says together.

zerobase

func growslice(et *_type, old slice, cap int) slice {
    ...
    if et.size == 0 {
        if cap < old.cap {
            panic(errorString("growslice: cap out of range"))
        }
        
        return slice{unsafe.Pointer(&zerobase), old.len, cap}
    }
    ...
}

When Slice size is 0, if the capacity to be expanded is smaller than the original capacity, an exception will be thrown (i.e. shrinkage operation is not supported). Otherwise, a new Slice return will be regenerated with its Pointer pointing to a 0 byte address (the old Array pointer will not be retained)

Expansion-Calculation Strategy

func growslice(et *_type, old slice, cap int) slice {
    ...
    newcap := old.cap
    doublecap := newcap + newcap
    if cap > doublecap {
        newcap = cap
    } else {
        if old.len < 1024 {
            newcap = doublecap
        } else {
            for 0 < newcap && newcap < cap {
                newcap += newcap / 4
            }
            ...
        }
    }
    ...
}
  • If Slice cap is larger than doublecap, the expanded capacity is the capacity of the new Slice (beyond the reference value, I will only give you the capacity you need)
  • If slice length is less than 1024, the growth factor is 1 (i.e. 3 to 6) during expansion.
  • If the slice length is greater than 1024, the growth factor is 0.25 (1/4 of the original capacity) during expansion.

Note: when the number is less than 1024, it will double. More than 1024, an increase of 1.25 times

Expansion-Memory Policy

func growslice(et *_type, old slice, cap int) slice {
    ...
    var overflow bool
    var lenmem, newlenmem, capmem uintptr
    const ptrSize = unsafe.Sizeof((*byte)(nil))
    switch et.size {
    case 1:
        lenmem = uintptr(old.len)
        newlenmem = uintptr(cap)
        capmem = roundupsize(uintptr(newcap))
        overflow = uintptr(newcap) > _MaxMem
        newcap = int(capmem)
        ...
    }

    if cap < old.cap || overflow || capmem > _MaxMem {
        panic(errorString("growslice: cap out of range"))
    }

    var p unsafe.Pointer
    if et.kind&kindNoPointers != 0 {
        p = mallocgc(capmem, nil, false)
        memmove(p, old.array, lenmem)
        memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem)
    } else {
        p = mallocgc(capmem, et, true)
        if !writeBarrier.enabled {
            memmove(p, old.array, lenmem)
        } else {
            for i := uintptr(0); i < lenmem; i += et.size {
                typedmemmove(et, add(p, i), add(old.array, i))
            }
        }
    }
    ...
}

1. Obtain the length of the old Slice and calculate the assumed expanded length of the new Slice element, the size of the capacity and the pointer address (for a series of operations in the subsequent operation memory)

2. Confirm that the new Slice capacity is greater than the old Sice, and the new capacity memory is less than the specified maximum memory without overflow. Otherwise, throw an exception

3. If the element type iskindNoPointers, that is to sayNon pointerType. The expansion will continue after the old Slice

  • Step 1: According to the previous calculationcapmem, after the old Slice cap continue to apply for memory space, then used for expansion
  • Step 2: copy n bytes (according to lenmem) on old.array to the new memory space
  • Step 3: add the capacity address of the new Slice cap to the new memory space (p). Finally, a complete new sliceap memory address is obtained.add(p, newlenmem)(ptr)
  • Step 4: Reinitialize N bytes(capmem-newlenmem) from ptr

Note: Then the problem arises. Why do you want to reinitialize this memory? This is because ptr is uninitialized memory (e.g. reusable memory, generally used for new memory allocation), which may contain “garbage”. Therefore, “clean-up” should be carried out here. It is convenient for later practical use (expansion)

4. If 3 is not met, reapply and initialize a piece of memory for the new Slice to store Array.

5. check whether GC is currently being executed, that is, whether Write Barrier is currently enabled. ifEnablePasstypedmemmoveMethod, using pointer operationCircular copy. Otherwise passmemmoveMethods adoptedWhole copyIn order to achieve higher efficiency, lenmem bytes are copied from old.array to ptr

Note: Write Barrier is normally enabled during GC marking phase, and Write Barrier is only enabled for pointers. Then in point 5, it is not difficult for you to understand why there are two completely different ways to deal with it.

Summary

It should be noted here that the options for memory management during expansion are as follows:

  • Renovation extension: current element iskindNoPointers, will continue to apply for space for expansion after the address of the old sliceap.
  • Family Relocation: Re-apply for a memory address, migrate and expand the whole

Two little “traps”

I. the same root

func main() {
    nums := [3]int{}
    nums[0] = 1

    fmt.Printf("nums: %v , len: %d, cap: %d\n", nums, len(nums), cap(nums))

    dnums := nums[0:2]
    dnums[0] = 5

    fmt.Printf("nums: %v ,len: %d, cap: %d\n", nums, len(nums), cap(nums))
    fmt.Printf("dnums: %v, len: %d, cap: %d\n", dnums, len(dnums), cap(dnums))
}

Output results:

nums: [1 0 0] , len: 3, cap: 3
nums: [5 0 0] ,len: 3, cap: 3
dnums: [5 0], len: 2, cap: 3

InBefore expansion, Slice Array points to the referenced array. So the change on Slice. Will be directly modified to the original Array (both refer to the same)

image

II. Times have changed

As Slice continues to append, there are more and more internal elements, which finally triggered the expansion. The following code:

func main() {
    nums := [3]int{}
    nums[0] = 1

    fmt.Printf("nums: %v , len: %d, cap: %d\n", nums, len(nums), cap(nums))

    dnums := nums[0:2]
    dnums = append(dnums, []int{2, 3}...)
    dnums[1] = 1

    fmt.Printf("nums: %v ,len: %d, cap: %d\n", nums, len(nums), cap(nums))
    fmt.Printf("dnums: %v, len: %d, cap: %d\n", dnums, len(dnums), cap(dnums))
}

Output results:

nums: [1 0 0] , len: 3, cap: 3
nums: [1 0 0] ,len: 3, cap: 3
dnums: [1 1 2 3], len: 4, cap: 6

When adding the Slice append element, if the expansion strategy is satisfied, that is to say, the original array will exceed the maximum capacity after insertion.

At this time, the internal will re-apply for a memory space and transfer the original elementsCopyOne copy to the new memory space. At this time, it has no relation with the original array.The modified value will not change to the original array.. This requires attention

image

Copy

Prototype

func copy(dst,src [] T)int

Copy function transfers data fromSource Slicecopy toTarget Slice. It returns the number of elements copied.

Example

func main() {
    dst := []int{1, 2, 3}
    src := []int{4, 5, 6, 7, 8}
    n := copy(dst, src)

    fmt.Printf("dst: %v, n: %d", dst, n)
}

The copy function supports copying between Slices of different lengths. if the lengths are inconsistent, the copy will be made according to the minimum number of slice elements.

So how does this behavior of copying complete in the source code? Let’s look at the implementation of the source code together, as follows:

func slicecopy(to, fm slice, width uintptr) int {
    if fm.len == 0 || to.len == 0 {
        return 0
    }

    n := fm.len
    if to.len < n {
        n = to.len
    }

    if width == 0 {
        return n
    }

    ...

    size := uintptr(n) * width
    if size == 1 {
        *(*byte)(to.array) = *(*byte)(fm.array) // known to be a byte pointer
    } else {
        memmove(to.array, fm.array, size)
    }
    return n
}
  • If the source Slice or the target Slice has a length of 0, it will directly return 0 (because there is no need to perform replication at all)
  • The minimum Slice length is obtained by comparing two slices. Which is convenient for subsequent operation.
  • If Slice has only one element, the pointer’s characteristics are directly used for conversion.
  • If Slice is greater than one element, then fromfm.arrayCopysizeBytes toto.arrayAt the address of (overwrites the original value)

Initialization of “Strange”

There are two legends circulating in Slice, namely Empty and Nil Slice. next let’s look at their small differences.

Empty

func main() {
    nums := []int{}
    renums := make([]int, 0)
    
    fmt.Printf("nums: %v, len: %d, cap: %d\n", nums, len(nums), cap(nums))
    fmt.Printf("renums: %v, len: %d, cap: %d\n", renums, len(renums), cap(renums))
}

Output results:

nums: [], len: 0, cap: 0
renums: [], len: 0, cap: 0

Nil

func main() {
    var nums []int
}

Output results:

nums: [], len: 0, cap: 0

Think about it.

At first glance, emptysice and Nil Slice look exactly the same? Both len and cap are 0. Doesn’t seem to make any difference? Let’s look at the following code:

func main() {
    var nums []int
    renums := make([]int, 0)
    if nums == nil {
        fmt.Println("nums is nil.")
    }
    if renums == nil {
        fmt.Println("renums is nil.")
    }
}

What do you think is the output? As you may have thought, the final output:

nums is nil.

Why?

Empty

image

Nil

image

As can be seen from the diagram, there is an essential difference between the two. The pointers to the underlying arrays are different. nil Slice refers to Nil, and Empty Slice refers to the actual empty array address

You can think that Nil Slice refers to a nonexistent Slice and Empty Slice refers to an empty set. The meaning they represent is completely different.

Summary

According to this article, gosice is quite flexible. There is no need for you to manually expand the capacity, nor do you need to pay attention to how much you add and how much you subtract. Array is a dynamic reference and a great supplement to Go type, so it is more and more convenient to use in applications.

Although there are some “pits” to be noticed, they are actually reasonable. What do you think?