Proper Memory Alignment in Go

  Back end, golang, php

image

Original address:Proper Memory Alignment in Go

Problem

type Part1 struct {
    a bool
    b int32
    c int8
    d int64
    e byte
}

Before starting, I hope you can calculate itPart1What is the total size?

func main() {
    fmt.Printf("bool size: %d\n", unsafe.Sizeof(bool(true)))
    fmt.Printf("int32 size: %d\n", unsafe.Sizeof(int32(0)))
    fmt.Printf("int8 size: %d\n", unsafe.Sizeof(int8(0)))
    fmt.Printf("int64 size: %d\n", unsafe.Sizeof(int64(0)))
    fmt.Printf("byte size: %d\n", unsafe.Sizeof(byte(0)))
    fmt.Printf("string size: %d\n", unsafe.Sizeof("EDDYCJY"))
}

Output results:

bool size: 1
int32 size: 4
int8 size: 1
int64 size: 8
byte size: 1
string size: 16

With such a calculation,Part1This structure occupies 1+4+1+8+1 = 15 bytes of memory. I believe that some of my little friends are doing this, and there seems to be nothing wrong with them.

What is the real situation? Let’s look at the actual call, as follows:

type Part1 struct {
    a bool
    b int32
    c int8
    d int64
    e byte
}

func main() {
    part1 := Part1{}
    
    fmt.Printf("part1 size: %d, align: %d\n", unsafe.Sizeof(part1), unsafe.Alignof(part1))
}

Output results:

part1 size: 32, align: 8

The final output is 32 bytes. This is totally different from the expected result. This fully illustrates that the previous calculation method is wrong. Why?

We need to mention the concept of “memory alignment” here before we can use the correct posture to calculate. Next we will talk about what it is in detail.

Memory alignment

Some small partners may think that memory reading is a simple arrangement of byte arrays

image

The above figure shows the memory reading method of one pit and one radish. However, the CPU does not actually read and write memory byte by byte. On the contrary, the CPU reads the memory as followsRead one by one, the block size can be 2, 4, 6, 8, 16 bytes, etc. The block size we call itMemory access granularity. The following figure:

image

In the sample, assume an access granularity of 4. The CPU reads and writes memory at an access granularity of every 4 bytes. This is the correct posture.

Why Care about Alignment

  • The code you are writing has certain performance requirements (CPU, Memory)
  • You are processing instructions on vectors
  • Some hardware platform (ARM) architectures do not support misaligned memory accesses

In addition, as an engineer, it is also necessary for you to learn this knowledge.

Why do you want to do alignment

  • Platform (Portability) Reason: Not all hardware platforms can access any data at any address. For example, a specific hardware platform only allows specific types of data to be obtained at specific addresses, otherwise it will lead to abnormal situations.
  • Performance reason: If the misaligned memory is accessed, it will cause the CPU to make two memory accesses and take extra clock cycles to process the alignment and operation. The self-aligned memory needs only one access to complete the read operation.

image

In the above figure, assuming reading from Index 1, there will be a very crash problem. Because its memory access boundaries are not aligned. Therefore, the CPU will do some additional processing work. As follows:

  1. CPUfirstRead the first memory block of the misaligned address and read 0-3 bytes. And the unnecessary byte 0 is removed
  2. CPUAgainRead the second memory block of the misaligned address and read 4-7 bytes. And unnecessary bytes 5, 6 and 7 are removed
  3. Merge 1-4 bytes of data
  4. Put into register after merging

From the above process, it can be concluded that not doing “memory alignment” is a little “troublesome”. Because it will add many time-consuming actions.

On the other hand, assuming memory alignment is done, starting from Index 0, 4 bytes need to be read only once, and no additional operations are required. This is obviously much more efficient and standard.Space for TimePractice

Default factor

Compilers on different platforms have their own default “alignment factor”, which can be obtained by precompiling commands#pragma pack(n)To change, n means “alignment factor”. Generally speaking, the coefficients of our commonly used platforms are as follows:

  • 32-bit: 4
  • 64-bit: 8

In addition, it should be noted that the size and alignment values occupied by different hardware platforms may be different. Therefore, the value in this article is not unique, and debugging should be considered according to the actual situation of the machine.

Member alignment

func main() {
    fmt.Printf("bool align: %d\n", unsafe.Alignof(bool(true)))
    fmt.Printf("int32 align: %d\n", unsafe.Alignof(int32(0)))
    fmt.Printf("int8 align: %d\n", unsafe.Alignof(int8(0)))
    fmt.Printf("int64 align: %d\n", unsafe.Alignof(int64(0)))
    fmt.Printf("byte align: %d\n", unsafe.Alignof(byte(0)))
    fmt.Printf("string align: %d\n", unsafe.Alignof("EDDYCJY"))
    fmt.Printf("map align: %d\n", unsafe.Alignof(map[string]string{}))
}

Output results:

bool align: 1
int32 align: 4
int8 align: 1
int64 align: 8
byte align: 1
string align: 8
map align: 8

Can be called in Gounsafe.AlignofTo return the corresponding type of alignment factor. By observing the output results, we can know that they are basically the same.2^n, the maximum will not exceed 8. This is because the default alignment coefficient of my portable (64-bit) compiler is 8, so the maximum value will not exceed this number.

Overall alignment

In the previous section, it was mentioned that member variables in the structure should be byte aligned. Then it is taken for granted that the structure as the final result also needs to be byte aligned.

Alignment rule

  • The member variable of the structure, the offset of the first member variable is 0. The alignment value for each subsequent member variable must beCompiler default alignment length(#pragma pack(n)) orThe length of the current member variable type(unsafe.Sizeof), takeThe minimum value is the alignment value of the current type. The offset must be an integer multiple of the alignment value.
  • Structure itself, alignment value must beCompiler default alignment length(#pragma pack(n)) orThe maximum length of all member variable types of the structureTakeThe smallest integer multiple of the largest number.As an alignment value
  • Combining the above two points, we can know that ifCompiler default alignment length(#pragma pack(n)) exceeds the type maximum length of member variables within the structure, the default alignment length is meaningless

Analysis process

Next, let’s analyze together what “it” went through and influenced the “expected” results.

Member variable Type Offset Own occupation
a bool 0 1
byte aligned No 1 3
b int32 4 4
c int8 8 1
byte aligned No 9 7
d int64 16 8
e byte 24 1
byte aligned No 25 7
Total occupation size 32

Member alignment

  • First membera

    • Type bool
    • The size/alignment value is 1 byte
    • Initial address, offset 0. Occupy the 1st position
  • Second memberb

    • Type is int32
    • The size/alignment value is 4 bytes
    • According to rule 1, the offset must be an integer multiple of 4. It is determined that the offset is 4, so the 2-4 bits are Padding. However, the current value starts from bit 5 to bit 8. As follows: axxx|bbbb
  • Third membersc

    • Type is int8
    • The size/alignment value is 1 byte
    • According to rule 1, its offset must be an integer multiple of 1. The current offset is 8. No additional alignment is required, filling 1 byte to bit 9. As follows: axxx|bbbb|c | c …
  • Fourth memberd

    • Type is int64
    • The size/alignment value is 8 bytes
    • According to rule 1, the offset must be an integer multiple of 8. The offset is determined to be 16, so 9-16 bits are Padding. The current value is written from bit 17 to bit 24. As follows: axxx | bbbbb | cxxx | xxxx | dddd | dddd
  • Fifth membere

    • Byte type
    • The size/alignment value is 1 byte
    • According to rule 1, its offset must be an integer multiple of 1. The current offset is 24. No additional alignment is required, filling 1 byte to the 25th bit. As follows: axxx | bbbbb | cxxx | xxxx | dddd | ddd | e …

Overall alignment

After each member variable is aligned, according to rule 2, the entire structure itself is byte aligned, because it may not be found2^n, not even several times. Obviously does not conform to the alignment rules

According to rule 2, the alignment value is 8. The offset is now 25, not an integer multiple of 8. Therefore, it is determined that the offset is 32. Align structures

Result

Part1 Memory Layout: AXXX | BBBB | CXXX | XXXX | DDDD | DDDD | EXXX | XXXX

Summary

Through the analysis in this section, we can know why the previous “calculation” was wrong.

It is because the actual memory management is not the idea of “one radish, one pit”. But one by one. This piece of reading and writing is completed through the idea of space for time (efficiency). In addition, it is also necessary to consider the memory operation of different platforms.

Clever structure

In the previous section, we can know that according to the type of member variable, the memory of its structure will generate actions such as alignment. If the order of fields is different, will there be any change? Let’s try it together:-)

type Part1 struct {
    a bool
    b int32
    c int8
    d int64
    e byte
}

type Part2 struct {
    e byte
    c int8
    a bool
    b int32
    d int64
}

func main() {
    part1 := Part1{}
    part2 := Part2{}

    fmt.Printf("part1 size: %d, align: %d\n", unsafe.Sizeof(part1), unsafe.Alignof(part1))
    fmt.Printf("part2 size: %d, align: %d\n", unsafe.Sizeof(part2), unsafe.Alignof(part2))
}

Output results:

part1 size: 32, align: 8
part2 size: 16, align: 8

Through the results, we can be pleasantly surprised to find that only “simple” changes to the field order of member variables have changed the structure occupation size.

Next, let’s analyze it together.Part2, and see what is the difference between its internal and the previous, led to such a result?

Analysis process

Member variable Type Offset Own occupation
e byte 0 1
c int8 1 1
a bool 2 1
byte aligned No 3 1
b int32 4 4
d int64 8 8
Total occupation size 16

Member alignment

  • First member e

    • Byte type
    • The size/alignment value is 1 byte
    • Initial address, offset 0. Occupy the 1st position
  • Second membersc

    • Type is int8
    • The size/alignment value is 1 byte
    • According to rule 1, its offset must be an integer multiple of 1. The current offset is 2. No additional alignment is required
  • Third membera

    • Type bool
    • The size/alignment value is 1 byte
    • According to rule 1, its offset must be an integer multiple of 1. The current offset is 3. No additional alignment is required
  • Fourth memberb

    • Type is int32
    • The size/alignment value is 4 bytes
    • According to rule 1, the offset must be an integer multiple of 4. It is determined that the offset is 4, so the 3rd bit is Padding. However, the current value starts from bit 4 to bit 8. As follows: ecax|bbbb
  • Fifth membersd

    • Type is int64
    • The size/alignment value is 8 bytes
    • According to rule 1, the offset must be an integer multiple of 8. The current offset is 8. No additional alignment is required and 8 bytes are filled from 9-16 bits. As follows: ecax | bbbbb | ddddd | dddd

Overall alignment

Compliance with Rule 2, No Additional Alignment Required

Result

Part2 memory layout: ecax | bbbbb | ddddd | dddd

Summary

By comparisonPart1AndPart2You will find that there is a big difference between the two. As follows:

  • Part1:axxx|bbbb|cxxx|xxxx|dddd|dddd|exxx|xxxx
  • Part2:ecax|bbbb|dddd|dddd

On closer inspection,Part1There are many Padding. Obviously it takes up a lot of space, so how did Padding appear?

Through the introduction of this article, we can know that byte alignment is required due to different types, so as to ensure the memory access boundary.

Then it is not difficult to understand whyAdjust the field order of member variables in the structure bodyThe problem of reducing the size of the structure can be solved because the existence of Padding is cleverly reduced. Make them more “compact.” This is very helpful for deepening Go’s memory layout impression and optimizing large objects.

Of course, there is no special problem, you can not pay attention to this one. But you need to know this knowledge.

References