In-depth understanding of Go panic and recover

  Back end, Error, golang, php

As a gophper, I believe that you are very interested inpanicAndrecoverCertainly not strange, but have you ever thought about it? After we execute these two statements. What happened on the ground floor? A few days ago, I just talked to my colleagues about some related topics. I found that everyone’s understanding of this topic was rather vague. I hope this article can tell you why and what it has done from a deeper angle.

Original address:In-depth understanding of Go panic and recover

Thinking

First, why will suspend operation

func main() {
    panic("EDDYCJY.")
}

Output results:

$ go run main.go
panic: EDDYCJY.

goroutine 1 [running]:
main.main()
    /Users/eddycjy/go/src/github.com/EDDYCJY/awesomeProject/main.go:4 +0x39
exit status 2

Please think about it, why is it implementedpanicWill it cause the application to stop running? (instead of just saying it was done.)panicSo it’s over.

Second, why not suspend operation

func main() {
    defer func() {
        if err := recover(); err != nil {
            log.Printf("recover: %v", err)
        }
    }()

    panic("EDDYCJY.")
}

Output results:

$ go run main.go 
2019/05/11 23:39:47 recover: EDDYCJY.

Please think about it, why adddefer+recoverCan the combination protect the application?

Three, do not set the defer line

The second question above isdefer+recoverCombination, then I removedeferIs it okay? As follows:

func main() {
    if err := recover(); err != nil {
        log.Printf("recover: %v", err)
    }

    panic("EDDYCJY.")
}

Output results:

$ go run main.go
panic: EDDYCJY.

goroutine 1 [running]:
main.main()
    /Users/eddycjy/go/src/github.com/EDDYCJY/awesomeProject/main.go:10 +0xa1
exit status 2

Actually, I can’t. Gosh, after all, all the introductory courses are written.defer+recoverCombine “universal” capture. But why? RemovedeferAfter why can’t you capture?

Please think about why you need to set it up.deferAfterrecoverTo work?

At the same time, you also need to think carefully, we set updefer+recoverAfter the combination, can you be carefree? Have you written all kinds of “chaos”?

Why can’t a goroutine work?

func main() {
    go func() {
        defer func() {
            if err := recover(); err != nil {
                log.Printf("recover: %v", err)
            }
        }()
    }()

    panic("EDDYCJY.")
}

Output results:

$ go run main.go 
panic: EDDYCJY.

goroutine 1 [running]:
main.main()
    /Users/eddycjy/go/src/github.com/EDDYCJY/awesomeProject/main.go:14 +0x51
exit status 2

Please think about why there is a new one.GoroutineCan’t you catch an exception? What the hell happened …

Source code

Next, we will take the above 4+1 small questions to start the analysis and analysis of the source code, trying to find the answers to the questions and more why from reading the source code

Data structure

type _panic struct {
    argp      unsafe.Pointer
    arg       interface{} 
    link      *_panic 
    recovered bool
    aborted   bool 
}

InpanicIs the use of_panicAs its basic unit, each executionpanicStatement, a_panic. It contains some basic fields for storing the currentpanicThe call involves the following fields:

  • Argp: pointingdeferPointer to the parameter of the deferred call
  • arg:panicThe reason for this is to callpanicParameter passed in when
  • Link: Pointing to the last call_panic
  • recovered:panicWhether it has been processed, that is, whether it has beenrecover
  • aborted:panicIs it suspended?

In addition, through viewinglinkField, you can know that it is a linked list data structure, as shown in the figure below:

image

Panic

func main() {
    panic("EDDYCJY.")
}

Output results:

$ go run main.go
panic: EDDYCJY.

goroutine 1 [running]:
main.main()
    /Users/eddycjy/go/src/github.com/EDDYCJY/awesomeProject/main.go:4 +0x39
exit status 2

Let’s check it outpanicWhere is the specific logic to be handled, as follows:

$ go tool compile -S main.go
"".main STEXT size=66 args=0x0 locals=0x18
    0x0000 00000 (main.go:23)    TEXT    "".main(SB), ABIInternal, $24-0
    0x0000 00000 (main.go:23)    MOVQ    (TLS), CX
    0x0009 00009 (main.go:23)    CMPQ    SP, 16(CX)
    ...
    0x002f 00047 (main.go:24)    PCDATA    $2, $0
    0x002f 00047 (main.go:24)    MOVQ    AX, 8(SP)
    0x0034 00052 (main.go:24)    CALL    runtime.gopanic(SB)

Obviously the assembly code points directly to the internal implementationruntime.gopanic, let’s look at what this method has done, as follows (omitted):

func gopanic(e interface{}) {
    gp := getg()
    ...
    var p _panic
    p.arg = e
    p.link = gp._panic
    gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
    
    for {
        d := gp._defer
        if d == nil {
            break
        }

        // defer...
        ...
        d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))

        p.argp = unsafe.Pointer(getargp(0))
        reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
        p.argp = nil

        // recover...
        if p.recovered {
            ...
            mcall(recovery)
            throw("recovery failed") // mcall should not return
        }
    }

    preprintpanics(gp._panic)

    fatalpanic(gp._panic) // should not return
    *(*int)(nil) = 0      // not reached
}
  • Gets a pointer to the currentGoroutinePointer to
  • Initialize apanicThe basic unit of_panicFor subsequent operations
  • Get currentGoroutineMounted on_defer(The data structure is also a linked list)
  • If it currently existsdeferCall, callreflectcallMethod to execute the previousdeferIf it is necessary to run the code that has delayed execution in therecoverWill callgorecoverMethod
  • Before ending, usepreprintpanicsMethod to print out the involvedpanicMessage
  • Last callfatalpanicTo suspend an application is actually to execute it.exit(2)Carrying out the final withdrawal act

Through the analysis of the above code, we can know thatpanicThe method is actually dealing with the currentGoroutine(g)Mounted on._panicLinked listGoroutineTo which it belongsdeferLinked list sumrecoverCarry out detection and processing, and finally call an exit command to stop the application program

Unrecoverable panic fatalpanic

func fatalpanic(msgs *_panic) {
    pc := getcallerpc()
    sp := getcallersp()
    gp := getg()
    var docrash bool

    systemstack(func() {
        if startpanic_m() && msgs != nil {
            ...
            printpanics(msgs)
        }

        docrash = dopanic_m(gp, pc, sp)
    })

    systemstack(func() {
        exit(2)
    })

    *(*int)(nil) = 0
}

We saw that the method would be executed at the end of exception handling, as if it had taken all the finishing touches. In fact, it is at the end of the program executionexitInstruction to suspend operation, but it will pass before the end.printpanicsRecursively outputs all exception messages and parameters. The code is as follows:

func printpanics(p *_panic) {
    if p.link != nil {
        printpanics(p.link)
        print("\t")
    }
    print("panic: ")
    printany(p.arg)
    if p.recovered {
        print(" [recovered]")
    }
    print("\n")
}

So don’t think all exceptions can berecoverTo, in fact, likefatal errorAndruntime.throwAre unable to berecoverTo, even oom is also directly suspend the program, also have backhand will give you a.exit(2)Teaching people how to behave. Therefore, when writing code, you should pay more attention to “panic” because there are unrecoverable scenarios.

Recover

func main() {
    defer func() {
        if err := recover(); err != nil {
            log.Printf("recover: %v", err)
        }
    }()

    panic("EDDYCJY.")
}

Output results:

$ go run main.go 
2019/05/11 23:39:47 recover: EDDYCJY.

Consistent with expectations, the anomaly was successfully captured. butrecoverHow to recoverpanicHow about yours? Look at the assembly code again, as follows:

$ go tool compile -S main.go
"".main STEXT size=110 args=0x0 locals=0x18
    0x0000 00000 (main.go:5)    TEXT    "".main(SB), ABIInternal, $24-0
    ...
    0x0024 00036 (main.go:6)    LEAQ    "".main.func1·f(SB), AX
    0x002b 00043 (main.go:6)    PCDATA    $2, $0
    0x002b 00043 (main.go:6)    MOVQ    AX, 8(SP)
    0x0030 00048 (main.go:6)    CALL    runtime.deferproc(SB)
    ...
    0x0050 00080 (main.go:12)    CALL    runtime.gopanic(SB)
    0x0055 00085 (main.go:12)    UNDEF
    0x0057 00087 (main.go:6)    XCHGL    AX, AX
    0x0058 00088 (main.go:6)    CALL    runtime.deferreturn(SB)
    ...
    0x0022 00034 (main.go:7)    MOVQ    AX, (SP)
    0x0026 00038 (main.go:7)    CALL    runtime.gorecover(SB)
    0x002b 00043 (main.go:7)    PCDATA    $2, $1
    0x002b 00043 (main.go:7)    MOVQ    16(SP), AX
    0x0030 00048 (main.go:7)    MOVQ    8(SP), CX
    ...
    0x0056 00086 (main.go:8)    LEAQ    go.string."recover: %v"(SB), AX
    ...
    0x0086 00134 (main.go:8)    CALL    log.Printf(SB)
    ...

By analyzing the underlying calls, we can see that the following methods are the main ones:

  • runtime.deferproc
  • runtime.gopanic
  • runtime.deferreturn
  • runtime.gorecover

In the previous section, we described a simple process.gopanicThe method calls the currentGoroutineinferiordeferLinked list, ifreflectcallEncountered during executionrecoverIt will callgorecoverThe method code is as follows:

func gorecover(argp uintptr) interface{} {
    gp := getg()
    p := gp._panic
    if p != nil && !p.recovered && argp == uintptr(p.argp) {
        p.recovered = true
        return p.arg
    }
    return nil
}

This code looks quite simple, and the core is modification.recoveredFields. This field is used to identify the currentpanicHas it beenrecoverHandling. But this is not the same as what we imagined. How did the program come frompanicWhat about those who have been transferred back? Is it dealt with in the core method? Let’s look againgopanicThe code for is as follows:

func gopanic(e interface{}) {
    ...
    for {
        // defer...
        ...
        pc := d.pc
        sp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copy
        freedefer(d)
        
        // recover...
        if p.recovered {
            atomic.Xadd(&runningPanicDefers, -1)

            gp._panic = p.link
            for gp._panic != nil && gp._panic.aborted {
                gp._panic = gp._panic.link
            }
            if gp._panic == nil { 
                gp.sig = 0
            }

            gp.sigcode0 = uintptr(sp)
            gp.sigcode1 = pc
            mcall(recovery)
            throw("recovery failed") 
        }
    }
    ...
}

Let’s go backgopanicA closer look at the method shows that it actually contains the rightrecoverOf the processing code that flows. The recovery process is as follows:

  • Judge current_panichit the targetrecoverIs it marked for processing
  • From_panicDelete from linked list those marked suspendedpanicEvent, that is to say, delete the restoredpanicEvents
  • Transfer relevant stack frame information to be recoveredrecoveryMethod ofgpParameters (each stack frame corresponds to an unfinished function. The return address and local variables of the function are saved in the stack frame)
  • carry outrecoveryPerform recovery actions

From the process point of view, the core isrecoveryMethods. It has assumed the responsibility of abnormal flow control. The code is as follows:

func recovery(gp *g) {
    sp := gp.sigcode0
    pc := gp.sigcode1

    if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
        print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
        throw("bad recovery")
    }

    gp.sched.sp = sp
    gp.sched.pc = pc
    gp.sched.lr = 0
    gp.sched.ret = 1
    gogo(&gp.sched)
}

At first glance, it seems to be very simple to set some values? However, what is actually set is the value of pseudo register in compiler, which is often used to maintain context, etc. We need to combine heregopanicMethods Observe togetherrecoveryMethods. The stack pointer it usesspAnd program counterspcIt is up to the currentdeferIn the call processdeferprocPassed down, so in fact the final passgogoThe method jumped backdeferprocMethods. In addition, we note that:

gp.sched.ret = 1

In the bottom layer the program willgp.sched.retSet to 1, that isNo actual calls deferprocMethod to directly modify its return value. This means that it has been processed by default. Direct transfer todeferprocThe next instruction of the method goes to. So far, the flow control of abnormal state has ended. The next step is to continue walkingdeferThe flow of the

To test this idea, we can look at the core jump method.gogo, code is as follows:

// void gogo(Gobuf*)
// restore state from Gobuf; longjmp
TEXT runtime·gogo(SB),NOSPLIT,$8-4
    MOVW    buf+0(FP), R1
    MOVW    gobuf_g(R1), R0
    BL    setg<>(SB)

    MOVW    gobuf_sp(R1), R13    // restore SP==R13
    MOVW    gobuf_lr(R1), LR
    MOVW    gobuf_ret(R1), R0
    MOVW    gobuf_ctxt(R1), R7
    MOVW    $0, R11
    MOVW    R11, gobuf_sp(R1)    // clear to help garbage collector
    MOVW    R11, gobuf_ret(R1)
    MOVW    R11, gobuf_lr(R1)
    MOVW    R11, gobuf_ctxt(R1)
    MOVW    gobuf_pc(R1), R11
    CMP    R11, R11 // set condition codes for == test, needed by stack split
    B    (R11)

By looking at the code, we can know that its main function is to learn fromGobufRestore status. Simply put, the value of the register is modified to correspondGoroutine(g)The value of, and talked about many times in the articleGobuf, as follows:

type gobuf struct {
    sp   uintptr
    pc   uintptr
    g    guintptr
    ctxt unsafe.Pointer
    ret  sys.Uintreg
    lr   uintptr
    bp   uintptr
}

Reason, in fact it stores isGoroutineWhat you need to switch contexts

Expand

const(
    OPANIC       // panic(Left)
    ORECOVER     // recover()
    ...
)
...
func walkexpr(n *Node, init *Nodes) *Node {
    ...
    switch n.Op {
    default:
        Dump("walk", n)
        Fatalf("walkexpr: switch 1 unknown op %+S", n)

    case ONONAME, OINDREGSP, OEMPTY, OGETG:
    case OTYPE, ONAME, OLITERAL:
        ...
    case OPANIC:
        n = mkcall("gopanic", nil, init, n.Left)

    case ORECOVER:
        n = mkcall("gorecover", n.Type, init, nod(OADDR, nodfp, nil))
    ...
}

Actually callingpanicAndrecoverKeyword is converted into corresponding OPCODE at compilation stage and then into corresponding runtime method by compiler. It’s not the one-step approach you think. Interested partners can study it.

Summary

This article mainly aims atpanicAndrecoverKey words are analyzed in-depth source code, and the first 4+1 thinking questions are to hope you can learn with doubt and achieve the effect of getting twice the result with half the effort.

In addition, this article anddeferThere is a certain degree of relevance, so a certain amount of basic knowledge is required. If you don’t understand this part when you just read it, you can read it again after studying to deepen your impression.

In the end, can you answer these questions now? If you say it, you really understand:)