• Matthew Slipper's avatar
    op-conductor: Fix hang in testing (#13266) · 9548d53a
    Matthew Slipper authored
    I've found a [deadlock](https://app.circleci.com/pipelines/github/ethereum-optimism/optimism/73846/workflows/19369ca9-9eaa-4021-9eb8-589a06e7bd34/jobs/3018041) in the op-conductor tests. The inner conductor loop is stuck waiting on a queue signal that never arrives, which in turn causes the test to hang since the coordinating waitgroup is never decremented. I believe this happens because the conductor's `queueAction` method is non-blocking, so nothing ever triggers conductor's inner loop when conductor start up. I've updated the code to use a blocking channel write when conductor starts to avoid this issue.
    
    The traces of the deadlock look like this, for reference:
    
    ```
    goroutine 227 [semacquire, 9 minutes]:
    
    sync.runtime_Semacquire(0xc0004df577?)
    
    	/usr/local/go/src/runtime/sema.go:62 +0x25
    
    sync.(*WaitGroup).Wait(0x0?)
    
    	/usr/local/go/src/sync/waitgroup.go:116 +0x48
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).execute(0xc0003e4008, 0x0)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:177 +0x65
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).executeAction(...)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:197
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).enableSynchronization(0xc0003e4008)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:163 +0x93
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).TestScenario4(0xc0003e4008)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:420 +0x27
    
    reflect.Value.call({0xc000131900?, 0xc0001b3740?, 0x1fed5b8?}, {0x191d63c, 0x4}, {0xc0004dff28, 0x1, 0x16f12e0?})
    
    	/usr/local/go/src/reflect/value.go:596 +0xca6
    
    reflect.Value.Call({0xc000131900?, 0xc0001b3740?, 0x28307b8?}, {0xc0004dff28?, 0xf?, 0x0?})
    
    	/usr/local/go/src/reflect/value.go:380 +0xb9
    
    github.com/stretchr/testify/suite.Run.func1(0xc0001f9ba0)
    
    	/home/circleci/go/pkg/mod/github.com/stretchr/testify@v1.10.0/suite/suite.go:202 +0x4a5
    
    testing.tRunner(0xc0001f9ba0, 0xc0001f4e10)
    
    	/usr/local/go/src/testing/testing.go:1689 +0xfb
    
    created by testing.(*T).Run in goroutine 166
    
    	/usr/local/go/src/testing/testing.go:1742 +0x390
    
    goroutine 229 [select, 9 minutes]:
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).loopAction(0xc00056ab40)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service.go:577 +0x14f
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).enableSynchronization.func1()
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:159 +0x33
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).loop(0xc00056ab40)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service.go:570 +0x99
    
    created by github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).Start in goroutine 227
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service.go:376 +0x1f6
    
    FAIL	github.com/ethereum-optimism/optimism/op-conductor/conductor	600.103s
    ```
    9548d53a
service.go 29.7 KB