• Matthew Slipper's avatar
    op-conductor: Fix hang in testing (#13266) · 9548d53a
    Matthew Slipper authored
    I've found a [deadlock](https://app.circleci.com/pipelines/github/ethereum-optimism/optimism/73846/workflows/19369ca9-9eaa-4021-9eb8-589a06e7bd34/jobs/3018041) in the op-conductor tests. The inner conductor loop is stuck waiting on a queue signal that never arrives, which in turn causes the test to hang since the coordinating waitgroup is never decremented. I believe this happens because the conductor's `queueAction` method is non-blocking, so nothing ever triggers conductor's inner loop when conductor start up. I've updated the code to use a blocking channel write when conductor starts to avoid this issue.
    
    The traces of the deadlock look like this, for reference:
    
    ```
    goroutine 227 [semacquire, 9 minutes]:
    
    sync.runtime_Semacquire(0xc0004df577?)
    
    	/usr/local/go/src/runtime/sema.go:62 +0x25
    
    sync.(*WaitGroup).Wait(0x0?)
    
    	/usr/local/go/src/sync/waitgroup.go:116 +0x48
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).execute(0xc0003e4008, 0x0)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:177 +0x65
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).executeAction(...)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:197
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).enableSynchronization(0xc0003e4008)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:163 +0x93
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).TestScenario4(0xc0003e4008)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:420 +0x27
    
    reflect.Value.call({0xc000131900?, 0xc0001b3740?, 0x1fed5b8?}, {0x191d63c, 0x4}, {0xc0004dff28, 0x1, 0x16f12e0?})
    
    	/usr/local/go/src/reflect/value.go:596 +0xca6
    
    reflect.Value.Call({0xc000131900?, 0xc0001b3740?, 0x28307b8?}, {0xc0004dff28?, 0xf?, 0x0?})
    
    	/usr/local/go/src/reflect/value.go:380 +0xb9
    
    github.com/stretchr/testify/suite.Run.func1(0xc0001f9ba0)
    
    	/home/circleci/go/pkg/mod/github.com/stretchr/testify@v1.10.0/suite/suite.go:202 +0x4a5
    
    testing.tRunner(0xc0001f9ba0, 0xc0001f4e10)
    
    	/usr/local/go/src/testing/testing.go:1689 +0xfb
    
    created by testing.(*T).Run in goroutine 166
    
    	/usr/local/go/src/testing/testing.go:1742 +0x390
    
    goroutine 229 [select, 9 minutes]:
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).loopAction(0xc00056ab40)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service.go:577 +0x14f
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).enableSynchronization.func1()
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:159 +0x33
    
    github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).loop(0xc00056ab40)
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service.go:570 +0x99
    
    created by github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).Start in goroutine 227
    
    	/var/opt/circleci/data/workdir/op-conductor/conductor/service.go:376 +0x1f6
    
    FAIL	github.com/ethereum-optimism/optimism/op-conductor/conductor	600.103s
    ```
    9548d53a
Name
Last commit
Last update
..
assets Loading commit data...
client Loading commit data...
cmd Loading commit data...
conductor Loading commit data...
consensus Loading commit data...
flags Loading commit data...
health Loading commit data...
metrics Loading commit data...
rpc Loading commit data...
.gitignore Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
RUNBOOK.md Loading commit data...
justfile Loading commit data...
run_test_100times.sh Loading commit data...