op-conductor: Fix hang in testing (#13266)
I've found a [deadlock](https://app.circleci.com/pipelines/github/ethereum-optimism/optimism/73846/workflows/19369ca9-9eaa-4021-9eb8-589a06e7bd34/jobs/3018041) in the op-conductor tests. The inner conductor loop is stuck waiting on a queue signal that never arrives, which in turn causes the test to hang since the coordinating waitgroup is never decremented. I believe this happens because the conductor's `queueAction` method is non-blocking, so nothing ever triggers conductor's inner loop when conductor start up. I've updated the code to use a blocking channel write when conductor starts to avoid this issue. The traces of the deadlock look like this, for reference: ``` goroutine 227 [semacquire, 9 minutes]: sync.runtime_Semacquire(0xc0004df577?) /usr/local/go/src/runtime/sema.go:62 +0x25 sync.(*WaitGroup).Wait(0x0?) /usr/local/go/src/sync/waitgroup.go:116 +0x48 github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).execute(0xc0003e4008, 0x0) /var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:177 +0x65 github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).executeAction(...) /var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:197 github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).enableSynchronization(0xc0003e4008) /var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:163 +0x93 github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).TestScenario4(0xc0003e4008) /var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:420 +0x27 reflect.Value.call({0xc000131900?, 0xc0001b3740?, 0x1fed5b8?}, {0x191d63c, 0x4}, {0xc0004dff28, 0x1, 0x16f12e0?}) /usr/local/go/src/reflect/value.go:596 +0xca6 reflect.Value.Call({0xc000131900?, 0xc0001b3740?, 0x28307b8?}, {0xc0004dff28?, 0xf?, 0x0?}) /usr/local/go/src/reflect/value.go:380 +0xb9 github.com/stretchr/testify/suite.Run.func1(0xc0001f9ba0) /home/circleci/go/pkg/mod/github.com/stretchr/testify@v1.10.0/suite/suite.go:202 +0x4a5 testing.tRunner(0xc0001f9ba0, 0xc0001f4e10) /usr/local/go/src/testing/testing.go:1689 +0xfb created by testing.(*T).Run in goroutine 166 /usr/local/go/src/testing/testing.go:1742 +0x390 goroutine 229 [select, 9 minutes]: github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).loopAction(0xc00056ab40) /var/opt/circleci/data/workdir/op-conductor/conductor/service.go:577 +0x14f github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).enableSynchronization.func1() /var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:159 +0x33 github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).loop(0xc00056ab40) /var/opt/circleci/data/workdir/op-conductor/conductor/service.go:570 +0x99 created by github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).Start in goroutine 227 /var/opt/circleci/data/workdir/op-conductor/conductor/service.go:376 +0x1f6 FAIL github.com/ethereum-optimism/optimism/op-conductor/conductor 600.103s ```
Showing
Please register or sign in to comment