Commit 9548d53a authored by Matthew Slipper's avatar Matthew Slipper Committed by GitHub

op-conductor: Fix hang in testing (#13266)

I've found a [deadlock](https://app.circleci.com/pipelines/github/ethereum-optimism/optimism/73846/workflows/19369ca9-9eaa-4021-9eb8-589a06e7bd34/jobs/3018041) in the op-conductor tests. The inner conductor loop is stuck waiting on a queue signal that never arrives, which in turn causes the test to hang since the coordinating waitgroup is never decremented. I believe this happens because the conductor's `queueAction` method is non-blocking, so nothing ever triggers conductor's inner loop when conductor start up. I've updated the code to use a blocking channel write when conductor starts to avoid this issue.

The traces of the deadlock look like this, for reference:

```
goroutine 227 [semacquire, 9 minutes]:

sync.runtime_Semacquire(0xc0004df577?)

	/usr/local/go/src/runtime/sema.go:62 +0x25

sync.(*WaitGroup).Wait(0x0?)

	/usr/local/go/src/sync/waitgroup.go:116 +0x48

github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).execute(0xc0003e4008, 0x0)

	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:177 +0x65

github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).executeAction(...)

	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:197

github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).enableSynchronization(0xc0003e4008)

	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:163 +0x93

github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).TestScenario4(0xc0003e4008)

	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:420 +0x27

reflect.Value.call({0xc000131900?, 0xc0001b3740?, 0x1fed5b8?}, {0x191d63c, 0x4}, {0xc0004dff28, 0x1, 0x16f12e0?})

	/usr/local/go/src/reflect/value.go:596 +0xca6

reflect.Value.Call({0xc000131900?, 0xc0001b3740?, 0x28307b8?}, {0xc0004dff28?, 0xf?, 0x0?})

	/usr/local/go/src/reflect/value.go:380 +0xb9

github.com/stretchr/testify/suite.Run.func1(0xc0001f9ba0)

	/home/circleci/go/pkg/mod/github.com/stretchr/testify@v1.10.0/suite/suite.go:202 +0x4a5

testing.tRunner(0xc0001f9ba0, 0xc0001f4e10)

	/usr/local/go/src/testing/testing.go:1689 +0xfb

created by testing.(*T).Run in goroutine 166

	/usr/local/go/src/testing/testing.go:1742 +0x390

goroutine 229 [select, 9 minutes]:

github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).loopAction(0xc00056ab40)

	/var/opt/circleci/data/workdir/op-conductor/conductor/service.go:577 +0x14f

github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductorTestSuite).enableSynchronization.func1()

	/var/opt/circleci/data/workdir/op-conductor/conductor/service_test.go:159 +0x33

github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).loop(0xc00056ab40)

	/var/opt/circleci/data/workdir/op-conductor/conductor/service.go:570 +0x99

created by github.com/ethereum-optimism/optimism/op-conductor/conductor.(*OpConductor).Start in goroutine 227

	/var/opt/circleci/data/workdir/op-conductor/conductor/service.go:376 +0x1f6

FAIL	github.com/ethereum-optimism/optimism/op-conductor/conductor	600.103s
```
parent 1eb223d8
...@@ -381,7 +381,9 @@ func (oc *OpConductor) Start(ctx context.Context) error { ...@@ -381,7 +381,9 @@ func (oc *OpConductor) Start(ctx context.Context) error {
oc.log.Info("OpConductor started") oc.log.Info("OpConductor started")
// queue an action in case sequencer is not in the desired state. // queue an action in case sequencer is not in the desired state.
oc.prevState = NewState(oc.leader.Load(), oc.healthy.Load(), oc.seqActive.Load()) oc.prevState = NewState(oc.leader.Load(), oc.healthy.Load(), oc.seqActive.Load())
oc.queueAction() // Immediately queue an action. This is made blocking to ensure that start is not
// considered complete until the first action is executed.
oc.actionCh <- struct{}{}
return nil return nil
} }
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment