Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential deadlock from GraphListenerAdapterToPipelineListener #700

Open
Dohbedoh opened this issue Sep 13, 2023 · 6 comments
Open

Potential deadlock from GraphListenerAdapterToPipelineListener #700

Dohbedoh opened this issue Sep 13, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@Dohbedoh
Copy link

Dohbedoh commented Sep 13, 2023

Jenkins and plugins versions report

Environment
* core: 2.387.3
* opentelemetry:2.11.0
* workflow-api:1208.v0cc7c6e0da_9e
* workflow-cps:3653.v07ea_433c90b_4
* workflow-job:1289.1291.vb_7c188e7e7df
* workflow-support:839.v35e2736cfd5c

Issue

Jenkins shows a deadlock between open-telemetry GraphListenerAdapterToPipelineListener and the Jenkins initialization thread:

==============
Deadlock Found
==============
"Jenkins initialization thread" id=50 (0x32) state=BLOCKED cpu=95%
    - waiting to lock <0x23e558fe> (a org.jenkinsci.plugins.workflow.cps.CpsFlowExecution)
      owned by "Running CpsFlowExecution[Owner[myfolder/myjob/17:myfolder/myjob #17]]" id=1246 (0x4de)
    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.isComplete(CpsFlowExecution.java:1277)
    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getCurrentExecutions(CpsFlowExecution.java:1019)
    at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener.onResumed(FlowExecutionList.java:267)
    at org.jenkinsci.plugins.workflow.flow.FlowExecutionListener.fireResumed(FlowExecutionListener.java:85)
    at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:589)
    at hudson.model.RunMap.retrieve(RunMap.java:233)
    at hudson.model.RunMap.retrieve(RunMap.java:61)
    at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:650)
    at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:632)
    at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:530)
    at hudson.model.RunMap.getById(RunMap.java:213)
    at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:965)
    at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:977)
    at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:76)
    at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:68)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
    at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:197)
    at jenkins.model.Jenkins.<init>(Jenkins.java:1040)
    at hudson.model.Hudson.<init>(Hudson.java:86)
    at hudson.model.Hudson.<init>(Hudson.java:82)
    at hudson.WebAppMain$3.run(WebAppMain.java:247)

"Running CpsFlowExecution[Owner[myfolder/myjob/17:myfolder/myjob #17]]" id=1246 (0x4de) state=BLOCKED cpu=57%
    - waiting to lock <0x76460394> (a hudson.model.RunMap)
      owned by "Jenkins initialization thread" id=50 (0x32)
    at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:519)
    at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:237)
    at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:234)
    at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:105)
    at hudson.model.Run.fromExternalizableId(Run.java:2498)
    at org.jenkinsci.plugins.workflow.support.steps.build.DownstreamFailureCause.getDownstreamBuild(DownstreamFailureCause.java:47)
    at org.jenkinsci.plugins.workflow.support.steps.build.DownstreamFailureCause.getShortDescription(DownstreamFailureCause.java:63)
    at io.jenkins.plugins.opentelemetry.job.MonitoringPipelineListener.lambda$endCurrentSpan$11(MonitoringPipelineListener.java:308)
    at io.jenkins.plugins.opentelemetry.job.MonitoringPipelineListener$$Lambda$1490/0x00000008425bd440.apply(Unknown Source)
    at [email protected]/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at [email protected]/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
    at [email protected]/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at [email protected]/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at [email protected]/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
    at [email protected]/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at [email protected]/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
    at io.jenkins.plugins.opentelemetry.job.MonitoringPipelineListener.endCurrentSpan(MonitoringPipelineListener.java:308)
    at io.jenkins.plugins.opentelemetry.job.MonitoringPipelineListener.onEndParallelStepBranch(MonitoringPipelineListener.java:291)
    at io.jenkins.plugins.opentelemetry.job.jenkins.GraphListenerAdapterToPipelineListener.fireOnAfterEndParallelStepBranch(GraphListenerAdapterToPipelineListener.java:129)
    at io.jenkins.plugins.opentelemetry.job.jenkins.GraphListenerAdapterToPipelineListener.onNewHead(GraphListenerAdapterToPipelineListener.java:64)
    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1588)
    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.notifyNewHead(CpsThreadGroup.java:492)
    at org.jenkinsci.plugins.workflow.cps.FlowHead.setNewHead(FlowHead.java:158)
    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onProgramEnd(CpsFlowExecution.java:1297)
    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.croak(CpsFlowExecution.java:882)
    at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.reportProblem(CpsVmExecutorService.java:57)
    at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:69)
    at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
    at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
    at jenkins.util.ErrorLoggingExecutorService$$Lambda$1086/0x0000000841d64840.run(Unknown Source)
    at [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at [email protected]/java.lang.Thread.run(Thread.java:829)

Could be due to the usage of a synchronous GraphListener:

What Operating System are you using (both controller, and any agents involved in the problem)?

  • JVM Implementation

    • Name: OpenJDK 64-Bit Server VM
    • Vendor: Red Hat, Inc.
    • Version: 11.0.19+7-LTS
  • Operating system

    • Name: Linux
    • Architecture: amd64
    • Version: 3.10.0-1160.92.1.el7.x86_64

Reproduction steps

No consistently reproducible. But basically:

  • launch many many resumable pipelines
  • restart Jenkins

Expected Results

No deadlocks

Actual Results

deadlock

Anything else?

No response

@Dohbedoh Dohbedoh added the bug Something isn't working label Sep 13, 2023
@jglick

This comment was marked as outdated.

@dwnusbaum
Copy link
Member

@jglick it's in the "Issue" <details> in the description.

@Dohbedoh
Copy link
Author

Sorry about that. I tried to follow the issue template of the project. I can make it more visible.

@cyrille-leclerc
Copy link
Contributor

@Dohbedoh @dwnusbaum do you have recommendations on how we can fix this? I looked at the logic of the code and it sounded reasonable to me.

@dwnusbaum
Copy link
Member

@cyrille-leclerc GraphListenerAdapterToPipelineListener should probably not extend GraphListener.Synchronous, just GraphListener, see https://github.com/jenkinsci/workflow-api-plugin/blob/ca5fddb3fcebcc6acf823c80600ec7ab9deb0d05/src/main/java/org/jenkinsci/plugins/workflow/flow/GraphListener.java#L48. Or perhaps you could modify it to only perform a small subset of the current logic synchronously if necessary and then offload all of the complex behavior that acquires locks to a separate thread.

@cyrille-leclerc
Copy link
Contributor

Thanks @dwnusbaum .
Some of it definitively needs to be synchronous so that we can associate a span with the FlowNode. I'll brainstorm to see if something can be made async and how we would orchestrate the sync and async parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants