Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueueWorker ObjectDisposedException: The CancellationTokenSource has been disposed. #172

Open
nulltoken opened this issue Jan 21, 2025 · 21 comments · Fixed by #175
Open

Comments

@nulltoken
Copy link
Collaborator

Describe the bug

While onboarding a new project at $DAYJOB, the CI started to complain

2025-01-21T14:44:30.8930169Z    [Test Collection Cleanup Failure (FakeStandardAuthenticatedServerSetupCollection)]:
System.ObjectDisposedException : The CancellationTokenSource has been disposed.
2025-01-21T14:44:30.8930752Z Object name: 'System.Threading.CancellationTokenSource+Linked1CancellationTokenSource'.
2025-01-21T14:44:30.8931223Z   Stack Trace:
2025-01-21T14:44:30.8931762Z      at NCronJob.QueueWorker.StopAsync(CancellationToken cancellationToken)
2025-01-21T14:44:30.8932320Z    at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
2025-01-21T14:44:30.8932908Z    at Microsoft.Extensions.Hosting.Internal.Host.StopAsync(CancellationToken cancellationToken)
2025-01-21T14:44:30.8933376Z    at Microsoft.AspNetCore.Mvc.Testing.WebApplicationFactory`1.DisposeAsync()
2025-01-21T14:44:30.8933829Z    at Microsoft.AspNetCore.Mvc.Testing.WebApplicationFactory`1.Dispose(Boolean disposing)
2025-01-21T14:44:30.8934363Z    at Microsoft.AspNetCore.Mvc.Testing.WebApplicationFactory`1.Dispose()
2025-01-21T14:44:30.8934829Z    at Alba.AlbaHost.Dispose() in /_/src/Alba/AlbaHost.cs:line 125
2025-01-21T14:44:30.8939828Z    at Tests.Helpers.RawServer.Dispose() in /home/vsts/work/1/s/backend/test/Helpers/TestServerSetup.cs:line 237

My rough understanding of the call stack is that while xUnit disposes a Collection Fixture, the said collection fixture disposes a Alba hosted test intance of the web app under test, which eventually cascades to NCronJob.

@linkdotnet I have nothing much more than that for now. I'll try to dig a little deeper and report back.
Or otherwise will try to bisect this thing into a minimal repro case (but this may take some time)

Meanwhile, if you've got mind-debugging abilities, I'd be a taker 😉

@linkdotnet
Copy link
Member

linkdotnet commented Jan 22, 2025

Ohhh damn - I did have this some time ago.

The reason was something here (or better: this code is the fix - as some tasks where null):
var tasks = workerTasks.Values.WhereNotNull();

While cancelling (aka in dispose) a lot of things are happening async - so it is not as trivial to reproduce (especially not with an attached debugger).
Originally I saw that some tasks in our collection are null probably because they can't cancelled while they were still in the queue.

I think this is similar. There might be a still running "something" that waits for something (while in dispose) and tries to access a disposed resource after. Basically, we need a few more safeguards after we awaited some process. I try to have a look on Friday.

@nulltoken
Copy link
Collaborator Author

nulltoken commented Jan 22, 2025

Sadly I can't seem to repro it locally. Fails the build server around once every ten runs.

@linkdotnet
Copy link
Member

Well - your hardware is just too good :D

@kikisaeba
Copy link

kikisaeba commented Jan 22, 2025

Hello,

I'm evaluating NCronJob and suffer from this behavior as well. Same kind of issue. Transient and CI only

@linkdotnet
Copy link
Member

By any chance - is there some kind of setup you can share? For example:

  • NCronJob runs via an WebApplicationFactory
  • There is a job that runs way longer and doesn't get passed in the CancellationToken
  • ...

@nulltoken
Copy link
Collaborator Author

nulltoken commented Jan 22, 2025

There is a job that runs way longer and doesn't get passed in the CancellationToken

@linkdotnet Hmmm. Could you please elaborate on that scenario?

  • App is launched
  • Long cron job is started (but doesn't properly deal with the cancellation token)
  • App is disposed before the job is finished

Could that trigger a double dispose call at the queue worker level?

@linkdotnet
Copy link
Member

Yeah something like that - I tried to reproduce this with a small test - but no luck so far.

Could that trigger a double dispose call at the queue worker level?

Well, we do have disposables that call other disposables. There might be room for chaos.

@linkdotnet
Copy link
Member

Given the source of your code @nulltoken: QueueWorker:

public override async Task StopAsync(CancellationToken cancellationToken)
{
    if (shutdown != null)
    {
        await shutdown.CancelAsync();
    }

    ...

If shutdown is already disposed, we will run into that issue. The question is why is it already disposed?
The "easy" fix is:

public override async Task StopAsync(CancellationToken cancellationToken)
{
    if (shutdown is not null && !isDisposed)
    {
        await shutdown.CancelAsync();
    }
    ...

@linkdotnet
Copy link
Member

Maybe we could push a preview with the given fix above and you (@nulltoken) and @kikisaeba can check?

@linkdotnet
Copy link
Member

Also JobWorker seems to suffer from such an issue:

Image

@linkdotnet
Copy link
Member

This can happen if the Queue manager has been disposed and we waited too long and go into the finally block

@nulltoken
Copy link
Collaborator Author

Maybe we could push a preview with the given fix above and you (@nulltoken) and @kikisaeba can check?

I'd be all in for this

@nulltoken
Copy link
Collaborator Author

Also JobWorker seems to suffer from such an issue:

Oh! Interesting.

@nulltoken
Copy link
Collaborator Author

  • There is a job that runs way longer and doesn't get passed in the CancellationToken

On my side, I've double checked. All tokens are propagated to an eventual HttpClient SendAsync() method.

@linkdotnet
Copy link
Member

Hupps - that should have not been closed. In any case, there will be a preview version out in a second.
Maybe you can have a look at this and report back if it still occurs.

@nulltoken
Copy link
Collaborator Author

Maybe you can have a look at this and report back if it still occurs.

Will do

@linkdotnet
Copy link
Member

@nulltoken
Copy link
Collaborator Author

https://www.nuget.org/packages/NCronJob/4.3.0-preview

Woot! I've run four of five builds against a branch that had a higher ratio of failure (1 every 3 builds on average) and no issue.
That looks promising!

However, I suggest that we wait until tomorrow before releasing a new stable version

  • I'll run a more intensive test session
  • Maybe @kikisaeba will give it a test and report back by then

@nulltoken
Copy link
Collaborator Author

@linkdotnet Great job!

BTW, you forgot to attribute the fix to its rightful author in the changelog.

You deserve all the possible kudos <3

@linkdotnet
Copy link
Member

https://www.nuget.org/packages/NCronJob/4.3.0-preview

Woot! I've run four of five builds against a branch that had a higher ratio of failure (1 every 3 builds on average) and no issue. That looks promising!

However, I suggest that we wait until tomorrow before releasing a new stable version

  • I'll run a more intensive test session
  • Maybe @kikisaeba will give it a test and report back by then

Yeah - I also will wait a bit longer. And will also check tomorrow morning for some more spots that might be impacted.

BTW, you forgot to attribute the fix to its rightful author in the changelog.
You deserve all the possible kudos <3

Thanks <3 - will add this once we are certain it fixes the issue. In any case, a new release might be nice given all your hard work and features should be shipped to the public.

@nulltoken
Copy link
Collaborator Author

I'll ran a four additional builds, seems stable on my side.
Will provide more feedback tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants