Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

[BUG] Mars supervisor process and the worker process lose each other after the process has been started locally for a while #22

Open
ChengjieLi28 opened this issue Sep 8, 2022 · 0 comments

Comments

@ChengjieLi28
Copy link

ChengjieLi28 commented Sep 8, 2022

Describe the bug
When start supervisor and worker processes locally:

python -m mars.supervisor -H 127.0.0.1 -p 8092 -w 8091 --log-conf xxx
python -m mars.worker -H 127.0.0.1 -p 8093 -s 127.0.0.1:8092 --log-conf xxx

After a while (a whole night), web ui cannot find supervisor and worker:
image
image
And console cannot see any error.
When this happens and try to submit task, the task cannot be execute since supervisor cannot find worker.

To Reproduce
To help us reproducing this bug, please provide information below:

  1. Your Python version: 3.9.12
  2. The version of Mars you use: newest
  3. Versions of crucial packages, such as numpy, scipy and pandas: follow mars
  4. Full stack of the error.
    Cannot see any error.
  5. Minimized code to reproduce the error.
    See description.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant