Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: puma example: Fail to listen 127.0.0.1:61930 #441

Closed
hkz103 opened this issue Dec 18, 2023 · 2 comments · Fixed by #442
Closed

[Bug]: puma example: Fail to listen 127.0.0.1:61930 #441

hkz103 opened this issue Dec 18, 2023 · 2 comments · Fixed by #442

Comments

@hkz103
Copy link

hkz103 commented Dec 18, 2023

Issue Type

Usability

Modules Involved

Documentation/Tutorial/Example

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

spu 0.7.0, secretflow1.3.0

OS Platform and Distribution

linux ubuntu20.04

Python Version

3.8

Compiler Version

gcc 10.5

Current Behavior?

I want to run the puma example from /examples/python/ml/flax_llama7b,but when I run, it says fails to listen 127.0.0.1:61930. How can I deal with this problem?

Standalone code to reproduce the issue

INFO: Running command line: bazel-bin/examples/python/utils/nodectl --config /opt/spu/examples/python/ml/flax_llama7b/3pc.json up
[2023-12-18 08:55:20,700] [Process-2] Starting grpc server at 127.0.0.1:61921
[2023-12-18 08:55:20,701] [Process-3] Starting grpc server at 127.0.0.1:61922
[2023-12-18 08:55:20,701] [Process-1] Starting grpc server at 127.0.0.1:61920
[2023-12-18 08:55:20,704] [Process-5] Starting grpc server at 127.0.0.1:61924
[2023-12-18 08:55:20,706] [Process-4] Starting grpc server at 127.0.0.1:61923
[2023-12-18 08:55:26,606] [Process-2] Run : builtin_spu_init at node:1
[2023-12-18 08:55:26,606] [Process-1] Run : builtin_spu_init at node:0
[2023-12-18 08:55:26,606] [Process-3] Run : builtin_spu_init at node:2
I1218 08:55:26.614755 48778 external/com_github_brpc_brpc/src/brpc/server.cpp:1158] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=61930.
E1218 08:55:26.614857 48779 external/com_github_brpc_brpc/src/brpc/server.cpp:1054] Fail to listen 127.0.0.1:61930
I1218 08:55:26.615022 48778 external/com_github_brpc_brpc/src/brpc/server.cpp:1161] Check out http://8dbc7fd76721:61930 in web browser.
E1218 08:55:26.615261 48780 external/com_github_brpc_brpc/src/brpc/server.cpp:1054] Fail to listen 127.0.0.1:61930
[2023-12-18 08:55:26.628] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
[2023-12-18 08:55:26.628] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
[2023-12-18 08:55:26.629] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
[2023-12-18 08:55:26.629] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
[2023-12-18 08:55:26,629] [Process-3] Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/9de4a6aef89daf40df580afb13ff9f9d/execroot/spulib/bazel-out/k8-opt/bin/examples/python/utils/nodectl.runfiles/spulib/spu/utils/distributed.py", line 324, in Run
    ret_objs = fn(self, *args, **kwargs)
  File "/root/.cache/bazel/_bazel_root/9de4a6aef89daf40df580afb13ff9f9d/execroot/spulib/bazel-out/k8-opt/bin/examples/python/utils/nodectl.runfiles/spulib/spu/utils/distributed.py", line 543, in builtin_spu_init
    link = libspu.link.create_brpc(desc, my_rank)
RuntimeError: what:
        [external/yacl/yacl/link/transport/brpc_link.cc:100] brpc server failed start
stacktrace:
#0 yacl::link::FactoryBrpc::CreateContext()+0x7fed8b43cbfb
#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7fed8a0341ce
#2 pybind11::cpp_function::dispatcher()+0x7fed8a01234b
#3 PyCFunction_Call+0x5d5499

Relevant log output

No response

@tpppppub
Copy link
Collaborator

Please update the config in 3pc.json from

"spu_internal_addrs": [
                    "127.0.0.1:61930",
                    "127.0.0.1:61930",
                    "127.0.0.1:61930"
                ],

to

"spu_internal_addrs": [
                    "127.0.0.1:61930",
                    "127.0.0.1:61931",
                    "127.0.0.1:61932"
                ],

and try again.

@hkz103
Copy link
Author

hkz103 commented Dec 18, 2023 via email

tpppppub added a commit that referenced this issue Dec 18, 2023
# Pull Request

## What problem does this PR solve?

Issue Number: Fixed #441 

## Possible side effects?

- Performance:

- Backward compatibility:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants