[Bug]: Errors when another modular service is running on port 8000 #6

russfellows · 2025-02-20T21:30:30Z

Recipe Name

max-serve-openai-embeddings

Operating System

Linux

What happened?

In the examples, everything shows using a global environment. Your underlying pixi environment manager (aka magic) can handle multiple project directories. It would be good to show examples with custom pixi / magic environments, so that running services can be overridden and not all try to run on the same port 8000.

Relevant log output

(max-embeddings) rfellows@tag-965:~/Documents/Modular/max-recipes/max-serve-openai-embeddings$ magic run app
21:23:29 system | llm.1 started (pid=2584200)
21:23:29 system | main.1 started (pid=2584202)
21:23:29 llm.1  | Global environments as specified in '/home/rfellows/.modular/manifests/pixi-global.toml'
21:23:29 llm.1  | └── max-pipelines: 25.2.0.dev2025022005 (already installed)
21:23:29 llm.1  |     └─ exposes: max-serve, max-pipelines
21:23:30 main.1 | 2025-02-20 21:23:30,161 - __main__ - INFO - Waiting for server at http://0.0.0.0:8001/v1 to start (attempt 1/20)...
21:23:36 llm.1  | ✔ Environment max-pipelines was already up-to-date.
21:23:36 llm.1  | cat: .env: No such file or directory
21:23:38 llm.1  | /home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/transformers/utils/hub.py:106: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
21:23:38 llm.1  |   warnings.warn(
21:23:39 llm.1  | Traceback (most recent call last):
21:23:39 llm.1  |   File "/home/rfellows/.modular/envs/max-pipelines/bin/max-pipelines", line 6, in <module>
21:23:39 llm.1  |     from max.entrypoints.pipelines import main
21:23:39 llm.1  |   File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/max/entrypoints/__init__.py", line 17, in <module>
21:23:39 llm.1  |     from .llm import LLM
21:23:39 llm.1  |   File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/max/entrypoints/llm.py", line 34, in <module>
21:23:39 llm.1  |     from max.serve.pipelines.model_worker import start_model_worker
21:23:39 llm.1  |   File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/max/serve/pipelines/model_worker.py", line 9, in <module>
21:23:39 llm.1  |     configure_metrics(Settings())
21:23:39 llm.1  |                       ^^^^^^^^^^
21:23:39 llm.1  |   File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/pydantic_settings/main.py", line 171, in __init__
21:23:39 llm.1  |     super().__init__(
21:23:39 llm.1  |   File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/pydantic/main.py", line 214, in __init__
21:23:39 llm.1  |     validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
21:23:39 llm.1  |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21:23:39 llm.1  | pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
21:23:39 llm.1  | port
21:23:39 llm.1  |   Value error, port 8000 is already in use [type=value_error, input_value=8000, input_type=int]
21:23:39 llm.1  |     For further information visit https://errors.pydantic.dev/2.10/v/value_error
21:23:40 llm.1  | Attempt 1 failed, retrying...
^C21:23:41 system | SIGINT received
21:23:41 system | sending SIGTERM to llm.1 (pid 2584200)
21:23:41 system | sending SIGTERM to main.1 (pid 2584202)
21:23:41 system | llm.1 stopped (rc=-15)
21:23:41 system | main.1 stopped (rc=-15)

Environment

Notice that I specifically stated to run on port 8001 in the "main.py" file, however; the underlying code still tries to run on port 8000.

Code of Conduct

I agree to follow this project's Code of Conduct

Update:

Note: The problem seems to arise in part because of the process trying to create yet another magic environment. Given that I had a project, and was already running the magic shell, I didn't need further invocation, which only created problems.

Perhaps update the example and code to enable running just the 'max-pipelines' server with specific environment variables.

I was able to get this working, by pulling apart the overly ambitious "Procfile". See following example of how I got this to work properly:

(max-embeddings) rfellows@tag-965:/Documents/Modular/max-recipes/max-serve-openai-embeddings$ env | grep TOKEN
HUGGING_FACE_HUB_TOKEN=hf_UEAT****************rEKMenJZH
(max-embeddings) rfellows@tag-965:/Documents/Modular/max-recipes/max-serve-openai-embeddings$ export MAX_SERVE_PORT=8001 ; export MAX_SERVE_HOST=127.0.0.1
(max-embeddings) rfellows@tag-965:/Documents/Modular/max-recipes/max-serve-openai-embeddings$ cat Procfile
llm: for i in $(seq 1 3); do MAX_SERVE_PORT=8001 MAX_SERVE_HOST=127.0.0.1 HUGGING_FACE_HUB_TOKEN=$(cat .env | grep HUGGING_FACE_HUB_TOKEN | cut -d '=' -f2) && max-pipelines serve --huggingface-repo-id sentence-transformers/all-mpnet-base-v2 && break || (echo "Attempt $i failed, retrying..." && sleep 5); done
main: magic run python main.py && kill -2 $(pgrep -f "max-pipelines serve")
(max-embeddings) rfellows@tag-965:/Documents/Modular/max-recipes/max-serve-openai-embeddings$ max-pipelines serve --huggingface-repo-id sentence-transformers/all-mpnet-base-v2
/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/transformers/utils/hub.py:106: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
21:51:19.454 INFO: 2597952 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
21:51:19.455 WARNING: 2597952 MainThread: opentelemetry.metrics._internal: Overriding of current MeterProvider is not allowed
21:51:20.146 WARNING: 2597952 MainThread: max.pipelines: --huggingface-repo-id is deprecated, use --model-path instead. This setting will stop working in a future release.
21:51:20.434 INFO: 2597952 MainThread: max.entrypoints.cli.serve: Starting server using sentence-transformers/all-mpnet-base-v2
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 6.77MB/s]
21:51:22.209 WARNING: 2597952 MainThread: max.pipelines: torch_dtype not available, cant infer encoding from config.json
21:51:22.924 INFO: 2597952 MainThread: max.pipelines:

Estimated memory consumption:
    Weights:                418 MiB
    KVCache allocation:     0 MiB
    Total estimated:        418 MiB used / 5575 MiB free
Auto-inferred max sequence length: 514
Auto-inferred max batch size: 1

21:51:22.924 INFO: 2597952 MainThread: max.pipelines:

    Loading TextTokenizer and EmbeddingsPipeline(MPNetPipelineModel) factory for:
        engine:                 PipelineEngine.MAX
        architecture:           MPNetForMaskedLM
        devices:                gpu[0]
        model_path:             sentence-transformers/all-mpnet-base-v2
        quantization_encoding:  float32
        cache_strategy:         model_default
        weight_path:            [
                                   model.safetensors
                                ]

tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 363/363 [00:00<00:00, 4.06MB/s]
vocab.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 5.02MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 7.36MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 239/239 [00:00<00:00, 4.14MB/s]
21:51:24.157 INFO: 2597952 MainThread: max.serve: Server configured with no cache and batch size 1
21:51:24.157 INFO: 2597952 MainThread: max.serve: Settings: api_types=[<APIType.OPENAI: 'openai'>] host='127.0.0.1' port=8001 logs_console_level='INFO' logs_otlp_level=None logs_file_level=None logs_file_path=None disable_telemetry=False use_heartbeat=False mw_timeout_s=1200.0 mw_health_fail_s=60.0 telemetry_worker_spawn_timeout=60.0 runner_type=<RunnerType.PYTORCH: 'pytorch'>
21:51:24.168 INFO: 2597952 MainThread: max.serve: Launching server on http://127.0.0.1:8001
/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/transformers/utils/hub.py:106: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
21:51:26.973 INFO: 2597974 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
21:51:26.973 WARNING: 2597974 MainThread: opentelemetry.metrics._internal: Overriding of current MeterProvider is not allowed
/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/transformers/utils/hub.py:106: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
21:51:30.471 INFO: 2597987 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
21:51:30.471 WARNING: 2597987 MainThread: opentelemetry.metrics._internal: Overriding of current MeterProvider is not allowed
21:51:33.564 INFO: 2597987 MainThread: max.pipelines: Starting download of model: sentence-transformers/all-mpnet-base-v2
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 438M/438M [00:04<00:00, 87.7MB/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.36s/it]
21:51:38.925 INFO: 2597987 MainThread: max.pipelines: Finished download of model: sentence-transformers/all-mpnet-base-v2 in 5.360789 seconds.
21:51:38.925 INFO: 2597987 MainThread: max.pipelines: Building and compiling model...
21:51:54.057 INFO: 2597987 MainThread: max.pipelines: Building and compiling model took 15.131542 seconds
21:51:54.072 INFO: 2597952 MainThread: max.serve: Server ready on http://127.0.0.1:8001 (Press CTRL+C to quit)

The text was updated successfully, but these errors were encountered:

ehsanmok · 2025-02-21T03:39:55Z

Thanks! please make sure to run magic run clean when you're done which cleans up the services running on the previously occupied ports. I'll be changing to ports to most unused in a PR.

russfellows added the bug Something isn't working label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Errors when another modular service is running on port 8000 #6

[Bug]: Errors when another modular service is running on port 8000 #6

russfellows commented Feb 20, 2025 •

edited

Loading

ehsanmok commented Feb 21, 2025

[Bug]: Errors when another modular service is running on port 8000 #6

[Bug]: Errors when another modular service is running on port 8000 #6

Comments

russfellows commented Feb 20, 2025 • edited Loading

Recipe Name

Operating System

What happened?

Relevant log output

Environment

Code of Conduct

ehsanmok commented Feb 21, 2025

russfellows commented Feb 20, 2025 •

edited

Loading