You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the examples, everything shows using a global environment. Your underlying pixi environment manager (aka magic) can handle multiple project directories. It would be good to show examples with custom pixi / magic environments, so that running services can be overridden and not all try to run on the same port 8000.
Relevant log output
(max-embeddings) rfellows@tag-965:~/Documents/Modular/max-recipes/max-serve-openai-embeddings$ magic run app
21:23:29 system | llm.1 started (pid=2584200)
21:23:29 system | main.1 started (pid=2584202)
21:23:29 llm.1 | Global environments as specified in'/home/rfellows/.modular/manifests/pixi-global.toml'
21:23:29 llm.1 | └── max-pipelines: 25.2.0.dev2025022005 (already installed)
21:23:29 llm.1 | └─ exposes: max-serve, max-pipelines
21:23:30 main.1 | 2025-02-20 21:23:30,161 - __main__ - INFO - Waiting for server at http://0.0.0.0:8001/v1 to start (attempt 1/20)...
21:23:36 llm.1 | ✔ Environment max-pipelines was already up-to-date.
21:23:36 llm.1 | cat: .env: No such file or directory
21:23:38 llm.1 | /home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/transformers/utils/hub.py:106: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
21:23:38 llm.1 | warnings.warn(
21:23:39 llm.1 | Traceback (most recent call last):
21:23:39 llm.1 | File "/home/rfellows/.modular/envs/max-pipelines/bin/max-pipelines", line 6, in<module>
21:23:39 llm.1 | from max.entrypoints.pipelines import main
21:23:39 llm.1 | File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/max/entrypoints/__init__.py", line 17, in<module>
21:23:39 llm.1 | from .llm import LLM
21:23:39 llm.1 | File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/max/entrypoints/llm.py", line 34, in<module>
21:23:39 llm.1 | from max.serve.pipelines.model_worker import start_model_worker
21:23:39 llm.1 | File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/max/serve/pipelines/model_worker.py", line 9, in<module>
21:23:39 llm.1 |configure_metrics(Settings())
21:23:39 llm.1 | ^^^^^^^^^^
21:23:39 llm.1 | File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/pydantic_settings/main.py", line 171, in __init__
21:23:39 llm.1 |super().__init__(
21:23:39 llm.1 | File "/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/pydantic/main.py", line 214, in __init__
21:23:39 llm.1 | validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
21:23:39 llm.1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21:23:39 llm.1 | pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
21:23:39 llm.1 | port
21:23:39 llm.1 | Value error, port 8000 is already in use [type=value_error, input_value=8000, input_type=int]
21:23:39 llm.1 | For further information visit https://errors.pydantic.dev/2.10/v/value_error
21:23:40 llm.1 | Attempt 1 failed, retrying...
^C21:23:41 system | SIGINT received
21:23:41 system | sending SIGTERM to llm.1 (pid 2584200)
21:23:41 system | sending SIGTERM to main.1 (pid 2584202)
21:23:41 system | llm.1 stopped (rc=-15)
21:23:41 system | main.1 stopped (rc=-15)
Environment
Notice that I specifically stated to run on port 8001 in the "main.py" file, however; the underlying code still tries to run on port 8000.
Code of Conduct
I agree to follow this project's Code of Conduct
Update:
Note: The problem seems to arise in part because of the process trying to create yet another magic environment. Given that I had a project, and was already running the magic shell, I didn't need further invocation, which only created problems.
Perhaps update the example and code to enable running just the 'max-pipelines' server with specific environment variables.
I was able to get this working, by pulling apart the overly ambitious "Procfile". See following example of how I got this to work properly:
(max-embeddings) rfellows@tag-965:/Documents/Modular/max-recipes/max-serve-openai-embeddings$ env | grep TOKEN
HUGGING_FACE_HUB_TOKEN=hf_UEAT****************rEKMenJZH
(max-embeddings) rfellows@tag-965:/Documents/Modular/max-recipes/max-serve-openai-embeddings$ export MAX_SERVE_PORT=8001 ; export MAX_SERVE_HOST=127.0.0.1
(max-embeddings) rfellows@tag-965:/Documents/Modular/max-recipes/max-serve-openai-embeddings$ cat Procfile
llm: for i in $(seq 1 3); do MAX_SERVE_PORT=8001 MAX_SERVE_HOST=127.0.0.1 HUGGING_FACE_HUB_TOKEN=$(cat .env | grep HUGGING_FACE_HUB_TOKEN | cut -d '=' -f2) && max-pipelines serve --huggingface-repo-id sentence-transformers/all-mpnet-base-v2 && break || (echo "Attempt $i failed, retrying..." && sleep 5); done
main: magic run python main.py && kill -2 $(pgrep -f "max-pipelines serve")
(max-embeddings) rfellows@tag-965:/Documents/Modular/max-recipes/max-serve-openai-embeddings$ max-pipelines serve --huggingface-repo-id sentence-transformers/all-mpnet-base-v2
/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/transformers/utils/hub.py:106: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
21:51:19.454 INFO: 2597952 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
21:51:19.455 WARNING: 2597952 MainThread: opentelemetry.metrics._internal: Overriding of current MeterProvider is not allowed
21:51:20.146 WARNING: 2597952 MainThread: max.pipelines: --huggingface-repo-id is deprecated, use --model-path instead. This setting will stop working in a future release.
21:51:20.434 INFO: 2597952 MainThread: max.entrypoints.cli.serve: Starting server using sentence-transformers/all-mpnet-base-v2
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 6.77MB/s]
21:51:22.209 WARNING: 2597952 MainThread: max.pipelines: torch_dtype not available, cant infer encoding from config.json
21:51:22.924 INFO: 2597952 MainThread: max.pipelines:
Estimated memory consumption:
Weights: 418 MiB
KVCache allocation: 0 MiB
Total estimated: 418 MiB used / 5575 MiB free
Auto-inferred max sequence length: 514
Auto-inferred max batch size: 1
Thanks! please make sure to run magic run clean when you're done which cleans up the services running on the previously occupied ports. I'll be changing to ports to most unused in a PR.
Recipe Name
max-serve-openai-embeddings
Operating System
Linux
What happened?
In the examples, everything shows using a global environment. Your underlying pixi environment manager (aka magic) can handle multiple project directories. It would be good to show examples with custom pixi / magic environments, so that running services can be overridden and not all try to run on the same port 8000.
Relevant log output
Environment
Code of Conduct
Update:
Note: The problem seems to arise in part because of the process trying to create yet another magic environment. Given that I had a project, and was already running the magic shell, I didn't need further invocation, which only created problems.
Perhaps update the example and code to enable running just the 'max-pipelines' server with specific environment variables.
I was able to get this working, by pulling apart the overly ambitious "Procfile". See following example of how I got this to work properly:
(max-embeddings) rfellows@tag-965:
/Documents/Modular/max-recipes/max-serve-openai-embeddings$ env | grep TOKEN/Documents/Modular/max-recipes/max-serve-openai-embeddings$ export MAX_SERVE_PORT=8001 ; export MAX_SERVE_HOST=127.0.0.1HUGGING_FACE_HUB_TOKEN=hf_UEAT****************rEKMenJZH
(max-embeddings) rfellows@tag-965:
(max-embeddings) rfellows@tag-965:
/Documents/Modular/max-recipes/max-serve-openai-embeddings$ cat Procfile$(seq 1 3); do MAX_SERVE_PORT=8001 MAX_SERVE_HOST=127.0.0.1 HUGGING_FACE_HUB_TOKEN=$ (cat .env | grep HUGGING_FACE_HUB_TOKEN | cut -d '=' -f2) && max-pipelines serve --huggingface-repo-id sentence-transformers/all-mpnet-base-v2 && break || (echo "Attempt $i failed, retrying..." && sleep 5); done/Documents/Modular/max-recipes/max-serve-openai-embeddings$ max-pipelines serve --huggingface-repo-id sentence-transformers/all-mpnet-base-v2llm: for i in
main: magic run python main.py && kill -2 $(pgrep -f "max-pipelines serve")
(max-embeddings) rfellows@tag-965:
/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/transformers/utils/hub.py:106: FutureWarning: Using
TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead.warnings.warn(
21:51:19.454 INFO: 2597952 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
21:51:19.455 WARNING: 2597952 MainThread: opentelemetry.metrics._internal: Overriding of current MeterProvider is not allowed
21:51:20.146 WARNING: 2597952 MainThread: max.pipelines: --huggingface-repo-id is deprecated, use
--model-path
instead. This setting will stop working in a future release.21:51:20.434 INFO: 2597952 MainThread: max.entrypoints.cli.serve: Starting server using sentence-transformers/all-mpnet-base-v2
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 6.77MB/s]
21:51:22.209 WARNING: 2597952 MainThread: max.pipelines: torch_dtype not available, cant infer encoding from config.json
21:51:22.924 INFO: 2597952 MainThread: max.pipelines:
21:51:22.924 INFO: 2597952 MainThread: max.pipelines:
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 363/363 [00:00<00:00, 4.06MB/s]
vocab.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 5.02MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 7.36MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 239/239 [00:00<00:00, 4.14MB/s]
21:51:24.157 INFO: 2597952 MainThread: max.serve: Server configured with no cache and batch size 1
21:51:24.157 INFO: 2597952 MainThread: max.serve: Settings: api_types=[<APIType.OPENAI: 'openai'>] host='127.0.0.1' port=8001 logs_console_level='INFO' logs_otlp_level=None logs_file_level=None logs_file_path=None disable_telemetry=False use_heartbeat=False mw_timeout_s=1200.0 mw_health_fail_s=60.0 telemetry_worker_spawn_timeout=60.0 runner_type=<RunnerType.PYTORCH: 'pytorch'>
21:51:24.168 INFO: 2597952 MainThread: max.serve: Launching server on http://127.0.0.1:8001
/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/transformers/utils/hub.py:106: FutureWarning: Using
TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead.warnings.warn(
21:51:26.973 INFO: 2597974 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
21:51:26.973 WARNING: 2597974 MainThread: opentelemetry.metrics._internal: Overriding of current MeterProvider is not allowed
/home/rfellows/.modular/envs/max-pipelines/lib/python3.12/site-packages/transformers/utils/hub.py:106: FutureWarning: Using
TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead.warnings.warn(
21:51:30.471 INFO: 2597987 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
21:51:30.471 WARNING: 2597987 MainThread: opentelemetry.metrics._internal: Overriding of current MeterProvider is not allowed
21:51:33.564 INFO: 2597987 MainThread: max.pipelines: Starting download of model: sentence-transformers/all-mpnet-base-v2
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 438M/438M [00:04<00:00, 87.7MB/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.36s/it]
21:51:38.925 INFO: 2597987 MainThread: max.pipelines: Finished download of model: sentence-transformers/all-mpnet-base-v2 in 5.360789 seconds.
21:51:38.925 INFO: 2597987 MainThread: max.pipelines: Building and compiling model...
21:51:54.057 INFO: 2597987 MainThread: max.pipelines: Building and compiling model took 15.131542 seconds
21:51:54.072 INFO: 2597952 MainThread: max.serve: Server ready on http://127.0.0.1:8001 (Press CTRL+C to quit)
The text was updated successfully, but these errors were encountered: