bug: PyTorch MLflow example's container returns 404 exception for API calls #4039

majeranr · 2023-07-12T11:37:14Z

Describe the bug

I was following the official example but I encountered an issue with containerization while using default conda due to some dependencies error. Then I ran onto this issue and change the bentofile.yaml to:

service: "service:svc"
include:
  - "service.py"
python:
  packages:
    - torch
    - torchvision
    - mlflow
    - protobuf
    - bentoml

While bentoml serve service.py:svc works absolutely fine, same as the bentoml containerize, the container returns an error:

2023-07-12T11:15:06+0000 [INFO] [runner:mlflow_pytorch_mnist:1] _ (scheme=http,method=POST,path=http://127.0.0.1:8000/predict,type=application/octet-stream,length=9408) (status=404,type=text/plain; charset=utf-8,length=9) 2.084ms (trace=476453e61a33a7d0e6009adb8e691436,span=09664493bbd60a56,sampled=0,service.name=mlflow_pytorch_mnist)
2023-07-12T11:15:06+0000 [ERROR] [api_server:15] Exception on /predict [POST] (trace=476453e61a33a7d0e6009adb8e691436,span=5cd753eedaa0957d,sampled=0,service.name=mlflow_pytorch_mnist_demo)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
    output = await api.func(*args)
  File "/home/bentoml/bento/src/service.py", line 16, in predict
    return await mnist_runner.predict.async_run(input_arr)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
    return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 244, in async_run_method
    ) from None
bentoml.exceptions.RemoteException: An unexpected exception occurred in remote runner mlflow_pytorch_mnist: [404] Not Found
2023-07-12T11:15:06+0000 [INFO] [api_server:15] 172.17.0.1:35336 (scheme=http,method=POST,path=/predict,type=application/json,length=24007) (status=500,type=application/json,length=2) 66.715ms (trace=476453e61a33a7d0e6009adb8e691436,span=5cd753eedaa0957d,sampled=0,service.name=mlflow_pytorch_mnist_demo)

The only one thing I changed in mnist.py is an addition of MLflow tracking server uri & experiment with lines:

os.environ['MLFLOW_TRACKING_TOKEN']="<token>"
os.environ['MLFLOW_TRACKING_SERVER_CERT_PATH']="<path to cert>"

mlflow.set_tracking_uri("<mlflow instance's url>")
mlflow.set_experiment("<experiment's name>")

However changing the model in service.py from bentoml.mlflow to bentoml.pytorch (and adjusting model's name) also produces the same error.

I also tried changing service.py from:

import bentoml

mnist_runner = bentoml.mlflow.get("mlflow_pytorch_mnist:latest").to_runner()

svc = bentoml.Service("mlflow_pytorch_mnist_demo", runners=[mnist_runner])

input_spec = bentoml.io.NumpyNdarray(
    dtype="float32",
    shape=[-1, 1, 28, 28],
    enforce_dtype=True,
)


@svc.api(input=input_spec, output=bentoml.io.NumpyNdarray())
async def predict(input_arr):
    return await mnist_runner.predict.async_run(input_arr)

to:

import bentoml

mnist_runner = bentoml.mlflow.get("mlflow_pytorch_mnist:latest").to_runner()

svc = bentoml.Service("mlflow_pytorch_mnist_demo", runners=[mnist_runner])

input_spec = bentoml.io.NumpyNdarray(
    dtype="float32",
    shape=[-1, 1, 28, 28],
    enforce_dtype=True,
)


@svc.api(input=input_spec, output=bentoml.io.NumpyNdarray())
def predict(input_arr):
    return mnist_runner.predict.run(input_arr)

But it also produced the same error.

According to line:

2023-07-14T07:14:11+0000 [DEBUG] [api_server:10] Default runner method set to 'predict', it can be accessed both via 'runner.run' and 'runner.predict.async_run'.

I also tried changing service.py to:

import bentoml

mnist_runner = bentoml.mlflow.get("mlflow_pytorch_mnist:latest").to_runner()

svc = bentoml.Service("mlflow_pytorch_mnist_demo", runners=[mnist_runner])

input_spec = bentoml.io.NumpyNdarray(
    dtype="float32",
    shape=[-1, 1, 28, 28],
    enforce_dtype=True,
)


@svc.api(input=input_spec, output=bentoml.io.NumpyNdarray())
async def predict(input_arr):
    return await mnist_runner.run(input_arr)

But ended up with error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
    output = await api.func(*args)
  File "/home/bentoml/bento/src/service.py", line 20, in predict
    return await mnist_runner.run(input_arr)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner.py", line 52, in run
    return self.runner._runner_handle.run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 290, in run_method
    *args,
  File "/usr/local/lib/python3.7/site-packages/anyio/from_thread.py", line 45, in run
    raise RuntimeError("This function can only be run from an AnyIO worker thread")
RuntimeError: This function can only be run from an AnyIO worker thread
2023-07-14T07:14:59+0000 [INFO] [api_server:9] 172.17.0.1:33646 (scheme=http,method=POST,path=/predict,type=application/json,length=24007) (status=500,type=application/json,length=110) 4.461ms (trace=bf16c819f82aadfe0a0292c52d7064ac,span=701c33aad979d04c,sampled=0,service.name=mlflow_pytorch_mnist_demo)

anyio 3.7.1 & aiohttp 3.8.4 are installed.

I've been doing multiple tests with different versions of dependencies (including locking same versions as mentioned in conda.yaml), different model etc and result is still the same.
I even logged onto a bash session to the container to see if all required files are there, but those were present.
However running bentoml models list or bentoml list directly on the container does not return any results, is it an expected behaviour?

Example from BentoML Tutorial works fine.

Unfortunately after deeper research and support from another person I still have no idea what was not found

To reproduce

Steps to reproduce:

Follow the example until the step bentoml containerize
According to the issue: change the bentofile.yaml to:

service: "service:svc"
include:
  - "service.py"
python:
  packages:
    - torch
    - torchvision
    - mlflow
    - protobuf
    - bentoml

Continue with the example from GitHub
Serve the model with docker run -it --rm -p 3000:3000 mlflow_pytorch_mnist_demo:latest serve
Send the request with:

curl -X POST -H "Content-Type:application/json"   -d @test_input.json   http://localhost:3000/predict

Result:

2023-07-12T11:22:41+0000 [INFO] [runner:mlflow_pytorch_mnist:1] _ (scheme=http,method=POST,path=http://127.0.0.1:8000/predict,type=application/octet-stream,length=9408) (status=404,type=text/plain; charset=utf-8,length=9) 1.489ms (trace=d6013656ce992cceb04176ef0dcc29b9,span=a86dcb8b3fd26f1b,sampled=0,service.name=mlflow_pytorch_mnist)
2023-07-12T11:22:41+0000 [ERROR] [api_server:14] Exception on /predict [POST] (trace=d6013656ce992cceb04176ef0dcc29b9,span=c773d1402b639afe,sampled=0,service.name=mlflow_pytorch_mnist_demo)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
    output = await api.func(*args)
  File "/home/bentoml/bento/src/service.py", line 20, in predict
    return await mnist_runner.predict.async_run(input_arr)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
    return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 244, in async_run_method
    ) from None
bentoml.exceptions.RemoteException: An unexpected exception occurred in remote runner mlflow_pytorch_mnist: [404] Not Found
2023-07-12T11:22:41+0000 [INFO] [api_server:14] 172.17.0.1:35380 (scheme=http,method=POST,path=/predict,type=application/json,length=24007) (status=500,type=application/json,length=2) 59.387ms (trace=d6013656ce992cceb04176ef0dcc29b9,span=c773d1402b639afe,sampled=0,service.name=mlflow_pytorch_mnist_demo)

Expected behavior

Model should behave same as with bentoml serve, so return the 200 and prediction results:

2023-07-12T13:27:16+0200 [INFO] [runner:mlflow_pytorch_mnist:1] _ (scheme=http,method=POST,path=/predict,type=application/octet-stream,length=9408) (status=200,type=application/vnd.bentoml.NdarrayContainer,length=120) 5.956ms (trace=dd39b7dc12026999433920a234459293,span=e25a26f49ceca2f5,sampled=0,service.name=mlflow_pytorch_mnist)
2023-07-12T13:27:16+0200 [INFO] [api_server:14] 127.0.0.1:36430 (scheme=http,method=POST,path=/predict,type=application/json,length=24007) (status=200,type=application/json,length=634) 56.324ms (trace=dd39b7dc12026999433920a234459293,span=5a9bab64b0880d24,sampled=0,service.name=mlflow_pytorch_mnist_demo)

&&

[[-3.6152830123901367, -5.332465171813965, -3.0992157459259033, -0.8537688255310059, -3.2960684299468994, -3.9919497966766357, -7.9404096603393555, -0.033282458782196045, -3.728358268737793, -0.5474755167961121], [-0.08536237478256226, -2.9697699546813965, -0.0837031900882721, -1.4068245887756348, -4.206423282623291, -1.7627538442611694, -0.270295113325119, -7.176737308502197, -0.5251529216766357, -4.596581935882568], [-2.902038097381592, -0.05778511241078377, -3.3463895320892334, -1.1108677387237549, -0.0533248595893383, -0.21076475083827972, -1.4418506622314453, -3.4429407119750977, -0.9558041095733643, -0.8879327178001404]]

Environment

bentoml: 1.0.23, tried also with 1.0.1 & 1.0.8
python: 3.7.16, tried also with 3.8
platform: Linux
mlflow: 1.30.1, tried also with 2.4, 2.4.2
torch: 1.8.1, tried also with 2.0.1
torchvision: 0.9.1, tried also with 0.15.2

The text was updated successfully, but these errors were encountered:

majeranr added the bug Something isn't working label Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: PyTorch MLflow example's container returns 404 exception for API calls #4039

bug: PyTorch MLflow example's container returns 404 exception for API calls #4039

majeranr commented Jul 12, 2023 •

edited

Loading

bug: PyTorch MLflow example's container returns 404 exception for API calls #4039

bug: PyTorch MLflow example's container returns 404 exception for API calls #4039

Comments

majeranr commented Jul 12, 2023 • edited Loading

Describe the bug

To reproduce

Expected behavior

Environment

majeranr commented Jul 12, 2023 •

edited

Loading