Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: PyTorch MLflow example's container returns 404 exception for API calls #4039

Open
majeranr opened this issue Jul 12, 2023 · 0 comments
Open
Labels
bug Something isn't working

Comments

@majeranr
Copy link

majeranr commented Jul 12, 2023

Describe the bug

I was following the official example but I encountered an issue with containerization while using default conda due to some dependencies error. Then I ran onto this issue and change the bentofile.yaml to:

service: "service:svc"
include:
  - "service.py"
python:
  packages:
    - torch
    - torchvision
    - mlflow
    - protobuf
    - bentoml

While bentoml serve service.py:svc works absolutely fine, same as the bentoml containerize, the container returns an error:

2023-07-12T11:15:06+0000 [INFO] [runner:mlflow_pytorch_mnist:1] _ (scheme=http,method=POST,path=http://127.0.0.1:8000/predict,type=application/octet-stream,length=9408) (status=404,type=text/plain; charset=utf-8,length=9) 2.084ms (trace=476453e61a33a7d0e6009adb8e691436,span=09664493bbd60a56,sampled=0,service.name=mlflow_pytorch_mnist)
2023-07-12T11:15:06+0000 [ERROR] [api_server:15] Exception on /predict [POST] (trace=476453e61a33a7d0e6009adb8e691436,span=5cd753eedaa0957d,sampled=0,service.name=mlflow_pytorch_mnist_demo)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
    output = await api.func(*args)
  File "/home/bentoml/bento/src/service.py", line 16, in predict
    return await mnist_runner.predict.async_run(input_arr)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
    return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 244, in async_run_method
    ) from None
bentoml.exceptions.RemoteException: An unexpected exception occurred in remote runner mlflow_pytorch_mnist: [404] Not Found
2023-07-12T11:15:06+0000 [INFO] [api_server:15] 172.17.0.1:35336 (scheme=http,method=POST,path=/predict,type=application/json,length=24007) (status=500,type=application/json,length=2) 66.715ms (trace=476453e61a33a7d0e6009adb8e691436,span=5cd753eedaa0957d,sampled=0,service.name=mlflow_pytorch_mnist_demo)

The only one thing I changed in mnist.py is an addition of MLflow tracking server uri & experiment with lines:

os.environ['MLFLOW_TRACKING_TOKEN']="<token>"
os.environ['MLFLOW_TRACKING_SERVER_CERT_PATH']="<path to cert>"

mlflow.set_tracking_uri("<mlflow instance's url>")
mlflow.set_experiment("<experiment's name>")

However changing the model in service.py from bentoml.mlflow to bentoml.pytorch (and adjusting model's name) also produces the same error.

I also tried changing service.py from:

import bentoml

mnist_runner = bentoml.mlflow.get("mlflow_pytorch_mnist:latest").to_runner()

svc = bentoml.Service("mlflow_pytorch_mnist_demo", runners=[mnist_runner])

input_spec = bentoml.io.NumpyNdarray(
    dtype="float32",
    shape=[-1, 1, 28, 28],
    enforce_dtype=True,
)


@svc.api(input=input_spec, output=bentoml.io.NumpyNdarray())
async def predict(input_arr):
    return await mnist_runner.predict.async_run(input_arr)

to:

import bentoml

mnist_runner = bentoml.mlflow.get("mlflow_pytorch_mnist:latest").to_runner()

svc = bentoml.Service("mlflow_pytorch_mnist_demo", runners=[mnist_runner])

input_spec = bentoml.io.NumpyNdarray(
    dtype="float32",
    shape=[-1, 1, 28, 28],
    enforce_dtype=True,
)


@svc.api(input=input_spec, output=bentoml.io.NumpyNdarray())
def predict(input_arr):
    return mnist_runner.predict.run(input_arr)

But it also produced the same error.

According to line:

2023-07-14T07:14:11+0000 [DEBUG] [api_server:10] Default runner method set to 'predict', it can be accessed both via 'runner.run' and 'runner.predict.async_run'.

I also tried changing service.py to:

import bentoml

mnist_runner = bentoml.mlflow.get("mlflow_pytorch_mnist:latest").to_runner()

svc = bentoml.Service("mlflow_pytorch_mnist_demo", runners=[mnist_runner])

input_spec = bentoml.io.NumpyNdarray(
    dtype="float32",
    shape=[-1, 1, 28, 28],
    enforce_dtype=True,
)


@svc.api(input=input_spec, output=bentoml.io.NumpyNdarray())
async def predict(input_arr):
    return await mnist_runner.run(input_arr)

But ended up with error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
    output = await api.func(*args)
  File "/home/bentoml/bento/src/service.py", line 20, in predict
    return await mnist_runner.run(input_arr)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner.py", line 52, in run
    return self.runner._runner_handle.run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 290, in run_method
    *args,
  File "/usr/local/lib/python3.7/site-packages/anyio/from_thread.py", line 45, in run
    raise RuntimeError("This function can only be run from an AnyIO worker thread")
RuntimeError: This function can only be run from an AnyIO worker thread
2023-07-14T07:14:59+0000 [INFO] [api_server:9] 172.17.0.1:33646 (scheme=http,method=POST,path=/predict,type=application/json,length=24007) (status=500,type=application/json,length=110) 4.461ms (trace=bf16c819f82aadfe0a0292c52d7064ac,span=701c33aad979d04c,sampled=0,service.name=mlflow_pytorch_mnist_demo)

anyio 3.7.1 & aiohttp 3.8.4 are installed.

I've been doing multiple tests with different versions of dependencies (including locking same versions as mentioned in conda.yaml), different model etc and result is still the same.
I even logged onto a bash session to the container to see if all required files are there, but those were present.
However running bentoml models list or bentoml list directly on the container does not return any results, is it an expected behaviour?

Example from BentoML Tutorial works fine.

Unfortunately after deeper research and support from another person I still have no idea what was not found

To reproduce

Steps to reproduce:

  1. Follow the example until the step bentoml containerize
  2. According to the issue: change the bentofile.yaml to:
service: "service:svc"
include:
  - "service.py"
python:
  packages:
    - torch
    - torchvision
    - mlflow
    - protobuf
    - bentoml
  1. Continue with the example from GitHub
  2. Serve the model with docker run -it --rm -p 3000:3000 mlflow_pytorch_mnist_demo:latest serve
  3. Send the request with:
curl -X POST -H "Content-Type:application/json"   -d @test_input.json   http://localhost:3000/predict
  1. Result:
2023-07-12T11:22:41+0000 [INFO] [runner:mlflow_pytorch_mnist:1] _ (scheme=http,method=POST,path=http://127.0.0.1:8000/predict,type=application/octet-stream,length=9408) (status=404,type=text/plain; charset=utf-8,length=9) 1.489ms (trace=d6013656ce992cceb04176ef0dcc29b9,span=a86dcb8b3fd26f1b,sampled=0,service.name=mlflow_pytorch_mnist)
2023-07-12T11:22:41+0000 [ERROR] [api_server:14] Exception on /predict [POST] (trace=d6013656ce992cceb04176ef0dcc29b9,span=c773d1402b639afe,sampled=0,service.name=mlflow_pytorch_mnist_demo)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
    output = await api.func(*args)
  File "/home/bentoml/bento/src/service.py", line 20, in predict
    return await mnist_runner.predict.async_run(input_arr)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
    return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 244, in async_run_method
    ) from None
bentoml.exceptions.RemoteException: An unexpected exception occurred in remote runner mlflow_pytorch_mnist: [404] Not Found
2023-07-12T11:22:41+0000 [INFO] [api_server:14] 172.17.0.1:35380 (scheme=http,method=POST,path=/predict,type=application/json,length=24007) (status=500,type=application/json,length=2) 59.387ms (trace=d6013656ce992cceb04176ef0dcc29b9,span=c773d1402b639afe,sampled=0,service.name=mlflow_pytorch_mnist_demo)

Expected behavior

Model should behave same as with bentoml serve, so return the 200 and prediction results:

2023-07-12T13:27:16+0200 [INFO] [runner:mlflow_pytorch_mnist:1] _ (scheme=http,method=POST,path=/predict,type=application/octet-stream,length=9408) (status=200,type=application/vnd.bentoml.NdarrayContainer,length=120) 5.956ms (trace=dd39b7dc12026999433920a234459293,span=e25a26f49ceca2f5,sampled=0,service.name=mlflow_pytorch_mnist)
2023-07-12T13:27:16+0200 [INFO] [api_server:14] 127.0.0.1:36430 (scheme=http,method=POST,path=/predict,type=application/json,length=24007) (status=200,type=application/json,length=634) 56.324ms (trace=dd39b7dc12026999433920a234459293,span=5a9bab64b0880d24,sampled=0,service.name=mlflow_pytorch_mnist_demo)

&&

[[-3.6152830123901367, -5.332465171813965, -3.0992157459259033, -0.8537688255310059, -3.2960684299468994, -3.9919497966766357, -7.9404096603393555, -0.033282458782196045, -3.728358268737793, -0.5474755167961121], [-0.08536237478256226, -2.9697699546813965, -0.0837031900882721, -1.4068245887756348, -4.206423282623291, -1.7627538442611694, -0.270295113325119, -7.176737308502197, -0.5251529216766357, -4.596581935882568], [-2.902038097381592, -0.05778511241078377, -3.3463895320892334, -1.1108677387237549, -0.0533248595893383, -0.21076475083827972, -1.4418506622314453, -3.4429407119750977, -0.9558041095733643, -0.8879327178001404]]

Environment

bentoml: 1.0.23, tried also with 1.0.1 & 1.0.8
python: 3.7.16, tried also with 3.8
platform: Linux
mlflow: 1.30.1, tried also with 2.4, 2.4.2
torch: 1.8.1, tried also with 2.0.1
torchvision: 0.9.1, tried also with 0.15.2

@majeranr majeranr added the bug Something isn't working label Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant