Problem loading Pytorch model #126
Replies: 59 comments 3 replies
-
Thanks for reporting the issue. |
Beta Was this translation helpful? Give feedback.
-
In here you can add 0.6.0 to replace the You were having the dependencies missing in your environment, to proceed further, could you please provide the following information:
|
Beta Was this translation helpful? Give feedback.
-
@eusebioaguilera
|
Beta Was this translation helpful? Give feedback.
-
No, the same error appears using this property. |
Beta Was this translation helpful? Give feedback.
-
Hi, I have used the version 0.6.0 instead of the djl.version defined, and there is no difference.
|
Beta Was this translation helpful? Give feedback.
-
The full stacktrace is:
|
Beta Was this translation helpful? Give feedback.
-
Will take a look and back to you tomorrow |
Beta Was this translation helpful? Give feedback.
-
@eusebioaguilera With debug log, we can see more engine initialization logs. And can you checkout our DJL master branch (use latest code) and run following command:
|
Beta Was this translation helpful? Give feedback.
-
Hi, I have enabled debug log and the problem seems to be that the native pytorch library is not loaded. [main] DEBUG ai.djl.repository.zoo.ModelZoo - Searching model in zoo provider: ai.djl.repository.zoo.DefaultZooProvider However the native pytorch component is supplied by the dependency defined in the pom.xml file (artifactId pytorch-native-auto), right? Have I to supply the path of the pytorch library or another extra configuration to solve this issue? Thank you in advance. |
Beta Was this translation helpful? Give feedback.
-
Currently pytorch support CUDA 9.2 10.1 and 10.2 only. 10 will not work due to the missing matched binary. You can try to install a different CUDA version to solve this issue |
Beta Was this translation helpful? Give feedback.
-
@lanking520 @eusebioaguilera This seems a bug. We will look into it. |
Beta Was this translation helpful? Give feedback.
-
Yes, if I install cuda 10.2, then the library starts downloading a set of dll and fails for another reason. The question is, Do I need a suitable version of cuda installed on my system to use the library? In my case I want to use the model for inference using the pytorch-native-cpu artifact. |
Beta Was this translation helpful? Give feedback.
-
Currently it seemed to be a bug that didn’t fall back to use CPU. As an alternative, try to replace auto by using this: |
Beta Was this translation helpful? Give feedback.
-
@eusebioaguilera However, it seems something is wrong in your system. The downloaded .jar file seems corrupted. Would you please clear your gradle cache folder and rerun the example. What's the error message when you using CUDA 10.2? |
Beta Was this translation helpful? Give feedback.
-
I have uninstalled cuda 10.2 and try to use the artifact for the cpu version (pytorch-native-cpu) and I get the same error "No deep learning engine found". This error disappear when cuda 10.2 is installed. |
Beta Was this translation helpful? Give feedback.
-
with that command I get the same output.
Isn´t the JNA provided by the dependencies? |
Beta Was this translation helpful? Give feedback.
-
@carlosuc3m java.lang.NoSuchMethodError indicate that there is another JNA jar file is included in the classpath. I guess it came from another library you included in your project. |
Beta Was this translation helpful? Give feedback.
-
I finally was able to run gradle and this is what I got:
Yes my project used another library that required reading native libraries, but with with the insallation of the Visual Studio 2019 redistributables there was no problem running both libraries. |
Beta Was this translation helpful? Give feedback.
-
@carlosuc3m A separate issue I noticed in your log is you are using CUDA 10.0, we don't support CUDA 10.0, only 10.1 and 10.2 are supported. The pytorch download on your windows is CPU only version. |
Beta Was this translation helpful? Give feedback.
-
Si in the new version we still need to instal VC2019 redistributables? I understood that with the new version they were no longer needed. |
Beta Was this translation helpful? Give feedback.
-
@carlosuc3m |
Beta Was this translation helpful? Give feedback.
-
Hello again,
or
I always get the same error I then installed CUDA 10.2 on my computer to see if it worked but again an error:
I am using a WIndowsServer 2016 with the following pom.xml:
Regards and thanks for your time, |
Beta Was this translation helpful? Give feedback.
-
Using
Still produces the same error |
Beta Was this translation helpful? Give feedback.
-
However if I uninstall CUDA 10.0 and delete C:\Program Files\NVIDIA GPU Computing Toolkit and C:\Program Files\NVIDIA Corporation it works well either with CUDA10.2 or without any CUDA and just the CPU |
Beta Was this translation helpful? Give feedback.
-
I have also realised that removing all the enviroment variables related to CUDA from the PATH allow falling back to CPU. If any of those variables is present and there is an incompatible CUDA version, fallback does not happen in version 1.6.0.
HOwever I have not found a way to have two CUDA versions installed and make the problem run well. |
Beta Was this translation helpful? Give feedback.
-
@carlosuc3m We don't support cuda 10.0, We only have cuda 10.1 and 10.2 support. See: https://github.com/awslabs/djl/blob/master/pytorch/pytorch-engine/README.md#windows |
Beta Was this translation helpful? Give feedback.
-
@carlosuc3m |
Beta Was this translation helpful? Give feedback.
-
Hello again,
Without this, djl was unable to download the correct native library. |
Beta Was this translation helpful? Give feedback.
-
@carlosuc3m @lopezmt JNA is transient dependency of ai.djl:api. If you have api as dependency, jna will be added automatically,
See our example pom.xml file, we don't JNA added explicitly: https://github.com/awslabs/djl/blob/master/examples/pom.xml |
Beta Was this translation helpful? Give feedback.
-
This fixed it for me:
|
Beta Was this translation helpful? Give feedback.
-
Question
Hi everyone. I am trying to use this library in order to serve deep neural network models for a web service. I'm trying to load a custom pytorch model for landmark detection. I have follow the method to convert the model (https://github.com/awslabs/djl/blob/master/docs/pytorch/how_to_convert_your_model_to_torchscript.md), then I've used the example code to load a model from a local file (https://github.com/awslabs/djl/blob/master/docs/load_model.md). I have encountered the error "No deep learning engine found.". However I have checked my dependencies on the pom.xml and I have not found the problem (djl.version is also defined in the pom.xml file):
Moreover I have debugged the code and the problem is that the code fails in modelZoo.java in loadModel method on line:
Set supportedEngine = zoo.getSupportedEngines();
var zoo is of type "DefaultModelZoo" and modelLoaders array is empty. Anyone can help me with this issue?
Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions