Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault at converting llama model #470

Open
mhyeonsoo opened this issue Jan 17, 2025 · 2 comments
Open

segmentation fault at converting llama model #470

mhyeonsoo opened this issue Jan 17, 2025 · 2 comments
Assignees
Labels

Comments

@mhyeonsoo
Copy link

mhyeonsoo commented Jan 17, 2025

Description of the bug:

I downloaded llama3.2-1b-instruct model to convert the model to tflite.
Used

python convert_to_tflite.py --checkpoint_path=model.safetensors 

at ai_edge_torch/generative/examples/llama path.

and I got below output

2025-01-17 11:23:48.510034: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-17 11:23:49.101093: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/gea-ai/.local/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:335: UserWarning: Device capability of jax unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.
  warnings.warn(
WARNING:2025-01-17 11:23:50,990:jax._src.xla_bridge:969: An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
2025-01-17 11:23:51.533181: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-01-17 11:23:51.534009: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-01-17 11:23:51.534179: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
W0117 11:23:59.496164 140590083385152 conversion.py:77] Your model "prefill_8" is converted in training mode. Please set the module in evaluation mode with `module.eval()` for better on-device performance and compatibility.
W0117 11:23:59.496322 140590083385152 conversion.py:77] Your model "prefill_64" is converted in training mode. Please set the module in evaluation mode with `module.eval()` for better on-device performance and compatibility.
W0117 11:23:59.496353 140590083385152 conversion.py:77] Your model "prefill_128" is converted in training mode. Please set the module in evaluation mode with `module.eval()` for better on-device performance and compatibility.
W0117 11:23:59.496378 140590083385152 conversion.py:77] Your model "prefill_256" is converted in training mode. Please set the module in evaluation mode with `module.eval()` for better on-device performance and compatibility.
W0117 11:23:59.496402 140590083385152 conversion.py:77] Your model "prefill_512" is converted in training mode. Please set the module in evaluation mode with `module.eval()` for better on-device performance and compatibility.
W0117 11:23:59.496423 140590083385152 conversion.py:77] Your model "prefill_1024" is converted in training mode. Please set the module in evaluation mode with `module.eval()` for better on-device performance and compatibility.
W0117 11:23:59.496453 140590083385152 conversion.py:77] Your model "decode" is converted in training mode. Please set the module in evaluation mode with `module.eval()` for better on-device performance and compatibility.
Fatal Python error: Segmentation fault

Current thread 0x00007fddadfaa740 (most recent call first):
  File "/home/user/.local/lib/python3.11/site-packages/torch/cuda/__init__.py", line 319 in _lazy_init
  File "/home/user/.local/lib/python3.11/site-packages/torch/cuda/random.py", line 33 in get_rng_state
  File "/home/user/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 208 in _fn
  File "/home/user/.local/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1322 in transform_code_object
  File "/home/user/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 699 in _compile_inner
  File "/home/user/.local/lib/python3.11/site-packages/torch/_utils_internal.py", line 87 in wrapper_function
  File "/home/user/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 666 in compile_inner
  File "/home/user/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 924 in _compile
  File "/home/user/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 526 in __call__
  File "/home/user/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1269 in __call__
  File "/home/user/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747 in _call_impl
  File "/home/user/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl
  File "/home/user/.local/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 465 in _fn
  File "/home/user/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747 in _call_impl
  File "/home/user/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl
  File "/home/user/.local/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 1432 in inner
  File "/home/user/.local/lib/python3.11/site-packages/torch/export/_trace.py", line 560 in _export_to_torch_ir
  File "/home/user/.local/lib/python3.11/site-packages/torch/export/_trace.py", line 1252 in _strict_export_lower_to_aten_ir
  File "/home/user.local/lib/python3.11/site-packages/torch/export/_trace.py", line 1224 in _strict_export
  File "/home/user/.local/lib/python3.11/site-packages/torch/export/_trace.py", line 1880 in _export
  File "/home/user/.local/lib/python3.11/site-packages/torch/export/exported_program.py", line 114 in wrapper
  File "/home/user/.local/lib/python3.11/site-packages/torch/export/_trace.py", line 990 in wrapper
  File "/home/user/.local/lib/python3.11/site-packages/torch/export/__init__.py", line 270 in export
  File "/home/user/.local/lib/python3.11/site-packages/ai_edge_torch/_convert/conversion.py", line 126 in export
  File "/home/user/.local/lib/python3.11/site-packages/ai_edge_torch/_convert/conversion.py", line 139 in <listcomp>
  File "/home/user/.local/lib/python3.11/site-packages/ai_edge_torch/_convert/conversion.py", line 138 in convert_signatures
  File "/home/user/.local/lib/python3.11/site-packages/ai_edge_torch/_convert/converter.py", line 172 in convert
  File "/home/user/.local/lib/python3.11/site-packages/ai_edge_torch/generative/utilities/converter.py", line 214 in _export_helper
  File "/home/user/.local/lib/python3.11/site-packages/ai_edge_torch/generative/utilities/converter.py", line 119 in convert_to_tflite
  File "/mount/workspace/09.LLM/ai-edge-torch/ai_edge_torch/generative/examples/llama/convert_to_tflite.py", line 79 in main
  File "/home/user/.local/lib/python3.11/site-packages/absl/app.py", line 254 in _run_main
  File "/home/user/.local/lib/python3.11/site-packages/absl/app.py", line 308 in run
  File "/mount/workspace/09.LLM/ai-edge-torch/ai_edge_torch/generative/examples/llama/convert_to_tflite.py", line 91 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, tensorflow.python.framework.fast_tensor_util, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.utils, h5py.h5t, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5r, h5py._proxy, h5py._conv, h5py.h5z, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, jaxlib.cpu_feature_guard, PIL._imaging, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special (total: 78)
Segmentation fault (core dumped)

Actual vs expected behavior:

expected to have a converted tflite model, but returned segmentation fault.

Any other information you'd like to share?

This is my environments:

  • python: 3.11.11
  • tensorflow: 2.13.0
  • 2.5.0+cu118
  • cuda: 11.8
  • cudnn: 8.7.0
@gaikwadrahul8
Copy link

gaikwadrahul8 commented Jan 22, 2025

Hi, @mhyeonsoo
I apologize for the delay in my response, I have been able to replicate a similar behavior on my end while utilizing GPU please refer this gist-file so we'll have to dig more into this issue and will update you

I have also attempted to utilize CPU instead of GPU within Google Colab however, I encountered a different error: RuntimeError: Cannot set version_counter for inference tensor. This error is also documented in this gist-file

EDIT : I would appreciate it if you could attempt using commit 5a93316. Another user reported success with this commit 5a93316 please refer this comment #447 (comment)

Thank you for your understanding and patience.

@mhyeonsoo
Copy link
Author

Hi @gaikwadrahul8 ,
Thanks for the response.
I have reviewed 5a93316 and I checked that I've already used such code lines.

I will look forward to listen to the update from you :)
Thanks,

@pkgoogle pkgoogle assigned gaikwadrahul8 and unassigned pkgoogle Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants