ipex.llm
provides dedicated optimization for running Large Language Models (LLM) faster, including technical points like paged attention, ROPE fusion, etc.
And a set of data types are supported for various scenarios, including FP32, BF16, Smooth Quantization INT8, Weight Only Quantization INT8/INT4 (prototype).
Note: The instructions in this section will setup an environment with a recent PyTorch* nightly build and a latest source build of IPEX.
If you would like to use stable PyTorch* and IPEX release versions, please refer to the instructions in the release branch, in which IPEX is installed via prebuilt wheels using pip install
rather than source code building.
Note: Please be aware that in order to enable the latest optimizations for MoE models (DeepSeek, Mixtral, etc.) in DeepSpeed
,
we are setting a different argument for env_setup.sh
in IPEX v2.6.0+cpu comparing with previous versions,
in order to build DeepSpeed
from source code with a recent commit.
# Get the Intel® Extension for PyTorch\* source code
git clone https://github.com/intel/intel-extension-for-pytorch.git
cd intel-extension-for-pytorch
git submodule sync
git submodule update --init --recursive
# Build an image with the provided Dockerfile by compiling Intel® Extension for PyTorch\* from source
# To have a custom ssh server port for multi-nodes run, please add --build-arg PORT_SSH=<CUSTOM_PORT> ex: 2345, otherwise use the default 22 SSH port
docker build -f examples/cpu/llm/Dockerfile --build-arg COMPILE=ON --build-arg PORT_SSH=2345 -t ipex-llm:main .
# Run the container with command below
docker run --rm -it --net host --privileged -v /dev/shm:/dev/shm ipex-llm:main bash
# When the command prompt shows inside the docker container, enter llm examples directory
cd llm
# Activate environment variables
# set bash script argument to "inference" or "fine-tuning" for different usages
source ./tools/env_activate.sh [inference|fine-tuning]
# Get the Intel® Extension for PyTorch\* source code
git clone https://github.com/intel/intel-extension-for-pytorch.git
cd intel-extension-for-pytorch
git submodule sync
git submodule update --init --recursive
# GCC 12.3 is required. Installation can be taken care of by the environment configuration script.
# Create a conda environment
conda create -n llm python=3.10 -y
conda activate llm
# Setup the environment with the provided script
cd examples/cpu/llm
bash ./tools/env_setup.sh 8
# Activate environment variables
# set bash script argument to "inference" or "fine-tuning" for different usages
source ./tools/env_activate.sh [inference|fine-tuning]
After setting up your docker or conda environment, you may follow these additional steps to setup and run Jupyter Notebooks. The port number can be changed.
# Install dependencies
pip install notebook matplotlib
# Launch Jupyter Notebook
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
- Open up a web browser with the given URL and token.
- Open the notebook.
- Run all cells.
# Install dependencies
pip install notebook ipykernel matplotlib
# Register ipykernel with Conda
python -m ipykernel install --user --name=IPEX-LLM
# Launch Jupyter Notebook
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
- Open up a web browser with the given URL and token.
- Open the notebook.
- Change your Jupyter Notebook kernel to IPEX-LLM.
- Run all cells.
Note: In env_setup.sh
script a prompt.json
file is downloaded, which provides prompt samples with pre-defined input token lengths for benchmarking.
For Llama-3 models benchmarking, the users need to download a specific prompt.json
file, overwriting the original one.
wget -O prompt.json https://intel-extension-for-pytorch.s3.amazonaws.com/miscellaneous/llm/prompt-3.json
The original prompt.json
file can be restored from the repository if needed.
wget https://intel-extension-for-pytorch.s3.amazonaws.com/miscellaneous/llm/prompt.json
Inference and fine-tuning are supported in individual directories.
For inference example scripts, visit the inference directory.
For fine-tuning example scripts, visit the fine-tuning directory.