diff --git a/setup.sh b/setup.sh
index 6a9d771..b40809d 100644
--- a/setup.sh
+++ b/setup.sh
@@ -1,8 +1,8 @@
-conda create -n r1-v python=3.11 
-conda activate r1-v
+# conda create -n r1-v python=3.11 
+# conda activate r1-v
 
-# Install the packages in open-r1-multimodal .
-cd src/open-r1-multimodal # We edit the grpo.py and grpo_trainer.py in open-r1 repo.
+# Install the packages in r1-v .
+cd src/r1-v 
 pip install -e ".[dev]"
 
 # Addtional modules
@@ -11,7 +11,7 @@ pip install tensorboardx
 pip install qwen_vl_utils torchvision
 pip install flash-attn --no-build-isolation
 
-# vLLM support
+# vLLM support 
 pip install vllm==0.7.2
 
 # fix transformers version
diff --git a/src/open-r1-multimodal/README.md b/src/open-r1-multimodal/README.md
deleted file mode 100644
index a513c62..0000000
--- a/src/open-r1-multimodal/README.md
+++ /dev/null
@@ -1,104 +0,0 @@
-# Multimodal Open R1
-
-We conducted a speed-run on to investigate R1's paradigm in multimodal models after observing growing interest in R1 and studying the elegant implementation of the GRPO algorithm in `open-r1` and `trl`.
-
-[🤗 Models](https://huggingface.co/lmms-lab/Qwen2-VL-2B-GRPO-8k) | [🤗 Datasets](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified) | [Wandb Logs](https://api.wandb.ai/links/libo0013/lz60ml8h)
-
-> [!NOTE] 
-> Although our insights may not be guaranteed to be correct, we commit to sharing them truthfully and honestly. We welcome community feedback and discussions to improve our understanding on multimodal reasoning models. We will PR to `open-r1` later to better support community study on multimodal RL.
-
-![alt text](assets/lmm_r1.png)
-
-**What We Did**
-- Implemented Multimodal R1 based on [huggingface/open-r1](https://github.com/huggingface/open-r1) and [deepseek-ai/DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1). 
-  - Integrated Qwen2-VL series, Aria-MoE, and other VLMs available in `transformers`.
-- Open-sourced the first batch of `8k` multimodal RL training examples focused on Math reasoning. The data is created by GPT4o with reasoning paths and verifiable answers, based on `Math360K` and `Geo170K`. We provide a [script](local_scripts/create_vision_cot_data.py) for users to inspect and create their own data.
-  - The dataset is available in [lmms-lab/multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified).
-- Open-sourced models trained with GRPO.
-  - The models are available in [lmms-lab/Qwen2-VL-2B-GRPO-8k](https://huggingface.co/lmms-lab/Qwen2-VL-2B-GRPO-8k) | [lmms-lab/Qwen2-VL-7B-GRPO-8k](https://huggingface.co/lmms-lab/Qwen2-VL-7B-GRPO-8k).
-
-**Insights and Future Plans**
-- Multiple-choice option verification is necessary since many math multimodal problems are MCQs. Discussed in [issue#56](https://github.com/huggingface/open-r1/issues/56) and we customize the verification logic in [src/open_r1/grpo.py](src/open_r1/grpo.py).
-- Need to curate RL data to be verifiable, requiring further exploration on effectively converting existing data into RL data and validating GPT4o's curation reliability.
-- Current framework is not efficient for large-scale training. Qwen2-VL-2B model takes `10 hours` to train `1 epoch` on `8 H100 GPUs` for `8k samples`. So it's necessary to investigate how to efficiently scale up the training.
-- Our init model (Qwen2-VL-2/7B-Instruct) do not show good reasoning ability in our experiments, and during training, the model quickly gather rewards from `format` but not `accuracy`, which is not a good sign for whole RL training. We release our [wandb logs](https://api.wandb.ai/links/libo0013/lz60ml8h) for reference.
-
-  ![image](https://github.com/user-attachments/assets/e0cfca59-3403-4776-97e9-090f2972b903)
-
-- The community may need to curate better multimodal dataset for RL training. Current dataset is limited to math scenarios since it has verifiable answers. It's unclear how to expand the RL dataset to other general domains with open-ended answer. We welcome community feedback on our current strategy and plan to release a larger dataset if we get clear scaling insights through community discussions.
-
-
-## Training Models
-
-> [!NOTE]
-> The training commands below are configured for a node of 8 x H100s (80GB). For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.
-
-### GRPO on Qwen2-VL-2/7B
-
-To run GRPO on Qwen2-VL-2B:
-
-```
-cd /home/tiger/multimodal-open-r1
-# pip3 install vllm==0.6.6.post1
-pip3 install -e ".[dev]"
-
-pip3 install wandb==0.18.3
-
-torchrun --nproc_per_node="${ARNOLD_WORKER_GPU}" \ # 8
-    --nnodes="${ARNOLD_WORKER_NUM}" \ # 1
-    --node_rank="${ARNOLD_ID}" \ # 0
-    --master_addr="${METIS_WORKER_0_HOST}" \ # 127.0.0.1
-    --master_port="${port_in_cmd}" \ # 12345
-    src/open_r1/grpo.py \
-    --deepspeed scripts/zero3.json \
-    --output_dir checkpoints/Qwen2-VL-2B-GRPO-8k \
-    --model_name_or_path Qwen/Qwen2-VL-2B-Instruct \
-    --dataset_name lmms-lab/multimodal-open-r1-8k-verified \
-    --max_prompt_length 8192 \
-    --per_device_train_batch_size 1 \
-    --gradient_accumulation_steps 1 \
-    --logging_steps 1 \
-    --bf16 \
-    --report_to wandb \
-    --gradient_checkpointing true \
-    --attn_implementation flash_attention_2 \
-    --max_pixels 2359296 \
-    --save_total_limit 8 \
-    --num_train_epochs 1 \
-    --run_name Qwen2-VL-2B-GRPO-8k
-```
-
-Please refer to [local_scripts/train_qwen2_vl.sh](local_scripts/train_qwen2_vl.sh) for more details.
-
-Above scripts are naively for `multi-gpu/multi-node` training.
-
-### Reasoning matters for evaluation
-
-Many benchmarks, such as MMMU and AI2D, require the model to directly output an answer without providing reasoning steps. This raises a critical issue for evaluation: does the model truly understand how to derive the answer or is it just guessing or relying on memorization? To address this, we require the model to first generate its reasoning steps before providing the final answer. We then use GPT-4o to extract and score the responses.
-
-We tested the original Qwen2-VL-2B-Instruct and Qwen2-VL-7B-Instruct models and observed that their scores decreased on certain benchmarks when reasoning steps were included. Subsequently, we compared the scores of our model using the same evaluation method. Our model performed better under the reasoning-based chain-of-thought (CoT) setting. We attribute this improvement to our model’s training on GRPO, which appears to enhance its ability to handle reasoning formats and consequently achieve higher scores.
-
-| Benchmarks     | Qwen2-VL-2B-Instruct(w.o reasoning) | Qwen2-VL-2B-Instruct(w. reasoning) | Qwen2-VL-2B-GRPO-8k(w. reasoning) | Qwen2-VL-7B-Instruct(w.o reasoning) | Qwen2-VL-7B-Instruct(w. reasoning) | Qwen2-VL-7B-GRPO-8k(w. reasoning) |
-|----------------|-------------------------------------|------------------------------------|-----------------------------------|-------------------------------------|------------------------------------|-----------------------------------|
-| MMMU           | 39.7                                | 31.2                               | 35.22                             | 50.8                                | 41.9                               | 49.4                              |
-| Mathvista-mini | 51.6                                | 48.6                               | 49.4                              | 57.1                                | 60.9                               | 60.6                              |
-
-In our logs, we sometimes find out that the model still just outputing the answer with our the reasoning steps (even for our trained models). We believe that this could because the model are not familiar with the reasoning steps and can't decide how to generate it.
-
-### Evaluating models
-
-We use [lmms-eval]([https://github.com/LMMs-Lab/lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)) to evaluate models, please run:
-
-```shell
-bash local_scripts/lmms_eval_qwen2vl.sh
-```
-
-To reproduce our result on the above benchmarks, please checkout to the `dev/qwen_cot` branch.
-
-Visual reasoning task evaluation currently are limited in direct answer format and simple parsing logic. Tasks like `mmmu_val`, `mathvista_testmini`, and `mmmu_pro` expect direct answers rather than reasoning traces, and the current parsing logic cannot process step-by-step reasoning. We are actively working on improving this limitation and welcome community contributions to develop a more comprehensive evaluation framework for visual reasoning models.
-
-### RL Data Generation
-
-We provide the first batch of `8k` multimodal RL training examples focused on Math reasoning. The data is generated by GPT4o. We provide the [script](local_scripts/create_vision_cot_data.py) to users to inspect and create their own data.
-
-Users can view data in [lmms-lab/multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified). The problem/solution are generated by GPT4o with reasoning path and verifiable answer. The `original question`/`original answer` are from the original dataset.
diff --git a/src/open-r1-multimodal/assets/lmm_r1.png b/src/open-r1-multimodal/assets/lmm_r1.png
deleted file mode 100644
index 24e2b66..0000000
Binary files a/src/open-r1-multimodal/assets/lmm_r1.png and /dev/null differ
diff --git a/src/open-r1-multimodal/assets/plan-of-attack.png b/src/open-r1-multimodal/assets/plan-of-attack.png
deleted file mode 100644
index ac44326..0000000
Binary files a/src/open-r1-multimodal/assets/plan-of-attack.png and /dev/null differ
diff --git a/src/open-r1-multimodal/slurm/evaluate.slurm b/src/open-r1-multimodal/slurm/evaluate.slurm
deleted file mode 100644
index 421a96c..0000000
--- a/src/open-r1-multimodal/slurm/evaluate.slurm
+++ /dev/null
@@ -1,49 +0,0 @@
-#!/bin/bash
-#SBATCH --job-name=open-r1-evaluate
-#SBATCH --nodes=1
-#SBATCH --ntasks-per-node=1
-#SBATCH --exclusive
-#SBATCH --gres=gpu:8
-#SBATCH --partition=hopper-prod 
-#SBATCH --time=01:59:00
-#SBATCH --output=./logs/evaluate/%x-%j.out
-#SBATCH --err=./logs/evaluate/%x-%j.err
-
-# Usage: sbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B aime24
-
-set -x -e
-
-source ~/.bashrc
-conda activate openr1
-module load cuda/12.1
-echo "START TIME: $(date)"
-echo "PYTHON ENV: $(which python)"
-
-
-NUM_GPUS=8
-MODEL=$1
-TASK=$2
-MODEL_ARGS="pretrained=$MODEL,dtype=float16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilisation=0.8"
-OUTPUT_DIR=data/evals/$MODEL
-
-
-# force crashing on nccl issues like hanging broadcast
-export NCCL_ASYNC_ERROR_HANDLING=1
-# export NCCL_DEBUG=INFO
-# export NCCL_DEBUG_SUBSYS=COLL
-# export NCCL_SOCKET_NTHREADS=1
-# export NCCL_NSOCKS_PERTHREAD=1
-# export CUDA_LAUNCH_BLOCKING=1
-
-# Specific configuration optimized for the Hugging Face Compute Cluster
-# Be ye warned this may not work on other clusters!
-module load cuda/12.1
-
-lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
-    --custom-tasks src/open_r1/evaluate.py \
-    --use-chat-template \
-    --system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
-    --output-dir $OUTPUT_DIR 
-
-
-echo "END TIME: $(date)"
diff --git a/src/open-r1-multimodal/slurm/generate.slurm b/src/open-r1-multimodal/slurm/generate.slurm
deleted file mode 100644
index a7d4ab5..0000000
--- a/src/open-r1-multimodal/slurm/generate.slurm
+++ /dev/null
@@ -1,201 +0,0 @@
-#!/bin/bash
-#SBATCH --job-name=deepseek-r1-generation
-#SBATCH --partition=hopper-prod
-#SBATCH --qos=normal
-#SBATCH --nodes=4
-#SBATCH --exclusive
-#SBATCH --gpus-per-node=8
-#SBATCH --output=./logs/%x-%j.out
-#SBATCH --err=./logs/%x-%j.err
-#SBATCH --time=08:00:00
-
-# Parse command line arguments
-while [[ $# -gt 0 ]]; do
-    case $1 in
-        --hf-dataset)
-            HF_DATASET="$2"
-            shift 2
-            ;;
-        --hf-dataset-config)
-            HF_DATASET_CONFIG="$2"
-            shift 2
-            ;;
-        --hf-dataset-split)
-            HF_DATASET_SPLIT="$2"
-            shift 2
-            ;;
-        --prompt-column)
-            PROMPT_COLUMN="$2"
-            shift 2
-            ;;
-        --model)
-            MODEL="$2"
-            shift 2
-            ;;
-        --temperature)
-            TEMPERATURE="$2"
-            shift 2
-            ;;
-        --top-p)
-            TOP_P="$2"
-            shift 2
-            ;;
-        --max-new-tokens)
-            MAX_NEW_TOKENS="$2"
-            shift 2
-            ;;
-        --num-generations)
-            NUM_GENERATIONS="$2"
-            shift 2
-            ;;
-        --hf-output-dataset)
-            HF_OUTPUT_DATASET="$2"
-            shift 2
-            ;;
-        --private)
-            PRIVATE="true"
-            shift
-            ;;
-        *)
-            echo "Unknown parameter: $1"
-            exit 1
-            ;;
-    esac
-done
-
-if [ -z "$MODEL" ] || [ -z "$HF_DATASET" ]; then
-    echo "Error: --model and --hf-dataset are required parameters"
-    exit 1
-fi
-
-# Set default values for optional parameters
-HF_DATASET_SPLIT=${HF_DATASET_SPLIT:-"train"}
-PROMPT_COLUMN=${PROMPT_COLUMN:-"prompt"}
-MAX_NEW_TOKENS=${MAX_NEW_TOKENS:-8192}
-NUM_GENERATIONS=${NUM_GENERATIONS:-1}
-PRIVATE=${PRIVATE:-"false"}
-
-# Print all input arguments
-echo "Input arguments:"
-echo "MODEL: $MODEL"
-echo "HF_DATASET: $HF_DATASET"
-echo "HF_DATASET_CONFIG: $HF_DATASET_CONFIG"
-echo "HF_DATASET_SPLIT: $HF_DATASET_SPLIT"
-echo "PROMPT_COLUMN: $PROMPT_COLUMN"
-echo "TEMPERATURE: $TEMPERATURE"
-echo "TOP_P: $TOP_P"
-echo "MAX_NEW_TOKENS: $MAX_NEW_TOKENS"
-echo "NUM_GENERATIONS: $NUM_GENERATIONS"
-echo "HF_OUTPUT_DATASET: $HF_OUTPUT_DATASET"
-echo "PRIVATE: $PRIVATE"
-echo "-------------------"
-
-set -ex
-
-module load cuda/12.1
-
-export LD_LIBRARY_PATH=.venv/lib/python3.11/site-packages/nvidia/nvjitlink/lib
-
-echo "SLURM_JOB_ID: $SLURM_JOB_ID"
-echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST"
-
-source .venv/bin/activate
-
-# Getting the node names
-nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
-nodes_array=($nodes)
-
-# Get the IP address of the head node
-head_node=${nodes_array[0]}
-head_node_ip=$(srun --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)
-
-# Start Ray head node
-port=6379
-ip_head=$head_node_ip:$port
-export ip_head
-echo "IP Head: $ip_head"
-
-echo "Starting HEAD at $head_node"
-srun --nodes=1 --ntasks=1 -w "$head_node" \
-    ray start --head --node-ip-address="$head_node_ip" --port=$port \
-    --dashboard-host=0.0.0.0 \
-    --dashboard-port=8265 \
-    --block &
-
-# Give some time to head node to start...
-sleep 10
-
-# Start Ray worker nodes
-worker_num=$((SLURM_JOB_NUM_NODES - 1))
-
-# Start from 1 (0 is head node)
-for ((i = 1; i <= worker_num; i++)); do
-    node_i=${nodes_array[$i]}
-    echo "Starting WORKER $i at $node_i"
-    srun --nodes=1 --ntasks=1 -w "$node_i" \
-        ray start --address "$ip_head" \
-        --block &
-    sleep 5
-done
-
-# Give some time to the Ray cluster to gather info
-echo "Waiting a bit for Ray cluster to gather node info..."
-sleep 60
-
-# Run vllm
-RAY_ADDRESS="http://$head_node_ip:8265" ray job submit \
-    --working-dir src/open_r1 \
-    --no-wait \
-    -- vllm serve $MODEL \
-    --tensor-parallel-size 8 \
-    --pipeline-parallel-size 4 \
-    --gpu-memory-utilization=0.85 \
-    --max-model-len 16384 \
-    --enable-chunked-prefill \
-    --trust-remote-code \
-    --distributed-executor-backend ray
-
-# wait for vllm to load the model
-echo "Waiting for vLLM (http://$head_node_ip:8000) server to be up..."
-
-# wait for vllm to load and serve the model
-while true; do
-    if curl -s -o /dev/null -w "%{http_code}" http://$head_node_ip:8000 >/dev/null 2>&1; then
-        echo "Received response from http://$head_node_ip:8000"
-        break
-    else
-        echo "Still waiting... (Press Ctrl+C to cancel)"
-        sleep 60
-    fi
-done
-
-echo "Checking available models..."
-curl http://$head_node_ip:8000/v1/models
-
-echo "Executing sanity check..."
-curl http://$head_node_ip:8000/v1/completions \
-    -H "Content-Type: application/json" \
-    -d "{
-        \"model\": \"$MODEL\",
-        \"prompt\": \"<｜begin▁of▁sentence｜><｜User｜>hi, how are you?<｜Assistant｜>\",
-        \"max_tokens\": 2048,
-        \"temperature\": 0.6
-    }"
-
-# Finally submit the job to the cluster
-echo "Submitting job to ray cluster..."
-RAY_ADDRESS="http://$head_node_ip:8265" ray job submit \
-    --working-dir src/open_r1 \
-    -- python -u generate.py \
-    --model "$MODEL" \
-    --hf-dataset "$HF_DATASET" \
-    ${HF_DATASET_CONFIG:+--hf-dataset-config "$HF_DATASET_CONFIG"} \
-    --hf-dataset-split "$HF_DATASET_SPLIT" \
-    --prompt-column "$PROMPT_COLUMN" \
-    ${TEMPERATURE:+--temperature "$TEMPERATURE"} \
-    ${TOP_P:+--top-p "$TOP_P"} \
-    --max-new-tokens "$MAX_NEW_TOKENS" \
-    --num-generations "$NUM_GENERATIONS" \
-    ${HF_OUTPUT_DATASET:+--hf-output-dataset "$HF_OUTPUT_DATASET"} \
-    ${PRIVATE:+--private} \
-    --vllm-server-url "http://$head_node_ip:8000/v1"
\ No newline at end of file
diff --git a/src/open-r1-multimodal/slurm/sft.slurm b/src/open-r1-multimodal/slurm/sft.slurm
deleted file mode 100644
index e2c61e9..0000000
--- a/src/open-r1-multimodal/slurm/sft.slurm
+++ /dev/null
@@ -1,88 +0,0 @@
-#!/bin/bash
-#SBATCH --job-name=open-r1-sft
-#SBATCH --nodes=1
-#SBATCH --ntasks-per-node=1
-#SBATCH --exclusive
-#SBATCH --gres=gpu:8
-#SBATCH --partition=hopper-prod 
-#SBATCH --output=./logs/%x-%j.out
-#SBATCH --err=./logs/%x-%j.err
-
-set -x -e
-
-source ~/.bashrc
-conda activate openr1
-module load cuda/12.1
-echo "START TIME: $(date)"
-echo "PYTHON ENV: $(which python)"
-
-MODEL_PATH=$1
-DATASET_PATH=$2
-ACCELERATOR=$3
-
-# Training setup
-NUM_NODES=$SLURM_NNODES
-GPUS_PER_NODE=8
-WORLD_SIZE=$(($NUM_NODES*$GPUS_PER_NODE))
-
-# so processes know who to talk to
-MASTER_ADDR=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)
-MASTER_PORT=6000
-
-export CMD=" \
-    src/open_r1/sft.py \
-    --model_name_or_path $MODEL_PATH \
-    --dataset_name $DATASET_PATH \
-    --use_liger_kernel true \
-    --learning_rate 2.0e-5 \
-    --num_train_epochs 1 \
-    --packing \
-    --max_seq_length 4096 \
-    --per_device_train_batch_size 4 \
-    --per_device_eval_batch_size 4 \
-    --gradient_accumulation_steps 4 \
-    --gradient_checkpointing \
-    --bf16 \
-    --logging_steps 5 \
-    --eval_strategy steps \
-    --eval_steps 100 \
-    --output_dir data/Qwen2.5-1.5B-Open-R1-Distill
-    "
-
-export LAUNCHER="HF_HUB_ENABLE_HF_TRANSFER=1 ACCELERATE_LOG_LEVEL=info TRANSFORMERS_VERBOSITY=info accelerate launch \
-    --config_file configs/$ACCELERATOR.yaml  \
-    --gradient_accumulation_steps 4 \
-    --num_machines $NUM_NODES \
-    --num_processes $WORLD_SIZE \
-    --main_process_ip $MASTER_ADDR \
-    --main_process_port $MASTER_PORT \
-    --machine_rank \$SLURM_PROCID \
-    --rdzv_conf "rdzv_backend=c10d,rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT" \
-    --max_restarts 1 \
-    --role \$(hostname -s): \
-    --tee 3 \
-    "
-
-# force crashing on nccl issues like hanging broadcast
-export NCCL_ASYNC_ERROR_HANDLING=1
-# export NCCL_DEBUG=INFO
-# export NCCL_DEBUG_SUBSYS=COLL
-# export NCCL_SOCKET_NTHREADS=1
-# export NCCL_NSOCKS_PERTHREAD=1
-# export CUDA_LAUNCH_BLOCKING=1
-
-# Specific configuration optimized for the Hugging Face Compute Cluster
-# Be ye warned this may not work on other clusters!
-module load cuda/12.1
-
-# srun error handling:
-# --wait=60: wait 60 sec after the first task terminates before terminating all remaining tasks
-# --kill-on-bad-exit=1: terminate a step if any task exits with a non-zero exit code
-SRUN_ARGS=" \
-    --wait=60 \
-    --kill-on-bad-exit=1 \
-    "
-
-clear; srun $SRUN_ARGS --jobid $SLURM_JOB_ID bash -c "$LAUNCHER --role \$SLURMD_NODENAME: $CMD" 2>&1
-
-echo "END TIME: $(date)"
diff --git a/src/open-r1-multimodal/.gitignore b/src/r1-v/.gitignore
similarity index 100%
rename from src/open-r1-multimodal/.gitignore
rename to src/r1-v/.gitignore
diff --git a/src/open-r1-multimodal/LICENSE b/src/r1-v/LICENSE
similarity index 100%
rename from src/open-r1-multimodal/LICENSE
rename to src/r1-v/LICENSE
diff --git a/src/open-r1-multimodal/Makefile b/src/r1-v/Makefile
similarity index 100%
rename from src/open-r1-multimodal/Makefile
rename to src/r1-v/Makefile
diff --git a/src/open-r1-multimodal/configs/ddp.yaml b/src/r1-v/configs/ddp.yaml
similarity index 100%
rename from src/open-r1-multimodal/configs/ddp.yaml
rename to src/r1-v/configs/ddp.yaml
diff --git a/src/open-r1-multimodal/configs/qwen2vl_sft_config.yaml b/src/r1-v/configs/qwen2vl_sft_config.yaml
similarity index 100%
rename from src/open-r1-multimodal/configs/qwen2vl_sft_config.yaml
rename to src/r1-v/configs/qwen2vl_sft_config.yaml
diff --git a/src/open-r1-multimodal/configs/zero2.yaml b/src/r1-v/configs/zero2.yaml
similarity index 100%
rename from src/open-r1-multimodal/configs/zero2.yaml
rename to src/r1-v/configs/zero2.yaml
diff --git a/src/open-r1-multimodal/configs/zero3.yaml b/src/r1-v/configs/zero3.yaml
similarity index 100%
rename from src/open-r1-multimodal/configs/zero3.yaml
rename to src/r1-v/configs/zero3.yaml
diff --git a/src/open-r1-multimodal/local_scripts/create_vision_cot_data.py b/src/r1-v/local_scripts/create_vision_cot_data.py
similarity index 100%
rename from src/open-r1-multimodal/local_scripts/create_vision_cot_data.py
rename to src/r1-v/local_scripts/create_vision_cot_data.py
diff --git a/src/open-r1-multimodal/local_scripts/lmms_eval_qwen2vl.sh b/src/r1-v/local_scripts/lmms_eval_qwen2vl.sh
similarity index 100%
rename from src/open-r1-multimodal/local_scripts/lmms_eval_qwen2vl.sh
rename to src/r1-v/local_scripts/lmms_eval_qwen2vl.sh
diff --git a/src/open-r1-multimodal/local_scripts/prepare_hf_data.py b/src/r1-v/local_scripts/prepare_hf_data.py
similarity index 100%
rename from src/open-r1-multimodal/local_scripts/prepare_hf_data.py
rename to src/r1-v/local_scripts/prepare_hf_data.py
diff --git a/src/open-r1-multimodal/local_scripts/train_aria_moe.sh b/src/r1-v/local_scripts/train_aria_moe.sh
similarity index 100%
rename from src/open-r1-multimodal/local_scripts/train_aria_moe.sh
rename to src/r1-v/local_scripts/train_aria_moe.sh
diff --git a/src/open-r1-multimodal/local_scripts/train_qwen2_vl.sh b/src/r1-v/local_scripts/train_qwen2_vl.sh
similarity index 100%
rename from src/open-r1-multimodal/local_scripts/train_qwen2_vl.sh
rename to src/r1-v/local_scripts/train_qwen2_vl.sh
diff --git a/src/open-r1-multimodal/local_scripts/zero2.json b/src/r1-v/local_scripts/zero2.json
similarity index 100%
rename from src/open-r1-multimodal/local_scripts/zero2.json
rename to src/r1-v/local_scripts/zero2.json
diff --git a/src/open-r1-multimodal/local_scripts/zero3.json b/src/r1-v/local_scripts/zero3.json
similarity index 100%
rename from src/open-r1-multimodal/local_scripts/zero3.json
rename to src/r1-v/local_scripts/zero3.json
diff --git a/src/open-r1-multimodal/local_scripts/zero3.yaml b/src/r1-v/local_scripts/zero3.yaml
similarity index 100%
rename from src/open-r1-multimodal/local_scripts/zero3.yaml
rename to src/r1-v/local_scripts/zero3.yaml
diff --git a/src/open-r1-multimodal/local_scripts/zero3_offload.json b/src/r1-v/local_scripts/zero3_offload.json
similarity index 100%
rename from src/open-r1-multimodal/local_scripts/zero3_offload.json
rename to src/r1-v/local_scripts/zero3_offload.json
diff --git a/src/open-r1-multimodal/run_grpo.sh b/src/r1-v/run_grpo.sh
similarity index 96%
rename from src/open-r1-multimodal/run_grpo.sh
rename to src/r1-v/run_grpo.sh
index d2c5ea0..4c5b21e 100644
--- a/src/open-r1-multimodal/run_grpo.sh
+++ b/src/r1-v/run_grpo.sh
@@ -1,4 +1,4 @@
-cd src/open-r1-multimodal
+cd src/r1-v
 
 export DEBUG_MODE="true"
 export LOG_PATH="./debug_log_2b.txt"
diff --git a/src/open-r1-multimodal/setup.cfg b/src/r1-v/setup.cfg
similarity index 100%
rename from src/open-r1-multimodal/setup.cfg
rename to src/r1-v/setup.cfg
diff --git a/src/open-r1-multimodal/setup.py b/src/r1-v/setup.py
similarity index 89%
rename from src/open-r1-multimodal/setup.py
rename to src/r1-v/setup.py
index 0395965..a847d9e 100644
--- a/src/open-r1-multimodal/setup.py
+++ b/src/r1-v/setup.py
@@ -61,7 +61,7 @@
     "safetensors>=0.3.3",
     "sentencepiece>=0.1.99",
     "torch>=2.5.1",
-    "transformers @ git+https://github.com/huggingface/transformers.git@main",
+    "transformers @ git+https://github.com/huggingface/transformers.git@336dc69d63d56f232a183a3e7f52790429b871ef",
     "trl==0.14.0",
     "vllm==0.6.6.post1",
     "wandb>=0.19.1",
@@ -106,16 +106,12 @@ def deps_list(*pkgs):
 ]
 
 setup(
-    name="open-r1",
-    version="0.1.0.dev0",  # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
-    author="The Hugging Face team (past and future)",
-    author_email="lewis@huggingface.co",
-    description="Open R1",
-    long_description=open("README.md", "r", encoding="utf-8").read(),
-    long_description_content_type="text/markdown",
-    keywords="llm inference-time compute reasoning",
+    name="r1-v",
+    version="0.1.0",  # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
+    author="The r1-v team and the Hugging Face team (past and future)",
+    description="R1-V",
     license="Apache",
-    url="https://github.com/huggingface/open-r1",
+    url="https://github.com/Deep-Agent/R1-V",
     package_dir={"": "src"},
     packages=find_packages("src"),
     zip_safe=False,
diff --git a/src/open-r1-multimodal/src/open_r1/__init__.py b/src/r1-v/src/open_r1/__init__.py
similarity index 100%
rename from src/open-r1-multimodal/src/open_r1/__init__.py
rename to src/r1-v/src/open_r1/__init__.py
diff --git a/src/open-r1-multimodal/src/open_r1/evaluate.py b/src/r1-v/src/open_r1/evaluate.py
similarity index 100%
rename from src/open-r1-multimodal/src/open_r1/evaluate.py
rename to src/r1-v/src/open_r1/evaluate.py
diff --git a/src/open-r1-multimodal/src/open_r1/generate.py b/src/r1-v/src/open_r1/generate.py
similarity index 100%
rename from src/open-r1-multimodal/src/open_r1/generate.py
rename to src/r1-v/src/open_r1/generate.py
diff --git a/src/open-r1-multimodal/src/open_r1/grpo.py b/src/r1-v/src/open_r1/grpo.py
similarity index 100%
rename from src/open-r1-multimodal/src/open_r1/grpo.py
rename to src/r1-v/src/open_r1/grpo.py
diff --git a/src/open-r1-multimodal/src/open_r1/sft.py b/src/r1-v/src/open_r1/sft.py
similarity index 100%
rename from src/open-r1-multimodal/src/open_r1/sft.py
rename to src/r1-v/src/open_r1/sft.py
diff --git a/src/open-r1-multimodal/src/open_r1/trainer/__init__.py b/src/r1-v/src/open_r1/trainer/__init__.py
similarity index 100%
rename from src/open-r1-multimodal/src/open_r1/trainer/__init__.py
rename to src/r1-v/src/open_r1/trainer/__init__.py
diff --git a/src/open-r1-multimodal/src/open_r1/trainer/grpo_trainer.py b/src/r1-v/src/open_r1/trainer/grpo_trainer.py
similarity index 100%
rename from src/open-r1-multimodal/src/open_r1/trainer/grpo_trainer.py
rename to src/r1-v/src/open_r1/trainer/grpo_trainer.py
diff --git a/src/open-r1-multimodal/src/open_r1/trainer/vllm_grpo_trainer.py b/src/r1-v/src/open_r1/trainer/vllm_grpo_trainer.py
similarity index 100%
rename from src/open-r1-multimodal/src/open_r1/trainer/vllm_grpo_trainer.py
rename to src/r1-v/src/open_r1/trainer/vllm_grpo_trainer.py
diff --git a/src/open-r1-multimodal/temp_image.png b/src/r1-v/temp_image.png
similarity index 100%
rename from src/open-r1-multimodal/temp_image.png
rename to src/r1-v/temp_image.png