Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stage 2 : RuntimeError: The expanded size of the tensor #41

Open
cksthf3211 opened this issue Nov 1, 2024 · 0 comments
Open

Stage 2 : RuntimeError: The expanded size of the tensor #41

cksthf3211 opened this issue Nov 1, 2024 · 0 comments

Comments

@cksthf3211
Copy link

I'm encountering a RuntimeError during training related to tensor size mismatch. Below is the traceback for the error

PyTorch version: 2.1.0+cu118
CUDA version: 11.8
Python 3.11.10
Ubuntu 18.04

Traceback (most recent call last): File "/media/path/SmartEdit-main/train/DS_MLLMSD11_train.py", line 712, in <module> train() File "/media/path/SmartEdit-main/train/DS_MLLMSD11_train.py", line 501, in train model_.load_pretrain_MLLM_alignment(SD_QFormer_conversation_33tokens=SD_QFormer_conversation_33tokens, LLaVA_00002=LLaVA_00002) File "/media/path/SmartEdit-main/model/DS_MLLMSD11_model.py", line 221, in load_pretrain_MLLM_alignment self.lm_head.weight.data[-self.config.num_new_tokens:] = LLaMA_lm_haed ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: The expanded size of the tensor (35) must match the existing size (33) at non-singleton dimension 0. Target sizes: [35, 4096]. Tensor sizes: [33, 4096] [2024-11-01 11:52:23,929] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 149061

The error occurs when trying to assign LLaMA_lm_haed tensor to self.lm_head.weight.data[-self.config.num_new_tokens:]. The size of num_new_tokens is set to 35, but LLaMA_lm_haed has a size of 33. This causes a dimension mismatch, resulting in the RuntimeError

The tensor sizes should match during assignment to prevent this error.

Run the training script with the following configuration

`bash scripts/MLLMSD_7b.sh

wandb disabled
export WANDB_DISABLED=true

checkpoint-150000_embeddings_qformer.bin -> checkpoint-50000.bin

deepspeed --include localhost:0 --master_addr 127.0.0.1 --master_port 28457 train/DS_MLLMSD11_train.py
--max_steps 5000
--model_name_or_path ./checkpoints/vicuna-7b-v1-1
--LLaVA_00001 "./checkpoints/LLaVA-7B-v1/pytorch_model-00001-of-00002.bin"
--LLaVA_00002 "./checkpoints/LLaVA-7B-v1/pytorch_model-00002-of-00002.bin"
--LLaVA_model_path "./checkpoints/LLaVA-7B-v1"
--sd_qformer_version "v1.1-7b"
--unet_ckpt "./checkpoints/InstructDiffusion_diffusers/unet/diffusion_pytorch_model.bin"
--bf16 True
--tf32 True
--output_dir ./checkpoints/stage2_MLLMSD_7b
--num_train_epochs 20
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 4
--evaluation_strategy 'no'
--save_strategy 'steps'
--save_steps 5000
--save_total_limit 3
--learning_rate 1e-5
--lr_scheduler_type 'cosine'
--weight_decay 0.
--warmup_ratio 0.001
--logging_steps 1
--model_max_length 2048
--gradient_checkpointing True
--dataloader_num_workers 16
--ddp_find_unused_parameters True
--SD_QFormer_conversation_33tokens "./checkpoints/stage1_CC12M_alignment_7b/embeddings_qformer/checkpoint-50000.bin"
--InstructPix2PixDataset_path "./dataset/InstructPix2PixCLIPFiltered_HF"
--MagicBrushDataset_path "./dataset/MagicBrush_HF"
--LLaVADataset_data_path "./dataset/LLaVA/llava_instruct_150k.json"
--LLaVADataset_image_folder "./dataset/coco/train2017"
--refcoco_path "./dataset/refcoco"
--grefcoco_path "./dataset/grefcoco"
--coco_image_path "./dataset/coco"
--COCOStuff_mask_path "./dataset/cocostuff"
--ReasoningEditingDataset_path "./dataset/SyntheticData/SyntheticData_info_new.json"
--ReasoningSegmentationDataset_json_path "./dataset/reason_seg/train"
--ReasoningSegmentationDataset_image_path "./dataset/reason_seg/train"
--ReasoningSegmentationDataset_binary_mask_path "./dataset/reason_seg/train_binary_mask"
--deepspeed scripts/zero2_mixed.json `

How do you solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant