there are lots of bugs in TrainStage1 #33

Bilibilee · 2024-09-27T09:51:42Z

Thank you for your excellent work, but the open-source code indeed has many minor issues, which makes others hesitant to follow your work.
During the TrainStage1 phase, the issues are as follows:

code torchrun --nproc_per_node=8 --master_port=20001 fastchat/train/TrainStage1.py fastchat directory seemingly doesn't exist,it should be train/TrainStage1.py.
code load_LLaVA_ckpt_v1_1 should be load_LLaVA_ckpt_v1_1_7b.
code SD_QFormer_conversation_33tokens ckpt doesn't have mm_projector module, which didn't used in train stage 1.

Could you provide Trainstage1 result checkpoint.

The text was updated successfully, but these errors were encountered:

yuzhou914 · 2024-09-28T05:33:23Z

Thanks for your interest in our work. There might be some small code typos when we push on github, while you could simply fix them for further usage.

Bilibilee · 2024-09-30T10:33:05Z

Hello, I am confused about the inconsistencies between the first training stage and the MLLMSD training stage:

In the first training stage, the LLama checkpoint is loaded, and 33 new tokens are added (<img>,<img_0>,...,<img_31>), with only the llm_head weight and embed_token weight corresponding to the new tokens being trained.
In the MLLMSD training stage, the LLava checkpoint is loaded, and 35 new tokens are added (<img>,<img_start>,<img_end>,<img_0>,...<img_31>).

This discrepancy in the number of new tokens causes the MLLMSD model's load_pretrain_MLLM_alignment function to fail.

In the first training stage, the LLama checkpoint is loaded, but in the MLLMSD training stage, the LLava checkpoint is loaded, which is puzzling. Why not directly align LLava with CLIP?"

XuwuChen443 · 2025-01-16T06:09:16Z

@Bilibilee Hi, I'm encountering the same issue regarding the token inconsistency between training stages. Could you share how you resolved this, specifically:

How did you handle the token mismatch (33 vs 35 tokens) when loading the checkpoint?
What modifications were needed in the load_pretrain_MLLM_alignment function?

Any insights would be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

there are lots of bugs in TrainStage1 #33

there are lots of bugs in TrainStage1 #33

Bilibilee commented Sep 27, 2024

yuzhou914 commented Sep 28, 2024

Bilibilee commented Sep 30, 2024

XuwuChen443 commented Jan 16, 2025

there are lots of bugs in TrainStage1 #33

there are lots of bugs in TrainStage1 #33

Comments

Bilibilee commented Sep 27, 2024

yuzhou914 commented Sep 28, 2024

Bilibilee commented Sep 30, 2024

XuwuChen443 commented Jan 16, 2025