Skip to content

Latest commit

 

History

History

InternVideo2.5

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

InternVideo2.5 [Paper]

This repo will give the code and models of 'InternVideo2.5: Empowering Video MLLMS with long and rich context modeling'. InternVideo2.5 is a video multimodal large language model (MLLM, built upoon InternVL2.5) enhanced with long and rich context (LRC) modeling. It significantly improves upon existing MLLMs by enhancing their ability to perceive fine-grained details and capture long-form temporal structures. We achieve this through dense vision task annotations using direct preference optimization (TPO) and compact spatiotemporal representations via adaptive hierarchical token compression (HiCo).

Our experiments demonstrate substantial performance gains on mainstream short and long video understanding benchmarks. InternVideo2.5 can memorize video inputs at least 6x longer than the original model and exhibits specialized vision capabilities like object tracking and segmentation. This work highlights the importance of rich multimodal context (length and detail) for empowering MLLM focus and memory, offering valuable insights for future video MLLM research.

Updates

yoga-iv2.2.mp4
-.-.mp4
car-iv2.5.mp4
teach-install.mp4

Model Zoo

MLLM Link MVBench Perception Test LongVideoBench MLVU VideoMME LVBench #Tokens per frame #Params
InternVideo2.5 huggingface 75.7 74.9 60.6 72.8 65.1 46.4 16 8B
InternVL2.5 + HiCo huggingface 74.0 71.4 59.6 71.5 64.9 - 16 8B
InternVL2.5 + HiCo huggingface 74.4 71.9 62.7 72.6 66.4 - 64 8B

Citation

If this work is helpful for your research, please consider citing InternVideo2.5.

@article{wang2025internvideo,
  title={InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling},
  author={Wang, Yi and Li, Xinhao and Yan, Ziang and He, Yinan and Yu, Jiashuo and Zeng, Xiangyu and Wang, Chenting and Ma, Changlian and Huang, Haian and Gao, Jianfei and Dou, Min and Chen, Kai and Wang, Wenhai and Qiao, Yu and Wang, Yali and Wang, Limin},
  journal={arXiv preprint arXiv:2501.12386},
  year={2025}
}