Table of Contents
Paper | Base Language Model | Code | Publication | Preprint | Affiliation |
---|---|---|---|---|---|
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers | GPT2-medium | mesh-gpt | 2311.15475 | Technical University of Munich | |
Uni3D: Exploring Unified 3D Representation at Scale | CLIP | Uni3D | 2310.06773 | BAAI | |
Point-Bind & Point-LLM: Aligning 3D with Multi-modality | LLaMA | Point-Bind & Point-LLM | 2309.00615 | Shanghai AI Lab | |
PointLLM: Empowering Large Language Models to Understand Point Clouds | Vicuna | PointLLM | 2308.16911 | Shanghai AI Lab | |
RT-2: New model translates vision and language into action | BLIP2 | 3D-LLM | 2307.12981 | UMASS |
- SceneVerse, We propose SceneVerse, the first million-scale 3D vision-language dataset with 68K 3D indoor scenes and 2.5M vision-language pairs.
- GPTEval3D, An implementation of the paper "GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation". This contains an evaluation metric for text-to-3D generative models.