This repository automatically fetches new or updated arXiv papers in the [cs.CV] category every day, checks if they are relevant to "3D reconstruction" or "3D generation" via ChatGPT, and lists them below.
- A GitHub Actions workflow runs daily at 09:00 UTC.
- It uses the script fetch_cv_3d_papers.py to:
- Retrieve the latest arXiv papers in cs.CV.
- Use ChatGPT to filter out those related to 3D reconstruction/generation.
- Update this README.md with the new findings.
- Send an email via 163 Mail if any relevant papers are found.
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.16419 DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion [{'name': 'Jianbin Jiao, Xina Cheng, Kailun Yang, Xiangrong Zhang, Licheng Jiao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D human pose estimation multi-view perception deficiency-aware estimation |
Input: Multi-view images 多视角图像 Step 1: Simplification of network architecture 网络结构简化 Step 2: Feature extraction from images 从图像中提取特征 Step 3: Adaptive multi-view feature fusion 自适应多视角特征融合 Output: 3D human pose estimations 3D 人体姿态估计 |
9.5 | [9.5] 2502.16475 Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control [{'name': 'Jinbo Yan, Alan Zhao, Yixin Hu'}] |
3D Generation 三维生成 | v2 3D Generation 3D生成 Geometric Consistency 几何一致性 User Control 用户控制 |
Input: Single image and point cloud 单幅图像和点云 Step1: Generate sparse seed points 生成稀疏种子点 Step2: Map seed points to anchor latents 映射种子点到锚潜在 Step3: Generate 3D Gaussian representations 生成3D高斯表示 Output: Multi-view consistent 3D models 输出:多视角一致的3D模型 |
9.5 | [9.5] 2502.16488 Geometry-Aware 3D Salient Object Detection Network [{'name': 'Chen Wang, Liyuan Zhang, Le Hui, Qi Liu, Yuchao Dai'}] |
3D Salient Object Detection 三维显著目标检测 | v2 3D salient object detection point cloud geometry-aware |
Input: 3D point clouds 3D点云 Step1: Superpoint partitioning 超点划分 Step2: Point feature learning 点特征学习 Step3: Geometry enhancement geometry enhancement Output: Salient object map 显著目标图 |
9.5 | [9.5] 2502.16575 Efficient 4D Gaussian Stream with Low Rank Adaptation [{'name': 'Zhenhuan Liu, Shuai Liu, Yidong Lu, Yirui Chen, Jie Yang, Wei Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 Dynamic novel view synthesis 动态新视图合成 3D Gaussian Splatting 3D高斯点云 |
Input: Video frames 视频帧 Step1: Scene representation using 3D Gaussians 使用3D高斯表示场景 Step2: Low-rank adaptation for bandwidth reduction 低秩适应以减少带宽 Step3: Continuous dynamic reconstruction 进行连续动态重建 Output: Scalable dynamic novel views 可扩展的动态新视图 |
9.5 | [9.5] 2502.16652 Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration [{'name': 'Kim Jun-Seong, GeonU Kim, Kim Yu-Ji, Yu-Chiang Frank Wang, Jaesung Choe, Tae-Hyun Oh'}] |
3D Scene Understanding 三维场景理解 | v2 3D Gaussian Splatting open-vocabulary scene understanding language embedding 3D reconstruction 3D perception |
Input: 3D Gaussians 3D高斯点云 Step1: Feature registration 特征注册 Step2: Direct language embedding association 直接语言嵌入关联 Step3: Evaluation of 3D perception tasks 3D感知任务评估 Output: Enhanced scene understanding 改进的场景理解 |
9.5 | [9.5] 2502.16826 Noise2Score3D:Unsupervised Tweedie's Approach for Point Cloud Denoising [{'name': 'Xiangbin Wei'}] |
Point Cloud Processing 点云处理 | v2 point cloud denoising unsupervised learning Tweedie's formula |
Input: Noisy point cloud data 含噪点云数据 Step1: Gradient learning from noisy data 从含噪数据中学习梯度 Step2: Single-step denoising using Tweedie's formula 使用Tweedie公式进行单步去噪 Output: Denoised point cloud 输出: 去噪点云 |
9.5 | [9.5] 2502.17053 PointSea: Point Cloud Completion via Self-structure Augmentation [{'name': 'Zhe Zhu, Honghua Chen, Xing He, Mingqiang Wei'}] |
3D Reconstruction and Modeling 三维重建 | v2 Point Cloud Completion 点云补全 Self-structure Augmentation 自结构增强 |
Input: Incomplete point cloud data 不完整的点云数据 Step1: Data augmentation 数据增强 Step2: Self-view fusion network self-view融合网络 Step3: Feature fusion feature融合 Step4: Point generation point生成 Output: Completed point cloud 完成的点云 |
9.5 | [9.5] 2502.17288 GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow [{'name': 'Simon Boeder, Fabian Gigengack, Benjamin Risse'}] |
3D Reconstruction and Modeling 三维重建 | v2 occupancy estimation 3D Gaussian representation autonomous driving Gaussian Splatting |
Input: Multi-view images 多视角图像 Step1: Construct a sparse 3D Gaussian representation 构建稀疏的3D高斯表示 Step2: Integrate temporal flow estimation 整合时间流估计 Step3: Utilize Gaussian Splatting for training 采用高斯点云训练 Output: Efficient occupancy estimation 高效的占用率估计 |
9.5 | [9.5] 2502.17377 Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting [{'name': 'Chong Cheng, Gaochao Song, Yiyang Yao, Qinzheng Zhou, Gangjian Zhang, Hao Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Splatting scene reconstruction autonomous driving |
Input: Images captured by RGB cameras RGB摄像机拍摄的图像 Step1: Spatial prior-based scene structure estimation 空间先验场景结构估计 Step2: Camera graph creation 相机图创建 Step3: Graph-guided optimization guided by multi-view consistency graph-guided多视角一致性优化 Output: High-fidelity 3D reconstruction of scenes 改进的大规模三维场景重建 |
9.5 | [9.5] 2502.17429 CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation [{'name': 'Vishal Thengane, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Lu Yin, Xiatian Zhu, Salman Khan'}] |
3D Instance Segmentation 3D实例分割 | v2 3D instance segmentation 3D实例分割 continual learning 持续学习 class imbalance 类别不平衡 |
Input: RGB-D images with 3D instance annotations 采用带有3D实例标注的RGB-D图像 Step1: Implement a unified framework 实现统一框架 Step2: Integrate Exemplar Replay, Knowledge Distillation, and Imbalance Correction 集成样本重放、知识蒸馏和不平衡修正 Step3: Create benchmark scenarios for evaluation 创建基准场景进行评估 Output: Improved 3D instance segmentation performance 提高3D实例分割性能 |
9.0 | [9.0] 2502.16779 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model [{'name': 'Yaxuan Huang, Xili Dai, Jianan Wang, Xianbiao Qi, Yixing Yuan, Xiangyu Yue'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction room layout estimation multi-view geometry DUSt3R autonomous systems |
Input: Multi-view images 多视角图像 Step1: 2D plane detection 2D 平面检测 Step2: 3D point representation 密集 3D 点表示 Step3: Plane correspondence establishment 平面对应关系的建立 Output: Estimated room layout 估计的房间布局 |
9.0 | [9.0] 2502.16907 MambaFlow: A Novel and Flow-guided State Space Model for Scene Flow Estimation [{'name': 'Jiehao Luo, Jintao Cheng, Xiaoyu Tang, Qingwen Zhang, Bohuan Xue, Rui Fan'}] |
Scene Flow Estimation 场景流估计 | v2 Scene Flow Estimation State Space Model 3D motion |
Input: Consecutive point cloud frames 连续点云帧 Step1: Model design 模型设计 Step2: Feature extraction 特征提取 Step3: Scene flow estimation 场景流估计 Output: Motion vectors 运动向量 |
9.0 | [9.0] 2502.17237 MegaLoc: One Retrieval to Place Them All [{'name': 'Gabriele Berton, Carlo Masone'}] |
Visual Place Recognition 视觉位置识别 | v2 3D reconstruction Visual Place Recognition Image Retrieval SLAM |
Input: Diverse image datasets 多样的图像数据集 Step1: Data integration 数据集成 Step2: Model training using combined techniques 模型训练结合多种技术 Step3: Evaluation across multiple tasks 在多项任务中评估模型 Output: Robust image retrieval model 可靠的图像检索模型 |
8.5 | [8.5] 2502.15888 Understanding and Evaluating Hallucinations in 3D Visual Language Models [{'name': 'Ruiying Peng, Kaiyuan Li, Weichen Zhang, Chen Gao, Xinlei Chen, Yong Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D-LLMs hallucinations evaluation metrics point cloud |
Input: 3D point-cloud data 3D点云数据 Step1: Definition of 3D hallucinations 3D幻觉的定义 Step2: Evaluation of hallucinations in 3D-LLMs 对3D-LLMs中的幻觉进行评估 Step3: Analysis of underlying causes 分析潜在原因 Output: New evaluation metrics for hallucinations 针对幻觉的新评估指标 |
8.5 | [8.5] 2502.16012 Cross-Model Transferability of Adversarial Patches in Real-time Segmentation for Autonomous Driving [{'name': 'Prashant Shekhar, Bidur Devkota, Dumindu Samaraweera, Laxima Niure Kandel, Manoj Babu'}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving adversarial attacks semantic segmentation |
Input: Adversarial patch training adversarial 图像片段训练 Step1: Attack formulation 攻击形式化 Step2: Model performance analysis 模型性能分析 Step3: Cross-model transferability evaluation 跨模型转移性评估 Output: Insights on attack susceptibility 输出: 攻击易受性的洞见 |
8.5 | [8.5] 2502.16164 A Deep Learning Framework with Geographic Information Adaptive Loss for Remote Sensing Images based UAV Self-Positioning [{'name': 'Mingkun Li, Ziming Wang, Guang Huo, Wei Chen, Xiaoning Zhao'}] |
Autonomous Systems and Robotics 自动驾驶与机器人 | v2 UAV self-positioning remote sensing deep learning |
Input: Remote sensing images and UAV images 遥感图像与无人机图像 Step1: Data alignment 数据对齐 Step2: Adaptive loss integration 自适应损失集成 Step3: Model evaluation 模型评估 Output: Precise UAV positioning output 精确的无人机定位输出 |
8.5 | [8.5] 2502.16214 SalM$2$: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention [{'name': 'Chunyu Zhao, Wentao Mu, Xian Zhou, Wenbo Liu, Fei Yan, Tao Deng'}] |
Autonomous Driving 自动驾驶 | v2 driver attention recognition real-time model semantic information |
Input: Driving scene data 驾驶场景数据 Step 1: Extract bottom-up image features 提取自下而上的图像特征 Step 2: Extract top-down semantic information 提取自上而下的语义信息 Step 3: Integrate extracted features and map driver attention 集成提取的特征并映射驾驶者注意力 Output: Driver attention map 驾驶员注意力图 |
8.5 | [8.5] 2502.16302 DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation [{'name': 'Yuxuan Xiong, Yue Shi, Yishun Dou, Bingbing Ni'}] |
3D Scene Editing 三维场景编辑 | v2 3D scene editing Neural Radiance Fields Text-driven generation |
Input: Text instructions and 3D scene representation 文本指令与三维场景表示 Step1: Introduce dual-field representation 引入双场表示 Step2: Simulated annealing strategy implementation 模拟退火策略实现 Step3: Apply CLIP-based consistency indicator 应用CLIP一致性指标 Output: Edited 3D scenes with preserved backgrounds 输出:保留背景的编辑后的三维场景 |
8.5 | [8.5] 2502.16303 Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field [{'name': 'Wenhao Hu, Wenhao Chai, Shengyu Hao, Xiaotong Cui, Xuexiang Wen, Jenq-Neng Hwang, Gaoang Wang'}] |
3D Segmentation 3D分割 | v2 3D segmentation Gaussian segmentation autonomous driving |
Input: Multi-view images 多视角图像 Step1: Establish pixel correspondence 建立像素对应关系 Step2: Optimize mask association using the Hungarian algorithm 使用匈牙利算法优化掩膜关联 Step3: Apply piecewise-plane constraints 施加分段平面约束 Output: Consistent and compact 3D segmentation field 一致且紧凑的3D分割场 |
8.5 | [8.5] 2502.16351 AquaNeRF: Neural Radiance Fields in Underwater Media with Distractor Removal [{'name': 'Luca Gough, Adrian Azzarelli, Fan Zhang, Nantheera Anantrasirichai'}] |
3D Reconstruction and Modeling 三维重建 | v2 Neural Radiance Fields 3D reconstruction Underwater imaging |
Input: Underwater scenes focusing on static objects 水下静态物体场景 Step1: Model the cumulative density of volumes along a ray 建模沿光线的体积累积密度 Step2: Apply Gaussian distribution for transmittance modeling 应用高斯分布建模透过率 Step3: Optimize the Gaussian distribution for stable rendering 优化高斯分布以实现稳定渲染 Output: Enhanced 3D representation of underwater scenes 改进的水下场景的三维表示 |
8.5 | [8.5] 2502.16389 An Expert Ensemble for Detecting Anomalous Scenes, Interactions, and Behaviors in Autonomous Driving [{'name': 'Tianchen Ji, Neeloy Chakraborty, Andre Schreiber, Katherine Driggs-Campbell'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人 | v2 anomaly detection autonomous driving |
Input: Egocentric videos 自我中心视频 Step1: Scene analysis 场景分析 Step2: Expert model development 专家模型开发 Step3: Anomaly score fusion 异常分数融合 Output: Anomaly detection scores 异常检测分数 |
8.5 | [8.5] 2502.16915 Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model [{'name': 'Kang Fu, Huiyu Duan, Zicheng Zhang, Xiaohong Liu, Xiongkuo Min, Jia Wang, Guangtao Zhai'}] |
3D Reconstruction 三维重建 | v2 text-to-3D generation quality assessment 3D modeling |
Input: 3D assets generated via text prompts 生成的3D资产与文本提示 Step1: Database creation 数据库创建 Step2: Quality feature extraction 质量特征提取 Step3: Model evaluation and benchmarking 模型评估与基准测试 Output: Quality assessment scores 质量评估分数 |
8.5 | [8.5] 2502.16941 Gaussian Difference: Find Any Change Instance in 3D Scenes [{'name': 'Binbin Jiang, Rui Huang, Qingyi Zhao, Yuxiang Zhang'}] |
3D Change Detection 三维变化检测 | v2 3D change detection Gaussian distributions instance segmentation |
Input: Multi-view images 多视角图像 Step1: Embed images into 4D Gaussians 将图像嵌入4D高斯中 Step2: Segment images and assign IDs 分割图像并分配ID Step3: Compare IDs for change detection 比较ID以进行变化检测 Output: Change maps from any viewpoint 从任何视点生成变化图 |
8.5 | [8.5] 2502.16992 Semantic Neural Radiance Fields for Multi-Date Satellite Data [{'name': 'Valentin Wagner, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields multi-date satellite images 3D reconstruction |
Input: Multi-date satellite images with semantic labels 多日期卫星图像和语义标签 Step1: Model adaptation for satellite images 针对卫星图像的模型适应 Step2: Semantic and color fusion 语义与颜色融合 Step3: Robustness evaluation and improvement 可靠性评估与改进 Output: 3D semantic representations 输出三维语义表示 |
8.5 | [8.5] 2502.17039 LCV2I: Communication-Efficient and High-Performance Collaborative Perception Framework with Low-Resolution LiDAR [{'name': 'Xinxin Feng, Haoran Sun, Haifeng Zheng, Huacong Chen, Wenqiang Chen'}] |
Autonomous Driving 自动驾驶 | v2 3D object detection LiDAR collaborative perception |
Input: Data collected from low-resolution LiDAR and cameras 低分辨率LiDAR和相机收集的数据 Step1: Feature extraction 特征提取 Step2: Voxel-wise fusion voxel级融合 Step3: Feature offset correction 特征偏移矫正 Step4: Regional feature enhancement 区域特征增强 Output: Enhanced 3D object detection 改进的三维物体检测 |
8.0 | [8.0] 2502.15956 Human Motion Prediction, Reconstruction, and Generation [{'name': 'Canxuan Gang, Yiran Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 human motion prediction 3D reconstruction motion generation |
Input: Historical motion data 历史运动数据 Step1: Pose forecasting 姿势预测 Step2: 3D motion reconstruction 三维运动重建 Step3: Motion generation 运动生成 Output: Realistic human motion sequences 真实的人类动作序列 |
7.5 | [7.5] 2502.16427 Fine-Grained Video Captioning through Scene Graph Consolidation [{'name': 'Sanghyeok Chu, Seonguk Seo, Bohyung Han'}] |
VLM & VLA 视觉语言模型与视觉语言对齐 | v2 video captioning visual-language models scene graphs |
Input: Video frames 视频帧 Step1: Generate frame-level captions using an image VLM 使用图像视觉语言模型生成帧级字幕 Step2: Convert captions into scene graphs 将字幕转换为场景图 Step3: Consolidate frame-level scene graphs into a video-level scene graph 将帧级场景图整合为视频级场景图 Output: Comprehensive video captions 生成综合视频字幕 |
7.5 | [7.5] 2502.16493 Trunk-branch Contrastive Network with Multi-view Deformable Aggregation for Multi-view Action Recognition [{'name': 'Yingyuan Yang, Guoyuan Liang, Can Wang, Xiaojun Wu'}] |
Multi-view Stereo 多视角立体 | v2 Multi-view action recognition Contrastive learning Feature fusion |
Input: Multi-view RGB images 多视角RGB图像 Step1: Feature aggregation 特征聚合 Step2: Contrastive learning against trunk features 对比学习 Step3: Model evaluation on datasets 模型在数据集上的评估 Output: Enhanced action representations 改进的动作表征 |
7.5 | [7.5] 2502.16618 Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI? [{'name': 'Qipan Xu, Zhenting Wang, Xiaoxiao He, Ligong Han, Ruixiang Tang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models copyright detection Generative AI |
Input: Image samples 图像样本 Step1: Dataset creation 数据集创建 Step2: Model evaluation 模型评估 Step3: Analysis of failure cases 失败案例分析 Output: Proposed solutions 提出的解决方案 |
6.5 | [6.5] 2502.16368 Concept Corrector: Erase concepts on the fly for text-to-image diffusion models [{'name': 'Zheling Meng, Bo Peng, Xiaochuan Jin, Yueming Lyu, Wei Wang, Jing Dong'}] |
Image Generation 图像生成 | v2 concept erasure text-to-image generation |
Input: Intermediate-generated images 中间生成图像 Step1: Concept presence checking 概念存在检查 Step2: Concept removal correction 概念移除修正 Output: Corrected images 修正后的图像 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.14891 CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection [{'name': 'Zhe Huang, Shuo Wang, Yongcai Wang, Lei Wang'}] |
3D Object Detection 3D物体检测 | v2 3D object detection autonomous driving diffusion models |
Input: Point clouds from multiple agents 多个代理的点云 Step1: Feature extraction from point clouds 从点云中提取特征 Step2: Information sharing between agents 代理之间共享信息 Step3: Noise reduction using diffusion models 使用扩散模型进行噪声减少 Output: Accurate collaborative 3D object detection 准确的协作3D物体检测 |
9.5 | [9.5] 2502.14938 GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models [{'name': 'Miao Tao, Yuanzhen Zhou, Haoran Xu, Zeyu He, Zhenyu Yang, Yuchang Zhang, Zhongling Su, Linning Xu, Zhenxiang Ma, Rong Fu, Hengjie Li, Xingcheng Zhang, Jidong Zhai'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting neural rendering real-time rendering virtual reality |
Input: Large-scale 3D Gaussian Splatting models 大规模3D高斯点云模型 Step1: Design cache-centric rendering pipeline 设计基于缓存的渲染管线 Step2: Implement multi-GPU scheduling 实现多GPU调度 Step3: Optimize CUDA kernels to enhance performance 优化CUDA内核以提高性能 Output: Real-time rendered 3D scenes 实时渲染的3D场景 |
9.5 | [9.5] 2502.14940 FacaDiffy: Inpainting Unseen Facade Parts Using Diffusion Models [{'name': 'Thomas Froech, Olaf Wysocki, Yan Xia, Junyu Xie, Benedikt Schwab, Daniel Cremers, Thomas H. Kolbe'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction image inpainting Stable Diffusion conflict maps diffusion models |
Input: 3D building models and laser scanning point clouds 3D建筑模型和激光扫描点云 Step1: Deriving 2D conflict maps by deterministic ray analysis 通过确定性光线分析推导二维冲突图 Step2: Personalizing a Stable Diffusion model for inpainting 个性化稳定扩散模型进行修复 Step3: Generating synthetic conflict maps for training 生成合成冲突图用于训练 Output: Completed conflict maps for 3D semantic reconstruction 输出用于三维语义重建的完整冲突图 |
9.5 | [9.5] 2502.15011 CrossOver: 3D Scene Cross-Modal Alignment [{'name': 'Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Daniel Barath, Iro Armeni'}] |
3D Scene Understanding 三维场景理解 | v2 3D scene understanding cross-modal alignment point clouds |
Input: Multi-modal 3D data 多模态三维数据 Step1: Flexible modality alignment 灵活的模态对齐 Step2: Unified embedding space learning 统一嵌入空间学习 Step3: Scene retrieval and object localization 场景检索与物体定位 Output: Enhanced scene understanding 改进的场景理解 |
9.5 | [9.5] 2502.15076 Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving Scenarios [{'name': 'Richard Marcus, Christian Vogel, Inga Jatzkowski, Niklas Knoop, Marc Stamminger'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D object detection 3D目标检测 LiDAR Synthetic data 合成数据 Domain randomization 域随机化 Autonomous driving 自动驾驶 |
Input: LiDAR point clouds and synthetic data 输入: LiDAR点云和合成数据 Step1: Sensor modeling 传感器建模 Step2: Domain randomization 域随机化 Step3: Object detection training 目标检测训练 Step4: Performance evaluation 性能评估 Output: Enhanced object detection模型输出: 改进的目标检测 |
9.5 | [9.5] 2502.15438 LEAP: Enhancing Vision-Based Occupancy Networks with Lightweight Spatio-Temporal Correlation [{'name': 'Fengcheng Yu, Haoran Xu, Canming Xia, Guang Tan'}] |
3D Scene Reconstruction 3D场景重建 | v2 3D occupancy networks 3D占用网络 autonomous driving 自动驾驶 spatio-temporal correlation 空间时间相关性 |
Input: Multi-view images 多视角图像 Step1: Tokenization of baseline and motion features 基线和运动特征的标记化 Step2: Tri-stream fusion architecture for correlation establishment 三流融合架构进行关系建立 Step3: Generation of occupancy results 生成占用结果 Output: Enhanced occupancy predictions 改进的占用预测 |
9.5 | [9.5] 2502.15633 RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes [{'name': 'Sicheng Yu, Chong Cheng, Yifan Zhou, Xiaojun Yang, Hao Wang'}] |
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 | v2 RGB-only SLAM 3D Gaussian Splatting outdoor scenes pose estimation |
Input: RGB images RGB图像 Step1: Pointmap regression to generate spatial relationships 生成空间关系点图回归 Step2: Pose estimation based on pointmaps 使用点图进行位姿估计 Step3: 3D Gaussian Splatting for rendering 进行3D高斯喷涂渲染 Output: High-fidelity novel views 输出高保真新视图 |
9.5 | [9.5] 2502.15635 Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis [{'name': 'Ziqian Ni, Sicong Du, Zhenghua Hou, Chenming Wu, Sheng Yang'}] |
3D Reconstruction and Modeling 三维重建 | v2 novel view synthesis multi-lane dataset autonomous driving 3D reconstruction LiDAR |
Input: Multi-lane dataset containing LiDAR and camera data multi车道数据集,包括LiDAR和相机数据 Step1: Data collection通过多次扫描收集数据 Step2: Multi-sensor pose optimization多传感器姿态优化 Step3: Dataset registration数据集注册 Output: Evaluated novel view synthesis capabilities评估新的视图合成能力 |
9.2 | [9.2] 2502.15488 Q-PETR: Quant-aware Position Embedding Transformation for Multi-View 3D Object Detection [{'name': 'Jiangyong Yu, Changyong Shu, Dawei Yang, Zichen Yu, Xing Hu, Yan Chen'}] |
3D Object Detection 3D物体检测 | v2 3D object detection quantization autonomous driving |
Input: Multi-view images 多视角图像 Step1: Identify quantization issues 识别量化问题 Step2: Propose Q-PETR model 提出Q-PETR模型 Step3: Evaluate performance evaluation 性能评估 Output: Enhanced detection accuracy 提高的检测精度 |
9.2 | [9.2] 2502.15516 Depth-aware Fusion Method based on Image and 4D Radar Spectrum for 3D Object Detection [{'name': 'Yue Sun, Yeqiang Qian, Chunxiang Wang, Ming Yang'}] |
3D Object Detection 3D目标检测 | v2 3D object detection 3D目标检测 4D millimeter-wave radar 4D毫米波雷达 |
Input: 4D radar spectra and depth-aware camera images 4D雷达谱和深度感知相机图像 Step1: Feature extraction from RGB and depth images 从RGB和深度图像中提取特征 Step2: Feature fusion in BEV feature space 在鸟瞰特征空间中融合特征 Step3: 3D object detection using fused features 使用融合特征进行3D目标检测 Output: Enhanced 3D object detection results 改进的3D目标检测结果 |
8.8 | [8.8] 2502.15448 MVIP -- A Dataset and Methods for Application Oriented Multi-View and Multi-Modal Industrial Part Recognition [{'name': 'Paul Koch, Marian Schl\"uter, J\"org Kr\"uger'}] |
Multi-view and Stereo Vision 多视角与立体视觉 | v2 Multi-View Multi-Modal Industrial Part Recognition |
Input: Multi-view RGBD dataset 多视角RGBD数据集 Step1: Data acquisition 数据采集 Step2: Modality integration 模态集成 Step3: Model training and evaluation 模型训练与评估 Output: Robust industrial classifiers 稳健的工业分类器 |
8.5 | [8.5] 2502.14908 KOALA: Knowledge Conflict Augmentations for Robustness in Vision Language Models [{'name': 'Peter Carragher, Nikitha Rao, Abhinand Jha, R Raghav, Kathleen M. Carley'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models knowledge conflicts robustness |
Input: Visual Question Answering (VQA) with multimodal sources 视觉问答与多模态源 Step1: Introduce targeted perturbations 引入目标扰动 Step2: Evaluate model robustness 评估模型的鲁棒性 Step3: Fine-tune models to improve reasoning 优化模型以提高推理能力 Output: Enhanced understanding of knowledge conflicts 增强对知识冲突的理解 |
8.5 | [8.5] 2502.14917 Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning [{'name': 'Rui Zhao, Qirui Yuan, Jinyu Li, Haofeng Hu, Yun Li, Chengyuan Zheng, Fei Gao'}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving 3D spatial understanding multimodal learning |
Input: Local scene videos and global BEV maps 本地场景视频和全局鸟瞰图 Step1: Modal encoders align visual representations 模态编码器对齐视觉表示 Step2: Generate natural language responses 生成自然语言响应 Step3: Enhance model performance through extensive training 通过大量训练提升模型性能 Output: Improved perception and reasoning for autonomous driving 改进的感知和推理能力用于自动驾驶 |
8.5 | [8.5] 2502.15180 OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework [{'name': 'Junliang Chen, Huaiyuan Xu, Yi Wang, Lap-Pui Chau'}] |
Autonomous Driving 自动驾驶 | v2 occupancy forecasting autonomous driving 3D perception |
Input: Multi-camera video input 多摄像头视频输入 Step1: Feature extraction 特征提取 Step2: Future occupancy forecasting 未来占用状态预测 Step3: Refinement of predictions 预测优化 Output: Future occupancy map 未来占用图 |
8.5 | [8.5] 2502.15307 Road Traffic Sign Recognition method using Siamese network Combining Efficient-CNN based Encoder [{'name': 'Zhenghao Xi, Yuchao Shao, Yang Zheng, Xiang Liu, Yaqi Liu, Yitong Cai'}] |
Autonomous Driving 自动驾驶 | v2 Traffic Sign Recognition 交通标志识别 Siamese Network 西安网络 Efficient-CNN 高效卷积神经网络 |
Input: Traffic sign images 交通标志图像 Step1: Feature extraction 特征提取 using Efficient-CNN based encoders Step2: Distance computation 距离计算 using Siamese network Step3: Classification 分类 using fully-connected layer Output: Recognized traffic sign categories 识别的交通标志类别 |
8.5 | [8.5] 2502.15342 PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments [{'name': 'Yueting Liu, Hanshi Wang, Yunfei Lei, Zhengjun Zha, Weiming Hu, Jin Gao'}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving 3D detection dataset pedestrian detection semi-structured environments |
Input: Multi-modal data input 多模态数据输入 Step1: Dataset annotation 数据集标注 Step2: Hybrid Multi-Scale Fusion Network framework development 混合多尺度融合网络框架开发 Step3: Performance evaluation 性能评估 Output: Improved pedestrian detection results 改进的行人检测结果 |
8.5 | [8.5] 2502.15398 Enhancing Vehicle Make and Model Recognition with 3D Attention Modules [{'name': 'Narges Semiromizadeh, Omid Nejati Manzari, Shahriar B. Shokouhi, Sattar Mirzakuchaki'}] |
Autonomous Driving 自动驾驶 | v2 Vehicle Make and Model Recognition Attention Module Deep Learning |
Input: Vehicle images from various makes and models 车辆图像输入 Step1: Integrate attention module into convolutional model 将注意力模块集成到卷积模型中 Step2: Enhance focus on distinguishing vehicle features 提高对识别车辆特征的关注 Step3: Evaluate performance on Stanford Cars dataset 在斯坦福汽车数据集上评估性能 Output: Improved VMMR accuracy 提高的车辆品牌和型号识别准确度 |
8.5 | [8.5] 2502.15601 WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents [{'name': 'Xinhang Liu, Chi-Keung Tang, Yu-Wing Tai'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D world creation LLM agents procedural generation |
Input: User natural language commands 用户自然语言指令 Step1: Interaction with LLM agents 与LLM代理交互 Step2: Object customization and control 对象自定义与控制 Step3: Scene layout optimization 场景布局优化 Output: Photorealistic 3D scenes 照相真实3D场景 |
8.5 | [8.5] 2502.15672 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling [{'name': "Florent Bartoccioni, Elias Ramzi, Victor Besnier, Shashanka Venkataramanan, Tuan-Hung Vu, Yihong Xu, Loick Chambon, Spyros Gidaris, Serkan Odabas, David Hurych, Renaud Marlet, Alexandre Boulch, Mickael Chen, \'Eloi Zablocki, Andrei Bursuc, Eduardo Valle, Matthieu Cord"}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving video generative models |
Input: Driving video sequences 驾驶视频序列 Step1: Frame prediction 帧预测 Step2: Representation learning 表示学习 Step3: Action generation 动作生成 Output: Driving trajectories 驾驶轨迹 |
8.0 | [8.0] 2502.15079 Can Hallucination Correction Improve Video-Language Alignment? [{'name': "Lingjun Zhao, Mingyang Xie, Paola Cascante-Bonilla, Hal Daum\'e III, Kwonjoon Lee"}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 video-language alignment hallucination correction |
Input: Video and textual descriptions 视频和文本描述 Step1: Identify hallucinations 识别幻觉 Step2: Correct inconsistencies 修正不一致性 Step3: Enhance alignment 增强对齐 Output: Improved video-language alignment 改进的视频语言对齐 |
7.5 | [7.5] 2502.14888 The Multi-Faceted Monosemanticity in Multimodal Representations [{'name': 'Hanqi Yan, Xiangxiang Cui, Lu Yin, Paul Pu Liang, Yulan He, Yifei Wang'}] |
VLM & VLA 视觉语言模型与视觉语言对齐 | v2 multimodal models interpretability CLIP |
Input: CLIP features from image-text pairs CLIP特征 Step1: Feature extraction 特征提取 Step2: Classification into vision, language, and visual-language categories 分类为视觉、语言和视觉-语言类别 Step3: Evaluation of Modality Dominance Score (MDS) MDS评估 Output: Categorized and interpretable multimodal features 分类和可解释的多模态特征 |
7.5 | [7.5] 2502.15389 The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting [{'name': 'Masayo Tomita, Katsuhiko Hayashi, Tomoyuki Kaneko'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models object hallucination background context |
Input: Visual-Language Models (VLMs) 视觉语言模型 Step1: Analyze object hallucination in outputs 分析输出中的物体幻觉 Step2: Examine effectiveness of background context 研究背景上下文的有效性 Step3: Evaluate visual prompting techniques 评估视觉提示技术 Output: Recommendations for reducing hallucination 输出:减少幻觉的建议 |
7.5 | [7.5] 2502.15563 Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation [{'name': 'Tim R\"adsch, Leon Mayer, Simon Pavicic, A. Emre Kavur, Marcel Knopp, Bar{\i}\c{s} \"Ozt\"urk, Klaus Maier-Hein, Paul F. Jaeger, Fabian Isensee, Annika Reinke, Lena Maier-Hein'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models (VLMs) 视觉语言模型 benchmark generation 基准生成 task augmentation 任务增强 |
Input: Existing VLM tasks 现有的VLM任务 Step1: Task augmentation for diverse tasks 任务增强以生成多样化任务 Step2: Benchmark creation for multiple domains 基准创建以适应多个领域 Step3: Performance evaluation performance performance evaluation 评估22个VLMs的表现 Output: Resource-efficient VLM benchmarks 资源高效的VLM基准 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.14129 GlossGau: Efficient Inverse Rendering for Glossy Surface with Anisotropic Spherical Gaussian [{'name': 'Bang Du, Runfa Blark Li, Chen Du, Truong Nguyen'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction inverse rendering glossy surfaces NeRF Gaussian Splatting |
Input: Multi-view images 多视角图像 Step1: Model surface normals and BRDF parameters 模型表面法线和BRDF参数 Step2: Use Anisotropic Spherical Gaussian to approximate reflections 使用各向异性球面高斯近似反射 Step3: Apply regularization for better normal estimation 应用正则化以提高法线估计 Output: Efficiently rendered glossy 3D surfaces 经过高效渲染的光泽3D表面 |
9.5 | [9.5] 2502.14142 Token Adaptation via Side Graph Convolution for Temporally and Spatially Efficient Fine-tuning of 3D Point Cloud Transformers [{'name': 'Takahiko Furuya'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D point cloud Transformer fine-tuning |
Input: 3D point cloud data 三维点云数据 Step1: Define graph convolutional network 定义图卷积网络 Step2: Implement Side Token Adaptation 进行侧边令牌适应 Step3: Evaluate performance on benchmarks 在基准上评估性能 Output: Efficiently fine-tuned models 高效微调模型 |
9.5 | [9.5] 2502.14235 OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving [{'name': 'Yedong Shen, Xinran Zhang, Yifan Duan, Shiqi Zhang, Heng Li, Yilong Wu, Jianmin Ji, Yanyong Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction autonomous driving |
Input: Surround-view camera images 环视摄像头图像 Step1: Generate Occupancy Grids 使用占用网格生成 Step2: Separate dynamic and static objects 分离动态与静态物体 Step3: Convert Occupancy Grids to point clouds 将占用网格转换为点云 Step4: Estimate poses and trajectories 估计姿态和轨迹 Output: 3D reconstructed scene 3D重建场景 |
9.5 | [9.5] 2502.14520 Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance [{'name': 'Meng Wang, Fan Wu, Ruihui Li, Yunchuan Qin, Zhuo Tang, Kenli Li'}] |
3D Scene Completion 三维场景补全 | v2 3D Semantic Scene Completion optical flow autonomous driving temporal modeling |
Input: Temporal RGB images 时间RGB图像 Step1: Optical flow estimation 光流估计 Step2: Flow-guided temporal aggregation 模块流引导的时间聚合 Step3: Occlusion-guided voxel refinement 模块遮挡引导的体素细化 Output: 3D semantic scene completion 3D语义场景补全 |
9.5 | [9.5] 2502.14789 Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing [{'name': 'Yoel Levy, David Shavin, Itai Lang, Sagie Benaim'}] |
3D Understanding and Editing 3D理解与编辑 | v2 3D Understanding 3D理解 3D Editing 3D编辑 Feature Distillation 特征提炼 |
Input: 2D feature maps obtained from large pre-trained models 基于大型预训练模型的2D特征图 Step1: Distillation of 2D features to 3D structurally disentangled feature fields 2D特征向3D结构解缠特征场的提炼 Step2: Control of individual structural components for semantic understanding 语义理解的个体结构成分控制 Step3: Application of segmentation and editing capabilities 应用分割与编辑功能 Output: Enhanced 3D understanding and editing capabilities 改进的3D理解与编辑能力 |
8.5 | [8.5] 2502.14061 EfficientPose 6D: Scalable and Efficient 6D Object Pose Estimation [{'name': 'Zixuan Fang, Thomas P\"ollabauer, Tristan Wirth, Sarah Berkei, Volker Knauthe, Arjan Kuijper'}] |
Pose Estimation 姿态估计 | v2 6D pose estimation autonomous navigation real-time feedback robotics |
Input: Monocular RGB-D images 单目RGB-D图像 Step1: Architecture adaptation 架构适配 Step2: AMIS algorithm implementation AMIS算法实现 Step3: Model testing across datasets 在数据集上进行模型测试 Output: Optimized 6D pose estimation optimized 6D目标姿态估计 |
8.5 | [8.5] 2502.14068 A Racing Dataset and Baseline Model for Track Detection in Autonomous Racing [{'name': 'Shreya Ghosh, Yi-Huan Chen, Ching-Hsiang Huang, Abu Shafin Mohammad Mahdee Jameel, Chien Chou Ho, Aly El Gamal, Samuel Labi'}] |
Autonomous Driving 自动驾驶 | v2 3D reconstruction autonomous driving |
Input: Multi-camera image data 多摄像机图像数据 Step1: Data collection and annotation 数据收集与注释 Step2: Algorithm development using GAN 算法开发,使用生成对抗网络(GAN) Step3: Model evaluation and benchmarking 模型评估与基准测试 Output: Track detection results 轨道检测结果 |
8.5 | [8.5] 2502.14099 Point Cloud Geometry Scalable Coding Using a Resolution and Quality-conditioned Latents Probability Estimator [{'name': "Daniele Mari, Andr\'e F. R. Guarda, Nuno M. M. Rodrigues, Simone Milani, Fernando Pereira"}] |
Point Cloud Processing 点云处理 | v2 Point Cloud Coding scalable coding deep learning |
Input: Point Cloud geometry points 点云几何点 Step1: Development of Scalable Resolution and Quality Hyperprior (SRQH)方案开发 Step2: Integration into JPEG PCC 将SRQH集成到JPEG PCC中 Step3: Experimental validation 实验验证 Output: Scalable coding for point clouds 提供点云的可扩展编码 |
8.5 | [8.5] 2502.14113 Object-centric Binding in Contrastive Language-Image Pretraining [{'name': 'Rim Assouel, Pietro Astolfi, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models object-centric compositional understanding |
Input: CLIP-like models CLIP类模型 Step1: Integrating scene graphs with image representations 将场景图与图像表示结合 Step2: Developing a binding module 设计绑定模块 Step3: Enhancing spatial relationship understanding 加强空间关系理解 Output: Improved compositional understanding 提升组合理解 |
8.5 | [8.5] 2502.14156 Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration [{'name': 'Katie Z Luo, Minh-Quan Dao, Zhenzhen Liu, Mark Campbell, Wei-Lun Chao, Kilian Q. Weinberger, Ezio Malis, Vincent Fremont, Bharath Hariharan, Mao Shan, Stewart Worrall, Julie Stephany Berrio Perez'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction V2X point cloud LiDAR sensors |
Input: LiDAR sensor data from vehicles 和车辆的激光雷达传感器数据 Step1: Data collection 数据收集 Step2: Data alignment 数据对齐 Step3: Statistical analysis 统计分析 Output: Comprehensive V2X dataset 综合V2X数据集 |
8.5 | [8.5] 2502.14190 Stereo Image Coding for Machines with Joint Visual Feature Compression [{'name': 'Dengchao Jin, Jianjun Lei, Bo Peng, Zhaoqing Pan, Nam Ling, Qingming Huang'}] |
Multi-view Stereo 多视角立体 | v2 stereo image compression 3D visual tasks |
Input: Stereo images 立体图像 Step1: Feature extraction 特征提取 Step2: Feature compression 特征压缩 Step3: Data transmission 数据传输 Output: Efficiently compressed stereo visual features 高效压缩的立体视觉特征 |
8.5 | [8.5] 2502.14191 Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models [{'name': 'Michihiro Yasunaga, Luke Zettlemoyer, Marjan Ghazvininejad'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 reward models vision-language models benchmark |
Input: Vision-language models (VLMs) 视觉语言模型 Step1: Benchmark creation 基准创建 Step2: Expert annotation 专家标注 Step3: Model evaluation 模型评估 Output: Reward model evaluation reward model评估 |
8.5 | [8.5] 2502.14195 Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition [{'name': 'Tianyi Shang, Zhenyu Li, Pengjie Xu, Jinwei Qiao, Gang Chen, Zihan Ruan, Weijun Hu'}] |
Visual Place Recognition 视觉地点识别 | v2 text-vision registration place recognition cross-modal localization |
Input: Multi-view images 多视角图像 Step1: Text embedding extraction 文本嵌入提取 Step2: Clustering of visual descriptors 视觉描述符聚类 Step3: Cross-modal alignment 交模态对齐 Output: Place recognition based on text-image pairs 基于文本-图像对的地点识别 |
8.5 | [8.5] 2502.14279 OrchardDepth: Precise Metric Depth Estimation of Orchard Scene from Monocular Camera Images [{'name': 'Zhichao Zheng, Henry Williams, Bruce A MacDonald'}] |
Depth Estimation 深度估计 | v2 depth estimation monocular camera autonomous driving |
Input: Monocular camera images 单目相机图像 Step1: Data collection 数据收集 Step2: Depth estimation model training 深度估计模型训练 Step3: Consistency monitoring 一致性监测 Output: Enhanced depth maps 改进的深度图 |
8.5 | [8.5] 2502.14316 Textured 3D Regenerative Morphing with 3D Diffusion Prior [{'name': 'Songlin Yang, Yushi Lan, Honghua Chen, Xingang Pan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D morphing 3D diffusion models textured 3D representations |
Input: Textured 3D objects 纹理3D对象 Step1: Source-target information integration 源-目标信息集成 Step2: 3D diffusion model application 3D扩散模型应用 Step3: Attention Fusion strategy implementation 注意力融合策略实施 Output: Morphing sequence output 变形序列输出 |
8.5 | [8.5] 2502.14412 Evaluating Precise Geolocation Inference Capabilities of Vision Language Models [{'name': 'Neel Jay, Hieu Minh Nguyen, Trung Dung Hoang, Jacob Haimes'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models geolocation privacy dataset |
Input: Images from Google Street View 来自 Google 街景图像 Step1: Dataset collection 数据集收集 Step2: Model evaluation 模型评估 Step3: Geolocation inference 位置推断 Output: Geolocation accuracy results 地理位置精度结果 |
8.5 | [8.5] 2502.14454 Exploiting Deblurring Networks for Radiance Fields [{'name': 'Haeyun Choi, Heemin Yang, Janghyeok Han, Sunghyun Cho'}] |
Neural Rendering 神经渲染 | v2 radiance fields deblurring 3D Gaussian novel view synthesis |
Input: Blurred multi-view images 退化的多视角图像 Step1: RF-guided deblurring RF导向去模糊 Step2: Radiance field construction 辐射场构建 Step3: Iterative enhancement 迭代增强 Output: High-quality novel views 高质量新视图 |
8.5 | [8.5] 2502.14503 LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera [{'name': 'Weiyi Xiong, Zean Zou, Qiuchi Zhao, Fengchun He, Bing Zhu'}] |
3D Object Detection 3D物体检测 | v2 3D object detection 3D物体检测 4D radar 4D雷达 camera camera |
Input: 4D radar and camera data 4D 雷达和相机数据 Step1: Depth supervision strategy via radar points 通过雷达点的深度监督策略 Step2: Attention-based multi-modal fusion module attention-based 多模态融合模块 Step3: Model evaluation on standard datasets 在标准数据集上评估模型 Output: Enhanced detection accuracy 改进的检测精度 |
8.5 | [8.5] 2502.14573 Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining [{'name': 'Wonhyeok Choi, Kyumin Hwang, Wei Peng, Minwoo Choi, Sunghoon Im'}] |
Depth Estimation 深度估计 | v2 monocular depth estimation triplet mining reflective surfaces autonomous driving |
Input: Monocular images 单目图像 Step1: Triplet mining to identify reflective regions 三元矿挖掘以识别反射区域 Step2: Apply reflection-aware triplet mining loss 应用反射感知的三元损失 Step3: Knowledge distillation for depth estimation 知识蒸馏以进行深度估计 Output: Enhanced depth map 改进的深度图 |
8.5 | [8.5] 2502.14616 Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion [{'name': 'Jiangyuan Liu, Hongxuan Ma, Yuxin Guo, Yuhao Zhao, Chi Zhang, Wei Sui, Wei Zou'}] |
Depth Estimation 深度估计 | v2 monocular depth estimation segmentation transparent objects |
Input: Single RGB image 单幅RGB图像 Step1: Feature extraction 特征提取 Step2: Semantic and geometric fusion 语义和几何融合 Step3: Iterative feature refinement 迭代特征优化 Output: Segmentation mask and depth map 分割掩膜和深度图 |
8.5 | [8.5] 2502.14676 BP-SGCN: Behavioral Pseudo-Label Informed Sparse Graph Convolution Network for Pedestrian and Heterogeneous Trajectory Prediction [{'name': 'Ruochen Li, Stamos Katsigiannis, Tae-Kyun Kim, Hubert P. H. Shum'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 trajectory prediction behavioral pseudo-labels autonomous vehicles |
Input: Observed agent trajectories 所观测的代理轨迹 Step1: Unsupervised behavior clustering module 无监督行为聚类模块 Step2: Goal-guided trajectory prediction module 目标引导轨迹预测模块 Step3: Cascaded training scheme cascade training scheme Output: Enhanced trajectory predictions 改进的轨迹预测 |
8.5 | [8.5] 2502.14721 Multi-dataset synergistic in supervised learning to pre-label structural components in point clouds from shell construction scenes [{'name': 'Lukas Rauch, Thomas Braml'}] |
Point Cloud Processing 点云处理 | v2 Point Cloud Semantic Segmentation Transformer Models Construction Industry |
Input: Point cloud data from shell construction sites 壳体建筑现场的点云数据 Step1: Supervised training using custom validation dataset 使用自定义验证数据集进行监督训练 Step2: Cross-domain inference with existing datasets 使用现有数据集进行跨域推理 Step3: Transfer learning to enhance performance 迁移学习以提高性能 Output: Improved semantic segmentation for construction components 改进的建筑组件语义分割 |
8.5 | [8.5] 2502.14792 RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation [{'name': 'Henrique Pi\~neiro Monteagudo, Leonardo Taccari, Aurel Pjetri, Francesco Sambo, Samuele Salti'}] |
Image and Video Generation 图像生成与视频生成 | v2 Bird's Eye View segmentation 鸟瞰视图分割 self-supervised training 自监督训练 |
Input: Video sequences 视频序列 Step1: Monocular semantic segmentation 单目语义分割 Step2: Rendering of perspective views 视角图像渲染 Step3: Self-supervised training 自监督训练 Output: BEV segmentation results BEV分割结果 |
8.5 | [8.5] 2502.14801 AVD2: Accident Video Diffusion for Accident Video Description [{'name': 'Cheng Li, Keyuan Zhou, Tong Liu, Yu Wang, Mingqiao Zhuang, Huan-ang Gao, Bu Jin, Hao Zhao'}] |
Autonomous Driving 自动驾驶 | v2 Accident Video Diffusion Autonomous Driving Video Understanding |
Input: Accident videos 事故视频 Step1: Video generation 视频生成 Step2: Detailed description alignment 详细描述对齐 Step3: Actionable prevention strategies 制定可行动的预防策略 Output: Enhanced understanding of accident scenarios 提升对事故场景的理解 |
7.5 | [7.5] 2502.14221 H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging [{'name': 'Zhen Huang, Ronghao Xu, Xiaoqian Zhou, Yangbo Wei, Suhua Wang, Xiaoxin Sun, Han Li, Qingsong Yao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D landmark detection medical image analysis deep learning |
Input: 3D volumetric data 3D体积数据 Step1: Local feature extraction 局部特征提取 Step2: Global dependency modeling 全局依赖建模 Step3: Multi-scale feature fusion 多尺度特征融合 Output: Accurate 3D landmark detection 精确的3D特征检测 |
7.5 | [7.5] 2502.14493 CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond [{'name': 'Yukai Shi, Cidan Shi, Zhipeng Weng, Yin Tian, Xiaoyu Xian, Liang Lin'}] |
Image Fusion 图像融合 | v2 Infrared-visible fusion 红外可见图像融合 Autonomous driving 自动驾驶 |
Input: Infrared and visible images 红外和可见图像 Step1: External data augmentation by Top-k Selective Vision Alignment 使用 Top-k 选择性视觉对齐的外部数据增强 Step2: Internal data augmentation with self-supervised learning 使用自监督学习的内部数据增强 Step3: Fusion process 融合过程 Output: Enhanced fused images 改进的融合图像 |
6.0 | [6.0] 2502.14070 DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models [{'name': 'Daewon Chae, June Suk Choi, Jinkyu Kim, Kimin Lee'}] |
Image Generation 图像生成 | v2 text-to-image generation reward fine-tuning diffusion models |
Input: Text prompts 文本提示 Step1: Dynamic scaling of classifier-free guidance 动态调整无分类器引导的规模 Step2: Randomly weight prompt phrases 随机加权提示短语 Step3: Sample generation and evaluation 样本生成与评估 Output: Improved sampling efficiency 改进的采样效率 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.13335 Geometry-Aware Diffusion Models for Multiview Scene Inpainting [{'name': 'Ahmad Salimi, Tristan Aumentado-Armstrong, Marcus A. Brubaker, Konstantinos G. Derpanis'}] |
3D Scene Inpainting 3D场景修复 | v2 3D inpainting multi-view consistency geometry-aware models |
Input: Multi-view images 多视角图像 Step1: Image masking 图像遮蔽 Step2: Geometry-aware fusion 几何感知融合 Step3: Generative inpainting 生成式修复 Output: Multi-view consistent images 多视角一致图像 |
9.5 | [9.5] 2502.13803 3D Gaussian Splatting aided Localization for Large and Complex Indoor-Environments [{'name': 'Vincent Ress, Jonas Meyer, Wei Zhang, David Skuddis, Uwe Soergel, Norbert Haala'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting visual localization SLAM indoor environments |
Input: Multi-view images 多视角图像 Step1: Use visual SLAM to generate a 3D Gaussian Splatting (3DGS) based map 使用视觉SLAM生成基于3D高斯的地图 Step2: Render images from the 3DGS map to create reference data 从3DGS地图中渲染图像以创建参考数据 Step3: Evaluate the performance impact of additional rendered views 评估附加渲染视图对性能的影响 Output: Improved localization accuracy 改进的定位精度 |
9.5 | [9.5] 2502.13968 Betsu-Betsu: Multi-View Separable 3D Reconstruction of Two Interacting Objects [{'name': 'Suhas Gopal, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction neuro-implicit methods multi-view human-object interactions |
Input: Multi-view RGB images 多视角RGB图像 Step1: Data integration 数据集成 Step2: Algorithm development 算法开发 Step3: Alpha-blending regularization implementation α混合正则化实施 Step4: Joint optimization of Signed Distance Fields (SDFs) 联合优化有符号距离场(SDF) Output: Separable 3D geometries 可分离的3D几何 |
8.5 | [8.5] 2502.13524 MobileViM: A Light-weight and Dimension-independent Vision Mamba for 3D Medical Image Analysis [{'name': 'Wei Dai, Steven Wang, Jun Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D medical imaging segmentation deep learning |
Input: 3D medical images 三维医学图像 Step1: Data transformation 数据转换 Step2: Model enhancement 模型增强 Step3: Evaluation on datasets 数据集评估 Output: Efficient segmentation results 高效分割结果 |
8.5 | [8.5] 2502.13883 Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition [{'name': 'Idris Hamoud, Vinkle Srivastav, Muhammad Abdullah Jamal, Didier Mutter, Omid Mohareri, Nicolas Padoy'}] |
Multi-view and Stereo Vision 多视角与立体视觉 | v2 Surgical Activity Recognition Multi-view Pose Estimation Computer Vision |
Input: Multi-view camera recordings 多视角摄像头录制 Step1: Align 2D pose and vision embeddings 2D姿势和视觉嵌入对齐 Step2: Dual-encoder architecture implementation 双编码器架构实现 Step3: Pretraining with geometric constraints 几何约束预训练 Output: Enhanced surgical activity recognition model 改进的手术活动识别模型 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.12456 Not-So-Optimal Transport Flows for 3D Point Cloud Generation [{'name': 'Ka-Hei Hui, Chao Liu, Xiaohui Zeng, Chi-Wing Fu, Arash Vahdat'}] |
3D Generation 三维生成 | v2 3D point cloud generation 3D 点云生成 Optimal transport 最优传输 Shape completion 形状补全 |
Input: 3D point clouds 3D 点云 Step1: Analyze existing models 分析现有模型 Step2: Propose not-so-optimal transport flow models 提出不那么最优的传输流模型 Step3: Empirical study 实证研究 Output: Enhanced generation techniques 改进的生成技术 |
9.5 | [9.5] 2502.12534 NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization [{'name': 'Zhen Li, Weiwei Sun, Shrisudhan Govindarajan, Shaobo Xia, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction point cloud signed distance field autonomous driving |
Input: Irregular point cloud 不规则点云 Step1: Convert to signed distance field (SDF) 转换为有符号距离场 Step2: Serialize point cloud into tokens 将点云序列化为标记 Step3: Predict SDF by aggregating features 通过聚合特征预测SDF值 Output: Reconstructed surface 重建表面 |
9.5 | [9.5] 2502.12545 IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras [{'name': 'Dongki Jung, Jaehoon Choi, Yonghan Lee, Dinesh Manocha'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction 三维重建 Omnidirectional cameras 全向摄像头 Texture optimization 纹理优化 |
Input: Omnidirectional images 全向图像 Step1: Feature detection 特征检测 Step2: Sparse matching with spherical model 使用球形模型进行稀疏匹配 Step3: Neural implicit surface reconstruction 神经隐式表面重建 Step4: Texture mapping and optimization 纹理映射和优化 Output: Textured meshes with improved rendering quality 改进的三维纹理网格 |
9.5 | [9.5] 2502.12673 ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition [{'name': "Quoc-Anh Bui, Gilles Rougeron, G\'eraldine Morin, Simone Gasparini"}] |
3D Reconstruction 三维重建 | v2 3D reconstruction 3D重建 Neural Radiance Fields 神经辐射场 visualization 可视化 level of detail 细节级别 |
Input: Multi-view images 多视角图像 Step1: Decompose the scene into Scene NeRF and ROI NeRFs 将场景分解为场景NeRF和感兴趣区域NeRF Step2: Camera selection module chooses relevant cameras 相机选择模块选择相关相机 Step3: Ray-level compositional rendering combines NeRFs 使用光线级组合渲染结合NeRF Output: High-fidelity rendered images outputs 高保真渲染图像 |
9.5 | [9.5] 2502.12894 CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image [{'name': 'Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Lan Xu, Wei Yang, Jiayuan Gu, Jingyi Yu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction robotics scene recovery |
Input: Single RGB image 单张RGB图像 Step1: Extract object-level 2D segmentation 提取物体级2D分割 Step2: Analyze inter-object spatial relationships 分析物体间空间关系 Step3: Generate object geometries 生成物体几何 Step4: Align and integrate meshes with point cloud 对齐并集成网格与点云 Step5: Optimize object poses using physics-aware methods 利用物理感知方法优化物体姿态 Output: High-quality 3D scene reconstruction 高质量3D场景重建 |
9.5 | [9.5] 2502.12985 PartSDF: Part-Based Implicit Neural Representation for Composite 3D Shape Parametrization and Optimization [{'name': 'Nicolas Talabot, Olivier Clerc, Arda Cinar Demirtas, Doruk Oner, Pascal Fua'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D shape representation implicit neural representation part-based modeling |
Input: Composite 3D shapes 复合三维形状 Step 1: Supervised part-aware representation 监督的部件感知表示 Step 2: Modeling independent parts 模型独立部件 Step 3: Shape optimization 形状优化 Output: Controllable 3D models 可控的三维模型 |
9.5 | [9.5] 2502.13071 RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [{'name': 'Jingtong Yue, Zhiwei Lin, Xin Lin, Xiaoyu Zhou, Xiangtai Li, Lu Qi, Yongtao Wang, Ming-Hsuan Yang'}] |
3D Object Detection 3D目标检测 | v2 3D object detection 3D目标检测 radar-camera fusion 雷达-相机融合 autonomous driving 自动驾驶 |
Input: Multi-modal data from radar and camera 传感器与相机的多模态数据 Step1: Systematic analysis of noise patterns 噪音模式的系统分析 Step2: Development of 3D Gaussian Expansion (3DGE) module 开发3D高斯扩展模块 Step3: Implementation of weather-adaptive fusion module 实现天气自适应融合模块 Output: Robust 3D object detection results 稳健的3D目标检测结果 |
9.5 | [9.5] 2502.13144 RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning [{'name': 'Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, Haoran Yin, Xiangyu Li, Xinbang Zhang, Ying Zhang, Wenyu Liu, Qian Zhang, Xinggang Wang'}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving 3DGS reinforcement learning |
Input: Photorealistic digital replica of the real world 逼真的数字复制环境 Step1: Establish closed-loop reinforcement learning paradigm 建立闭环强化学习范式 Step2: Incorporate imitation learning for alignment 融入模仿学习以进行对齐 Step3: Design specialized reward functions 设计专门的奖励函数 Output: Optimized end-to-end driving policy 优化的端到端驾驶策略 |
9.0 | [9.0] 2502.12231 PUGS: Zero-shot Physical Understanding with Gaussian Splatting [{'name': 'Yinghao Shuai, Ran Yu, Yuantao Chen, Zijian Jiang, Xiaowei Song, Nan Wang, Jv Zheng, Jianzhu Ma, Meng Yang, Zhicheng Wang, Wenbo Ding, Hao Zhao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Splatting physical properties robotics |
Input: Multi-view images 多视角图像 Step1: Shape-aware 3D Gaussian Splatting reconstruction 形状感知的3D高斯点云重建 Step2: Geometry-aware regularization loss geometry-aware regularization loss functions Step3: Region-aware feature contrastive loss region-aware feature contrastive loss functions Step4: Physical property prediction with VLMs 使用视觉语言模型进行物理属性预测 Output: 3D models with physical properties and enhanced quality 具有物理属性和增强质量的3D模型 |
9.0 | [9.0] 2502.12546 Spatiotemporal Multi-Camera Calibration using Freely Moving People [{'name': 'Sang-Eun Lee, Ko Nishino, Shohei Nobuhara'}] |
3D Reconstruction and Modeling 三维重建 | v2 multi-camera calibration 3D reconstruction freely moving people |
Input: Multi-view videos with freely moving people 多视角视频与自由移动的人 Step1: 3D pose estimation from videos 从视频中进行3D姿态估计 Step2: Solve rotation and translation with 3D points 求解与三维点的旋转和平移 Step3: Optimize camera poses and temporal offsets 优化相机姿态和时间偏移 Output: Accurate camera calibration and scene reconstruction 输出:准确的相机标定和场景重建 |
9.0 | [9.0] 2502.12752 High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion [{'name': 'Xiang Zhang, Yang Zhang, Lukas Mehl, Markus Gross, Christopher Schroers'}] |
Novel View Synthesis 新视图合成 | v2 Novel View Synthesis Splatting Diffusion Model |
Input: Single image 单张图像 Step1: Splatting for pixel alignment 像素对齐的点云处理 Step2: Diffusion model training 扩散模型训练 Step3: Texture generation texture generation通过自适应特征融合 Output: High-fidelity novel views 高保真新视图 |
8.5 | [8.5] 2502.12303 From Gaming to Research: GTA V for Synthetic Data Generation for Robotics and Navigations [{'name': 'Matteo Scucchia, Matteo Ferrara, Davide Maltoni'}] |
Autonomous Systems and Robotics 自主系统与机器人 | v2 synthetic data GTA V SLAM Visual Place Recognition robotics |
Input: Synthetic environment data from GTA V 以GTA V的合成环境数据为输入 Step1: Data generation 数据生成 Step2: Algorithm for VPR dataset creation VPR数据集创建算法 Step3: Experimentation for SLAM and VPR applications 针对SLAM和VPR应用的实验 Output: Usable synthetic datasets for robotics 提供可用的机器人合成数据集 |
8.5 | [8.5] 2502.12360 Detecting Systematic Weaknesses in Vision Models along Predefined Human-Understandable Dimensions [{'name': 'Sujan Sai Gannamaneni, Rohil Prakash Rao, Michael Mock, Maram Akila, Stefan Wrobel'}] |
Vision Models and Safety Analysis 视觉模型与安全分析 | v2 systematic weaknesses autonomous driving computer vision |
Input: Image dataset 图像数据集 Step1: Metadata generation 元数据生成 Step2: Slice discovery 模块切片发现 Step3: Systematic weakness identification 系统弱点识别 Output: Identified weaknesses identified weaknesses |
8.5 | [8.5] 2502.12640 RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation [{'name': 'Chenxi Zheng, Yihong Lin, Bangzhen Liu, Xuemiao Xu, Yongwei Nie, Shengfeng He'}] |
3D Generation 三维生成 | v2 3D generation text-to-3D generation score distillation |
Input: Text-based descriptions 基于文本的描述 Step1: Data distribution rectification 数据分布整治 Step2: Pose consistency enhancement 姿态一致性增强 Step3: Integration with score distillation algorithms 与得分蒸馏算法集成 Output: Consistent 3D asset generation 一致的3D资产生成 |
8.5 | [8.5] 2502.12742 3D Shape-to-Image Brownian Bridge Diffusion for Brain MRI Synthesis from Cortical Surfaces [{'name': 'Fabian Bongratz, Yitong Li, Sama Elbaroudy, Christian Wachinger'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction brain MRI diffusion model |
Input: Continuous cortical shape priors 连续皮层形状先验 Step1: Leverage Brownian bridge process 利用布朗桥过程 Step2: Map shape contours to synthetic MRIs 将形状轮廓映射到合成MRI Step3: Improve geometric accuracy 改进几何精度 Output: Anatomically plausible brain MRIs 解剖学上合理的脑MRI |
8.5 | [8.5] 2502.12819 Carotid Artery Plaque Analysis in 3D Based on Distance Encoding in Mesh Representations [{'name': 'Hinrich Rahlfs, Markus H\"ullebrand, Sebastian Schmitter, Christoph Strecker, Andreas Harloff, Anja Hennemuth'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction plaque analysis carotid artery |
Input: MRI scans of carotid arteries 磁共振扫描的颈动脉 Step1: 3D vessel wall segmentation 3D血管壁分割 Step2: Distance encoding to extract plaque mesh 使用距离编码提取斑块网格 Step3: Quantification and visualization of plaque parameters 斑块参数的量化和可视化 Output: Detailed 3D plaque models 详细的3D斑块模型 |
8.5 | [8.5] 2502.12860 An Experimental Study of SOTA LiDAR Segmentation Models [{'name': 'Bike Chen, Antti Tikanm\"aki, Juha R\"oning'}] |
Point Cloud Processing 点云处理 | v2 Point Cloud Segmentation LiDAR autonomous driving |
Input: LiDAR data LiDAR数据 Step 1: Data acquisition 数据采集 Step 2: Model training and evaluation 模型训练与评估 Step 3: Performance comparison 性能比较 Output: Selection of optimal PCS models 最优PCS模型选择 |
8.5 | [8.5] 2502.12994 SHADeS: Self-supervised Monocular Depth Estimation Through Non-Lambertian Image Decomposition [{'name': 'Rema Daher, Francisco Vasconcelos, Danail Stoyanov'}] |
Depth Estimation 深度估计 | v2 monocular depth estimation specular reflection self-supervised learning |
Input: Single images 单幅图像 Step1: Image decomposition 图像分解 Step2: Depth and light component estimation 深度和光成分估计 Step3: Model validation against real data 模型验证与真实数据 Output: Depth maps and light components 深度图和光成分 |
8.5 | [8.5] 2502.13037 Enhancing Power Grid Inspections with Machine Learning [{'name': 'Diogo Lavado, Ricardo Santos, Andre Coelho, Joao Santos, Alessandra Micheletti, Claudia Soares'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D computer vision 3D semantic segmentation power grid inspections |
Input: 3D LiDAR point clouds 3D LiDAR 点云 Step1: Data preprocessing 数据预处理 Step2: 3D semantic segmentation 3D 语义分割 Step3: Performance evaluation 性能评估 Output: Enhanced detection results 改进的检测结果 |
8.5 | [8.5] 2502.13130 Magma: A Foundation Model for Multimodal AI Agents [{'name': 'Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Lars Liden, Jianfeng Gao'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 multimodal AI robotic manipulation vision-language models |
Input: Heterogeneous multimodal data 其他模态数据 Step1: Data labeling for action grounding and planning 动作基础和规划数据标记 Step2: Model training with SoM and ToM techniques 使用SoM和ToM技术进行模型训练 Step3: Evaluation on various tasks 在各种任务上进行评估 Output: A multimodal AI agent capable of understanding and acting on inputs 输出:能够理解和根据输入执行操作的多模态AI代理 |
7.5 | [7.5] 2502.12801 Learning Wall Segmentation in 3D Vessel Trees using Sparse Annotations [{'name': 'Hinrich Rahlfs, Markus H\"ullebrand, Sebastian Schmitter, Christoph Strecker, Andreas Harloff, Anja Hennemuth'}] |
3D Segmentation 3D分割 | v2 3D segmentation 3D分割 clinical annotations 临床标注 carotid artery 颈动脉 |
Input: Sparse annotations from clinical studies 临床研究中的稀疏标注 Step1: Sample perpendicular cross-sections of the carotid artery 采样颈动脉的垂直横截面 Step2: Segment using an adversarial 2D network 使用对抗性2D网络进行分割 Step3: Transform annotations into 3D pseudo-labels 将标注转换为3D伪标签 Output: Train a 3D convolutional neural network 训练3D卷积神经网络 |
7.5 | [7.5] 2502.13146 Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization [{'name': 'Shuo Xing, Yuping Wang, Peiran Li, Ruizheng Bai, Yueqi Wang, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tu'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision Language Models cross-modal applications direct preference optimization visual question answering |
Input: Vision Language Models data 视觉语言模型数据 Step1: Construct dual-preference dataset 构建双重偏好数据集 Step2: Fine-tune with rDPO using visual preference signals 使用视觉偏好信号进行rDPO微调 Output: Improved VLM alignment 改进的VLM对齐 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.10674 Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition [{'name': 'Khanh Nguyen, Ghulam Mubashar Hassan, Ajmal Mian'}] |
3D Object Recognition 3D物体识别 | v2 3D object recognition 3D物体识别 point clouds 点云 occlusion-aware遮挡感知 |
Input: Synthetic 3D models from ShapeNetCore 3D模型 Step1: Generate partial point clouds from 3D models 从3D模型生成部分点云 Step2: Implement occlusion-aware pretraining 进行遮挡感知预训练 Step3: Evaluate recognition performance 评估识别性能 Output: Improved recognition accuracy 提高识别准确性 |
9.5 | [9.5] 2502.10704 Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy [{'name': 'Mingyang Zhao, Gaofeng Meng, Dong-Ming Yan'}] |
Point Cloud Processing 点云处理 | v2 non-rigid registration point cloud alignment occlusion handling |
Input: Point cloud data 点云数据 Step1: Identify occluded regions 确定遮挡区域 Step2: Apply maximum correntropy criterion 采用最大相关熵准则 Step3: Optimize deformation field 优化变形场 Output: Accurately aligned point clouds 准确对齐的点云 |
9.5 | [9.5] 2502.10827 E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting [{'name': 'Sohaib Zahid, Viktor Rudnev, Eddy Ilg, Vladislav Golyanik'}] |
3D Reconstruction and Modeling 三维重建 | v2 novel view synthesis event cameras 3D rendering Gaussian splatting |
Input: Event camera data 事件相机数据 Step1: Data processing 数据处理 Step2: 3D Gaussian representation construction 3D高斯表示构建 Step3: Novel view synthesis 利用生成新的视角 Output: High-quality rendered scenes 高质量渲染场景 |
9.5 | [9.5] 2502.10842 Mobile Robotic Multi-View Photometric Stereo [{'name': 'Suryansh Kumar'}] |
3D Reconstruction and Modeling 三维重建 | v2 Multi-View Photometric Stereo 3D acquisition Mobile Robotics |
Input: Multi-view images 多视角图像 Step1: Supervised learning setup for predicting surface normals, object depth, and uncertainty 监督学习设置以预测表面法线、物体深度和不确定性 Step2: Solve MVPS-driven optimization problem to refine depth maps 解决基于MVPS的优化问题以细化深度图 Step3: Fuse refined depth maps while tracking camera pose 融合精细化深度图并跟踪相机位姿 Output: Globally consistent 3D geometry 具有全局一致性的3D几何体 |
9.5 | [9.5] 2502.10982 TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction [{'name': 'Yunfei Liu, Lei Zhu, Lijian Lin, Ye Zhu, Ailing Zhang, Yu Li'}] |
3D Reconstruction 三维重建 | v2 3D facial reconstruction expression capture neural renderer |
Input: A single in-the-wild image 一张单一的野外图像 Step1: Extract hybrid facial parameters 提取混合面部参数 Step2: Design multi-scale tokenizer 设计多尺度标记器 Step3: Implement token-guided neural renderer 实现标记引导的神经渲染器 Step4: Train with token cycle loss 采用标记周期损失进行训练 Output: High-fidelity facial expressions output 高保真的面部表情输出 |
9.5 | [9.5] 2502.10988 OMG: Opacity Matters in Material Modeling with Gaussian Splatting [{'name': 'Silong Yong, Venkata Nagarjun Pudureddiyur Manivannan, Bernhard Kerbl, Zifu Wan, Simon Stepputtis, Katia Sycara, Yaqi Xie'}] |
Neural Rendering 神经渲染 | v2 neural rendering 3D Gaussian Splatting material modeling opacity |
Input: Images 图像 Step1: Inverse rendering process 逆向渲染过程 Step2: Opacity modeling 透明度建模 Step3: Algorithm integration 集成算法 Output: Improved material properties 改进的材料属性 |
9.5 | [9.5] 2502.11390 MARS: Mesh AutoRegressive Model for 3D Shape Detailization [{'name': 'Jingnan Gao, Weizhe Liu, Weixuan Sun, Senbo Wang, Xibin Song, Taizhang Shang, Shenzhou Chen, Hongdong Li, Xiaokang Yang, Yichao Yan, Pan Ji'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D shape detailization Generative Adversarial Networks (GANs) geometry-consistency MARS autoregressive model |
Input: Coarse mesh shapes 低质量网格形状 Step1: Tokenization of meshes 网格的标记化 Step2: Geometry-consistency supervision geometry-consistency 监督 Step3: Autoregressive detailization 自回归细节化 Output: Detailed meshes 细化的网格 |
9.5 | [9.5] 2502.11618 Real-time Neural Rendering of LiDAR Point Clouds [{'name': 'Joni Vanherck, Brent Zoomers, Tom Mertens, Lode Jorissen, Nick Michiels'}] |
Neural Rendering 神经渲染 | v2 Neural Rendering LiDAR Point Clouds Real-time Rendering |
Input: LiDAR point clouds LiDAR点云 Step1: Point cloud projection 点云投影 Step2: Depth-based filtering based on heuristics 基于启发式的深度过滤 Step3: Final image reconstruction using U-Net 使用U-Net进行最终图像重建 Output: Photorealistic images of LiDAR scans LiDAR扫描的照片真实图像 |
9.5 | [9.5] 2502.11777 Deep Neural Networks for Accurate Depth Estimation with Latent Space Features [{'name': 'Siddiqui Muhammad Yasir, Hyunsik Ahn'}] |
Depth Estimation 深度估计 | v2 depth estimation 3D scene reconstruction |
Input: RGB image to depth image mapping Step1: Feature extraction using latent space Step2: Dual encoder-decoder architecture Step3: Introduce a novel loss function Output: Enhanced depth maps with improved boundaries |
9.5 | [9.5] 2502.11801 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency [{'name': 'Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang'}] |
3D Inpainting 3D修复 | v2 3D Gaussian Inpainting Neural Radiance Field multi-view consistency 3D reconstruction computer vision |
Input: Multi-view images 多视角图像 Step1: Infer Depth-Guided Inpainting Masks 深度引导的修复掩码推断 Step2: Update inpainting mask based on background pixels 更新修复掩码基于背景像素 Step3: Perform 3D inpainting with cross-view consistency 在视图间一致性下进行3D修复 Output: High-fidelity 3D inpainting results 高保真3D修复结果 |
9.5 | [9.5] 2502.12135 MagicArticulate: Make Your 3D Models Articulation-Ready [{'name': 'Chaoyue Song, Jianfeng Zhang, Xiu Li, Fan Yang, Yiwen Chen, Zhongcong Xu, Jun Hao Liew, Xiaoyang Guo, Fayao Liu, Jiashi Feng, Guosheng Lin'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D models articulation skeleton generation skinning weights |
Input: Static 3D models 静态3D模型 Step1: Dataset creation 数据集合成 Step2: Skeleton generation 骨架生成 Step3: Skinning weight prediction 皮肤权重预测 Output: Articulation-ready 3D models 准备好的关节动作3D模型 |
9.5 | [9.5] 2502.12138 FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [{'name': 'Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, Gordon Wetzstein'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction camera pose estimation novel view synthesis |
Input: Uncalibrated sparse-view images 未标定稀疏视图 Step1: Camera pose estimation 摄像机姿态估计 Step2: Geometry and appearance estimation 几何体和外观估计 Step3: Novel-view synthesis 新视图合成 Output: High-quality 3D geometry 高质量三维几何体 |
9.2 | [9.2] 2502.10492 Multi-view 3D surface reconstruction from SAR images by inverse rendering [{'name': 'Emile Barbier--Renard (IDS, IMAGES), Florence Tupin (IMAGES, IDS), Nicolas Trouv\'e (LabHC), Lo\"ic Denis (LabHC)'}] |
3D Reconstruction 三维重建 | v2 3D Reconstruction SAR Imaging Inverse Rendering Deep Learning |
Input: SAR images from radar sensors 合成孔径雷达图像 Step1: Develop a differentiable rendering model 开发可微分的渲染模型 Step2: Implement a coarse-to-fine MLP strategy 实施精细训练的多层感知器策略 Step3: Train the model on synthetic datasets 在合成数据集上训练模型 Output: 3D surface reconstruction results 3D表面重建结果 |
9.2 | [9.2] 2502.10606 HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation [{'name': 'Yibo Liu, Zhaodong Jiang, Binbin Xu, Guile Wu, Yuan Ren, Tongtong Cao, Bingbing Liu, Rui Heng Yang, Amir Rasouli, Jinjun Shan'}] |
3D Reconstruction and Modeling 三维重建 | v2 6D pose estimation image-to-3D Diffusion Models |
Input: Images and scenes from robotics applications Step1: Utilize image-to-3D priors to generate initial meshes Step2: Estimate the 6D pose of observed objects Step3: Continuously refine the mesh and pose estimation based on new observations Output: Enhanced 3D mesh and accurate 6D pose estimation |
8.7 | [8.7] 2502.11663 MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction [{'name': 'Jingcheng Ni, Yuxin Guo, Yichen Liu, Rui Chen, Lewei Lu, Zehuan Wu'}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving video generation mask reconstruction |
Input: Video sequences 视频序列 Step1: Video mask reconstruction 视频掩码重建 Step2: Diffusion Transformer training 扩散变换器训练 Step3: Model evaluation 模型评估 Output: Generalizable driving world model 通用驾驶世界模型 |
8.7 | [8.7] 2502.12080 HumanGif: Single-View Human Diffusion with Generative Prior [{'name': 'Shoukang Hu, Takuya Narihira, Kazumi Fukuda, Ryosuke Sawata, Takashi Shibuya, Yuki Mitsufuji'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D human reconstruction novel view synthesis human avatars |
Input: Single-view image 单视图图像 Step1: Integrate generative priors from diffusion models 从扩散模型中集成生成先验 Step2: Implement Human NeRF module 引入Human NeRF模块 Step3: Optimize with image-level loss 使用图像级损失进行优化 Output: Novel view and pose consistent human avatars 输出: 新视图和姿态一致的人类头像 |
8.5 | [8.5] 2502.10498 The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey [{'name': 'Sifan Tu, Xin Zhou, Dingkang Liang, Xingyu Jiang, Yumeng Zhang, Xiaofan Li, Xiang Bai'}] |
Autonomous Driving 自动驾驶 | v2 Driving World Model autonomous driving scene prediction 3D perception |
Step1: Literature review and categorization of DWM approaches 进行文献回顾并对DWM方法进行分类 Step2: Analysis of existing methodologies and datasets 对现有方法和数据集进行分析 Step3: Discussion on limitations and future directions 讨论局限性和未来方向 |
8.5 | [8.5] 2502.10603 Adaptive Neural Networks for Intelligent Data-Driven Development [{'name': 'Youssef Shoeb, Azarm Nowzad, Hanno Gottschalk'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人 | v2 adaptive neural networks autonomous driving out-of-distribution learning |
Input: Autonomous driving environments 自动驾驶环境 Step1: Data collection 数据收集 Step2: Dynamic integration of new object classes 新对象类别的动态集成 Step3: Continuous learning 模型的持续学习 Output: Adaptive perception system 自适应感知系统 |
8.5 | [8.5] 2502.10720 NPSim: Nighttime Photorealistic Simulation From Daytime Images With Monocular Inverse Rendering and Ray Tracing [{'name': 'Shutong Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 mesh reconstruction autonomous driving nighttime simulation |
Input: Daytime images and semantic labels 白天图像和语义标签 Step1: Mesh reconstruction 网格重建 Step2: Relighting 重光照 Step3: Nighttime image simulation 夜间图像仿真 Output: Realistic nighttime images 真实的夜间图像 |
8.5 | [8.5] 2502.10724 Semantics-aware Test-time Adaptation for 3D Human Pose Estimation [{'name': 'Qiuxia Lin, Rongyu Chen, Kerui Gu, Angela Yao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Human Pose Estimation Test-time adaptation Semantics-aware motion prior |
Input: Video sequences containing human poses 视频 sequences containing human poses Step1: Identify semantics from video using language models 使用语言模型识别视频中的语义 Step2: Integrate motion prior with semantic information 将运动先验与语义信息整合 Step3: Adapt 3D pose predictions during test-time adaptation (TTA) 在测试时间适应中调整3D姿势预测 Output: Refined 3D pose estimations 提炼的3D姿势估计 |
8.5 | [8.5] 2502.11287 MC-BEVRO: Multi-Camera Bird Eye View Road Occupancy Detection for Traffic Monitoring [{'name': 'Arpitsinh Vaghela, Duo Lu, Aayush Atul Verma, Bharatesh Chakravarthi, Hua Wei, Yezhou Yang'}] |
3D Perception 3D感知 | v2 3D perception 3D感知 traffic monitoring 交通监测 multi-camera 多摄像头 occupancy detection 占用检测 |
Input: Multi-camera images 多摄像头图像 Step1: Data acquisition 数据收集 Step2: Background integration 背景集成 Step3: Late and early fusion 方法的融合 Output: BEV occupancy map BEV占用图 |
8.5 | [8.5] 2502.11307 Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection [{'name': 'Jiaxiang Wang, Haote Xu, Xiaolu Chen, Haodi Xu, Yue Huang, Xinghao Ding, Xiaotong Tu'}] |
3D Anomaly Detection 3D异常检测 | v2 anomaly detection 3D point cloud Point-Language model |
Input: 3D point clouds 3D点云 Step1: Dual-prompt learning 双提示学习 Step2: Dynamic prompt creation 动态提示创建 Step3: Anomaly detection 异常检测 Output: Enhanced anomaly detection performance 改进的异常检测性能 |
8.5 | [8.5] 2502.11586 Syllables to Scenes: Literary-Guided Free-Viewpoint 3D Scene Synthesis from Japanese Haiku [{'name': 'Chunan Yu, Yidong Han, Chaotao Ding, Ying Zang, Lanyun Zhu, Xinhao Chen, Zejian Li, Renjun Xu, Tianrun Chen'}] |
3D Scene Generation 三维场景生成 | v2 3D scene synthesis Japanese Haiku |
Input: Japanese Haiku 日本俳句 Step1: Literary analysis 文学分析 Step2: Spatial representation 空间表现 Step3: 3D scene synthesis 三维场景合成 Output: Navigable 3D scenes 可导航三维场景 |
8.5 | [8.5] 2502.11642 GaussianMotion: End-to-End Learning of Animatable Gaussian Avatars with Pose Guidance from Text [{'name': 'Gyumin Shim, Sangmin Lee, Jaegul Choo'}] |
Image Generation 图像生成 | v2 3D human models Gaussian Splatting text-to-3D generation animation |
Input: Textual descriptions 文本描述 Step1: Data integration 数据集成 Step2: Model optimization 模型优化 Step3: Animation generation 动画生成 Output: Animatable 3D avatars 可动画的三维头像 |
8.5 | [8.5] 2502.11697 MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow [{'name': 'Hanzhuo Huang, Yuan Liu, Ge Zheng, Jiepeng Wang, Zhiyang Dou, Sibei Yang'}] |
Image and Video Generation 图像生成 | v2 4D generation multiview diffusion models autonomous systems |
Input: Monocular videos 单目视频 Step1: Generate multiview images using multiview diffusion models 利用多视角扩散模型生成多视角图像 Step2: Associate pixels using token flow technique 使用令牌流技术关联像素 Step3: Refine the coarse 4D field 细化粗糙的4D场 Output: High-quality 4D field 高质量4D场 |
8.5 | [8.5] 2502.11710 The Worse The Better: Content-Aware Viewpoint Generation Network for Projection-related Point Cloud Quality Assessment [{'name': 'Zhiyong Su, Bingxu Xie, Zheng Li, Jincan Wu, Weiqing Li'}] |
Point Cloud Processing 点云处理 | v2 Point Cloud Quality Assessment 点云质量评估 Content-Aware Viewpoint Generation 内容感知视点生成 Geometric Features 几何特征 |
Input: Degraded point clouds 退化点云 Step1: Extract multi-scale geometric and texture features 提取多尺度几何和纹理特征 Step2: Refine features per viewpoint 针对每个视点进行特征优化 Step3: Generate optimized viewpoints 生成优化视角 Output: Optimized viewpoints for projection-related PCQA 用于投影相关PCQA的优化视角 |
8.5 | [8.5] 2502.11726 No-reference geometry quality assessment for colorless point clouds via list-wise rank learning [{'name': 'Zheng Li, Bingxu Xie, Chao Chu, Weiqing Li, Zhiyong Su'}] |
Geometry Quality Assessment 几何质量评估 | v2 geometry quality assessment point clouds 3D reconstruction |
Input: Colorless point clouds 颜色点云 Step1: Construct LRL dataset 生成 LRL 数据集 Step2: Design GQANet to extract geometric features 设计 GQANet 提取几何特征 Step3: Use LRLNet for ranking the quality of point clouds 使用 LRLNet 对点云品质进行排序 Output: Predicted geometry quality index 预测的几何质量指数 |
8.5 | [8.5] 2502.11742 Range and Bird's Eye View Fused Cross-Modal Visual Place Recognition [{'name': 'Jianyi Peng, Fan Lu, Bin Li, Yuan Huang, Sanqing Qu, Guang Chen'}] |
Visual Place Recognition 视觉地点识别 | v2 Visual Place Recognition Cross-modal RGB images LiDAR Bird's Eye View |
Input: RGB images and LiDAR point clouds Step1: Initial retrieval using global descriptor similarity Step2: Re-ranking based on Bird's Eye View (BEV) images Output: Improved Visual Place Recognition results |
8.5 | [8.5] 2502.11864 Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving? [{'name': 'Natalie Grabowsky, Annika M\"utze, Joshua Wendland, Nils Jansen, Matthias Rottmann'}] |
Autonomous Driving 自动驾驶 | v2 Perceptual Uncertainty Reinforcement Learning Automated Driving |
Input: Perturbed observation space 观察空间 Step1: Introduce uncertainty 引入不确定性 Step2: Inform agent of uncertainty 通知代理不确定性 Step3: Reward agent for navigating safely 奖励代理安全导航 Output: Adjusted behavior with uncertainty 根据不确定性调整行为 |
8.5 | [8.5] 2502.11971 Robust 6DoF Pose Tracking Considering Contour and Interior Correspondence Uncertainty for AR Assembly Guidance [{'name': 'Jixiang Chen, Jing Chen, Kai Liu, Haochen Chang, Shanfeng Fu, Jian Yang'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人 | v2 6DoF pose tracking augmented reality contour-based methods object tracking intelligent manufacturing |
Input: 6DoF object poses 6DoF 物体姿态 Step1: Robust contour-based tracking 方法 提出了一种基于轮廓的跟踪 Step2: CPU-only strategy for symmetric objects 针对对称物体的CPU仅策略 Step3: Unified energy function formulation 统一能量函数的表述 Output: Accurate tracking and assembly guidance 精确的跟踪和装配指导 |
8.5 | [8.5] 2502.12151 VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution [{'name': 'Chendong Wang, Anlan Zhang, Yifan Yang, Lili Qiu, Yuqing Yang, Xinyang Jiang, Feng Qian, Suman Banerjee'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D volumetric video super-resolution bandwidth reduction lookup tables (LUTs) |
Input: Low-resolution volumetric data 低分辨率体积数据 Step1: Downsampling data to reduce bandwidth 数据下采样以减少带宽 Step2: Applying super-resolution algorithm to upscale data 应用超分辨率算法对数据进行上采样 Step3: Utilizing lookup tables (LUTs) for efficient processing 使用查找表 (LUTs) 进行高效处理 Output: Enhanced volumetric video for streaming 改进的体积视频用于流传输 |
7.8 | [7.8] 2502.10444 A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision [{'name': 'Hao Ai, Zidong Cao, Lin Wang'}] |
3D Geometry and Motion Estimation 3D几何与运动估计 | v2 Omnidirectional vision Deep learning 3D geometry Autonomous driving |
Input: Omnidirectional images 全景图像 Step 1: Literature review 文献综述 Step 2: Challenges and complexities analysis 挑战与复杂性分析 Step 3: Taxonomy development 分类法开发 Objective: Summarize DL methods for omnidirectional vision 总结全景视觉的深度学习方法 |
7.5 | [7.5] 2502.12095 Descriminative-Generative Custom Tokens for Vision-Language Models [{'name': 'Pramuditha Perera, Matthew Trager, Luca Zancato, Alessandro Achille, Stefano Soatto'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models image retrieval custom tokens |
Input: Concept images and text 描述概念的图像和文本 Step1: Learn custom tokens 学习自定义token Step2: Align text and image features 对齐文本和图像特征 Step3: Use in VLMs 应用于视觉语言模型 Output: Improved query performance 改进的查询性能 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.2 | [9.2] 2502.09672 IMM-MOT: A Novel 3D Multi-object Tracking Framework with Interacting Multiple Model Filter [{'name': 'Xiaohong Liu, Xulong Zhao, Gang Liu, Zili Wu, Tao Wang, Lei Meng, Yuhan Wang'}] |
3D Multi-Object Tracking 3D多目标跟踪 | v2 3D Multi-Object Tracking Interacting Multiple Model filter 3D point clouds |
Input: 3D point clouds and images 3D点云和图像 Step1: Damping Window mechanism for trajectory management 轨迹管理的阻尼窗口机制 Step2: Interacting Multiple Model filter for dynamic tracking 动态跟踪的交互多个模型滤波器 Step3: Distance-Based Score Enhancement for detection scores 检测分数的基于距离的增强 Output: Enhanced 3D multi-object tracking system 改进的3D多目标跟踪系统 |
9.0 | [9.0] 2502.09980 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models [{'name': 'Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen'}] |
Autonomous Driving 自动驾驶 | v2 Autonomous Driving Cooperative Perception Large Language Models |
Input: Perception information from multiple CAVs 从多个CAV获取感知信息 Step1: Data integration 数据集成 Step2: LLM-based fusion 方法:基于LLM的特征融合 Step3: Question answering 问题回答 Output: Driving-related answers 驾驶相关答案 |
8.5 | [8.5] 2502.09652 GraphCompNet: A Position-Aware Model for Predicting and Compensating Shape Deviations in 3D Printing [{'name': 'Lei (Rachel), Chen, Juheon Lee, Juan Carlos Catana, Tsegai Yhdego, Nathan Moroney, Mohammad Amin Nabian, Hui Wang, Jun Zeng'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D printing 3D 打印 shape deviation 形状偏差 additive manufacturing 增材制造 |
Input: Point cloud data 点云数据 Step1: Integrate positional factors 集成位置因素 Step2: Develop compensation algorithms 开发补偿算法 Step3: Validate and refine with experimental data 验证和完善实验数据 Output: Enhanced shape accuracy 改进的形状精度 |
8.5 | [8.5] 2502.09669 Meta-INR: Efficient Encoding of Volumetric Data via Meta-Learning Implicit Neural Representation [{'name': 'Maizhe Yang, Kaiyuan Tang, Chaoli Wang'}] |
Volumetric Reconstruction 体积重建 | v2 implicit neural representation volumetric data meta-learning 3D reconstruction volume rendering |
Input: Volumetric dataset 体积数据集 Step1: Meta-pretraining on subsampled data 亚采样数据上的元预训练 Step2: Volume-specific finetuning on complete data 对完整数据的卷特定微调 Output: Adapted implicit neural representations (INRs) 调整后的隐式神经表征 |
8.5 | [8.5] 2502.09795 Vision-based Geo-Localization of Future Mars Rotorcraft in Challenging Illumination Conditions [{'name': 'Dario Pisanti, Robert Hewitt, Roland Brockers, Georgios Georgakis'}] |
Autonomous Systems and Robotics 自动驾驶机器人 | v2 Map-based Localization Mars image registration deep learning |
Input: Onboard images and reference map Step1: Development of Geo-LoFTR model Step2: Incorporation of geometric context Step3: Simulation of Martian terrain Output: Enhanced localization accuracy |
8.5 | [8.5] 2502.10028 ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation [{'name': 'Yuxin He, Qiang Nie'}] |
3D Flow and Action Prediction 3D流和动作预测 | v2 3D flow action prediction robotic manipulation |
Input: Language instructions and video data 语言指令和视频数据 Step 1: 3D flow prediction 3D流预测 Step 2: Model training using causal transformer 使用因果变换器训练模型 Output: Fine-grained action predictions and future image generation 输出: 精细的动作预测和未来图像生成 |
8.5 | [8.5] 2502.10059 RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control [{'name': 'Teng Li, Guangcong Zheng, Rui Jiang, Shuigenzhan, Tao Wu, Yehao Lu, Yining Lin, Xi Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 image-to-video generation 3D scene reconstruction camera control depth estimation |
Input: Monocular images 单目图像 Step1: Depth estimation 深度估计 Step2: 3D scene reconstruction 3D场景重建 Step3: Camera trajectory scaling 相机轨迹缩放 Output: Interactive video generation 交互式视频生成 |
8.5 | [8.5] 2502.10127 Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation [{'name': 'Gamal Elghazaly, Raphael Frank'}] |
Autonomous Driving 自动驾驶 | v2 Collaboration HD maps V2X Scene Graph Generation |
Input: Front-facing camera images 前视相机图像 Step1: Extract lane centerlines from images 从图像中提取车道中心线 Step2: Represent lane centerlines as directed graphs 将车道中心线表示为有向图 Step3: Transmit data to the cloud via V2X 通过V2X将数据传输到云端 Output: Generated localized HD map 生成的局部高清地图 |
8.5 | [8.5] 2502.10377 ReStyle3D: Scene-Level Appearance Transfer with Semantic Correspondences [{'name': 'Liyuan Zhu, Shengqu Cai, Shengyu Huang, Gordon Wetzstein, Naji Khosravan, Iro Armeni'}] |
3D Generation 三维生成 | v2 3D reconstruction style transfer multi-view consistency |
Input: Multi-view images 多视角图像 Step1: Style transfer to a single view using semantic attention mechanism 在单视图上使用语义注意机制进行风格转移 Step2: Lift stylization to additional views using warp-and-refine network 通过变换和细化网络将风格提升到其他视图 Output: Consistent stylized results across multiple views 在多个视图中获得一致的风格化结果 |
8.5 | [8.5] 2502.10392 Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding [{'name': 'Wenxuan Guo, Xiuwei Xu, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu'}] |
3D Visual Grounding 3D视觉定位 | v2 3D visual grounding 3D视觉定位 sparse convolution 稀疏卷积 text features 文本特征 |
Input: 3D scene representation and text features 3D场景表示和文本特征 Step1: Text-guided pruning to sparsify the 3D voxel features 文本引导的修剪以减少3D体素特征 Step2: Completion-based addition to address over-pruned areas 基于补全的添加以解决过度修剪区域 Output: Efficiently fused features for 3D visual grounding 高效融合的特征用于3D视觉定位 |
8.0 | [8.0] 2502.10273 Probing Perceptual Constancy in Large Vision Language Models [{'name': 'Haoran Sun, Suyang Yu, Yijiang Li, Qingying Gao, Haiyun Lyu, Hokin Deng, Dezhi Luo'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 perceptual constancy vision-language models VLMs cognitive tasks |
Input: Vision-Language Models (VLMs) 视觉语言模型 Step1: Evaluation using cognitive experiments 使用认知实验进行评估 Step2: Testing across dimensions of perceptual constancy 在感知恒常性的各个维度进行测试 Step3: Analysis of model variability in performance 对模型性能的变异性进行分析 Output: Insights into perceptual constancy capabilities of VLMs 输出: 对VLMs感知恒常性能力的洞察 |
7.5 | [7.5] 2502.09818 On the robustness of multimodal language model towards distractions [{'name': 'Ming Liu, Hao Chen, Jindong Wang, Wensheng Zhang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models (VLMs) 视觉语言模型 Robustness of Models 模型鲁棒性 |
Input: Vision-language models (VLMs) 视觉语言模型 Step1: Develop a benchmark 数据集开发 Step2: Introduce distractions in visual and textual inputs 输入中引入干扰 Step3: Evaluate model robustness 评估模型鲁棒性 Output: Insights on VLM performance 视觉语言模型性能洞察 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.08902 CoL3D: Collaborative Learning of Single-view Depth and Camera Intrinsics for Metric 3D Shape Recovery [{'name': 'Chenghao Zhang, Lubin Fan, Shen Cao, Bojian Wu, Jieping Ye'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D shape recovery depth estimation camera calibration |
Input: Single image 单幅图像 Step1: Depth estimation 深度估计 Step2: Camera intrinsics estimation 相机内参估计 Step3: Collaborative optimization 协同优化 Output: Metric 3D shape metric 3D 形状 |
9.5 | [9.5] 2502.09111 DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior [{'name': 'Mingrui Li, Shuhong Liu, Tianchen Deng, Hongyu Wang'}] |
SLAM 同时定位与地图构建 | v2 SLAM Neural Radiance Fields 3D Reconstruction Gaussian Splatting |
Input: RGB-D stream of frames RGB-D帧流 Step1: Camera pose and neural radiance fields optimization 相机位姿和神经辐射场优化 Step2: Initialize Gaussian primitives using implicit radiance fields based on sampled points 使用样本点的隐式辐射场初始化高斯原语 Step3: Implement local loop closure detection and bundle optimization 进行局部闭环检测和捆绑优化 Output: Enhanced Gaussian maps with improved tracking and mapping performance 输出:具有改进跟踪和映射性能的增强高斯地图 |
9.5 | [9.5] 2502.09274 FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation [{'name': 'Bin Yang, Alexandru Paul Condurache'}] |
3D Scene Understanding 3D场景理解 | v2 3D scene understanding LiDAR semantic segmentation autonomous driving |
Input: LiDAR point clouds LiDAR点云 Step1: Redesign data representation 重新设计数据表示 Step2: Implement data augmentation 实施数据增强 Step3: Apply post-processing methods 应用后处理方法 Output: Enhanced semantic segmentation performance 提升的语义分割性能 |
9.5 | [9.5] 2502.09278 ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization [{'name': 'Onat \c{S}ahin, Mohammad Altillawi, George Eskandar, Carlos Carbone, Ziyuan Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction image-to-3D mesh generation |
Input: Multi-view images 多视角图像 Step1: Generate multi-view prior images 生成多视角先验图像 Step2: Use score distillation sampling (SDS) to guide view generation 使用得分蒸馏采样引导视图生成 Step3: Optimize rough shape and fine details 优化粗形状和细节 Output: View-consistent 3D mesh 视图一致的三维网格 |
9.5 | [9.5] 2502.09425 A 3D Facial Reconstruction Evaluation Methodology: Comparing Smartphone Scans with Deep Learning Based Methods Using Geometry and Morphometry Criteria [{'name': "\'Alvaro Heredia-Lid\'on, Alejandro Mo\~nux-Bernal, Alejandro Gonz\'alez, Luis M. Echeverry-Quiceno, Max Rubert, Aroa Casado, Mar\'ia Esther Esteban, Mireia Andreu-Montoriol, Susanna Gallardo, Cristina Ruffo, Neus Mart\'inez-Abad\'ias, Xavier Sevillano"}] |
3D Reconstruction and Modeling 三维重建 | v2 3D facial reconstruction morphometric analysis deep learning |
Input: Smartphone-based 3D scans and deep learning models 智能手机3D扫描与深度学习模型 Step1: Data acquisition 数据采集 Step2: Morphometric shape analysis morphometric形状分析 Step3: Comparison with ground truth 比较真实模型 Output: Evaluation of global and local shape differences 输出:全球和局部形状差异的评估 |
9.5 | [9.5] 2502.09563 Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction [{'name': 'Youming Deng, Wenqi Xian, Guandao Yang, Leonidas Guibas, Gordon Wetzstein, Steve Marschner, Paul Debevec'}] |
3D Reconstruction 三维重建 | v2 3D Reconstruction 三维重建 Camera Calibration 相机校准 Gaussian Splatting 高斯点云 |
Input: Wide-angle images 广角图像 Step1: Optimize camera parameters 优化相机参数 Step2: Model lens distortion 建模镜头畸变 Step3: Use Gaussian representations 使用高斯表示 Step4: Resample with cubemap strategy 使用立方映射策略 Output: Accurate 3D scene reconstruction 准确的三维场景重建 |
9.5 | [9.5] 2502.09613 Latent Radiance Fields with 3D-aware 2D Representations [{'name': 'Chaoyi Zhou, Xi Liu, Feng Luo, Siyu Huang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction latent representations photorealistic rendering |
Input: 2D latent representations 2D 潜在表示 Step1: Enhance 3D consistency with correspondence-aware autoencoding 方法1: 使用对应感知自编码增强3D一致性 Step2: Lift 3D-aware representations into 3D space 方法2: 将3D感知表示提升至3D空间 Step3: Align VAE-Radiance Fields for image decoding 方法3: 对齐VAE-放射场以进行图像解码 Output: Photorealistic 3D reconstruction output 照片真实的3D重建输出 |
9.5 | [9.5] 2502.09615 RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets [{'name': 'Isabella Liu, Zhan Xu, Wang Yifan, Hao Tan, Zexiang Xu, Xiaolong Wang, Hao Su, Zifan Shi'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D assets autoregressive modeling automatic rigging |
Input: 3D asset shapes 3D资产形状 Step1: Joint probabilistic generation 关节概率生成 Step2: Skeleton topology prediction 骨架拓扑预测 Step3: Skinning weights assignment 绑定权重分配 Output: Rigged 3D asset 装配好的3D资产 |
9.5 | [9.5] 2502.09623 Embed Any NeRF: Graph Meta-Networks for Neural Tasks on Arbitrary NeRF Architectures [{'name': 'Francesco Ballerini, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di Stefano'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields 3D representation Graph Meta-Networks |
Input: Neural Radiance Fields (NeRFs) 神经辐射场 Step1: Train a Graph Meta-Network 训练图元网络 Step2: Apply contrastive learning 施加对比学习 Step3: Perform classification and retrieval tasks 执行分类和检索任务 Output: Architecture-agnostic representations 架构无关表示 |
8.8 | [8.8] 2502.09620 Exploring the Potential of Encoder-free Architectures in 3D LMMs [{'name': 'Yiwen Tang, Zoey Guo, Zhuhao Wang, Ray Zhang, Qizhi Chen, Junli Liu, Delin Qu, Zhigang Wang, Dong Wang, Xuelong Li, Bin Zhao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D LMMs Encoder-free architectures 3D LMMs 3D understanding 3D 理解 |
Input: 3D point clouds 3D 点云 Step1: Semantic Encoding in pre-training 阶段的语义编码 Step2: Hierarchical Geometry Aggregation in tuning 调优中的层次几何聚合 Output: Encoder-free 3D LMM 编码器自由 3D LMM |
8.5 | [8.5] 2502.08884 ShapeLib: designing a library of procedural 3D shape abstractions with Large Language Models [{'name': 'R. Kenny Jones, Paul Guerrero, Niloy J. Mitra, Daniel Ritchie'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D shape representation procedural modeling Large Language Models |
Input: Design intent (text descriptions, seed shapes) 设计意图(文本描述,种子形状) Step1: Library interface design 库接口设计 Step2: Function application proposing 函数应用提出 Step3: Function implementation formulation 函数实现制定 Step4: Geometric validation of functions 几何验证函数 Output: Library of procedural shape functions 程序化形状函数库 |
8.5 | [8.5] 2502.08974 Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning [{'name': 'Yiming Yang, Yueru Luo, Bingkun He, Erlong Li, Zhipeng Cao, Chao Zheng, Shuqi Mei, Zhen Li'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 lane topology autonomous driving topology reasoning |
Input: Perspective views (PV) from cameras Step1: Extract lane topology sequences from PV Step2: Implement dual-decoder architecture for segment and topology decoding Step3: Utilize randomized order prompt-to-sequence learning Output: Enhanced lane topology sequences for autonomous driving |
8.5 | [8.5] 2502.08977 Text-driven 3D Human Generation via Contrastive Preference Optimization [{'name': 'Pengfei Zhou, Xukun Shen, Yong Hu'}] |
3D Generation 三维生成 | v2 3D human generation text-driven contrastive preferences |
Input: Textual descriptions 文本描述 Step1: Preference optimization module 偏好优化模块 Step2: Integration of multiple preference models 多个偏好模型的集成 Step3: Negation preference module 引入否定偏好模块 Output: Enhanced 3D human models 改进的三维人类模型 |
8.5 | [8.5] 2502.09039 Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting [{'name': 'Lingting Zhu, Guying Lin, Jinnan Chen, Xinjie Zhang, Zhenchao Jin, Zhao Wang, Lequan Yu'}] |
3D Reconstruction and Modeling 三维重建 | Gaussian Splatting 3D reconstruction image representation |
Input: Large images 大图像 Step1: Gaussian point fitting 高斯点拟合 Step2: Optimization strategy 优化策略 Step3: Level-of-Gaussian reconstruction 高斯层次重建 Output: High-quality image representations 高质量图像表示 |
8.5 | [8.5] 2502.09057 Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model [{'name': 'Shiryu Ueno, Yoshikazu Hayashi, Shunsuke Nakatsuka, Yusei Yamada, Hiroaki Aizawa, Kunihito Kato'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Visual Inspection Vision-Language Model In-Context Learning |
Input: Few-shot images of products 产品的少量图像 Step1: Construct dataset 创建数据集 Step2: Fine-tune VLM for inspection 对VLM进行微调以进行检查 Step3: Perform visual inspection using In-Context Learning 使用In-Context Learning进行视觉检查 Output: Inspection results and defective location detection 检查结果及缺陷位置检测 |
8.5 | [8.5] 2502.09080 BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization [{'name': 'Qiwei Wang, Shaoxun Wu, Yujiao Shi'}] |
Cross-View Localization 跨视角定位 | v2 3D Gaussian primitives cross-view localization autonomous driving |
Input: Ground image and satellite image 地面图像与卫星图像 Step1: Generate 3D Gaussian primitives 生成三维高斯原语 Step2: Synthesize BEV feature map 合成鸟瞩视图特征图 Step3: Conduct pose estimation 进行姿态估计 Output: Location probability map of the query image 查询图像的位置信息图 |
8.5 | [8.5] 2502.09528 SteROI-D: System Design and Mapping for Stereo Depth Inference on Regions of Interest [{'name': 'Jack Erhardt, Ziang Li, Reid Pinkham, Andrew Berkovich, Zhengya Zhang'}] |
Multi-view Stereo 多视角立体 | v2 Stereo Depth Region of Interest Energy Efficiency AR/VR Dynamic ROIs |
Input: Stereo images 立体图像 Step1: ROI identification ROI识别 Step2: Depth estimation depth estimation Step3: Energy optimization 能耗优化 Output: Efficient depth maps 高效深度图 |
8.5 | [8.5] 2502.09617 LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh [{'name': 'Jing Wen, Alexander G. Schwing, Shenlong Wang'}] |
Neural Rendering 神经渲染 | v2 3D reconstruction human rendering computational efficiency |
Input: Sparse source images稀疏源图像 Step1: Iterative feedback update iterative feedback update Step2: Coupled multi-resolution Gaussians-on-Mesh representation耦合多分辨率高斯-网格表示 Output: Animatable human representation可动画人类表示 |
7.5 | [7.5] 2502.09075 PTZ-Calib: Robust Pan-Tilt-Zoom Camera Calibration [{'name': 'Jinhui Guo, Lubin Fan, Bojian Wu, Jiaqi Gu, Shen Cao, Jieping Ye'}] |
Camera Calibration 相机校准 | v2 PTZ calibration camera parameters 3D information |
Input: Reference images 参考图像 Step1: Image selection 图像选择 Step2: Apply PTZ-IBA algorithm 应用PTZ增量束调整算法 Step3: Parameter optimization 参数优化 Output: Calibrated camera parameters 校准的相机参数 |
7.5 | [7.5] 2502.09088 Unsupervised Anomaly Detection on Implicit Shape representations for Sarcopenia Detection [{'name': 'Louise Piecuch (MD), Jeremie Huet (MD), Antoine Frouin (PT), Antoine Nordez (MD), Anne-Sophie Boureau (MD), Diana Mateus'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction anomaly detection implicit neural representation sarcopenia |
Input: Muscle shape data 肌肉形状数据 Step1: Model normal muscle shapes using implicit neural representation (INR) 使用隐式神经表征建模正常肌肉形状 Step2: Employ unsupervised anomaly detection based on reconstruction error 使用基于重建误差的无监督异常检测 Step3: Classify and separate normal and sarcopenic muscles from learned representations 对学习的表示进行分类和分离正常与肌肉萎缩肌肉 Output: Anomaly detection results for sarcopenic and non-sarcopenic muscles 输出:肌肉萎缩及非肌肉萎缩的异常检测结果 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.07822 PDM-SSD: Single-Stage Three-Dimensional Object Detector With Point Dilation [{'name': 'Ao Liang, Haiyang Hua, Jian Fang, Wenyu Chen, Huaici Zhao'}] |
3D Object Detection 三维物体检测 | v2 3D object detection Point Dilation Mechanism autonomous driving |
Input: Point cloud data 点云数据 Step1: Efficient feature encoding using PointNet-style backbone 使用PointNet风格的骨干网进行高效特征编码 Step2: Point Dilation Mechanism (PDM) to expand feature space 使用点膨胀机制(PDM)扩展特征空间 Step3: Hybrid detection head for joint learning 设计混合检测头进行联合学习 Output: Enhanced 3D object detection results 改进的三维物体检测结果 |
9.5 | [9.5] 2502.07840 TranSplat: Surface Embedding-guided 3D Gaussian Splatting for Transparent Object Manipulation [{'name': 'Jeongyun Kim, Jeongho Noh, Dong-Guw Lee, Ayoung Kim'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting transparent object manipulation depth completion latent diffusion model robotics |
Input: RGB images and surface embeddings RGB图像和表面嵌入 Step1: Generate surface embeddings using a latent diffusion model 使用潜在扩散模型生成表面嵌入 Step2: Jointly optimize Gaussian splatting with RGB images and surface embeddings 与RGB图像和表面嵌入共同优化高斯点云 Step3: Render depth for object manipulation 渲染深度以进行物体操作 Output: Accurate depth completion for transparent objects 为透明物体提供准确的深度完成 |
9.5 | [9.5] 2502.07869 EventEgo3D++: 3D Human Motion Capture from a Head-Mounted Event Camera [{'name': 'Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Alain Pagani, Didier Stricker, Christian Theobalt, Vladislav Golyanik'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D human motion capture event cameras egocentric vision |
Input: Monocular event camera with fisheye lens 单眼事件相机与鱼眼镜头 Step1: Data acquisition from event camera 数据采集 Step2: Integration of RGB and event data RGB与事件数据集成 Step3: Algorithm development for pose estimation 算法开发以估计姿势 Step4: Real-time processing and 3D reconstruction 实时处理与三维重建 Output: Accurate 3D human motion capture 精确的三维人类运动捕捉 |
9.5 | [9.5] 2502.08169 CoDynTrust: Robust Asynchronous Collaborative Perception via Dynamic Feature Trust Modulus [{'name': 'Yunjiang Xu, Lingzhi Li, Jin Wang, Benyuan Yang, Zhiwen Wu, Xinhong Chen, Jianping Wang'}] |
3D Object Detection 三维物体检测 | v2 3D detection 三维检测 autonomous driving 自动驾驶 collaborative perception 协同感知 |
Input: Sensor data from LiDAR and cameras 传感器数据来自LiDAR和相机 Step1: Evaluate dynamic feature trust modulus (DFTM) 评估动态特征信任模数 (DFTM) Step2: Implement multi-scale fusion method 实现多尺度融合方法 Step3: Validate performance through extensive experiments 通过广泛实验验证性能 Output: Enhanced robustness in 3D object detection 提高三维物体检测的鲁棒性 |
9.5 | [9.5] 2502.08285 Fully-Geometric Cross-Attention for Point Cloud Registration [{'name': 'Weijie Wang, Guofeng Mei, Jian Zhang, Nicu Sebe, Bruno Lepri, Fabio Poiesi'}] |
3D Reconstruction 三维重建 | v2 Point Cloud Registration 点云配准 Geometric Attention 几何注意力 Transformer Network 变换网络 |
Input: Point clouds 输入: 点云 Step1: Cross-attention mechanism development 步骤1: 交叉注意力机制开发 Step2: Integration of Gromov-Wasserstein distance into attention 步骤2: 将Gromov-Wasserstein距离集成到注意力机制中 Step3: Point feature aggregation through self-attention 步骤3: 通过自注意力聚合点特征 Output: Enhanced point cloud registration results 输出: 改进的点云配准结果 |
9.5 | [9.5] 2502.08352 Sat-DN: Implicit Surface Reconstruction from Multi-View Satellite Images with Depth and Normal Supervision [{'name': 'Tianle Liu, Shuangming Zhao, Wanshou Jiang, Bingxuan Guo'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction satellite imagery neural networks |
Input: Multi-view satellite images 多视角卫星图像 Step1: Incorporate explicit depth guidance 引入显式深度指导 Step2: Apply surface normal consistency constraints 应用表面法线一致性约束 Step3: Utilize a multi-resolution hash grid for efficient reconstruction 使用多分辨率哈希网格进行高效重建 Output: Accurate 3D models from satellite images 从卫星图像获得精准的三维模型 |
8.5 | [8.5] 2502.07829 Preference Alignment on Diffusion Model: A Comprehensive Survey for Image Generation and Editing [{'name': 'Sihao Wu, Xiaonan Si, Chi Xing, Jianhong Wang, Gaojie Jin, Guangliang Cheng, Lijun Zhang, Xiaowei Huang'}] |
Image Generation 图像生成 | v2 diffusion models image generation preference alignment autonomous driving |
Input: Integration of preference alignment with diffusion models 偏好对齐与扩散模型的结合 Step1: Systematic review of optimization techniques 对优化技术进行系统回顾 Step2: Exploration of applications across various fields 在多个领域探索应用 Step3: Discussion of challenges in preference alignment 讨论偏好对齐中的挑战 Output: Insights for future innovation 未来创新的洞察 |
8.5 | [8.5] 2502.08377 Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features [{'name': 'Liying Yang, Chen Liu, Zhenwei Zhu, Ajian Liu, Hui Ma, Jian Nong, Yanyan Liang'}] |
3D Generation 三维生成 | v2 4D generation dynamic-static features computer vision |
Input: Video frames 视频帧 Step1: Feature extraction 特征提取 Step2: Dynamic-static feature decoupling 动态静态特征解耦 Step3: Temporal-spatial similarity fusion 在时间-空间上选择相似特征 Output: 4D content generation 4D内容生成 |
8.5 | [8.5] 2502.08639 CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation [{'name': 'Qinghe Wang, Yawen Luo, Xiaoyu Shi, Xu Jia, Huchuan Lu, Tianfan Xue, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai'}] |
Image and Video Generation 图像生成 | v2 3D-aware text-to-video generation depth maps camera trajectories |
Input: User-defined scene parameters 用户定义的场景参数 Step1: Interactive workflow for 3D control 3D控制的交互工作流程 Step2: Condition signal construction 条件信号构建 Step3: Text-to-video generation from control signals 基于控制信号的文本生成视频 Output: Generated controllable video 输出: 生成的可控视频 |
8.0 | [8.0] 2502.08374 AdvSwap: Covert Adversarial Perturbation with High Frequency Info-swapping for Autonomous Driving Perception [{'name': 'Yuanhao Huang, Qinfan Zhang, Jiandong Xing, Mengyue Cheng, Haiyang Yu, Yilong Ren, Xiao Xiong'}] |
Autonomous Driving 自动驾驶 | v2 adversarial attack autonomous driving information swapping |
Input: Autonomous vehicle images 自动驾驶车辆图像 Step1: Information swapping 信息交换 Step2: Adversarial sample generation 对抗样本生成 Step3: Evaluation on datasets 在数据集上评估 Output: Robust adversarial samples 稳健的对抗样本 |
7.5 | [7.5] 2502.08646 Poly-Autoregressive Prediction for Modeling Interactions [{'name': 'Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegaran, Shiry Ginosar, Jitendra Malik'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 autonomous vehicles trajectory prediction multi-agent interactions behavior forecasting |
Input: Ego agent's state history and states of other interacting agents 自我代理的状态历史和其他交互代理的状态 Step1: Model behavior as a sequence of tokens 将行为建模为状态序列 Step2: Use a transformer for prediction 使用变压器进行预测 Step3: Apply to different prediction tasks 应用到不同的预测任务 Output: Predicted future behavior of the ego agent 输出自我代理的未来行为预测 |
6.5 | [6.5] 2502.07838 NanoVLMs: How small can we go and still make coherent Vision Language Models? [{'name': 'Mukund Agarwalla, Himanshu Kumar, Raj Dandekar, Rajat Dandekar, Sreedath Panat'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models lightweight models |
Input: Image-text pairs 图像-文本对 Step1: Dataset creation 数据集创建 Step2: Model training 模型训练 Step3: Evaluation using creative scoring 通过创意评分进行评估 Output: Lightweight vision-language models 轻量级视觉语言模型 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.07140 Few-Shot Multi-Human Neural Rendering Using Geometry Constraints [{'name': 'Qian li, Victoria Fern\`andez Abrevaya, Franck Multon, Adnane Boukhayma'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction multi-human scenes neural rendering |
Input: Sparse multi-view images 稀疏多视角图像 Step1: Geometry constraints using SMPL meshes 使用SMPL网格的几何约束 Step2: Regularize signed distances for optimization 通过正则化签名距离进行优化 Step3: Apply ray and saturation regularization 应用射线和饱和度正则化 Output: Accurate multi-human 3D reconstructions and renderings 准确的多人的三维重建和渲染 |
9.5 | [9.5] 2502.07278 Articulate That Object Part (ATOP): 3D Part Articulation from Text and Motion Personalization [{'name': 'Aditya Vora, Sauradip Nag, Hao Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D articulation motion personalization video diffusion |
Input: Segmented mesh and text prompt 输入:分割网格和文本提示 Step1: Few-shot finetuning for category-specific motion generation 第一步:针对特定类别的运动生成进行少量样本微调 Step2: Multi-view rendering to generate personalized motion video 第二步:多视角渲染生成个性化运动视频 Step3: Differentiable rendering for transferring motion to the 3D object 第三步:可微渲染将运动转移到三维对象 Output: Articulated 3D object with realistic motion 输出:具有真实运动的关节三维对象 |
9.5 | [9.5] 2502.07289 Learning Inverse Laplacian Pyramid for Progressive Depth Completion [{'name': 'Kun Wang, Zhiqiang Yan, Junkai Fan, Jun Li, Jian Yang'}] |
Depth Estimation 深度估计 | v2 depth completion 3D reconstruction state-of-the-art |
Input: Sparse depth measurements and corresponding color image 稀疏深度测量和相应的彩色图像 Step 1: Initial low-resolution depth prediction 初步低分辨率深度预测 Step 2: Multi-path feature extraction via MFP module 通过MFP模块进行多路径特征提取 Step 3: Depth map refinement through upsampling and selective filtering 通过上采样和选择性过滤进行深度图优化 Output: Dense depth map 稠密深度图 |
9.5 | [9.5] 2502.07309 Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving [{'name': 'Xiang Li, Pengfei Li, Yupeng Zheng, Wei Sun, Yan Wang, Yilun Chen'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D occupancy modeling 3D占用建模 autonomous driving 自动驾驶 scene understanding 场景理解 |
Input: Multi-view images 多视角图像 Step1: Self-supervised pre-training with 2D labels 自监督预训练以2D标签 Step2: Fully-supervised fine-tuning with 3D occupancy labels 全监督微调以3D占用标签 Step3: State-conditioned forecasting module for future occupancy 未来占用状态条件预测模块 Output: 3D occupancy predictions 3D占用预测 |
9.5 | [9.5] 2502.07403 Extended monocular 3D imaging [{'name': 'Zicheng Shen, Feng Zhao, Yibo Ni, Yuanmu Yang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D imaging 3D成像 monocular vision 单目视觉 depth estimation 深度估计 material identification 材料识别 |
Input: Monocular camera with diffractive-refractive hybrid lens 使用具备衍射-折射混合透镜的单目相机 Step1: Multi-stage fusion of depth cues 深度线索的多级融合 Step2: Snapshot acquisition of 3D point cloud 3D点云的快照获取 Step3: Accurate 3D reconstruction 精确的3D重建 Output: Enhanced 3D imaging capabilities 改进的3D成像能力 |
9.5 | [9.5] 2502.07505 Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds [{'name': 'Lisa Weijler, Pedro Hermosilla'}] |
Point Cloud Processing 点云处理 | v2 3D point clouds 3D点云 equivariance 等变性 |
Input: 3D point clouds 3D点云 Step1: Define Local Reference Frame (LRF) 定义局部参考框架 Step2: Implement continuous SE(3) equivariant convolution 实现连续SE(3)等变卷积 Step3: Train the model with stochastically sampled frames 用随机采样的框架训练模型 Output: Local rotation equivariant features 输出局部旋转等变特征 |
9.5 | [9.5] 2502.07615 Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors [{'name': 'Lin-Zhuo Chen, Kangjie Liu, Youtian Lin, Siyu Zhu, Zhihao Li, Xun Cao, Yao Yao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting mesh reconstruction geometry reconstruction |
Input: 3D Gaussian Splatting images 3D高斯点云图像 Step1: Incorporate pre-trained matching prior 引入预训练匹配先验 Step2: Implement Flow Distillation Sampling 算法流蒸馏抽样 Step3: Target unobserved views 目标未观察视图 Output: Enhanced geometric reconstruction 改进的几何重建 |
9.5 | [9.5] 2502.07685 Matrix3D: Large Photogrammetry Model All-in-One [{'name': 'Yuanxun Lu, Jingyang Zhang, Tian Fang, Jean-Daniel Nahmias, Yanghai Tsin, Long Quan, Xun Cao, Yao Yao, Shiwei Li'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction photogrammetry depth estimation pose estimation novel view synthesis |
Input: Multi-modal data (images, camera parameters, depth maps) 图像、相机参数和深度图的多模态数据 Step 1: Masked input learning 掩码输入学习 Step 2: Pose estimation 位置估计 Step 3: Depth prediction 深度预测 Step 4: Novel view synthesis 新视图合成 Output: Comprehensive 3D model 综合三维模型 |
9.0 | [9.0] 2502.07030 PrismAvatar: Real-time animated 3D neural head avatars on edge devices [{'name': 'Prashant Raina, Felix Taubner, Mathieu Tuli, Eu Wern Teh, Kevin Ferreira'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D avatar neural rendering real-time animation head modeling |
Input: Series of matted images of a head 头部图像序列 Step1: Data acquisition and tracking 数据采集与跟踪 Step2: Train hybrid mesh-volumetric model 训练混合网格-体积模型 Step3: Distillation into rigged mesh and neural textures 蒸馏成具有骨架的网格和神经纹理 Output: Real-time animated 3D head avatar 实时动画3D头像 |
8.5 | [8.5] 2502.06843 Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation [{'name': 'Namhee Kim, Woojin Park'}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving large language models computer vision |
Input: Visual inputs and scenarios 视觉输入与场景 Step1: Feature extraction using YOLOv4 and ViT 使用YOLOv4和ViT进行特征提取 Step2: Integration with LLM for reasoning 与LLM结合进行推理 Step3: Generation of situation descriptions and responses 生成情境描述和适当反应 Output: Improved autonomous driving assistance system 改进的自动驾驶辅助系统 |
8.5 | [8.5] 2502.06957 GAS: Generative Avatar Synthesis from a Single Image [{'name': 'Yixing Lu, Junting Dong, Youngjoong Kwon, Qin Zhao, Bo Dai, Fernando De la Torre'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 avatar generation 3D reconstruction diffusion models |
Input: A single image 单幅图像 Step1: 3D human reconstruction 人体三维重建 Step2: Dense driving signal generation 生成密集驱动信号 Step3: Video diffusion model application 应用视频扩散模型 Output: View-consistent and temporally coherent avatars 输出:视图一致且时间连贯的头像 |
8.5 | [8.5] 2502.07001 From Image to Video: An Empirical Study of Diffusion Representations [{'name': "Pedro V\'elez, Luisa F. Polan\'ia, Yi Yang, Chuhan Zhang, Rishab Kabra, Anurag Arnab, Mehdi S. M. Sajjadi"}] |
Image and Video Generation 图像生成与视频生成 | v2 diffusion models video synthesis image generation depth estimation |
Input: Video and image diffusion models 视频与图像扩散模型 Step1: Model architecture comparison 模型架构比较 Step2: Performance analysis of latent representations 潜在表示性能分析 Step3: Feature extraction and qualitative analysis 特征提取与定性分析 Output: Insights into representations and performance 表示与性能的见解 |
8.5 | [8.5] 2502.07007 Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC [{'name': 'Siwei Meng, Yawei Luo, Ping Liu'}] |
Image and Video Generation 图像生成与视频生成 | v2 3D generation physics priors AI-generated content physical realism |
Input: Generative models 生成模型 Step1: Review of physics-aware methods 物理感知方法的回顾 Step2: Categorization of generation techniques 生成技术的分类 Step3: Comparative analysis 比较分析 Output: Insights for future research 未来研究的洞见 |
8.5 | [8.5] 2502.07120 Is Long Range Sequential Modeling Necessary For Colorectal Tumor Segmentation? [{'name': 'Abhishek Srivastava, Koushik Biswas, Gorkem Durak, Gulsah Ozden, Mustafa Adli, Ulas Bagci'}] |
3D Segmentation and Reconstruction 3D分割与重建 | v2 3D segmentation tumor segmentation colorectal cancer |
Input: 3D medical images 3D医学影像 Step 1: Evaluate long-range and local token modeling mechanisms 评估长范围和局部标记建模机制 Step 2: Propose MambaOutUNet for tumor segmentation 提出MambaOutUNet用于肿瘤分割 Step 3: Analyze performance on the CTS-204 dataset 在CTS-204数据集上分析性能 Output: Comparative results on tumor segmentation techniques 输出:肿瘤分割技术的比较结果 |
8.5 | [8.5] 2502.07145 Mesh2SSM++: A Probabilistic Framework for Unsupervised Learning of Statistical Shape Model of Anatomies from Surface Meshes [{'name': 'Krithika Iyer, Mokshagna Sai Teja Karanam, Shireen Elhabian'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 Statistical Shape Modeling Surface Meshes Unsupervised Learning |
Input: Surface meshes 表面网格 Step1: Estimate correspondences from meshes 估计来自网格的对应关系 Step2: Develop probabilistic shape model 开发概率形状模型 Step3: Evaluate model performance 评估模型性能 Output: Statistical shape model 统计形状模型 |
8.5 | [8.5] 2502.07194 Dense Object Detection Based on De-homogenized Queries [{'name': 'Yueming Huang, Chenrui Ma, Hao Zhou, Hao Wu, Guowu Yuan'}] |
Autonomous Driving 自动驾驶 | v2 dense object detection autonomous driving DETR deep learning computer vision |
Input: Dense object detection scenario 密集目标检测场景 Step1: Identify issues with existing NMS methods 识别现有NMS方法的问题 Step2: Propose differentiated encoding for queries 提出差异化编码以应对查询 Step3: Implement joint loss for better query initialization 实施联合损失以更好地初始化查询 Output: Enhanced dense object detection framework 改进的密集目标检测框架 |
8.5 | [8.5] 2502.07372 USRNet: Unified Scene Recovery Network for Enhancing Traffic Imaging under Multiple Adverse Weather Conditions [{'name': 'Yuxu Lu, Ai Chen, Dong Yang, Ryan Wen Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction autonomous driving image restoration |
Input: Degraded images 退化图像 Step1: Feature extraction 特征提取 Step2: Scene restoration 场景恢复 Step3: Edge feature extraction 边缘特征提取 Output: Enhanced image quality 改进的图像质量 |
8.5 | [8.5] 2502.07417 Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving [{'name': 'Novendra Setyawan, Ghufron Wahyu Kurniawan, Chi-Chia Sun, Wen-Kai Kuo, Jun-Wei Hsieh'}] |
Autonomous Driving 自动驾驶 | v2 Object Detection 目标检测 Autonomous Driving 自动驾驶 Vision Transformer 视觉变换器 |
Input: Driving scene images 驾驶场景图像 Step1: Analyze backbone architectures 分析主干架构 Step2: Develop reparameterized attention vision transformer 开发重参数化注意力视觉变换器 Step3: Integrate multi-scale feature extraction 集成多尺度特征提取 Step4: Model evaluation 模型评估 Output: High-performance object detection model 高性能目标检测模型 |
8.5 | [8.5] 2502.07486 Automated Road Extraction and Centreline Fitting in LiDAR Point Clouds [{'name': 'Xinyu Wang, Muhammad Ibrahim, Atif Mansoor, Hasnein Tareque, Ajmal Mian'}] |
3D Reconstruction and Modeling 三维重建 | v2 road extraction 3D point clouds LiDAR |
Input: 3D LiDAR point clouds 3D LiDAR点云 Step 1: Statistical outlier removal 统计离群值去除 Step 2: Density-based clustering 基于密度的聚类 Step 3: Ground point filtering using grid-based segmentation 使用基于网格的分割进行地面点过滤 Step 4: 2D projection and skeletonization 2D投影和骨架化 Step 5: Back-projection onto 3D point cloud 反投影到3D点云 Output: Refined road points and centreline 提炼的道路点和中心线 |
8.5 | [8.5] 2502.07631 Divide and Merge: Motion and Semantic Learning in End-to-End Autonomous Driving [{'name': 'Yinzhe Shen, \"Omer \c{S}ahin Ta\c{s}, Kaiwen Wang, Royden Wagner, Christoph Stiller'}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving motion learning semantic learning |
Input: Camera data 摄像头数据 Step1: Motion and semantic task separation 任务分离 Step2: Neural-Bayes motion decoder 运动解码器 Step3: Interactive semantic decoder 交互式语义解码器 Output: Improved detection and tracking 改进的检测与跟踪 |
8.5 | [8.5] 2502.07680 Multiview Point Cloud Registration Based on Minimum Potential Energy for Free-Form Blade Measurement [{'name': 'Zijie Wu, Yaonan Wang, Yang Mo, Qing Zhu, He Xie, Haotian Wu, Mingtao Feng, Ajmal Mian'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction point cloud registration noise resistance industrial measurement |
Input: Point cloud data 点云数据 Step1: Definition of objective function 目标函数定义 Step2: Global optimization procedure 全局优化过程 Step3: Fine registration using trimmed ICP 精细配准,使用修剪的ICP算法 Output: Registered point clouds 注册后的点云 |
8.5 | [8.5] 2502.07785 Pippo: High-Resolution Multi-View Humans from a Single Image [{'name': 'Yash Kant, Ethan Weber, Jin Kyu Kim, Rawal Khirodkar, Su Zhaoen, Julieta Martinez, Igor Gilitschenski, Shunsuke Saito, Timur Bagautdinov'}] |
3D Generation 三维生成 | v2 3D consistency multi-view generation video generation |
Input: Single image of a person 一个人的单张图像 Step1: Pre-training on human images 人体图像的预训练 Step2: Multi-view mid-training 多视角中期训练 Step3: Post-training with pixel-aligned controls 像素对齐控制的后期训练 Output: 1K resolution multi-view consistent images 1K分辨率的多视角一致图像 |
8.0 | [8.0] 2502.07508 Enhance-A-Video: Better Generated Video for Free [{'name': 'Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, Yang You'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation temporal consistency DiT-based models |
Input: DiT-based video generation models 基于DiT的视频生成模型 Step1: Analyze temporal attention analysis 时序注意力分析 Step2: Introduce cross-frame intensity parameters 引入跨帧强度参数 Step3: Enhance video quality through adjusted dependencies 调整依赖关系以增强视频质量 Output: Enhanced video generation quality 提升的视频生成质量 |
8.0 | [8.0] 2502.07564 An Elliptic Curve Based Solution to the Perspective-Three-Point Problem [{'name': 'Michael Q. Rieck'}] |
Computer Vision and Pose Estimation 计算机视觉与位姿估计 | v2 P3P camera pose elliptic curves |
Input: Control points 控制点 Step1: Determine directions of lines 计算直线方向 Step2: Develop P3P solver 开发P3P求解器 Step3: Compare with linear solvers 与线性求解器比较 Output: Accurate camera poses 准确的相机位姿 |
7.5 | [7.5] 2502.07306 TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation [{'name': 'Navid Rajabi, Jana Kosecka'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Navigation modular approach navigation instruction |
Input: Navigation instruction and environment map 导航指令和环境地图 Step1: Extract landmarks using LLM 提取地标 Step2: Retrieve top-k locations using shortest path algorithm 检索前k个位置,使用最短路径算法 Step3: Compute alignment score with dynamic programming 使用动态规划计算对齐评分 Output: Evaluate path fidelity using nDTW metric 输出:使用nDTW指标评估路径可信度 |
7.5 | [7.5] 2502.07617 Scaling Pre-training to One Hundred Billion Data for Vision Language Models [{'name': 'Xiao Wang, Ibrahim Alabdulmohsin, Daniel Salz, Zhe Li, Keran Rong, Xiaohua Zhai'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models cultural diversity multilinguality |
Input: 100 billion image-text pairs 1000亿图像-文本对 Step1: Empirical investigation 实证研究 Step2: Performance analysis 性能分析 Step3: Cultural diversity assessment 文化多样性评估 Output: Insights on VLM performance 视觉语言模型性能见解 |
7.5 | [7.5] 2502.07701 Magic 1-For-1: Generating One Minute Video Clips within One Minute [{'name': 'Hongwei Yi, Shitong Shao, Tian Ye, Jiantong Zhao, Qingyu Yin, Michael Lingelbach, Li Yuan, Yonghong Tian, Enze Xie, Daquan Zhou'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation diffusion models text-to-image image-to-video |
Input: Text and video data 文本和视频数据 Step1: Task factorization 任务分解 Step2: Generative prior injection 生成先验注入 Step3: Model optimization 模型优化 Output: Efficient video clips 生成高效视频片段 |
7.5 | [7.5] 2502.07737 Next Block Prediction: Video Generation via Semi-Autoregressive Modeling [{'name': 'Shuhuai Ren, Shuming Ma, Xu Sun, Furu Wei'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation semi-autoregressive modeling |
Input: Video data 视频数据 Step1: Block decomposition 块分解 Step2: Semi-autoregressive generation 半自回归生成 Step3: Bidirectional attention application 双向注意力应用 Output: Generated video frames 生成的视频帧 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.05222 VistaFlow: Photorealistic Volumetric Reconstruction with Dynamic Resolution Management via Q-Learning [{'name': 'Jayram Palamadai, William Yu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D volumetric reconstruction 3D体积重建 dynamic resolution management 动态分辨率管理 photorealistic rendering 照相真实渲染 |
Input: 2D photographs 二维照片 Step1: Image conversion to PlenOctree data structure 图像转换为PlenOctree数据结构 Step2: Dynamic resolution management using QuiQ 动态分辨率管理使用QuiQ Step3: Synthesizing novel viewpoints using differentiable rendering 合成新视角使用可微渲染 Output: Interactive 3D volumetric images 互动三维体积图像 |
9.5 | [9.5] 2502.05378 NextBestPath: Efficient 3D Mapping of Unseen Environments [{'name': "Shiyao Li, Antoine Gu\'edon, Cl\'ementin Boittiaux, Shizhe Chen, Vincent Lepetit"}] |
3D Mapping and Reconstruction 3D映射与重建 | v2 3D mapping active mapping robotics |
Input: Unseen indoor environments 未知室内环境 Step1: Create and benchmark a new dataset (AiMDoom) 创建并基准新的数据集 (AiMDoom) Step2: Develop the next-best-path method (NBP) 开发下一最佳路径方法 (NBP) Step3: Plan and optimize trajectory for active mapping 规划和优化主动映射的轨迹 Output: Efficiently reconstructed 3D models 有效重建的三维模型 |
9.5 | [9.5] 2502.05859 SphereFusion: Efficient Panorama Depth Estimation via Gated Fusion [{'name': 'Qingsong Yan, Qiang Wang, Kaiyong Zhao, Jie Chen, Bo Li, Xiaowen Chu, Fei Deng'}] |
Depth Estimation 深度估计 | v2 panorama depth estimation 3D reconstruction autonomous driving |
Input: Panorama images 全景图像 Step1: Feature extraction 特征提取 Step2: Feature fusion 特征融合 Step3: Depth estimation 深度估计 Output: Depth map and point cloud 深度图和点云 |
9.5 | [9.5] 2502.05874 MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation [{'name': 'Zhifei Yang, Keyang Lu, Chao Zhang, Jiaxing Qi, Hanqi Jiang, Ruifei Ma, Shenglin Yin, Yifan Xu, Mingzhe Xing, Zhen Xiao, Jieyi Long, Xiangde Liu, Guangyao Zhai'}] |
3D Generation 三维生成 | v2 3D scene generation geometry control mixed-modality graph |
Input: Mixed-Modality Graph combining textual and visual modalities Step1: Process user inputs involving text, image, or both Step2: Visual enhancement module constructs visual representations Step3: Relation predictor infers relationships between nodes Output: Generated 3D indoor scenes with controllable geometry |
9.5 | [9.5] 2502.06336 DefTransNet: A Transformer-based Method for Non-Rigid Point Cloud Registration in the Simulation of Soft Tissue Deformation [{'name': 'Sara Monji-Azad, Marvin Kinz, Siddharth Kothari, Robin Khanna, Amrei Carla Mihan, David Maennel, Claudia Scherl, Juergen Hesser'}] |
Point Cloud Processing 点云处理 | v2 3D reconstruction point cloud registration Transformers |
Input: Source and target point clouds 源点云和目标点云 Step1: Feature descriptor design 特征描述符设计 Step2: Learning displacement vector fields 学习位移向量场 Output: Enhanced point cloud registration 改进的点云配准 |
9.5 | [9.5] 2502.06338 Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior [{'name': 'Lee Hyoseok, Kyeong Seon Kim, Kwon Byung-Ki, Tae-Hyun Oh'}] |
Depth Estimation 深度估计 | v2 depth completion 3D reconstruction zero-shot learning |
Input: Sparse depth measurements and RGB images 输入:稀疏深度测量与RGB图像 Step1: Alignment of depth prior with sparse measurements 步骤1:将深度先验与稀疏测量对齐 Step2: Optimization loop at test-time to enforce constraints 步骤2:在测试时进行优化循环以强制约束 Step3: Depth map completion based on aligned prior 步骤3:基于对齐的先验完成深度图 Output: Complete dense depth map 输出:完整的密集深度图 |
9.5 | [9.5] 2502.06367 FOCUS - Multi-View Foot Reconstruction From Synthetically Trained Dense Correspondences [{'name': 'Oliver Boyne, Roberto Cipolla'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction multi-view reconstruction foot model structure-from-motion dense correspondences |
Input: Multi-view RGB images 多视角RGB图像 Step1: Dataset extension 数据集扩展 Step2: Dense correspondence prediction 密集对应关系预测 Step3: 3D surface reconstruction via SfM and optimization 通过SfM和优化进行3D表面重建 Output: 3D mesh model 输出: 3D网格模型 |
9.5 | [9.5] 2502.06608 TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models [{'name': 'Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, Yan-Pei Cao'}] |
3D Generation 三维生成 | v2 3D Generation Shape Diffusion High-Fidelity 3D Models |
Input: Images 输入: 图像 Step1: Data processing 数据处理 Step2: Shape generation 形状生成 Step3: Model evaluation 模型评估 Output: High-fidelity 3D meshes 输出: 高保真3D网格 |
9.5 | [9.5] 2502.06682 Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene [{'name': 'Tai-Yu Pan, Sooyoung Jeon, Mengdi Fan, Jinsu Yoo, Zhenyang Feng, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao'}] |
3D Generation 三维生成 | v2 3D generation collaborative perception autonomous driving point cloud generation |
Input: Ego-car sensory data 车载传感器数据 Step 1: Data integration 数据集成 Step 2: Conditioned diffusion model training 条件扩散模型训练 Step 3: Generate realistic point clouds 生成真实的点云 Output: Collaborative perception data 协同感知数据 |
9.2 | [9.2] 2502.05769 Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform [{'name': 'Kyle Gao, Dening Lu, Liangzhi Li, Nan Chen, Hongjie He, Linlin Xu, Jonathan Li'}] |
3D Modeling 三维建模 | v2 3D modeling Gaussian Splatting urban digital twin GIS integration Large Language Models |
Input: Building's address, postal code, or geographic coordinates Step1: Integrate with Google Maps Platform APIs Step2: Perform Gaussian Splatting-based mesh extraction Step3: Retrieve 3D models and visual descriptions Output: Digital twin of the building with 3D models and layers of data |
8.5 | [8.5] 2502.05409 Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment [{'name': 'Maneesha Wickramasuriya, Beomyeol Yu, Taeyoung Lee, Murray Snyder'}] |
3D Simulation and Modeling 三维仿真与建模 | v2 3D simulation pose estimation UAV Gaussian splatting |
Input: Monocular images from UAV 无人机采集的单目图像 Step1: Data integration and simulation 数据集成与仿真 Step2: Deep pose estimation algorithm development 深度姿态估计算法开发 Step3: Indoor testing and validation 室内测试与验证 Output: Accurate pose estimation for UAV relative to the vessel 输出:无人机相对于船只的准确姿态估计 |
8.5 | [8.5] 2502.05779 A 3D Multimodal Feature for Infrastructure Anomaly Detection [{'name': 'Yixiong Jing, Wei Lin, Brian Sheil, Sinan Acikgoz'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction anomaly detection point clouds crack detection |
Input: Point clouds and multimodal features 点云和多模态特征 Step1: Feature extraction 特征提取 Step2: Integration with PatchCore algorithm 集成至PatchCore算法 Step3: Evaluation with statistical methods 使用统计方法进行评估 Output: Enhanced defect detection results 改进的缺陷检测结果 |
8.5 | [8.5] 2502.05964 Revisiting Gradient-based Uncertainty for Monocular Depth Estimation [{'name': 'Julia Hornauer, Amir El-Ghoussani, Vasileios Belagiannis'}] |
Depth Estimation 深度估计 | v2 Monocular Depth Estimation 单目深度估计 Uncertainty Estimation 不确定性估计 |
Input: Monocular images 单目图像 Step1: Gradient extraction using auxiliary loss 梯度提取与辅助损失 Step2: Uncertainty score calculation 不确定性评分计算 Output: Depth predictions and uncertainty scores 深度预测与不确定性评分 |
8.5 | [8.5] 2502.06019 Noise is an Efficient Learner for Zero-Shot Vision-Language Models [{'name': 'Raza Imam, Asif Hanif, Jian Zhang, Khaled Waleed Dawoud, Yova Kementchedjhieva, Mohammad Yaqub'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 vision-language models noise adaptation test-time adaptation |
Input: Visual representations 视觉表征 Step1: Test-time adaptation 测试时适应 Step2: Learnable noise optimization 可学习噪声优化 Step3: Inter-view representation alignment 视图间表征对齐 Output: Enhanced VLM performance 改进的视觉语言模型性能 |
8.5 | [8.5] 2502.06219 Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing [{'name': 'Sicen Guo, Tianyou Wen, Chuang-Wei Liu, Qijun Chen, Rui Fan'}] |
3D Reconstruction and Modeling 三维重建 | v2 RGB-D driving scene parsing Heterogeneous Feature Integration Transformer Vision Foundation Models |
Input: RGB and depth data RGB和深度数据 Step1: Relative depth estimation 进行相对深度估计 Step2: Heterogeneous Feature Integration Transformer (HFIT) development 开发异构特征集成变换器 (HFIT) Step3: Feature integration and evaluation 特征集成与评估 Output: Enhanced driving scene parsing model 改进的驾驶场景解析模型 |
8.5 | [8.5] 2502.06337 Accelerating Outlier-robust Rotation Estimation by Stereographic Projection [{'name': 'Taosi Xu, Yinlong Liu, Xianbo Wang, Zhi-Xin Yang'}] |
3D Reconstruction and Modeling 三维重建 | v2 Rotation Estimation Outlier Robustness Stereographic Projection Point Cloud Registration |
Input: 3D point sets from different views 3D 点集来自不同视角 Step1: Investigate geometric constraints 调查几何约束 Step2: Use stereographic projection for rotation axis estimation 使用立体投影进行旋转轴估计 Step3: Implement spatial voting for axis identification 实施空间投票以识别轴 Output: Optimal rotation estimations optimal 旋转估计 |
8.5 | [8.5] 2502.06392 TANGLED: Generating 3D Hair Strands from Images with Arbitrary Styles and Viewpoints [{'name': 'Pengyu Long, Zijun Zhao, Min Ouyang, Qingcheng Zhao, Qixuan Zhang, Wei Yang, Lan Xu, Jingyi Yu'}] |
3D Generation 三维生成 | v2 3D hair generation diffusion models multi-view input |
Input: Multi-view linearts and images 多视角线稿和图像 Step 1: Collecting and annotating diverse hairstyle dataset 收集和标注多样的发型数据集 Step 2: Implementing a latent diffusion model with cross-attention 采用具有跨注意力的潜在扩散模型 Step 3: Applying parametric post-processing to enforce structural constraints 应用参数后处理以强制执行结构约束 Output: High-quality 3D hair strands 高质量三维发丝 |
8.5 | [8.5] 2502.06543 Unsupervised Learning for Feature Extraction and Temporal Alignment of 3D+t Point Clouds of Zebrafish Embryos [{'name': 'Zhu Chen, Ina Laube, Johannes Stegmaier'}] |
3D Reconstruction 三维重建 | v2 3D+t point clouds temporal alignment unsupervised learning |
Input: 3D+t point clouds of zebrafish embryos 3D+t 点云 Step1: Feature extraction using autoencoder 特征提取通过自编码器 Step2: Temporal alignment using regression network 时间对齐通过回归网络 Output: Aligned time frames of 3D+t point clouds 对齐的3D+t点云时间帧 |
8.5 | [8.5] 2502.06782 Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT [{'name': 'Dongyang Liu, Shicheng Li, Yutong Liu, Zhen Li, Kai Wang, Xinyue Li, Qi Qin, Yufei Liu, Yi Xin, Zhongyu Li, Bin Fu, Chenyang Si, Yuewen Cao, Conghui He, Ziwei Liu, Yu Qiao, Qibin Hou, Hongsheng Li, Peng Gao'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation Diffusion Transformers |
Input: Video generation task 视频生成任务 Step1: Implement Multi-scale Next-DiT architecture 实现多尺度Next-DiT架构 Step2: Incorporate motion conditioning 引入运动条件 Step3: Progressive and multi-source training for efficiency 进行渐进和多源训练以提高效率 Output: High-quality generated videos 高质量生成视频 |
8.5 | [8.5] 2502.06787 Visual Agentic AI for Spatial Reasoning with a Dynamic API [{'name': 'Damiano Marsili, Rohun Agrawal, Yisong Yue, Georgia Gkioxari'}] |
Spatial Reasoning 空间推理 | v2 3D spatial reasoning Visual reasoning Dynamic API |
Input: Queries for 3D understanding 3D理解的查询 Step1: Dynamic API generation 动态API生成 Step2: Program synthesis 程序合成 Step3: Evaluation with benchmarks 使用基准评估 Output: Enhanced 3D spatial reasoning capabilities 改进的3D空间推理能力 |
8.0 | [8.0] 2502.06023 Dual Caption Preference Optimization for Diffusion Models [{'name': 'Amir Saeidi, Yiran Luo, Agneet Chatterjee, Shamanthak Hegde, Bimsara Pathiraja, Yezhou Yang, Chitta Baral'}] |
Image Generation 图像生成 | v2 image generation text-to-image diffusion models |
Input: Text-to-image diffusion model 文本到图像扩散模型 Step1: Mitigate irrelevant prompts 减少无关提示 Step2: Optimize dual caption preferences 优化双重标题偏好 Step3: Experiment with different caption strategies 采用不同的标题策略 Output: Improved image generation 改进的图像生成 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.04630 High-Speed Dynamic 3D Imaging with Sensor Fusion Splatting [{'name': 'Zihao Zou, Ziyuan Qu, Xi Peng, Vivek Boominathan, Adithya Pediredla, Praneeth Chakravarthula'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction sensor fusion Gaussian splatting high-speed imaging |
Input: RGB, depth, and event camera data 输入: RGB、深度和事件相机数据 Step1: Data integration 数据集成 Step2: Scene representation using deformable 3D Gaussians 场景表示使用可变形3D高斯 Step3: Joint optimization of Gaussian parameters jointly 优化高斯参数 Output: High-quality 3D scene reconstruction 输出: 高质量3D场景重建 |
9.5 | [9.5] 2502.04734 SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting [{'name': 'Huajian Huang, Yingshu Chen, Longwei Li, Hui Cheng, Tristan Braud, Yajie Zhao, Sai-Kit Yeung'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction omnidirctional images |
Input: 360-degree images 360度图像 Step1: Direct pose calibration 直接姿态标定 Step2: 3D Gaussians optimization 3D高斯优化 Step3: Joint optimization of parameters 参数的联合优化 Output: Enhanced omnidirectional radiance fields 改进的全方位辐射场 |
9.5 | [9.5] 2502.04804 DetVPCC: RoI-based Point Cloud Sequence Compression for 3D Object Detection [{'name': 'Mingxuan Yan, Ruijie Zhang, Xuedou Xiao, Wei Wang'}] |
3D Object Detection 3D 物体检测 | v2 3D reconstruction point cloud compression object detection |
Input: 3D point cloud sequences 3D 点云序列 Step1: Identify regions of interest (RoIs) 识别兴趣区域 (RoIs) Step2: Apply RoI-based encoding 应用 RoI 基于编码 Step3: Compress using VPCC and evaluate compressive performance 基于 VPCC 压缩并评估压缩性能 Output: Compressed point cloud data with improved detection accuracy 输出: 经过压缩的点云数据,具有改进的检测准确性 |
9.5 | [9.5] 2502.04843 PoI: Pixel of Interest for Novel View Synthesis Assisted Scene Coordinate Regression [{'name': 'Feifei Li, Qi Song, Chi Zhang, Hui Shuai, Rui Huang'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction scene coordinate regression novel view synthesis |
Input: Rendered images and sparse inputs 渲染图像和稀疏输入 Step1: Pixel filtering to retain well-rendered pixels 像素过滤以保留渲染良好的像素 Step2: Scene Coordinate Regression (SCR) model training based on filtered data 基于过滤数据的场景坐标回归模型训练 Step3: Evaluation of pose estimation performance 性能评估 |
9.5 | [9.5] 2502.04981 OccGS: Zero-shot 3D Occupancy Reconstruction with Semantic and Geometric-Aware Gaussian Splatting [{'name': 'Xiaoyu Zhou, Jingqi Wang, Yongtao Wang, Yufei Wei, Nan Dong, Ming-Hsuan Yang'}] |
3D Reconstruction 三维重建 | v2 3D occupancy reconstruction semantic reconstruction Gaussian Splatting |
Input: Raw sensor data 原始传感器数据 Step1: Extract semantic information from vision-language models 提取语言模型中的语义信息 Step2: Construct Semantic and Geometric-Aware Gaussians 构建语义和几何意识高斯 Step3: Implement cumulative Gaussian-to-3D voxel splatting 实现累积高斯到3D体素的溅射 Output: Semantic 3D occupancy reconstruction 语义3D占用重建 |
9.5 | [9.5] 2502.05040 GaussRender: Learning 3D Occupancy with Gaussian Rendering [{'name': 'Loick Chambon, Eloi Zablocki, Alexandre Boulch, Mickael Chen, Matthieu Cord'}] |
3D Reconstruction 三维重建 | v2 3D occupancy Gaussian rendering autonomous driving semantic understanding voxel-based supervision |
Input: 3D voxel representations 3D体素表示 Step1: Projection to 2D perspectives 投影到2D视图 Step2: Introduction of Gaussian splatting 高斯点云引入 Step3: Loss integration for training 损失函数集成 Output: Enhanced 3D occupancy models 改进的3D占用模型 |
9.5 | [9.5] 2502.05175 Fillerbuster: Multi-View Scene Completion for Casual Captures [{'name': 'Ethan Weber, Norman M\"uller, Yash Kant, Vasu Agrawal, Michael Zollh\"ofer, Angjoo Kanazawa, Christian Richardt'}] |
3D Reconstruction 三维重建 | v2 3D scene completion multi-view synthesis novel view generation |
Input: Multi-view casual captures 多视角随意捕捉 Step1: Unobserved content recovery 未观察到的内容恢复 Step2: Generative model training 生成模型训练 Step3: Scene completion and pose prediction 场景补全与姿势预测 Output: Complete 3D scene with novel views 输出: 完整的三维场景与新视角 |
9.5 | [9.5] 2502.05176 AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360{\deg} Unbounded Scene Inpainting [{'name': 'Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen, Jie-Ying Lee, Bo-Hsu Ke, Chun-Wei Tuan Mu, Yi-Chuan Huang, Chin-Yang Lin, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D scene inpainting Gaussian Splatting depth-aware methods multi-view coherence unbounded scenes |
Input: Multi-view images, camera parameters, object masks, and reference images 输入: 多视角图像、相机参数、对象掩膜和参考图像 Step1: Generate depth-aware unseen masks for occlusion identification 步骤1: 生成深度感知的看不见掩膜以识别遮挡 Step2: Apply Adaptive Guided Depth Diffusion for point placement 步骤2: 应用自适应引导深度扩散进行点放置 Step3: Employ SDEdit for detail enhancement and coherence 步骤3: 使用SDEdit进行细节增强和一致性 Output: High-quality inpainted 3D scenes 输出: 高质量的3D场景修复 |
8.5 | [8.5] 2502.04361 Predicting 3D Motion from 2D Video for Behavior-Based VR Biometrics [{'name': 'Mingjun Li, Natasha Kholgade Banerjee, Sean Banerjee'}] |
3D Motion Prediction 三维运动预测 | v2 3D motion prediction biometric authentication virtual reality 2D video |
Input: 2D body joint data from video 输入: 来自视频的2D身体关节数据 Step1: External video tracking 外部视频追踪 Step2: 2D to 3D motion prediction 从2D到3D的运动预测 Step3: Authentication model evaluation 认证模型评估 Output: Enhanced biometric authentication system 输出: 增强的生物识别认证系统 |
8.5 | [8.5] 2502.04377 MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction [{'name': 'Xiaoshuai Hao, Yunfeng Diao, Mengchuan Wei, Yifan Yang, Peng Hao, Rong Yin, Hui Zhang, Weiming Li, Shu Zhao, Yu Liu'}] |
Map Construction 地图构建 | v2 BEV Feature Fusion Autonomous Driving Map Construction Cross-modal Interaction |
Input: Multi-modal data from camera and LiDAR sensors Step1: Cross-modal Interaction Transform (CIT) for semantic alignment Step2: Dual Dynamic Fusion (DDF) for selective information integration Step3: Map construction tasks evaluation Output: Enhanced HD and BEV maps |
8.5 | [8.5] 2502.04378 DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation [{'name': "Luciano Baresi, Davide Yi Xian Hu, Muhammad Irfan Mas'udi, Giovanni Quattrocchi"}] |
Multi-modal Testing and Image Generation 多模态测试与图像生成 | v2 autonomous driving deep learning testing diffusion models |
Input: Existing images from datasets 现有数据集中的图像 Step1: Image captioning 进行图像描述 Step2: Keyword identification 关键词识别 Step3: Counterfactual caption generation 生成反事实描述 Step4: Image generation using diffusion model 利用扩散模型生成图像 Output: Augmented test images 增强的测试图像 |
8.5 | [8.5] 2502.04478 OneTrack-M: A multitask approach to transformer-based MOT models [{'name': 'Luiz C. S. de Araujo, Carlos M. S. Figueiredo'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 Multi-Object Tracking transformers autonomous vehicles |
Input: Video sequences from cameras 视频序列 Step1: Data pre-processing 数据预处理 Step2: Model architecture design 模型架构设计 Step3: Multitask training techniques 多任务训练技术 Output: Enhanced tracking and detection performance 改进的跟踪与检测性能 |
8.5 | [8.5] 2502.04483 Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation [{'name': 'Nathan Louis, Mahzad Khoshlessan, Jason J. Corso'}] |
3D Reconstruction 三维重建 | v2 3D human pose estimation physical plausibility physics simulation 3D reconstruction |
Input: 3D human poses from estimation models 3D 人类姿势估计模型 Step1: Physics simulation setup 物理仿真设置 Step2: Metric introduction (CoM distance, Pose Stability Duration) 指标引入(质心距离,姿态稳定时间) Step3: Evaluation against state-of-the-art methods 评估与现有最佳方法的比较 Output: Metrics for physical plausibility and stability 普适性的物理合理性和稳定性的指标 |
8.5 | [8.5] 2502.04566 An Optimized YOLOv5 Based Approach For Real-time Vehicle Detection At Road Intersections Using Fisheye Cameras [{'name': 'Md. Jahin Alam, Muhammad Zubair Hasan, Md Maisoon Rahman, Md Awsafur Rahman, Najibul Haque Sarker, Shariar Azad, Tasnim Nishat Islam, Bishmoy Paul, Tanvir Anjum, Barproda Halder, Shaikh Anowarul Fattah'}] |
Autonomous Systems and Robotics 自主系统与机器人技术 | v2 vehicle detection YOLOv5 fisheye camera autonomous systems |
Input: Fisheye camera images 鱼眼摄像头图像 Step1: Data acquisition 数据采集 Step2: Image preprocessing 图像预处理 Step3: Vehicle detection using modified YOLOv5 基于改进的YOLOv5进行车辆检测 Step4: Model training and ensemble 模型训练与集成 Output: Real-time vehicle detection results 实时车辆检测结果 |
8.5 | [8.5] 2502.04615 Neural Clustering for Prefractured Mesh Generation in Real-time Object Destruction [{'name': 'Seunghwan Kim, Sunha Park, Seungkyu Lee'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction point cloud segmentation real-time object destruction |
Input: Point cloud data 点云数据 Step1: Clustering point cloud with a neural network 使用神经网络进行点云聚类 Step2: Predicting structural weaknesses 预测结构弱点 Step3: Generating prefractured meshes 生成预裂网格 Output: Ready-to-use prefractured meshes 准备使用的预裂网格 |
8.5 | [8.5] 2502.05055 Differentiable Mobile Display Photometric Stereo [{'name': 'Gawoon Ban, Hyeongjun Kim, Seokjun Choi, Seungwoo Yoon, Seung-Hwan Baek'}] |
3D Reconstruction 三维重建 | v2 Photometric stereo 3D reconstruction Mobile devices Surface normals |
Input: Mobile phone display and camera 移动电话显示器和相机 Step1: Developing a mobile app 开发移动应用 Step2: Capturing HDR images and display patterns 捕获HDR图像和显示模式 Step3: Learning display patterns 通过可微学习模式 Output: 3D surface normals and albedos 3D表面法线和反射率 |
8.5 | [8.5] 2502.05091 DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions [{'name': 'Gorkem Can Ates, Kuang Gong, Wei Shao'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 3D vision-language models medical imaging zero-shot classification efficient computation |
Input: 3D medical images 3D医学图像 Step1: Decomposed convolution设计 设计分解卷积 Step2: Integration into CLIP framework 集成到 CLIP 框架中 Step3: Evaluation on CT-RATE dataset 在 CT-RATE 数据集上评估 Output: Efficient 3D vision-language model 高效的 3D 视觉-语言模型 |
8.5 | [8.5] 2502.05153 Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment [{'name': 'Minh-Quan Le, Gaurav Mittal, Tianjian Meng, A S M Iftekhar, Vishwas Suryanarayanan, Barun Patra, Dimitris Samaras, Mei Chen'}] |
Image Generation 图像生成 | v2 Image Generation Visual Question Answering Multimodal learning |
Input: Multimodal context (reference image + text guidance) 多模态上下文(参考图像 + 文本指导) Step1: Context description generation 上下文描述生成 Step2: Fine-tuning of the diffusion model 调整扩散模型 Step3: Image generation 生成图像 Output: High-fidelity, diverse images 高保真、多样化图像 |
8.5 | [8.5] 2502.05178 QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation [{'name': 'Yue Zhao, Fuzhao Xue, Scott Reed, Linxi Fan, Yuke Zhu, Jan Kautz, Zhiding Yu, Philipp Kr\"ahenb\"uhl, De-An Huang'}] |
Neural Rendering 神经渲染 | v2 visual tokenization multimodal understanding image generation reconstruction |
Input: Image data 影像数据 Step1: Train binary-spherical-quantization-based autoencoder 训练基于二元球面量化的自编码器 Step2: Dynamically balance reconstruction and alignment objectives 动态平衡重建与对齐目标 Step3: Validate performance on multimodal understanding and image generation 验证在多模态理解与图像生成中的表现 Output: Unified model for multimodal tasks 输出:多模态任务的统一模型 |
7.5 | [7.5] 2502.04475 Augmented Conditioning Is Enough For Effective Training Image Generation [{'name': 'Jiahui Chen, Amy Zhang, Adriana Romero-Soriano'}] |
Image Generation 图像生成 | v2 image generation data augmentation classification |
Input: Real images and text prompts 真实图像和文本提示 Step1: Apply data augmentations 应用数据增强 Step2: Condition image generation on augmented data 基于增强数据进行图像生成 Step3: Generate synthetic training images 生成合成训练图像 Output: Enhanced training datasets 改进的训练数据集 |
7.5 | [7.5] 2502.04896 Goku: Flow Based Video Generative Foundation Models [{'name': 'Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu'}] |
Image and Video Generation 图像生成和视频生成 | v2 image generation video generation text-to-video tasks |
Input: Image and video datasets 图像和视频数据集 Step1: Data processing pipeline 数据处理管道 Step2: Model architecture optimization 模型架构优化 Step3: Training and evaluation 训练与评估 Output: High-quality image and video generation 高质量的图像和视频生成 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.03901 LeAP: Consistent multi-domain 3D labeling using Foundation Models [{'name': 'Simon Gebraad, Andras Palffy, Holger Caesar'}] |
3D Semantic Understanding 3D语义理解 | v2 3D semantic labeling Bayesian update Vision Foundation Models |
Input: Unlabeled image-pointcloud pairs 输入: 未标记的图像-点云对 Step1: Generate soft 2D labels using Vision Foundation Models 步骤1: 使用视觉基础模型生成软2D标签 Step2: Apply Bayesian updating to obtain 3D pseudo-labels 步骤2: 应用贝叶斯更新以获得3D伪标签 Step3: Use 3D Consistency Network to improve label quality 步骤3: 使用3D一致性网络提高标签质量 Output: High-quality 3D semantic labels 输出: 高质量的3D语义标签 |
9.5 | [9.5] 2502.04318 sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views [{'name': 'Eyvaz Najafli, Marius K\"astingsch\"afer, Sebastian Bernhard, Thomas Brox, Andreas Geiger'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction sparse views latent features |
Input: Sparse view images 稀疏视图图像 Step1: Generate intermediate virtual views 生成中间虚拟视图 Step2: Decode Gaussian primitives 解码高斯原语 Step3: Render novel views 渲染新视图 Output: 360-degree reconstructed scene 360度重建场景 |
9.0 | [9.0] 2502.04139 Beyond the Final Layer: Hierarchical Query Fusion Transformer with Agent-Interpolation Initialization for 3D Instance Segmentation [{'name': 'Jiahao Lu, Jiacheng Deng, Tianzhu Zhang'}] |
3D Instance Segmentation 3D实例分割 | v2 3D instance segmentation transformer-based methods |
Input: Scene point cloud input 场景点云输入 Step1: Query initialization 查询初始化 Step2: Hierarchical query fusion 层次查询融合 Step3: Instance segmentation 实例分割 Output: Binary foreground masks with semantic labels 输出:带语义标签的二元前景掩码 |
8.5 | [8.5] 2502.03510 Mapping and Localization Using LiDAR Fiducial Markers [{'name': 'Yibo Liu'}] |
Mapping and Localization 映射与定位 | v2 LiDAR fiducial markers mapping localization |
Input: LiDAR sensors and fiducial markers Step1: Development of Intensity Image-based LiDAR Fiducial Marker system Step2: Detection of 3D fiducials from intensity images Step3: Algorithm enhancement for 3D map merging and localization Output: Optimized mapping and localization using LFMs |
8.5 | [8.5] 2502.03628 The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering [{'name': 'Zhuowei Li, Haizhou Shi, Yunhe Gao, Di Liu, Zhenting Wang, Yuxiao Chen, Ting Liu, Long Zhao, Hao Wang, Dimitris N. Metaxas'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models hallucination VISTA multimodal learning |
Input: Visual tokens from large Vision-Language Models (LVLMs) 视觉令牌来自大型视觉-语言模型 Step1: Analyze token logits ranking 分析令牌的对数排名 Step2: Identify visual information loss 识别视觉信息损失 Step3: Propose VISTA framework 提出VISTA框架 Output: Enhanced decoding with reduced hallucination 输出:减少幻觉的增强解码 |
8.5 | [8.5] 2502.03639 Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach [{'name': 'Yunuo Chen, Junli Cao, Anil Kag, Vidit Goel, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren'}] |
Image and Video Generation 图像生成与视频生成 | v2 Video Generation 视频生成 3D Point Regularization 3D点正则化 Diffusion Models 扩散模型 |
Input: 2D videos with 3D point trajectories 2D视频与3D点轨迹 Step1: Data augmentation 数据增强 Step2: Model fine-tuning 模型微调 Step3: Regularization of shape and motion 形状与运动的正则化 Output: Enhanced video quality 改进的视频质量 |
8.5 | [8.5] 2502.03836 Adapting Human Mesh Recovery with Vision-Language Feedback [{'name': 'Chongyang Xu, Buzhen Huang, Chengfang Zhang, Ziliang Feng, Yangang Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 human mesh recovery vision-language models 3D reconstruction diffusion-based framework |
Input: Monocular images 单目图像 Step1: Initial pose prediction using a regression model 初始姿态预测 Step2: 2D keypoints extraction from images 从图像中提取2D关键点 Step3: Integration of vision-language descriptions 结合视觉语言描述 Step4: Refinement of 3D mesh using diffusion modeling 使用扩散模型优化3D网格 Output: Enhanced 3D human mesh 改进的3D人类网格 |
8.5 | [8.5] 2502.03877 Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks [{'name': 'Yuhui Jin, Yaqiong Zhang, Zheyuan Xu, Wenqing Zhang, Jingyu Xu'}] |
6D Object Detection and Pose Estimation 6D对象检测与姿态估计 | v2 6D object detection pose estimation Hybrid Task Cascade High-Resolution Network |
Input: 6D object detection data 6D对象检测数据 Step1: Hybrid Task Cascade integration 集成混合任务级联 Step2: High-Resolution Network backbone usage 使用高分辨率网络骨干 Step3: Advanced post-processing techniques 先进的后处理技术 Output: Improved object detection and pose estimation models 改进的对象检测和姿态估计模型 |
8.5 | [8.5] 2502.04111 Adaptive Margin Contrastive Learning for Ambiguity-aware 3D Semantic Segmentation [{'name': 'Yang Chen, Yueqi Duan, Runzhong Zhang, Yap-Peng Tan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Semantic Segmentation Point Cloud Processing Contrastive Learning |
Input: 3D point cloud 数据集 Step1: Ambiguity estimation based on position embeddings 基于位置嵌入的模糊性估计 Step2: Development of adaptive margin contrastive learning algorithm 自适应边际对比学习算法开发 Step3: Evaluation on large-scale datasets 在大规模数据集上进行评估 Output: Improved semantic segmentation results 改进的语义分割结果 |
8.5 | [8.5] 2502.04293 GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation [{'name': 'Weihang Li, Hongli Xu, Junwen Huang, Hyunjun Jung, Peter KT Yu, Nassir Navab, Benjamin Busam'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction semantic shape pose estimation |
Input: Partial RGB-D observations 具有部分可见性的RGB-D观测 Step1: Semantic Shape Reconstruction (SSR) 语义形状重建 Step2: Global Context Enhanced (GCE) feature fusion module 全球上下文增强特征融合模块 Output: Enhanced object poses 改进的物体姿态 |
8.5 | [8.5] 2502.04329 SMART: Advancing Scalable Map Priors for Driving Topology Reasoning [{'name': 'Junjie Ye, David Paz, Hengyuan Zhang, Yuliang Guo, Xinyu Huang, Henrik I. Christensen, Yue Wang, Liu Ren'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 autonomous driving lane topology reasoning |
Input: Standard-definition (SD) and satellite maps 标准清晰度和卫星地图 Step 1: Train map prior model to infer lane graphs 训练地图先验模型以推断车道图 Step 2: Integrate model with online topology reasoning models 将模型与在线拓扑推理模型集成 Output: Enhanced lane topology understanding 改进的车道拓扑理解 |
7.5 | [7.5] 2502.03813 Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation [{'name': 'Xuan Li, Quanchao Lu, Yankaiqi Li, Muqing Li, Yijiashun Qi'}] |
Image Generation 图像生成 | v2 semantic segmentation attention mechanism autonomous driving |
Input: Multi-scale images 多尺度图像 Step1: Implement attention mechanism 实施注意力机制 Step2: Optimize Unet architecture 优化Unet架构 Step3: Evaluate on Cityscapes dataset 在Cityscapes数据集上评估 Output: Improved segmentation results 改进的分割结果 |
7.5 | [7.5] 2502.04244 An object detection approach for lane change and overtake detection from motion profiles [{'name': 'Andrea Benericetti, Niccol\`o Bellaccini, Henrique Pi\~neiro Monteagudo, Matteo Simoncini, Francesco Sambo'}] |
Autonomous Driving 自动驾驶 | v2 object detection lane change ADAS motion profiles autonomous driving |
Input: Motion profile images 运动轮廓图像 Step1: Dataset creation 数据集创建 Step2: Object detection model development 目标检测模型开发 Step3: Performance evaluation 性能评估 Output: Detection of lane change and overtake maneuvers 车道变换和超车动作检测 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.02936 Every Angle Is Worth A Second Glance: Mining Kinematic Skeletal Structures from Multi-view Joint Cloud [{'name': 'Junkun Jiang, Jie Chen, Ho Yin Au, Mingyuan Chen, Wei Xue, Yike Guo'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Joint Cloud multi-view motion capture |
Input: Multi-view images 多视角图像 Step1: Triangulate 2D joints into Joint Cloud 将2D关节三角测量为联合云 Step2: Process using JCSAT to explore correlations 使用JCSAT处理以探索相关性 Step3: Utilize OTAP for feature selection 使用OTAP进行特征选择 Output: 3D motion estimation 3D运动估计 |
9.5 | [9.5] 2502.03449 Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics [{'name': 'Xuan Li, Chang Yu, Wenxin Du, Ying Jiang, Tianyi Xie, Yunuo Chen, Yin Yang, Chenfanfu Jiang'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction garment generation multi-view images simulation-ready |
Input: In-the-wild image 单张图像 Step1: Pre-trained image-to-sewing pattern generation model 预训练的图像到缝制模式生成模型 Step2: Multi-view diffusion model for producing images 多视角扩散模型用于生成图像 Step3: Refinement using a differentiable garment simulator differentiable garment simulator 进行细化 Output: Simulation-ready 3D garment 适合模拟的三维服装 |
8.5 | [8.5] 2502.02907 PoleStack: Robust Pole Estimation of Irregular Objects from Silhouette Stacking [{'name': 'Jacopo Villa, Jay W. McMahon, Issa A. D. Nesnas'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D pole estimation silhouette stacking |
Input: Silhouette images from multiple camera poses 多个相机视角的轮廓图像 Step1: Create a silhouette-stack image 创建轮廓堆叠图像 Step2: Apply Discrete Fourier Transform to enhance robustness 应用离散傅里叶变换以增强鲁棒性 Step3: Estimate 3D pole orientation using projected-pole measurements 使用投影极坐标测量来估计3D极坐标方向 Output: Accurate pole orientation estimation 准确的极坐标方向估计 |
8.5 | [8.5] 2502.02977 Disentangling CLIP Features for Enhanced Localized Understanding [{'name': 'Samyak Rawelekar, Yujun Cai, Yiwei Wang, Ming-Hsuan Yang, Narendra Ahuja'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 mutual feature information (MFI) vision-language models (VLM) multi-label recognition (MLR) |
Input: CLIP features from vision-language models 视觉语言模型中的CLIP特征 Step1: Analyze feature correlation 分析特征相关性 Step2: Implement MFI loss 施加MFI损失 Step3: Align text and image features 对齐文本和图像特征 Output: Improved localized understanding 改进的局部理解 |
8.5 | [8.5] 2502.03005 Driver Assistance System Based on Multimodal Data Hazard Detection [{'name': 'Long Zhouxiang, Ovanes Petrosian'}] |
Autonomous Driving 自动驾驶 | v2 multimodal data hazard detection autonomous driving incident recognition |
Input: Multimodal data (video, audio) 输入:多模态数据(视频、音频) Step1: Data integration 数据集成 Step2: Attention-based fusion strategy 基于注意力的融合策略 Step3: Incident recognition incidents 事件识别 Output: Enhanced detection accuracy 改进的检测精度 |
8.5 | [8.5] 2502.03465 Seeing World Dynamics in a Nutshell [{'name': 'Qiuhong Shen, Xuanyu Yi, Mingbao Lin, Hanwang Zhang, Shuicheng Yan, Xinchao Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D representation Monocular video Dynamic Gaussian Splatting |
Input: Monocular videos 单目视频 Step1: Transform videos to dynamic Gaussian representations 将视频转换为动态高斯表示 Step2: Introduce STAG representation 引入结构化时空对齐高斯表示 Step3: Optimizing for spatial and temporal coherence 进行空间和时间一致性的优化 Output: High-fidelity video reconstruction and spatial-temporal modeling 高保真视频重建和时空建模 |
7.5 | [7.5] 2502.02951 VQA-Levels: A Hierarchical Approach for Classifying Questions in VQA [{'name': 'Madhuri Latha Madaka, Chakravarthy Bhagvati'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Visual Question Answering VQA dataset Hierarchical questions |
Input: Visual content and questions 视觉内容和问题 Step1: Dataset development 数据集开发 Step2: Classification of questions 问题分类 Step3: Initial testing on VQA systems 在VQA系统上的初步测试 Output: VQA-Levels dataset VQA-Levels数据集 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.01666 Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding [{'name': 'Jingming Xia, Guanqun Cao, Guang Ma, Yiben Luo, Qinzhao Li, John Oyekan'}] |
Depth Estimation 深度估计 | v2 monocular depth estimation 3D reconstruction generative models autonomous driving |
Input: RGB image Step1: Extract latent features using Image Encoder Step2: Extract semantic vector through Image Semantic Encoder Step3: Integrate features within a denoising UNet Step4: Generate final metric depth map Output: Enhanced depth prediction |
9.5 | [9.5] 2502.01846 UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping [{'name': 'Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Arthur Chen, Srinath Sridhar, Aayush Prakash'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting diffusion models 3D generation structured representation |
Input: 3D Gaussian Splatting data 3D高斯点云数据 Step1: Spherical mapping to transform data into structured 2D representation 使用球面映射将数据转换为结构化2D表示 Step2: Multi-branch network for feature compression 使用多分支网络进行特征压缩 Step3: Integration with existing 2D models with zero-shot learning 将其与现有的2D模型进行无缝整合 Output: Structured 3D representation ready for generative tasks 输出:准备好用于生成任务的结构化3D表示 |
9.5 | [9.5] 2502.01855 Learning Fine-to-Coarse Cuboid Shape Abstraction [{'name': 'Gregor Kobsik, Morten Henkel, Yanjiang He, Victor Czech, Tim Elsner, Isaak Lim, Leif Kobbelt'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction shape abstraction cuboids unsupervised learning structural analysis |
Input: Collections of 3D shapes 3D形状集 Step1: Initialize with fine reconstruction to capture details 细致重建以捕获细节 Step2: Gradually reduce primitives while optimizing loss 渐进减少原始体并优化损失 Step3: Evaluate performance on shape benchmarks 在形状基准上评估性能 Output: Compact cuboid-based representations 紧凑的立方体表示 |
9.5 | [9.5] 2502.01856 Reliability-Driven LiDAR-Camera Fusion for Robust 3D Object Detection [{'name': 'Reza Sadeghian, Niloofar Hooshyaripour, Chris Joslin, WonSook Lee'}] |
3D Object Detection 三维物体检测 | v2 LiDAR-camera fusion 3D object detection autonomous driving |
Input: LiDAR and camera data 数据 Step1: Spatio-Temporal Feature Aggregation (STFA) module processes input 提取时空特征 Step2: Reliability module assigns confidence scores 可靠性模块自信度评分 Step3: Confidence-Weighted Mutual Cross-Attention (CW-MCA) module balances information with confidence 用置信度动态平衡信息 Output: Enhanced 3D object detection 改进的三维物体检测 |
9.5 | [9.5] 2502.01896 INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy [{'name': 'Nastaran Darabi, Divake Kumar, Sina Tayebati, Amit Ranjan Trivedi'}] |
3D Perception and Modeling 3D 感知与建模 | v2 LiDAR 3D perception object detection |
Input: Noisy LiDAR data 噪声激光雷达数据 Step 1: Meta-learning phase 迁移学习阶段 Step 2: Generate robust saliency maps 生成健壮的显著性图 Step 3: Adversarial curriculum training 对抗性课程训练 Output: Enhanced noise resilience 提升噪声鲁棒性 |
9.5 | [9.5] 2502.02163 Progressive Correspondence Regenerator for Robust 3D Registration [{'name': 'Guiyu Zhao, Sheng Ao, Ye Zhang, Kai Xu Yulan Guo'}] |
3D Registration 3D配准 | v2 3D registration point cloud outlier removal reconstruction robustness |
Input: Point cloud data 点云数据 Step1: Prior-guided local grouping using generalized mutual matching 先验引导的局部分组与互匹配 Step2: Local correspondence correction using center-aware three-point consistency 局部对应关系修正 Step3: Global correspondence refinement using extensive iterations 全局对应关系的细化 Output: High-quality point correspondences 高质量的点对应关系 |
9.5 | [9.5] 2502.02187 ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion [{'name': 'Nissim Maruani, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun'}] |
3D Generation 三维生成 | v2 3D Generation 3D生成 Shape Variations 形状变体 |
Input: Reference 3D model 参考3D模型 Step1: Sparse voxel grid and point sampling 稀疏体素网格和点采样 Step2: Multiscale neural architecture training 多尺度神经架构训练 Step3: Generate shape variations 生成形状变体 Output: High-quality 3D shapes 高质量3D形状 |
9.5 | [9.5] 2502.02247 Rotation-Adaptive Point Cloud Domain Generalization via Intricate Orientation Learning [{'name': 'Bangzhen Liu, Chenxi Zheng, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Shengfeng He'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D point cloud analysis 3D点云分析 domain generalization 域推广 rotation robustness 旋转鲁棒性 |
Input: 3D point clouds 3D点云 Step 1: Identify challenging rotations 识别具有挑战性的旋转 Step 2: Construct intricate orientation set 构建复杂方向集 Step 3: Utilize contrastive learning against orientations 使用对比学习进行方向建模 Output: Generalizable features with rotation consistency 输出: 具有旋转一致性的可泛化特征 |
9.5 | [9.5] 2502.02283 GP-GS: Gaussian Processes for Enhanced Gaussian Splatting [{'name': 'Zhihao Guo, Jingxuan Su, Shenglin Wang, Jinlong Fan, Jing Zhang, Liangxiu Han, Peng Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting Structure-from-Motion point clouds novel view synthesis |
Input: Sparse SfM point clouds 稀疏结构光点云 Step1: Dynamic sampling dynamic sampling 动态采样 Step2: Gaussian Process modeling 高斯过程建模 Step3: Densification of point clouds 点云稠密化 Output: Enhanced 3D Gaussian representation 改进的3D高斯表示 |
9.5 | [9.5] 2502.02334 Event-aided Semantic Scene Completion [{'name': 'Shangwei Guo, Hao Shi, Song Wang, Xiaoting Yin, Kailun Yang, Kaiwei Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 Semantic Scene Completion 3D Reconstruction |
Input: Multi-view images 多视角图像 Step1: Data integration 数据集成 Step2: Algorithm development 算法开发 Step3: Model evaluation 模型评估 Output: Enhanced 3D models 改进的三维模型 |
9.5 | [9.5] 2502.02338 Geometric Neural Process Fields [{'name': 'Wenzhe Yin, Zehao Xiao, Jiayi Shen, Yunlu Chen, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields 3D scenes probabilistic modeling |
Input: Limited context images 限制的上下文图像 Step1: Probabilistic modeling 概率建模 Step2: Integrate geometric bases 集成几何基底 Step3: Hierarchical latent variable design 分层潜变量设计 Output: Improved generalization 改进的泛化能力 |
9.5 | [9.5] 2502.02372 MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning [{'name': 'Shengbo Gu, Yu-Kun Qiu, Yu-Ming Tang, Ancong Wu, Wei-Shi Zheng'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields avatar generation continual learning |
Input: Image data of avatars 头像图像数据 Step1: Implement continual learning strategy 进行持续学习策略 Step2: Develop Global-Local Joint Storage Module 开发全局-局部联合存储模块 Step3: Develop Pose Distillation Module 开发姿态提炼模块 Output: Maintainable virtual avatar 可维护虚拟头像 |
9.5 | [9.5] 2502.02548 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation [{'name': 'Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Chris Choy'}] |
3D Segmentation 三维分割 | v2 3D segmentation open-vocabulary Vision-Language Models |
Input: Multi-view images 多视角图像 Step1: Data generation 数据生成 Step2: Data annotation 数据注释 Step3: Training model 训练模型 Output: Open-vocabulary segmentation model 开放词汇分割模型 |
9.5 | [9.5] 2502.02590 Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling [{'name': 'Xiaowen Qiu, Jincheng Yang, Yian Wang, Zhehuan Chen, Yufei Wang, Tsun-Hsuan Wang, Zhou Xian, Chuang Gan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D articulated objects Vision-Language Models 3D modeling |
Input: 3D meshes 3D 网格 Step1: Movable Part Segmentation 可动部分分割 Step2: Articulation Estimation 关节估计 Step3: Refinement 精化 Output: Articulated 3D objects 装配式三维物体 |
9.2 | [9.2] 2502.01940 Toward a Low-Cost Perception System in Autonomous Vehicles: A Spectrum Learning Approach [{'name': 'Mohammed Alsakabi, Aidan Erickson, John M. Dolan, Ozan K. Tonguz'}] |
Autonomous Driving 自动驾驶 | v2 3D reconstruction autonomous driving depth maps |
Input: Images from 4D radar detectors and RGB cameras 4D 雷达探测器和 RGB 摄像头的图像 Step1: Integrate radar depth maps and RGB images 集成雷达深度图和 RGB 图像 Step2: Apply pixel positional encoding algorithm 应用像素位置信息编码算法 Step3: Develop spectrum estimation algorithms 研发光谱估计算法 Step4: Train depth map generative models 训练深度图生成模型 Output: Enhanced depth maps 改进的深度图 |
9.2 | [9.2] 2502.02144 DOC-Depth: A novel approach for dense depth ground truth generation [{'name': 'Simon de Moreau, Mathias Corsia, Hassan Bouchiba, Yasser Almehio, Andrei Bursuc, Hafid El-Idrissi, Fabien Moutarde'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D Reconstruction 三维重建 Dense Depth Generation 密集深度生成 LiDAR 激光雷达 |
Input: LiDAR sensor data 利用激光雷达传感器数据 Step1: 3D environment reconstruction 3D环境重建 Step2: Dynamic object classification 动态对象分类 Step3: Dense depth generation 密集深度生成 Output: Dense depth annotation output 输出:密集深度标注 |
8.5 | [8.5] 2502.01814 PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph [{'name': 'Dazhou Yu, Genpei Zhang, Liang Zhao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction polyhedral representation surface-attributed graph |
Input: Polyhedral data 多面体数据 Step1: Decompose into local rigid representations 将其分解为局部刚性表示 Step2: Hierarchical aggregation of representations 层次聚合表示 Output: Global representation of polyhedra 全球多面体表示 |
8.5 | [8.5] 2502.01894 SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset [{'name': 'Goodarz Mehr, Azim Eskandarian'}] |
Autonomous Systems and Robotics 自主系统与机器人 | v2 Synthetic Data Generation 合成数据生成 Autonomous Driving 自动驾驶 BEV Representation 鸟瞰视图表示 |
Input: Multi-sensor data collection 多传感器数据收集 Step1: Configuration of synthetic data generation 生成合成数据的配置 Step2: Data generation for BEV representation 生成鸟瞰视图表示的数据 Step3: Annotation of perception data 性能数据的标注 Output: SimBEV dataset with annotated driving scenarios 输出: 包含标注的驾驶场景的SimBEV数据集 |
8.5 | [8.5] 2502.01949 LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation [{'name': 'Yang Zhou, Zongjin He, Qixuan Li, Chao Wang'}] |
3D Generation 三维生成 | 3D scene generation physically consistent layouts text-guided generation |
Input: Text prompt 文本提示 Step1: Convert text to scene graph 将文本转换为场景图 Step2: Adjust Gaussian densities and layouts 调整高斯密度和布局 Step3: Make dynamic camera adjustments 进行动态相机调整 Output: 3D compositional scene generation 3D 组合场景生成 |
8.5 | [8.5] 2502.01961 Hierarchical Consensus Network for Multiview Feature Learning [{'name': 'Chengwei Xia, Chaoxi Niu, Kun Zhan'}] |
Multi-view and Stereo Vision 多视角立体 | v2 multiview feature learning hierarchical consensus 3D reconstruction |
Input: Multi-view images 多视角图像 Step1: Learning view-consistency features 学习视图一致性特征 Step2: Hierarchical consensus derivation 层次共识推导 Step3: Comprehensive feature extraction 综合特征提取 Output: Discriminative features 具有区分性的特征 |
8.5 | [8.5] 2502.02091 Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation [{'name': 'JooHyun Kwon, Hanbyel Cho, Junmo Kim'}] |
Image and Video Generation 图像生成 | v2 4D Gaussian Splatting dynamic scene editing computer vision motion artifacts |
Input: 4D dynamic scene data 4D动态场景数据 Step1: Model static 3D Gaussians 模型静态三维高斯 Step2: Implement Hexplane-based deformation field 实现基于Hexplane的变形场 Step3: Perform editing on static 3D Gaussians 在静态三维高斯上执行编辑 Step4: Apply score distillation for refinement 应用得分蒸馏进行细化 Output: Enhanced edited dynamic scenes 改进的编辑动态场景 |
8.5 | [8.5] 2502.02322 Improving Generalization Ability for 3D Object Detection by Learning Sparsity-invariant Features [{'name': 'Hsin-Cheng Lu, Chung-Yi Lin, Winston H. Hsu'}] |
3D Object Detection 3D物体检测 | v2 3D object detection 3D物体检测 autonomous driving 自动驾驶 generalization 泛化 |
Input: Source domain 3D point clouds 源域3D点云 Step1: Downsample the point cloud based on confidence scores 根据置信度得分下采样点云 Step2: Teacher-student framework to align BEV features 使用师生框架对齐鸟瞰视图特征 Step3: Apply FCA and GERA to maintain consistency 使用FCA和GERA保持一致性 Output: Domain-agnostic 3D object detector 域无关的3D物体检测器 |
8.5 | [8.5] 2502.02468 High-Fidelity Human Avatars from Laptop Webcams using Edge Compute [{'name': 'Akash Haridas, Imran N. Junejo'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D Morphable Models 3D可变形模型 Photo-realistic Rendering 照相真实渲染 Avatar Generation 头像生成 |
Input: Images from consumer-grade laptop webcams 笔记本电脑网络摄像头拍摄的图像 Step1: Shape generation by fitting 3DMM shape parameters 通过拟合3D形状模型参数生成形状 Step2: Texture map generation 纹理图生成 Step3: Rendering using pre-defined parameters 使用预定义参数进行渲染 Output: High-fidelity animatable avatars 高保真可动画化头像 |
8.5 | [8.5] 2502.02537 Uncertainty Quantification for Collaborative Object Detection Under Adversarial Attacks [{'name': 'Huiqun Huang, Cong Chen, Jean-Philippe Monteuuis, Jonathan Petit, Fei Miao'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 Collaborative Object Detection Uncertainty Quantification Adversarial Attacks Autonomous Driving |
Input: Collaborative Object Detection (COD) models 协作目标检测模型 Step1: Apply adversarial training adversarially during collaboration 在协作中施加对抗性训练 Step2: Provide output uncertainty estimation through learning-based module 提供基于学习的模块输出的不确定性估计 Step3: Calibrate uncertainty using conformal prediction 对不确定性进行校准 Output: Enhanced object detection accuracy 提高的目标检测准确性 |
7.5 | [7.5] 2502.01906 Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models [{'name': 'Chia-Wen Kuo, Sijie Zhu, Fan Chen, Xiaohui Shen, Longyin Wen'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 vision-language models Decomposed Attention cross-modal learning |
Input: Visual and textual embeddings 视觉和文本嵌入 Step1: Decompose the self-attention mechanism 解构自注意力机制 Step2: Optimize visual-to-visual self-attention 视觉-视觉自注意力优化 Step3: Merge visual and textual information 视觉与文本信息合并 Output: Improved efficiency and performance of LVLMs 提高LVLM效率与性能 |
7.5 | [7.5] 2502.01969 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration [{'name': 'Younan Zhu, Linwei Tao, Minjing Dong, Chang Xu'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models object hallucination attention calibration |
Input: Large Vision-Language Models (LVLMs) 大型视觉语言模型 Step1: Bias estimation from input image 输入图像的偏差估计 Step2: Uniform Attention Calibration (UAC) application 应用统一注意力校准 Step3: Dynamic Attention Calibration (DAC) implementation 实现动态注意力校准 Output: Reduced object hallucination 减少物体幻觉 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.01814 PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph [{'name': 'Dazhou Yu, Genpei Zhang, Liang Zhao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction polyhedral representation |
Input: 3D polyhedral objects 3D 多面体对象 Step1: Surface-attributed graph construction 表面属性图构建 Step2: Local rigid representation learning 局部刚性表示学习 Step3: Hierarchical aggregation of representations 表示的分层聚合 Output: Global representation of polyhedra 全球多面体表示 |
9.5 | [9.5] 2502.01846 UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping [{'name': 'Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Arthur Chen, Srinath Sridhar, Aayush Prakash'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting UV Mapping image-based generation 3D reconstruction 3D重建 |
Input: 3D Gaussian Splatting (3DGS) data 3D高斯点云数据 Step1: Spherical mapping to create a structured 2D representation 使用球面映射创建结构化的2D表示 Step2: Compression of heterogeneous features into a shared feature space 将异构特征压缩到共享特征空间 Step3: Integration with pre-trained 2D generative models 与预训练的2D生成模型集成 Output: Structured 2D UV Gaussian Splatting representation 结构化的2D UV高斯点云表示 |
9.5 | [9.5] 2502.01856 Reliability-Driven LiDAR-Camera Fusion for Robust 3D Object Detection [{'name': 'Reza Sadeghian, Niloofar Hooshyaripour, Chris Joslin, WonSook Lee'}] |
3D Object Detection 3D目标检测 | v2 3D object detection LiDAR-camera fusion autonomous driving |
Input: Sensor data from LiDAR and camera LiDAR和摄像头的传感器数据 Step1: Integration of spatial and semantic information 空间和语义信息的集成 Step2: Implementation of Reliability module to assess confidence 实现可靠性模块以评估置信度 Step3: Use of CW-MCA for dynamic weighting of modalities 使用CW-MCA对模态进行动态加权 Output: Robust 3D object detection results 稳健的3D目标检测结果 |
9.5 | [9.5] 2502.01940 Toward a Low-Cost Perception System in Autonomous Vehicles: A Spectrum Learning Approach [{'name': 'Mohammed Alsakabi, Aidan Erickson, John M. Dolan, Ozan K. Tonguz'}] |
Depth Estimation 深度估计 | v2 Depth Estimation 深度估计 Autonomous Vehicles 自动驾驶 Radar-RGB Integration 雷达- RGB集成 |
Input: Radar depth maps and RGB images 雷达深度图和RGB图像 Step1: Pixel positional encoding 像素位置编码 Step2: Transformation to Spatial Spectrum 转换为空间谱 Step3: Generating denser depth maps 生成更密集的深度图 Output: Enhanced depth maps 改进的深度图 |
9.5 | [9.5] 2502.02144 DOC-Depth: A novel approach for dense depth ground truth generation [{'name': 'Simon de Moreau, Mathias Corsia, Hassan Bouchiba, Yasser Almehio, Andrei Bursuc, Hafid El-Idrissi, Fabien Moutarde'}] |
Depth Estimation 深度估计 | v2 depth estimation 深度估计 LiDAR 3D reconstruction 三维重建 |
Input: LiDAR measurements LiDAR测量 Step1: Data aggregation 数据聚合 Step2: Dynamic object classification 动态物体分类 Step3: Dense depth generation 密集深度生成 Output: Fully-dense depth annotations 完全密集的深度注解 |
9.5 | [9.5] 2502.02163 Progressive Correspondence Regenerator for Robust 3D Registration [{'name': 'Guiyu Zhao, Sheng Ao, Ye Zhang, Kai Xu Yulan Guo'}] |
3D Registration 3D 注册 | v2 3D registration point cloud registration |
Input: Point clouds from different perspectives 从不同视角获得点云 Step1: Prior-guided local grouping prior引导局部分组 Step2: Generalized mutual matching 广义互匹配 Step3: Center-aware three-point consistency center-aware三点一致性 Step4: Global correspondence refinement 全局对应关系精炼 Output: High-quality correspondences 高质量对应关系 |
9.5 | [9.5] 2502.02187 ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion [{'name': 'Nissim Maruani, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun'}] |
3D Generation 三维生成 | v2 3D Generation shape variations multiscale neural architecture interactive generation |
Input: A single reference 3D model 单一参考3D模型 Step1: Shape variations generation 形状变体生成 Step2: Multiscale diffusion sampling 多尺度扩散采样 Step3: Interactive editing 交互式编辑 Output: High-quality 3D shape variants 高质量3D形状变体 |
9.5 | [9.5] 2502.02247 Rotation-Adaptive Point Cloud Domain Generalization via Intricate Orientation Learning [{'name': 'Bangzhen Liu, Chenxi Zheng, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Shengfeng He'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D point cloud domain generalization rotation robustness |
Input: Point clouds with variable orientations 变量方向的点云 Step1: Identify challenging rotations 识别具有挑战性的旋转 Step2: Construct intricate orientation set 构建复杂方向集 Step3: Apply contrastive learning using intricate samples 使用复杂样本进行对比学习 Output: Enhanced orientation-aware 3D representations 改进的方向感知3D表示 |
9.5 | [9.5] 2502.02283 GP-GS: Gaussian Processes for Enhanced Gaussian Splatting [{'name': 'Zhihao Guo, Jingxuan Su, Shenglin Wang, Jinlong Fan, Jing Zhang, Liangxiu Han, Peng Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Processes novel view synthesis |
Input: Sparse SfM point clouds 稀疏的结构光点云 Step1: Develop MOGP model 开发多输出高斯过程模型 Step2: Adaptive sampling and filtering strategy 自适应采样和过滤策略 Step3: Densify the point clouds 使点云密集化 Output: High-quality 3D Gaussians 高质量的3D高斯 |
9.5 | [9.5] 2502.02322 Improving Generalization Ability for 3D Object Detection by Learning Sparsity-invariant Features [{'name': 'Hsin-Cheng Lu, Chung-Yi Lin, Winston H. Hsu'}] |
3D Object Detection 3D物体检测 | v2 3D object detection autonomous driving domain generalization |
Input: LiDAR point clouds from various domains 各种域的LiDAR点云 Step1: Data subsampling based on confidence scores 根据置信度评分进行数据子采样 Step2: Teacher-student framework implementation 教师-学生框架实施 Step3: Feature alignment between domains 域间特征对齐 Output: Generalized 3D object detector 具备良好泛化能力的3D物体检测器 |
9.5 | [9.5] 2502.02334 Event-aided Semantic Scene Completion [{'name': 'Shangwei Guo, Hao Shi, Song Wang, Xiaoting Yin, Kailun Yang, Kaiwei Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction semantic scene completion autonomous driving event cameras |
Input: Event and RGB images 输入:事件图像和RGB图像 Step1: Data integration 数据集成 Step2: Event-aided Lifting Module (ELM) 事件辅助提升模块开发 Step3: 3D scene reconstruction 三维场景重建 Output: Enhanced 3D semantic occupancy models 输出:改进的3D语义占用模型 |
9.5 | [9.5] 2502.02338 Geometric Neural Process Fields [{'name': 'Wenzhe Yin, Zehao Xiao, Jiayi Shen, Yunlu Chen, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields Geometric Neural Process Fields 3D reconstruction |
Input: Limited context observations 有限上下文观察 Step 1: Formulate NeF generalization as a probabilistic problem 将NeF泛化表述为一个概率问题 Step 2: Design geometric bases to encode structural information 设计几何基以编码结构信息 Step 3: Develop a hierarchical latent variable model for parameterization 建立分层潜变量模型以进行参数化 Output: Improved generalization for novel scenes and signals 改进的新场景和信号的泛化能力 |
9.5 | [9.5] 2502.02548 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation [{'name': 'Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Chris Choy'}] |
3D Segmentation 三维分割 | v2 3D segmentation 3D分割 open-vocabulary 开放词汇 |
Input: 3D scene datasets 3D场景数据集 Step1: Data generation data generation 数据生成 Step2: Model training 模型训练 Step3: Segmentation validation 分割验证 Output: Open-vocabulary 3D segmentation results 开放词汇3D分割结果 |
9.5 | [9.5] 2502.02590 Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling [{'name': 'Xiaowen Qiu, Jincheng Yang, Yian Wang, Zhehuan Chen, Yufei Wang, Tsun-Hsuan Wang, Zhou Xian, Chuang Gan'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D modeling articulated objects 3D建模 可动物体 |
Input: 3D mesh 输入: 3D网格 Step1: Movable Part Segmentation 可移动部分分割 Step2: Articulation Estimation and Refinement 动作估计与精细化 Output: Articulated 3D object 输出: 可动的3D物体 |
9.0 | [9.0] 2502.01666 Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding [{'name': 'Jingming Xia, Guanqun Cao, Guang Ma, Yiben Luo, Qinzhao Li, John Oyekan'}] |
Depth Estimation 深度估计 | v2 Monocular Depth Estimation 单目深度估计 Autonomous Driving 自动驾驶 3D Reconstruction 三维重建 |
Input: Single RGB image 单个RGB图像 Step1: Image-based semantic embedding image-based using SeeCoder 图像语义嵌入 Step2: Integration of features via denoising UNet 特征集成通过去噪UNet Step3: Depth map generation 深度图生成 Output: Enhanced depth map 改进的深度图 |
9.0 | [9.0] 2502.01855 Learning Fine-to-Coarse Cuboid Shape Abstraction [{'name': 'Gregor Kobsik, Morten Henkel, Yanjiang He, Victor Czech, Tim Elsner, Isaak Lim, Leif Kobbelt'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D shape abstraction unsupervised learning cuboids |
Input: Collections of 3D shapes 三维形状集合 Step1: Initial fine reconstruction 初始化细致重建 Step2: Apply fine-to-coarse abstraction fine-to-coarse abstraction Step3: Optimize reconstruction and volume preservation 优化重建与体积保持 Output: Cuboid-based structural abstraction cuboid 基于的结构抽象 |
8.5 | [8.5] 2502.01894 SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset [{'name': 'Goodarz Mehr, Azim Eskandarian'}] |
Autonomous Driving 自动驾驶 | v2 BEV perception synthetic data generation autonomous driving |
Input: Multi-sensor data 多传感器数据 Step1: Data generation 生成数据 Step2: Ground truth capture 捕获真实数据 Step3: Dataset creation 创建数据集 Output: Comprehensive BEV dataset 完整的鸟瞩图数据集 |
8.5 | [8.5] 2502.01896 INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy [{'name': 'Nastaran Darabi, Divake Kumar, Sina Tayebati, Amit Ranjan Trivedi'}] |
3D Point Cloud Processing 点云处理 | v2 LiDAR adversarial training 3D perception |
Input: Noisy LiDAR data 噪声LiDAR数据 Step1: Prepare saliency maps 准备显著性图 Step2: Apply adversarial curriculum training 应用对抗课程训练 Step3: Train student network 训练学生网络 Output: Robust deep learning model 稳健的深度学习模型 |
8.5 | [8.5] 2502.01949 LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation [{'name': 'Yang Zhou, Zongjin He, Qixuan Li, Chao Wang'}] |
3D Generation 三维生成 | 3D scene generation 3D Gaussian Splatting physics-guided generation |
Input: Text prompt 文本提示 Step1: Convert text to scene graph 将文本转换为场景图 Step2: Adjust density and layout 调整密度和布局 Step3: Dynamic camera adjustments 动态相机调整 Output: Compositional 3D scenes 组合三维场景 |
8.5 | [8.5] 2502.01961 Hierarchical Consensus Network for Multiview Feature Learning [{'name': 'Chengwei Xia, Chaoxi Niu, Kun Zhan'}] |
Multi-view and Stereo Vision 多视角与立体视觉 | v2 Multiview Learning 多视角学习 Consensus Learning 共识学习 Feature Integration 特征整合 |
Input: Multi-view data 多视角数据 Step1: Learn distinct and common information 学习独特和共同信息 Step2: Derive consensus indices 生成共识指标 Step3: Perform hierarchical consensus learning 进行分层共识学习 Output: Comprehensive and discriminative features 详尽和有辨识度的特征 |
8.5 | [8.5] 2502.01969 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration [{'name': 'Younan Zhu, Linwei Tao, Minjing Dong, Chang Xu'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models object hallucination |
Input: LVLMs with visual tokens 视觉语言模型与视觉标记 Step1: Analyze attention biases 分析注意力偏差 Step2: Implement UAC for calibration 实施均匀注意力校准 Step3: Develop DAC for dynamic adjustment 开发动态注意力校准模块 Output: Improved alignment and reduced hallucination 输出: 改进的对齐和减少的幻觉 |
8.5 | [8.5] 2502.02171 DeepForest: Sensing Into Self-Occluding Volumes of Vegetation With Aerial Imaging [{'name': 'Mohamed Youssef, Jian Peng, Oliver Bimber'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction remote sensing vegetation analysis |
Input: Aerial images from drones 通过无人机获取航空图像 Step1: Synthetic-aperture imaging 合成孔径成像 Step2: Use 3D convolutional neural networks to reduce out-of-focus signals 使用3D卷积神经网络减少模糊信号 Step3: Combine multiple reflectance stacks from various spectral channels 结合来自不同光谱通道的多重反射堆栈 Output: Volumetric representations of vegetation 体积植被表示 |
8.5 | [8.5] 2502.02372 MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning [{'name': 'Shengbo Gu, Yu-Kun Qiu, Yu-Ming Tang, Ancong Wu, Wei-Shi Zheng'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields 3D rendering continual learning |
Input: Limited training data 对应的有限训练数据 Step1: Employ NeRF for 3D rendering 使用NeRF进行3D渲染 Step2: Implement a Global-Local Joint Storage Module 实现全局-局部联合存储模块 Step3: Utilize a Pose Distillation Module 使用姿态蒸馏模块 Output: Maintainable virtual avatars 可维护的虚拟 avatar |
8.5 | [8.5] 2502.02468 High-Fidelity Human Avatars from Laptop Webcams using Edge Compute [{'name': 'Akash Haridas Imran N. Junejo'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction avatar generation differentiable rendering |
Input: Consumer-grade laptop webcam images 使用普通笔记本电脑网络摄像头的图像 Step1: Shape generation using 3D morphable models 使用3D可变形模型生成形状 Step2: Landmark detection using optimization 标记检测使用优化 Step3: Texture generation with GANs 使用GAN生成纹理 Step4: Differentiable rendering to create avatars 使用可微渲染创建虚拟形象 Output: High-fidelity human avatars 高保真度人类虚拟形象 |
8.5 | [8.5] 2502.02525 Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation [{'name': 'Jian Liu, Wei Sun, Hui Yang, Pengchao Deng, Chongpei Liu, Nicu Sebe, Hossein Rahmani, Ajmal Mian'}] |
Object Pose Estimation 物体姿态估计 | v2 9-DoF object pose estimation domain generalization robotic grasping |
Input: Rendered synthetic data 渲染合成数据 Step1: Model training 模型训练 Step2: Pose estimation 估计姿态 Step3: Real-time performance optimization 实时性能优化 Output: Estimated 9-DoF object poses 估计的9自由度物体姿态 |
8.5 | [8.5] 2502.02537 Uncertainty Quantification for Collaborative Object Detection Under Adversarial Attacks [{'name': 'Huiqun Huang, Cong Chen, Jean-Philippe Monteuuis, Jonathan Petit, Fei Miao'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 Collaborative Object Detection Uncertainty Quantification Adversarial Robustness Autonomous Vehicles |
Input: Collaborative object detection models 协作目标检测模型 Step1: Adversarial training for robustness 对抗训练以增强鲁棒性 Step2: Uncertainty quantification estimation 不确定性量化估计 Step3: Calibration of uncertainty using conformal prediction 使用保形预测进行不确定性校准 Output: Enhanced object detection accuracy 改进的目标检测准确性 |
8.0 | [8.0] 2502.01890 Geometric Framework for 3D Cell Segmentation Correction [{'name': 'Peter Chen, Bryan Chang, Olivia Annette Creasey, Julie Beth Sneddon, Yining Liu'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D Segmentation 3D分割 Geometric Framework 几何框架 |
Input: 2D cell segmentation results 2D细胞分割结果 Step1: Extract geometric features 提取几何特征 Step2: Train binary classifier 训练二元分类器 Step3: Correct segmentation errors 修正分割错误 Output: Accurate 3D cell body reconstruction 精确的3D细胞体重建 |
8.0 | [8.0] 2502.01906 Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models [{'name': 'Chia-Wen Kuo, Sijie Zhu, Fan Chen, Xiaohui Shen, Longyin Wen'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models Decomposed Attention Computational Efficiency |
Input: Visual and textual embeddings 视觉和文本嵌入 Step1: Decompose the attention mechanism 分解注意力机制 Step2: Optimize visual-to-visual self-attention 优化视觉间自注意力 Step3: Debias positional encodings 去偏差位置编码 Output: Enhanced processing of visual and textual embeddings 改进的视觉和文本嵌入处理 |
7.5 | [7.5] 2502.02225 Exploring the latent space of diffusion models directly through singular value decomposition [{'name': 'Li Wang, Boyan Gao, Yanran Li, Zhao Wang, Xiaosong Yang, David A. Clifton, Jun Xiao'}] |
Image Generation 图像生成 | v2 diffusion models image editing latent space Singular Value Decomposition image generation |
Input: Latent space of diffusion models 扩散模型的潜在空间 Step1: Investigate latent space using Singular Value Decomposition (SVD) 通过奇异值分解(SVD)研究潜在空间 Step2: Discover properties of latent space 发现潜在空间的属性 Step3: Propose image editing framework based on properties 提出基于属性的图像编辑框架 Output: Enhanced image editing capabilities 改进的图像编辑能力 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.00173 Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation [{'name': 'Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D instance segmentation 3D实例分割 Gaussian Splatted Radiance Fields 高斯点云辐射场 novel view synthesis 新视图合成 |
Input: Posed 2D image data 2D图像数据 Step1: Extract per-image 2D segmentation masks 提取每帧的2D分割掩码 Step2: 2D-to-3D lifting to assign unique object IDs 在3D中分配唯一对象ID的2D到3D提升流程 Step3: Incremental merging of object fragments into coherent objects 将对象片段合并成一致的对象 Output: High-quality 3D object segments 高质量的3D对象片段 |
9.5 | [9.5] 2502.00360 Shape from Semantics: 3D Shape Generation from Multi-View Semantics [{'name': 'Liangchen Li, Caoliwen Wang, Yuqi Zhou, Bailin Deng, Juyong Zhang'}] |
3D Shape Generation 3D形状生成 | v2 3D reconstruction shape generation semantic input |
Input: Semantic descriptions 语义描述 Step1: Distill 3D geometry from 2D diffusion models 从2D扩散模型提取3D几何 Step2: Refine textures using image and video generation models 使用图像和视频生成模型细化纹理 Step3: Represent the refined 3D model with neural implicit representations 使用神经隐式表示来表示细化的3D模型 Output: Fabricable high-quality meshes 可制造的高质量网格 |
9.5 | [9.5] 2502.00801 Environment-Driven Online LiDAR-Camera Extrinsic Calibration [{'name': 'Zhiwei Huang, Jiaqi Li, Ping Zhong, Rui Fan'}] |
3D Reconstruction and Modeling 三维重建 | v2 LiDAR-camera calibration 3D reconstruction autonomous driving |
Input: LiDAR and camera data 激光雷达和相机数据 Step1: Environment interpretation 环境解读 Step2: Data fusion 数据融合 Step3: Dual-path correspondence matching 双色通道对应匹配 Step4: Spatial-temporal optimization 空间-时间优化 Output: Accurate extrinsic calibration 精准的外部标定 |
9.5 | [9.5] 2502.01045 WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction [{'name': 'Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction generative models dynamic avatars |
Input: Monocular video 单目视频 Step1: Generative prior usage 生成优先级使用 Step2: Dual-Space Optimization 双空间优化 Step3: View selection strategy 视图选择策略 Step4: Pose feature injection 姿势特征注入 Output: High-fidelity dynamic human avatars 高保真动态人形象 |
9.5 | [9.5] 2502.01405 FourieRF: Few-Shot NeRFs via Progressive Fourier Frequency Control [{'name': 'Diego Gomez, Bingchen Gong, Maks Ovsjanikov'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 Few-Shot NeRF 3D Reconstruction Neural Rendering |
Input: Limited input views 有限的输入视角 Step1: Frequency control frequency control Step2: Curriculum training curriculum training Step3: Scene reconstruction scene reconstruction Output: Accurate 3D representations 准确的三维表示 |
9.2 | [9.2] 2502.00262 Your submission contained main.bib and main.tex file, but no main.bbl file (include main.bbl, or submit without main.bib; and remember to verify references) [{'name': 'Dianwei Chen, Zifan Zhang, Yuchen Liu, Xianfeng Terry Yang'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 hazard detection vision-language model autonomous driving |
Input: Multimodal data fusion 多模态数据融合 Step1: Semantic and visual inputs integration 语义和视觉输入集成 Step2: Supervised fine-tuning of vision-language models 有监督微调视觉语言模型 Step3: Hazard detection and edge case evaluation 危险检测和边缘案例评估 Output: Enhanced situational awareness 改进的情境意识 |
9.2 | [9.2] 2502.00315 MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model [{'name': 'Jihyeok Kim, Seongwoo Moon, Sungwon Nah, David Hyunchul Shim'}] |
3D Object Detection 3D对象检测 | v2 3D object detection 3D对象检测 monocular vision 单目视觉 depth estimation 深度估计 |
Input: Monocular images 单目图像 Step1: Depth estimation using Vision Transformer 步骤1:使用视觉Transformer进行深度估计 Step2: Feature extraction with Hierarchical Feature Fusion 步骤2:利用层次特征融合提取特征 Step3: Object detection using DETR architecture 步骤3:使用DETR架构进行对象检测 Output: 3D bounding boxes for detected objects 输出:检测到对象的3D边界框 |
8.5 | [8.5] 2502.00074 SpikingRTNH: Spiking Neural Network for 4D Radar Object Detection [{'name': 'Dong-Hee Paek, Seung-Hyun Kong'}] |
3D Object Detection 目标检测 | v2 4D Radar 3D object detection energy efficiency autonomous driving |
Input: 4D Radar point clouds 4D雷达点云 Step1: Convert RTNH to SNN architecture 将RTNH转换为SNN架构 Step2: Implement biological top-down inference (BTI) 实现生物学自上而下推理(BTI) Step3: Model evaluation and comparison 模型评估与比较 Output: Energy-efficient 3D object detection model 能源高效的3D目标检测模型 |
8.5 | [8.5] 2502.00342 Embodied Intelligence for 3D Understanding: A Survey on 3D Scene Question Answering [{'name': 'Zechuan Li, Hongshan Yu, Yihao Ding, Yan Li, Yong He, Naveed Akhtar'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 3D Scene Question Answering multimodal models |
Input: 3D scene representation and query 3D场景表示和查询 Step1: Systematic literature review 系统文献综述 Step2: Dataset analysis 数据集分析 Step3: Methodology evaluation 方法评估 Output: Comprehensive insights and challenges on 3D SQA 对3D SQA的综合见解和挑战 |
8.5 | [8.5] 2502.00500 Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation [{'name': 'Yang Cao, Zhao Song, Chiwun Yang'}] |
Video Generation 视频生成 | v2 video generation interpolation extrapolation latent flow matching |
Input: Video frames 视频帧 Step1: Model latent flow 模型潜在流 Step2: Polynomial projection 多项式投影 Step3: Generate time-dependent frames 生成时间相关帧 Output: Video with interpolation and extrapolation 带插值和外推的视频 |
8.5 | [8.5] 2502.00708 PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation [{'name': 'Qixuan Li, Chao Wang, Zongjin He, Yan Peng'}] |
3D Generation 三维生成 | v2 3D generation compositional scenes large language models |
Input: Complex scene descriptions 复杂场景描述 Step1: Semantic parsing and relationship extraction 语义解析和关系提取 Step2: Scene graph generation 场景图生成 Step3: 2D and 3D asset generation 2D和3D资产生成 Step4: Layout prediction and planning 布局预测与规划 Output: High-quality 3D compositional scenes 高质量三维组合场景 |
8.5 | [8.5] 2502.00843 VLM-Assisted Continual learning for Visual Question Answering in Self-Driving [{'name': 'Yuxin Lin, Mengshi Qi, Liang Liu, Huadong Ma'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Visual Question Answering Vision-Language Models Autonomous Driving |
Input: Visual Question Answering task in autonomous driving 视觉问答任务之于自动驾驶 Step1: Integrate Vision-Language Models with continual learning 结合视觉语言模型与持续学习 Step2: Implement selective memory replay and knowledge distillation 实施选择性记忆重放与知识蒸馏 Step3: Apply task-specific projection layer regularization 应用特定任务的投影层正则化 Output: Enhanced VQA performance in autonomous driving environments 改进的自动驾驶环境中的视觉问答性能 |
8.5 | [8.5] 2502.00954 Hypo3D: Exploring Hypothetical Reasoning in 3D [{'name': 'Ye Mao, Weixun Luo, Junpeng Jing, Anlan Qiu, Krystian Mikolajczyk'}] |
3D Reasoning in Scenes 三维场景推理 | v2 3D reasoning visual question answering hypothetical reasoning |
Input: Context change descriptions 上下文变化描述 Step1: Dataset construction 数据集构建 Step2: Model evaluation 模型评估 Output: Performance analysis 性能分析 |
8.5 | [8.5] 2502.00960 SAM-guided Pseudo Label Enhancement for Multi-modal 3D Semantic Segmentation [{'name': 'Mingyu Yang, Jitong Lu, Hun-Seok Kim'}] |
3D Semantic Segmentation 三维语义分割 | v2 3D semantic segmentation domain adaptation pseudo labels autonomous driving |
Input: 3D point cloud and SAM masks 3D点云和SAM掩码 Step1: Class label determination using majority voting 类别标签确定(使用投票法) Step2: Application of filtering constraints to unreliable labels 对不可靠标签应用过滤约束 Step3: Geometry-Aware Progressive Propagation (GAPP) for label propagation 到所有3D点进行标签传播(GAPP方法) Output: Enhanced pseudo-labels and improved segmentation performance 输出:改进的伪标签和增强的分割性能 |
8.5 | [8.5] 2502.00972 Pushing the Boundaries of State Space Models for Image and Video Generation [{'name': 'Yicong Hong, Long Mai, Yuan Yao, Feng Liu'}] |
Image and Video Generation 图像生成和视频生成 | v2 image generation video generation state-space models transformer models |
Input: Images and video sequences 图像和视频序列 Step1: Develop SSM-Transformer hybrid model 开发SSM-Transformer混合模型 Step2: Efficient processing of visual sequences 高效处理视觉序列 Step3: Generate images and videos 生成图像和视频 Output: High-quality images and dynamic videos 高质量图像和动态视频 |
8.5 | [8.5] 2502.01004 ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking [{'name': 'Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He'}] |
Autonomous Systems and Robotics 自动驾驶与机器人技术 | v2 6D pose estimation bin-picking zero-shot learning robotic manipulation |
Input: RGB-D image and CAD model 输入: RGB-D图像和CAD模型 Step1: Object detection 物体检测 Step2: Point cloud extraction 点云提取 Step3: Position-Aware Correspondence learning 位置感知对应学习 Step4: Pose estimation 位置估计 Output: 6D pose predictions 输出: 6D姿态预测 |
8.5 | [8.5] 2502.01157 Radiant Foam: Real-Time Differentiable Ray Tracing [{'name': 'Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi'}] |
Neural Rendering 神经渲染 | v2 differentiable rendering volumetric meshes real-time rendering |
Input: Volumetric mesh representations 体积网格表示 Step1: Mesh parameterization 网格参数化 Step2: Differentiable ray tracing 可微光线追踪 Step3: Rendering and evaluation 渲染与评估 Output: Real-time rendering results 实时渲染结果 |
8.5 | [8.5] 2502.01281 Label Correction for Road Segmentation Using Road-side Cameras [{'name': 'Henrik Toikka, Eerik Alamikkotervo, Risto Ojala'}] |
Autonomous Systems and Robotics 自动驾驶机器人系统 | v2 road segmentation autonomous vehicles image registration deep learning |
Input: Roadside camera images 道路监控摄像头图像 Step1: Automatic data collection 自动数据收集 Step2: Semi-automatic annotation method 开发半自动注释方法 Step3: Image registration to correct labels 图像配准以修正标签 Output: Enhanced road segmentation models 改进的道路分割模型 |
8.5 | [8.5] 2502.01297 XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications [{'name': 'Shangjin Zhai, Nan Wang, Xiaomeng Wang, Danpeng Chen, Weijian Xie, Hujun Bao, Guofeng Zhang'}] |
Autonomous Systems and Robotics 自动驾驶与机器人技术 | v2 Visual Inertial Odometry Initialization Feature Matching AR VR |
Input: Visual Inertial Odometry (VIO) data 视觉惯性里程计数据 Step1: Initialization using gyroscope and visual measurements 初始化算法 Step2: Hybrid feature matching using optical flow and descriptor methods 特征匹配 Step3: Evaluation on benchmarks and practical applications 验证和实际应用 Output: Enhanced VIO performance 改进的VIO性能 |
8.5 | [8.5] 2502.01357 Bayesian Approximation-Based Trajectory Prediction and Tracking with 4D Radar [{'name': 'Dong-In Kim, Dong-Hee Paek, Seung-Hyun Song, Seung-Hyun Kong'}] |
Autonomous Driving 自动驾驶 | v2 3D multi-object tracking 4D Radar |
Input: 4D Radar data 4D雷达数据 Step1: Object detection using Bayesian approximation 基于贝叶斯近似进行目标检测 Step2: Motion prediction with transformer network 使用变换器网络进行运动预测 Step3: Two-stage data association integrating Doppler measurements 两阶段数据关联,整合多普勒测量 Output: Accurate 3D MOT results 准确的3D多目标跟踪结果 |
8.5 | [8.5] 2502.01401 Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection [{'name': 'Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang'}] |
3D Visual Grounding 3D视觉基础 | v2 3D visual grounding Large Language Model 3D reconstruction vision-language model |
Input: Referring utterances and 3D scene scans 参考话语和三维场景扫描 Step1: Parse utterance into symbolic expression 将话语解析为符号表达式 Step2: Generate spatial relation features 生成空间关系特征 Step3: Use VLM to process visual information 使用视觉语言模型处理视觉信息 Output: Identified target object 确定目标对象 |
8.0 | [8.0] 2502.00800 Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data [{'name': 'Mengping Yang, Zhe Wang, Ziqiu Chi, Dongdong Li, Wenli Du'}] |
Image Generation 图像生成 | v2 Generative Adversarial Networks Data Augmentation Image Generation |
Input: Limited training data 有限训练数据 Step 1: Estimate covariance matrices 估计协方差矩阵 Step 2: Identify semantic transformation directions 确定语义转换方向 Step 3: Apply adversarial semantic augmentation 应用对抗性语义增强 Output: Improved generation quality 改进的生成质量 |
7.5 | [7.5] 2502.00618 DesCLIP: Robust Continual Adaptation via General Attribute Descriptions for Pretrained Vision-Language Models [{'name': 'Chiyuan He, Zihuan Qiu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 vision-language models knowledge forgetting general attributes |
Input: Pretrained Vision-Language Models (VLMs) 预训练视觉语言模型 Step1: Generating General Attribute Descriptions 生成通用属性描述 Step2: Establishing Vision-GA-Class Associations 建立视觉-通用属性-类关联 Step3: Tuning Visual Encoder 调整视觉编码器 Output: Enhanced Adaptation with Reduced Knowledge Forgetting 改进的适应性,减少知识遗忘 |
7.5 | [7.5] 2502.00639 Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer [{'name': 'Tao Ren, Zishi Zhang, Zehao Li, Jingyang Jiang, Shentao Qin, Guanghao Li, Yan Li, Yi Zheng, Xinping Li, Min Zhan, Yijie Peng'}] |
Image Generation 图像生成 | v2 Diffusion Model Image Generation Video Generation |
Input: Diffusion Model (DM) diffusion模型 Step1: Analyze variance and bias variance和偏差分析 Step2: Develop Recursive Likelihood Ratio optimizer 开发递归似然比优化器 Step3: Validate on image and video tasks 在图像和视频任务上验证 Output: Fine-tuned model 改进的模型 |
7.0 | [7.0] 2502.01530 The in-context inductive biases of vision-language models differ across modalities [{'name': 'Kelsey Allen, Ishita Dasgupta, Eliza Kosoy, Andrew K. Lampinen'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 vision-language models inductive biases generalization |
Input: Visual and textual stimuli 视觉和文本刺激 Step1: Inductive bias analysis 偏置分析 Step2: Experimental paradigm application 实验范式应用 Step3: Data collection and evaluation 数据收集与评估 Output: Insights on model generalization 关于模型泛化的见解 |
6.5 | [6.5] 2502.01524 Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective [{'name': 'Xiaorui Ma, Haoran Xie, S. Joe Qin'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 multimodal learning Large Language Models parameter-efficient learning Vision-Language Models |
Input: Vision-language models 视觉-语言模型 Step1: Categorize and review VLLMs 对VLLMs进行分类和审查 Step2: Discuss training paradigms 讨论训练范式 Step3: Summarize benchmarks 总结基准测试 Output: Comprehensive survey report 综合调查报告 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2502.00173 Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation [{'name': 'Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee'}] |
3D Reconstruction and Modeling 三维重建 | 3D instance segmentation 3D实例分割 Gaussian Splatted Radiance Fields 高斯喷溅辐射场 |
Input: 2D segmentation masks 2D分割掩码 Step1: Feature integration 特征集成 Step2: 3D Gaussian lifting 3D高斯提升 Step3: Segmentation application 分割应用 Output: 3D segmented assets 3D分割资产 |
9.5 | [9.5] 2502.00360 Shape from Semantics: 3D Shape Generation from Multi-View Semantics [{'name': 'Liangchen Li, Caoliwen Wang, Yuqi Zhou, Bailin Deng, Juyong Zhang'}] |
3D Generation 三维生成 | 3D reconstruction shape generation semantics |
Input: Multi-view semantics 多视角语义 Step1: Semantic input analysis 语义输入分析 Step2: Geometry and appearance distillation from 2D models 从2D模型提取几何与外观 Step3: Image restoration and detail enhancement 图像修复与细节增强 Step4: Shape reconstruction using neural SDF representation 使用神经签名距离场重建形状 Output: Complex detailed 3D meshes 复杂细节的三维网格 |
9.5 | [9.5] 2502.00801 Environment-Driven Online LiDAR-Camera Extrinsic Calibration [{'name': 'Zhiwei Huang, Jiaqi Li, Ping Zhong, Rui Fan'}] |
3D Reconstruction and Modeling 三维重建 | LiDAR-camera calibration 3D reconstruction data fusion |
Input: LiDAR and camera data LiDAR和相机数据 Step1: Environmental interpretation 环境解释 Step2: Dual-path correspondence matching 双路径对应匹配 Step3: Spatial-temporal optimization 空间时间优化 Output: Precise extrinsic calibration 精确的外部标定 |
8.5 | [8.5] 2502.00074 SpikingRTNH: Spiking Neural Network for 4D Radar Object Detection [{'name': 'Dong-Hee Paek, Seung-Hyun Kong'}] |
3D Object Detection 三维物体检测 | 3D object detection neural networks autonomous driving |
Input: 4D Radar data 4D 雷达数据 Step1: Process high-density point clouds 处理高密度点云 Step2: Implement spiking neural network architecture 实现脉冲神经网络架构 Step3: Apply biological top-down inference (BTI) 应用生物学的自上而下推理法 Output: Efficient 3D object detection results 高效的三维物体检测结果 |
8.5 | [8.5] 2502.00262 Your submission contained main.bib and main.tex file, but no main.bbl file (include main.bbl, or submit without main.bib; and remember to verify references) [{'name': 'Dianwei Chen, Zifan Zhang, Yuchen Liu, Xianfeng Terry Yang'}] |
Autonomous Driving 自动驾驶 | hazard detection autonomous driving multimodal data fusion |
Input: Multimodal data 输入: 多模态数据 Step1: Data integration 数据集成 Step2: Hazard detection 危险检测 Step3: Spatial localization 空间定位 Output: Enhanced hazard prediction 改进的危险预测 |
8.5 | [8.5] 2502.00315 MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model [{'name': 'Jihyeok Kim, Seongwoo Moon, Sungwon Nah, David Hyunchul Shim'}] |
3D Reconstruction 三维重建 | 3D object detection depth estimation |
Input: Monocular images 单目图像 Step1: Feature extraction using Vision Transformer 基于视觉变换器的特征提取 Step2: Depth estimation using a relative depth model 使用相对深度模型进行深度估计 Step3: Object detection using DETR architecture 使用DETR架构进行物体检测 Output: Enhanced 3D object detection capabilities 改进的3D物体检测能力 |
8.5 | [8.5] 2502.00528 Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings [{'name': 'Zachary Huemann, Samuel Church, Joshua D. Warner, Daniel Tran, Xin Tie, Alan B McMillan, Junjie Hu, Steve Y. Cho, Meghan Lubner, Tyler J. Bradshaw'}] |
VLM & VLA 视觉语言模型 | 3D vision-language model PET/CT visual grounding |
Input: PET/CT reports and images PET/CT 报告和图像 Step1: Automation of weak labeling pipeline 弱标记生成管道自动化 Step2: Data extraction from reports 报告中数据提取 Step3: Training of ConTEXTual Net 3D 训练 ConTEXTual Net 3D Output: 3D visual grounding model 3D 视觉定位模型 |
8.5 | [8.5] 2502.00708 PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation [{'name': 'Qixuan Li, Chao Wang, Zongjin He, Yan Peng'}] |
3D Generation 三维生成 | text-to-3D generation compositional scenes physics-guided generation |
Input: Complex scene descriptions 复杂场景描述 Step1: Scene graph generation 场景图生成 Step2: Asset creation using multimodal agents 使用多模态代理进行资产创建 Step3: Layout prediction with physical model 使用物理模型进行布局预测 Output: Compositional scenes with physical rationality 具有物理合理性的组合场景 |
8.5 | [8.5] 2502.00843 VLM-Assisted Continual learning for Visual Question Answering in Self-Driving [{'name': 'Yuxin Lin, Mengshi Qi, Liang Liu, Huadong Ma'}] |
VLM & VLA 视觉语言模型与视觉语言对齐 | Vision-Language Models Visual Question Answering autonomous driving continual learning |
Input: Visual Question Answering tasks in autonomous driving 在自动驾驶中的视觉问答任务 Step1: Integrate Vision-Language Models with continual learning 整合视觉语言模型与持续学习 Step2: Implement selective memory replay and knowledge distillation 实施选择性记忆重放和知识蒸馏 Step3: Apply task-specific projection layer regularization 应用任务特定投影层正则化 Output: Improved VQA system performance 改进的视觉问答系统性能 |
8.5 | [8.5] 2502.00954 Hypo3D: Exploring Hypothetical Reasoning in 3D [{'name': 'Ye Mao, Weixun Luo, Junpeng Jing, Anlan Qiu, Krystian Mikolajczyk'}] |
3D Reasoning 3D推理 | 3D reasoning Visual Question Answering scene understanding |
Input: Context changes and indoor scene descriptions 上下文变化和室内场景描述 Step1: Benchmark formulation 基准测试制定 Step2: Model evaluation models performance evaluation 模型性能评估 Output: Hypothetical reasoning capabilities 设想推理能力 |
8.5 | [8.5] 2502.00960 SAM-guided Pseudo Label Enhancement for Multi-modal 3D Semantic Segmentation [{'name': 'Mingyu Yang, Jitong Lu, Hun-Seok Kim'}] |
3D Reconstruction and Modeling 三维重建 | 3D semantic segmentation domain adaptation pseudo-labels autonomous driving |
Input: 3D point cloud and SAM masks 输入: 3D点云和SAM掩码 Step1: Class label determination using majority voting 步骤1: 使用投票法确定类别标签 Step2: Unreliable mask label filtering using constraints 步骤2: 使用约束过滤不可靠的掩码标签 Step3: Geometry-Aware Progressive Propagation (GAPP) to propagate mask labels 步骤3: 使用几何感知逐步传播来传递掩码标签 Output: Enhanced pseudo-labels with improved quality 输出: 质量提升的增强伪标签 |
8.5 | [8.5] 2502.01004 ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking [{'name': 'Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He'}] |
Autonomous Systems and Robotics 自动驾驶 | 6D pose estimation bin-picking robotic manipulation zero-shot learning |
Input: Scene instances and CAD models 场景实例与CAD模型 Step1: Feature extraction 特征提取 Step2: Position-aware correspondence learning 基于位置的对应学习 Step3: Pose estimation 位置估计 Output: Accurate 6D poses 准确的6D姿势 |
8.5 | [8.5] 2502.01045 WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction [{'name': 'Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo'}] |
3D Reconstruction 三维重建 | 3D human reconstruction photorealistic rendering |
Input: Monocular video 单目视频 Step1: Dual-Space Optimization 双空间优化 Step2: Score Distillation Sampling (SDS) 评分蒸馏采样 Step3: View Selection_strategy 视图选择策略 Step4: Pose Feature Injection 姿态特征注入 Output: High-fidelity dynamic human avatars 高保真动态人类虚拟形象 |
8.5 | [8.5] 2502.01157 Radiant Foam: Real-Time Differentiable Ray Tracing [{'name': 'Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi'}] |
Neural Rendering 神经渲染 | differentiable rendering ray tracing computer vision |
Input: Scene representations 场景表示 Step1: Implement volumetric mesh ray tracing 实现体积网格光线追踪 Step2: Develop a novel scene representation 发展新场景表示 Step3: Evaluate rendering speed and quality 评估渲染速度和质量 Output: Real-time rendering model 实时渲染模型 |
8.5 | [8.5] 2502.01281 Label Correction for Road Segmentation Using Road-side Cameras [{'name': 'Henrik Toikka, Eerik Alamikkotervo, Risto Ojala'}] |
Autonomous Driving 自动驾驶 | road segmentation deep learning autonomous vehicles data annotation |
Input: Roadside camera feeds 路边摄像头视频 Step1: Manual labeling of one frame 手动标注一帧 Step2: Transfer labels to other frames 转移标签到其他帧 Step3: Compensate for camera movements 使用频域图像配准补偿相机位移 Output: Semi-automatically labeled road data 半自动标注的道路数据 |
8.5 | [8.5] 2502.01297 XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications [{'name': 'Shangjin Zhai, Nan Wang, Xiaomeng Wang, Danpeng Chen, Weijian Xie, Hujun Bao, Guofeng Zhang'}] |
Visual Odometry 视觉里程计 | Visual Inertial Odometry Structure from Motion Augmented Reality Virtual Reality |
Input: Visual inertial measurements 视觉惯性测量 Step1: Robust initialization initialization 稳健初始化 Step2: Feature matching 特征匹配 Step3: State estimation 状态估计 Output: Accurate visual inertial odometry result 精确的视觉惯性里程计结果 |
8.5 | [8.5] 2502.01356 Quasi-Conformal Convolution : A Learnable Convolution for Deep Learning on Riemann Surfaces [{'name': 'Han Zhang, Tsz Lok Ip, Lok Ming Lui'}] |
3D Reconstruction and Modeling 3D重建 | 3D facial analysis Riemann surfaces |
Input: Geometric data and Riemann surfaces 几何数据和黎曼曲面 Step1: Define quasi-conformal mappings 定义准保形映射 Step2: Develop Quasi-Conformal Convolution operators 开发准保形卷积算子 Step3: Implement Quasi-Conformal Convolutional Neural Network (QCCNN) 实现准保形卷积神经网络 Output: Adaptive convolution for geometric data 自适应卷积用于几何数据 |
8.5 | [8.5] 2502.01357 Bayesian Approximation-Based Trajectory Prediction and Tracking with 4D Radar [{'name': 'Dong-In Kim, Dong-Hee Paek, Seung-Hyun Song, Seung-Hyun Kong'}] |
Robotic Perception 机器人感知 | 3D multi-object tracking Bayesian approximation autonomous driving |
Input: 4D Radar data 4D 雷达数据 Step1: Motion prediction using transformer-based network 使用基于变换器的网络进行运动预测 Step2: Bayesian approximation for detection and prediction 步骤 2: 检测和预测中的贝叶斯近似 Step3: Two-stage data association leveraging Doppler measurements 基于多普勒测量的两阶段数据关联 Output: Enhanced multi-object tracking performance 提升的多目标跟踪性能 |
8.5 | [8.5] 2502.01401 Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection [{'name': 'Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang'}] |
3D Visual Grounding 3D视觉定位 | 3D visual grounding weakly supervised learning |
Input: 3D visual information and language 3D视觉信息与语言 Step1: Code generation using LLM 通过LLM生成代码 Step2: Spatial relationship computation 空间关系计算 Step3: Quality evaluation and optimization 质量评估和优化 Output: Efficient grounding results 高效的定位结果 |
8.5 | [8.5] 2502.01405 FourieRF: Few-Shot NeRFs via Progressive Fourier Frequency Control [{'name': 'Diego Gomez, Bingchen Gong, Maks Ovsjanikov'}] |
3D Reconstruction 三维重建 | Few-Shot NeRFs 少样本神经辐射场 3D Reconstruction 三维重建 |
Input: Scene images 场景图像 Step1: Curriculum training curriculum training 课程训练 Step2: Feature parameterization 特征参数化 Step3: Scene complexity increment 增加场景复杂性 Output: High-quality reconstruction 高质量重建 |
8.0 | [8.0] 2502.00342 Embodied Intelligence for 3D Understanding: A Survey on 3D Scene Question Answering [{'name': 'Zechuan Li, Hongshan Yu, Yihao Ding, Yan Li, Yong He, Naveed Akhtar'}] |
3D Reconstruction and Modeling 3D重建与建模 | 3D scene question answering multimodal modelling datasets |
Input: 3D scene data 3D场景数据 Step1: Systematic review of datasets 数据集的系统评审 Step2: Analysis of methodologies 方法论分析 Step3: Evaluation of metrics 评估指标 Output: Comprehensive understanding of 3D SQA 3D场景问答的综合理解 |
8.0 | [8.0] 2502.00800 Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data [{'name': 'Mengping Yang, Zhe Wang, Ziqiu Chi, Dongdong Li, Wenli Du'}] |
Image Generation 图像生成 | Generative Adversarial Networks data augmentation image synthesis semantic features |
Input: Limited image datasets 有限图像数据集 Step1: Estimate covariance matrices 估计协方差矩阵 Step2: Identify meaningful transformation directions 识别有意义的转化方向 Step3: Apply transformations to semantic features 对语义特征应用转化 Output: Enhanced synthetic images 增强合成图像 |
7.5 | [7.5] 2502.00333 BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution [{'name': 'Kai Liu, Kaicheng Yang, Zheng Chen, Zhiteng Li, Yong Guo, Wenbo Li, Linghe Kong, Yulun Zhang'}] |
Image Generation 图像生成 | super-resolution diffusion model binarization model compression |
Input: Diffusion model for super-resolution 超分辨率扩散模型 Step1: Binarization of model models 模型的二值化 Step2: One-step distillation into extreme compression 一步蒸馏以实现极端压缩 Step3: Integration of sparse and low rank matrix branches 结合稀疏和低秩矩阵分支 Output: Compressed and accelerated super-resolution model 压缩和加速的超分辨率模型 |
7.5 | [7.5] 2502.00500 Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation [{'name': 'Yang Cao, Zhao Song, Chiwun Yang'}] |
Image and Video Generation 图像生成 | video generation interpolation extrapolation |
Input: Video frames 视频帧 Step1: Hypothesis generation 假设生成 Step2: Optimal projection approximation 最优投影近似 Step3: Interpolation and extrapolation 插值和外推 Output: Time-dependent video frames 时间依赖视频帧 |
7.5 | [7.5] 2502.00639 Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer [{'name': 'Tao Ren, Zishi Zhang, Zehao Li, Jingyang Jiang, Shentao Qin, Guanghao Li, Yan Li, Yi Zheng, Xinping Li, Min Zhan, Yijie Peng'}] |
Image Generation 图像生成 | Diffusion Model image generation video generation |
Input: Probabilistic diffusion model 概率扩散模型 Step1: Pre-training on unlabeled data 在无标签数据上进行预训练 Step2: Recursive Likelihood Ratio optimizer proposal 提出递归似然比优化器 Step3: Implementation of zero-order gradient estimation 零阶梯度估计的实施 Output: Aligned diffusion models 对齐的扩散模型 |
7.5 | [7.5] 2502.00662 Mitigating the Modality Gap: Few-Shot Out-of-Distribution Detection with Multi-modal Prototypes and Image Bias Estimation [{'name': 'Yimu Wang, Evelien Riddell, Adrian Chow, Sean Sedwards, Krzysztof Czarnecki'}] |
VLM & VLA 视觉语言模型与对齐 | vision-language models out-of-distribution detection few-shot learning |
Input: ID image and text prototypes 输入: ID图像和文本原型 Step1: Theoretical analysis 理论分析 Step2: Incorporation of image prototypes 图像原型的整合 Step3: Development of biased prompts generation (BPG) module 偏差提示生成(BPG)模块的开发 Step4: Implementation of image-text consistency (ITC) module 图像文本一致性(ITC)模块的实施 Output: Enhanced VLM-based OOD detection performance 输出: 改进的基于VLM的OOD检测性能 |
7.5 | [7.5] 2502.00711 VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework [{'name': 'Chunbai Zhang, Chao Wang, Yang Zhou, Yan Peng'}] |
Vision-Language Models (VLMs) 视觉语言模型 | visual reasoning evidence-based reasoning VLM |
Input: Visual information (images/videos) 输入: 视觉信息(图像/视频) Step1: Extract fine-grained visual knowledge from visual relationships 第一步: 从视觉关系中提取细粒度视觉知识 Step2: Paraphrase questions with underspecification using extracted knowledge 第二步: 利用提取的知识对欠规范的问题进行改写 Step3: Employ Chain-of-Evidence prompting for interpretable reasoning 第三步: 使用证据链提示进行可解释推理 Output: Enhanced visual reasoning capabilities 输出: 改进的视觉推理能力 |
7.5 | [7.5] 2502.00719 Vision and Language Reference Prompt into SAM for Few-shot Segmentation [{'name': 'Kosuke Sakurai, Ryotaro Shimizu, Masayuki Goto'}] |
VLM & VLA 视觉语言模型与对齐 | few-shot segmentation vision-language model |
Input: Annotated reference images and text labels 参考图像和文本标签 Step1: Input visual and semantic reference信息输入视觉和语义参考 Step2: Integrate prompt embeddings into SAM 将提示嵌入集成到SAM Step3: Few-shot segmentation via VLP-SAM 通过VLP-SAM进行少样本分割 Output: High-performance segmentation results 高性能的分割结果 |
7.5 | [7.5] 2502.00972 Pushing the Boundaries of State Space Models for Image and Video Generation [{'name': 'Yicong Hong, Long Mai, Yuan Yao, Feng Liu'}] |
Image Generation 图像生成 | image generation video generation |
Input: Visual sequences 视觉序列 Step1: Model development 模型开发 Step2: Integration of SSM and Transformers SSM与变换器的整合 Step3: Evaluation of generated outputs 生成结果的评估 Output: Generated images and videos 生成的图像和视频 |
7.5 | [7.5] 2502.01524 Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective [{'name': 'Xiaorui Ma, Haoran Xie, S. Joe Qin'}] |
VLM & VLA 视觉语言模型与对齐 | Vision-Language Large Language Models parameter efficiency |
Step1: Introduce architecture of LLMs 介绍LLM架构 Step2: Discuss parameter-efficient learning methods 讨论参数效率学习方法 Step3: Present taxonomy of modality integrators 提出模态集成器分类 Step4: Review training paradigms and efficiency considerations 回顾训练范式及效率考虑 Step5: Compare experimental results of representative models 比较代表模型的实验结果 |
7.5 | [7.5] 2502.01530 The in-context inductive biases of vision-language models differ across modalities [{'name': 'Kelsey Allen, Ishita Dasgupta, Eliza Kosoy, Andrew K. Lampinen'}] |
Vision-Language Models (VLMs) 视觉语言模型 | vision-language models inductive biases generalization |
Input: Stimuli presented in vision and text 视觉和文本中呈现的刺激 Step1: Conduct experiments 进行实验 Step2: Analyze generalization across models 分析模型间的概括性 Output: Insights on inductive biases regarding shape and color 对形状和颜色的归纳偏见的见解 |
5.0 | [5.0] 2502.00618 DesCLIP: Robust Continual Adaptation via General Attribute Descriptions for Pretrained Vision-Language Models [{'name': 'Chiyuan He, Zihuan Qiu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li'}] |
Vision-Language Models (VLMs) 视觉语言模型 | vision-language models continual adaptation attribute descriptions |
Input: Visual features and class text visuals 视觉特征和类别文本 Step1: Generate general attribute descriptions 生成一般属性描述 Step2: Design anchor-based embedding filter 设计基于锚点的嵌入过滤器 Step3: Tune visual encoder 调整视觉编码器 Output: Robust vision-GA-class associations 稳健的视觉-一般属性-类别关联 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2501.17978v2 VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting | 3D generation 3D生成 | 3D Gaussian Splatting view-dependent representation 3D高斯渲染 视角依赖表示 |
input: images 图片 extend the 3D Gaussian Splatting model 扩展3D高斯渲染模型 introduce an additional symmetric matrix 引入额外的对称矩阵 achieve view-dependent opacity representation 实现视角依赖的透明度表示 output: improved 3D scene reconstruction 输出:改进的3D场景重建 |
8.5 | [8.5] 2501.19319v1 Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping | 3D reconstruction 三维重建 | 3D reconstruction 3D Gaussian Splatting endoscopic SLAM depth reconstruction 三维重建 3D高斯斑点 内窥镜SLAM 深度重建 |
input: endoscopic image sequences 内窥镜图像序列 Step 1: tracking using Gaussian Splatting 使用高斯斑点的跟踪 Step 2: mapping and bundle adjustment 映射与束调整 Step 3: surface normal-aware reconstruction 结合表面法向量进行重构 output: accurate 3D reconstruction and real-time tracking 输出: 精确的3D重建与实时跟踪 |
8.5 | [8.5] 2501.19270v1 Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way | 3D reconstruction 三维重建 | Point Cloud Completion 3D Shape Completion Knowledge Distillation Points Completion 点云补全 3D形状补全 知识蒸馏 点补全 |
input: incomplete point cloud 有缺失的点云 step1: apply autoencoder to encode the point cloud 应用自编码器对点云进行编码 step2: use knowledge distillation for completion 使用知识蒸馏进行补全 step3: output: completed 3D shape 输出:完整的3D形状 |
8.5 | [8.5] 2501.19196v1 RaySplats: Ray Tracing based Gaussian Splatting | 3D generation 3D生成 | 3D Gaussian Splatting Gaussian Splatting 3D高斯喷溅 高斯喷溅 |
Input: 2D images 2D图像 Ray-tracing mechanism 射线追踪机制 Intersection computation 交点计算 Ray-tracing algorithms construction 射线追踪算法构建 Final 3D object with lighting and shadows 最终带有光影效果的三维物体 |
8.5 | [8.5] 2501.19088v1 JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting | 3D generation 3D生成 | 3D Gaussian Splatting 3D reconstruction 实时渲染 3D高斯分喷 三维重建 |
input: 3D key points (输入:3D关键点) Step 1: Create a joint-driven 3D Gaussian representation (步骤1:创建联合驱动的3D高斯表示) Step 2: Implement differentiable spatial transformations (步骤2:实现可微分的空间变换) Step 3: Apply real-time shadow simulation method (步骤3:应用实时阴影模拟方法) output: High-fidelity hand images (输出:高保真的手部图像) |
8.5 | [8.5] 2501.18982v1 OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation | 3D generation 3D生成 | 3D generation 3D gaussian 物体生成 3D高斯 |
input: 3D assets 3D资产 extract: physical properties 提取物理属性 generate: physics-based dynamics 生成基于物理的动态 output: dynamic scene 输出动态场景 |
7.5 | [7.5] 2501.19382v1 LiDAR Loop Closure Detection using Semantic Graphs with Graph Attention Networks | Autonomous Driving 自动驾驶 | LiDAR loop closure detection graph attention networks place recognition semanitic registration 激光雷达 回环闭合检测 图注意力网络 地点识别 语义注册 |
input: semantic graphs 语义图 step1: encode semantic graphs using graph attention networks 使用图注意力网络编码语义图 step2: compare graph vectors to identify loop closure 比较图向量以识别回环闭合 step3: estimate 6 DoF pose constraint using semantic registration 使用语义注册估计6自由度位姿约束 output: loop closure detection results 回环闭合检测结果 |
7.5 | [7.5] 2501.19259v1 Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge | Autonomous Driving 自主驾驶 | Autonomous Driving Neuromorphic Vision Real-time Navigation Autonomous Systems 自驾驶 神经形态视觉 实时导航 自主系统 |
Input: Human speech commands 人类语音指令 Step 1: Translate speech into planning commands 将语音翻译成规划指令 Step 2: Execute commands using neuromorphic vision 执行命令使用神经形态视觉 Step 3: Navigate and avoid obstacles in real-time 实时导航和避免障碍 Output: Autonomous drone navigation output 自主无人机导航输出 |
7.5 | [7.5] 2501.19252v1 Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Video Generation 视频生成 | video generation text-to-video models 视频生成 文本到视频模型 |
input: diffusion model inputs 输入:扩散模型输入 step1: align video frames with text prompts 步骤1:将视频帧与文本提示对齐 step2: utilize a beam search strategy to optimize output 使用束搜索策略优化输出 step3: compute metrics for perceptual quality evaluation 计算感知质量评估的指标 output: high-quality, aligned video generation 输出:高质量、对齐的视频生成 |
7.5 | [7.5] 2501.19035v1 SynthmanticLiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging | Autonomous Driving 自动驾驶 | Semantic Segmentation LiDAR Imaging Autonomous Driving 合成分割 LiDAR成像 自动驾驶 |
input: LiDAR data 输入: LiDAR 数据 step1: generate synthetic dataset 生成合成数据集 step2: utilize CARLA simulator 使用 CARLA 模拟器 step3: train segmentation algorithms 训练分割算法 output: improved segmentation performance 输出: 改进的分割性能 |
7.5 | [7.5] 2501.17159v2 IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait | Image Generation 图像生成 | personalized portrait generation identity preservation view-consistent reconstruction 个性化肖像生成 身份保留 视角一致重建 |
input: reference images 参考图像 step1: Lighting-Aware Stitching 光照感知拼接 step2: View-Consistent Adaptation 视角一致自适应 step3: ControlNet-like supervision 控制网络样监督 output: personalized portraits 个性化肖像 |
6.5 | [6.5] 2501.18994v1 VKFPos: A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration | Autonomous Driving (自动驾驶) | Monocular Positioning Extended Kalman Filter Deep Learning Single-shot 单目定位 扩展卡尔曼滤波 深度学习 单次 |
input: monocular images 单目图像 step1: Absolute Pose Regression (APR) 绝对姿态回归 step2: Relative Pose Regression (RPR) 相对姿态回归 step3: Integrate APR and RPR using EKF 通过扩展卡尔曼滤波整合APR和RPR output: accurate positioning results 精确定位结果 |
6.0 | [6.0] 2501.19331v1 Consistent Video Colorization via Palette Guidance | Video Generation 视频生成 | Video Colorization Stable Video Diffusion Palette Guidance 视频上色 稳定视频扩散 调色板引导 |
input: video sequences 视频序列 step 1: design palette-based color guider 设计调色板引导器 step 2: utilize Stable Video Diffusion as base model 利用稳定视频扩散作为基础模型 step 3: generate vivid colors using color context 根据颜色上下文生成生动的颜色 output: colorized video sequences 上色的视频序列 |
5.5 | [5.5] 2501.18865v1 REG: Rectified Gradient Guidance for Conditional Diffusion Models | Image Generation 图像生成 | conditional generation diffusion models conditional generation 条件生成 扩散模型 |
input: guidance techniques 指导技术 step1: replace the scaled marginal distribution target 替换缩放的边际分布目标 step2: implement rectified gradient guidance 实施矩形梯度指导 step3: conduct experiments on image generation tasks 进行图像生成任务的实验 output: improved image generation results 改进的图像生成结果 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
9.5 | [9.5] 2501.19196v1 RaySplats: Ray Tracing based Gaussian Splatting | 3D generation 三维生成 | 3D Gaussian Splatting Ray Tracing 3D高斯点云 光线追踪 |
input: 2D images 2D图像 process: Gaussian Splatting 高斯点云渲染 process: ray tracing based on Gaussian primitives 基于高斯原始体的光线追踪 output: 3D objects with light and shadow effects 输出具有光影效果的3D物体 |
9.0 | [9.0] 2501.17978v2 VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting | 3D generation 3D生成 | 3D Gaussian Splatting view-dependent rendering 3D高斯点云 视角依赖的渲染 |
input: 3D scene reconstruction from images 3D场景重建从图像中提取 step 1: extend 3D Gaussian Splatting model 扩展3D高斯点云模型 step 2: introduce symmetric matrix to enhance opacity representation 引入对称矩阵以增强不透明性表示 step 3: optimize suppression of Gaussians based on viewer perspective 根据观察者视角优化高斯的抑制 output: improved representation of view-dependent reflections and specular highlights 输出:改进视角依赖的反射和镜面高光的表示 |
8.5 | [8.5] 2501.19319v1 Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping | 3D reconstruction 三维重建 | 3D Gaussian Splatting SLAM endoscopic reconstruction depth reconstruction 3D 高斯点 SLAM 内窥镜重建 深度重建 |
input: endoscopic images 内窥镜图像 step1: surface normal-aware tracking 表面法线感知跟踪 step2: accurate mapping 精确地图构建 step3: bundle adjustment 捆绑调整 output: geometrically accurate 3D reconstruction 准确的三维重建 |
8.5 | [8.5] 2501.19252v1 Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Video Generation 视频生成 | Text-to-video Diffusion models Video generation 评分调整 文本转视频 扩散模型 视频生成 奖励校准 |
input: video generation prompts 视频生成提示 step1: employ diffusion latent beam search 使用扩散潜在光束搜索 step2: maximize alignment reward 最大化对齐奖励 step3: improve perceptual quality 提升感知质量 output: high-quality video optimized for natural movement 输出:高质量视频,优化自然运动 |
8.5 | [8.5] 2501.19088v1 JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting | 3D generation 3D生成 | 3D Gaussian Splatting animatable hand avatar 3D高斯喷涂 可动画手部化身 |
input: 3D key points 3D关键点 Jointly 3D Gaussian Splatting (3DGS) joint-driven representation 联合3D高斯喷涂(3DGS)驱动表示 apply spatial transformations based on 3D key points 基于3D关键点应用空间变换 real-time rendering and shadow simulation 实时渲染和阴影模拟 output: animatable high-fidelity hand images 输出:可动画的高保真手部图像 |
8.5 | [8.5] 2501.18982v1 OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation | 3D generation 3D生成 | 3D generation 3D gaussian 3D生成 3D高斯 |
input: user-specified prompts 用户指定的提示 step1: define a scene according to user prompts 根据用户提示定义场景 step2: estimate material weighting factors using a pretrained video diffusion model 使用预训练的视频扩散模型估计材料权重因子 step3: represent each 3D asset as a collection of constitutive 3D Gaussians 将每个3D资产表示为一组组成的3D高斯分布 output: a physics-based 3D dynamic scene 输出:基于物理的3D动态场景 |
8.0 | [8.0] 2501.19270v1 Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way | 3D reconstruction三维重建 | Point Cloud Completion Multi-view Distillation 3D Shape Recovery 点云补全 多视图蒸馏 3D形状恢复 |
input: incomplete point cloud 输入: 不完整的点云 step1: apply autoencoder architecture 应用自编码器架构 step2: use knowledge distillation strategy to enhance completion 使用知识蒸馏策略以增强完成度 step3: output: completed point cloud 输出: 完整的点云 |
7.5 | [7.5] 2501.19382v1 LiDAR Loop Closure Detection using Semantic Graphs with Graph Attention Networks | Autonomous Driving 自主驾驶 | Loop Closure Detection Semantic Graphs Graph Attention Networks 闭环检测 语义图 图注意力网络 |
input: point cloud 输入: 点云 step1: encode semantic graphs using graph attention networks 步骤1: 使用图注意力网络编码语义图 step2: generate graph vectors through self-attention mechanisms 步骤2: 通过自注意力机制生成图向量 step3: compare graph vectors to detect loop closure 步骤3: 比较图向量以检测闭环 output: loop closure candidates 输出: 闭环候选 |
7.5 | [7.5] 2501.19035v1 SynthmanticLiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging | Autonomous Driving 自主驾驶 | Semantic segmentation LiDAR imaging autonomous driving 合成分割 LiDAR成像 自主驾驶 |
input: LiDAR images (输入: LiDAR图像) modify CARLA simulator (修改CARLA模拟器) generate SynthmanticLiDAR dataset (生成SynthmanticLiDAR数据集) evaluate with transfer learning (使用迁移学习进行评估) output: improved semantic segmentation performance (输出: 改进的语义分割性能) |
7.5 | [7.5] 2501.17159v2 IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait | Image Generation 图像生成 | Personalized Portrait Generation 3D-aware relighting 个性化肖像生成 具3D感知的重光照 |
Input: reference portrait images 参考肖像图像 Step 1: Lighting-Aware Stitching 具光照感知的拼接 Step 2: View-Consistent Adaptation 具视图一致的适配 Output: personalized portraits with identity preservation 具有身份保留的个性化肖像 |
7.0 | [7.0] 2501.19243v1 Accelerating Diffusion Transformer via Error-Optimized Cache | Image Generation 图像生成 | Image Generation Diffusion Transformer ImageNet Dataset 图像生成 扩散变换器 ImageNet数据集 |
input: Diffusion Transformer features (扩散变换器特征) extract caching differences (提取缓存差异) optimize cache based on errors (基于错误优化缓存) output: improved generated images (输出: 改进的生成图像) |
6.5 | [6.5] 2501.19259v1 Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge | Autonomous Driving 自主驾驶 | autonomous driving natural language processing neuroscience autonomous navigation 自主驾驶 自然语言处理 神经科学 自主导航 |
input: human speech and dynamic environment 输入:人类语言和动态环境 step1: translate human speech into planning commands 步骤1:将人类语言翻译为规划命令 step2: navigate and avoid obstacles using neuromorphic vision 步骤2:利用神经形态视觉导航并避免障碍物 output: real-time autonomous navigation output 实时自主导航结果 |
6.5 | [6.5] 2501.18994v1 VKFPos: A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration | Autonomous Driving 自主驾驶 | monocular positioning extended kalman filter variational bayesian inference 单目定位 扩展卡尔曼滤波 变分贝叶斯推理 |
input: monocular images 单目图像 step1: Absolute Pose Regression (APR) 绝对姿态回归 step2: Relative Pose Regression (RPR) 相对姿态回归 step3: Integration with Extended Kalman Filter (EKF) 通过扩展卡尔曼滤波整合 output: accurate positional predictions 准确的位置信息预测 |
Relavance | Title | Research Topic | Keywords | Pipeline |
---|---|---|---|---|
8.5 | [8.5] 2501.18594v1 Foundational Models for 3D Point Clouds: A Survey and Outlook | 3D reconstruction 3D重建 | 3D point clouds foundational models 3D视觉理解 基础模型 3D点云 |
input: 3D point clouds 3D点云 step1: review of foundational models FMs 基础模型的回顾 step2: categorize use of FMs in 3D tasks 分类基础模型在3D任务中的应用 step3: summarize state-of-the-art methods 总结最新的方法 output: comprehensive overview of FMs for 3D understanding 输出:基础模型在3D理解中的综合概述 |
8.5 | [8.5] 2501.18162v1 IROAM: Improving Roadside Monocular 3D Object Detection Learning from Autonomous Vehicle Data Domain | Autonomous Driving 自动驾驶 | 3D object detection autonomous driving 3D对象检测 自动驾驶 |
input: roadside data and vehicle-side data In-Domain Query Interaction module learns content and depth information Cross-Domain Query Enhancement decouples queries into semantic and geometry parts outputs enhanced object queries |
8.5 | [8.5] 2501.18110v1 Lifelong 3D Mapping Framework for Hand-held & Robot-mounted LiDAR Mapping Systems | 3D reconstruction 三维重建 | 3D Mapping 3D Reconstruction Lifelong Mapping 激光雷达 三维映射 三维重建 终身映射 |
Input: Hand-held and robot-mounted LiDAR maps 输入:手持和机器人安装的激光雷达地图 Dynamic point removal algorithm 动态点去除算法 Multi-session map alignment using feature descriptor matching and fine registration 多会话地图对齐,使用特征描述符匹配和精细配准 Map change detection to identify changes between aligned maps 地图变化检测以识别对齐地图之间的变化 Map version control for maintaining current environmental state and querying changes 地图版本控制,用于维护当前环境状态和查询变化 |
8.0 | [8.0] 2501.18595v1 ROSA: Reconstructing Object Shape and Appearance Textures by Adaptive Detail Transfer | Mesh Reconstruction 网格重建 | Mesh Reconstruction 3D reconstruction 网格重建 三维重建 |
input: limited set of images 限制的图像集 step1: optimize mesh geometry 优化网格几何形状 step2: refine mesh with spatially adaptive resolution 使用空间自适应分辨率细化网格 step3: reconstruct high-resolution textures 重新构建高分辨率纹理 output: textured mesh with detailed appearance 带有详细外观的纹理网格 |
7.5 | [7.5] 2501.18590v1 DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models | Rendering Techniques 渲染技术 | Inverse Rendering Forward Rendering Video Diffusion Models Inverse渲染 正向渲染 视频扩散模型 |
input: real-world videos, 真实世界视频 step1: estimate G-buffers using inverse rendering model, 使用逆向渲染模型估计G-buffer step2: generate photorealistic images from G-buffers, 从G-buffer生成照片级真实图像 output: relit images, material edited images, realistic object insertions, 重新照明图像,材料编辑图像,逼真的物体插入 |
7.5 | [7.5] 2501.18315v1 Surface Defect Identification using Bayesian Filtering on a 3D Mesh | Mesh Reconstruction 网格重建 | 3D Mesh Mesh Reconstruction 3D网格 网格重建 |
input: CAD model and point cloud data 输入:CAD模型和点云数据 transform CAD model into polygonal mesh 将CAD模型转换为多边形网格 apply weighted least squares algorithm 应用加权最小二乘算法 estimate state based on point cloud measurements 根据点云测量估计状态 output: high-precision defect identification 输出:高精度缺陷识别 |
7.5 | [7.5] 2501.17636v2 Efficient Interactive 3D Multi-Object Removal | 3D reconstruction 三维重建 | 3D scene understanding multi-object removal 3D场景理解 多对象移除 |
input: selected areas and objects for removal 选定的移除区域和对象 step1: mask matching and refinement mask 匹配和细化掩码步骤 step2: homography-based warping 同伦变换基础的扭曲 step3: inpainting process 修复过程 output: modified 3D scene 修改后的3D场景 |
7.0 | [7.0] 2501.18246v1 Ground Awareness in Deep Learning for Large Outdoor Point Cloud Segmentation | 3D reconstruction 三维重建 | point cloud segmentation outdoor point clouds semantic segmentation point cloud 关键点云分割 户外点云 语义分割 点云 |
input: outdoor point clouds 户外点云 compute Digital Terrain Models (DTMs) 计算数字地形模型 employ RandLA-Net for segmentation 使用 RandLA-Net 进行分割 evaluate performance on datasets 评估在数据集上的表现 integrate relative elevation features 集成相对高程特征 |
6.5 | [6.5] 2501.18494v1 Runway vs. Taxiway: Challenges in Automated Line Identification and Notation Approaches | Autonomous Driving 自动驾驶 | Automated line identification 自动化线识别 Convolutional Neural Network 卷积神经网络 runway markings 跑道标记 autonomous systems 自动化系统 labeling algorithms 标记算法 |
input: runway and taxiway images 跑道和滑行道图像 Step 1: color threshold adjustment 颜色阈值调整 Step 2: refine region of interest selection 精细化感兴趣区域选择 Step 3: integrate CNN classification 集成CNN分类 output: improved marking identification 改进的标记识别 |
(Older entries get replaced automatically when the script runs again.)