Skip to content

xy-guo/github_bot_3d_papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Daily Updates on 3D-Related Papers

This repository automatically fetches new or updated arXiv papers in the [cs.CV] category every day, checks if they are relevant to "3D reconstruction" or "3D generation" via ChatGPT, and lists them below.

How It Works

  1. A GitHub Actions workflow runs daily at 09:00 UTC.
  2. It uses the script fetch_cv_3d_papers.py to:
    • Retrieve the latest arXiv papers in cs.CV.
    • Use ChatGPT to filter out those related to 3D reconstruction/generation.
    • Update this README.md with the new findings.
    • Send an email via 163 Mail if any relevant papers are found.

Paper List

Arxiv 2025-02-25

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.16419 DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion
[{'name': 'Jianbin Jiao, Xina Cheng, Kailun Yang, Xiangrong Zhang, Licheng Jiao'}]
3D Reconstruction and Modeling 三维重建 v2
3D human pose estimation
multi-view perception
deficiency-aware estimation
Input: Multi-view images 多视角图像
Step 1: Simplification of network architecture 网络结构简化
Step 2: Feature extraction from images 从图像中提取特征
Step 3: Adaptive multi-view feature fusion 自适应多视角特征融合
Output: 3D human pose estimations 3D 人体姿态估计
9.5 [9.5] 2502.16475 Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control
[{'name': 'Jinbo Yan, Alan Zhao, Yixin Hu'}]
3D Generation 三维生成 v2
3D Generation 3D生成
Geometric Consistency 几何一致性
User Control 用户控制
Input: Single image and point cloud 单幅图像和点云
Step1: Generate sparse seed points 生成稀疏种子点
Step2: Map seed points to anchor latents 映射种子点到锚潜在
Step3: Generate 3D Gaussian representations 生成3D高斯表示
Output: Multi-view consistent 3D models 输出:多视角一致的3D模型
9.5 [9.5] 2502.16488 Geometry-Aware 3D Salient Object Detection Network
[{'name': 'Chen Wang, Liyuan Zhang, Le Hui, Qi Liu, Yuchao Dai'}]
3D Salient Object Detection 三维显著目标检测 v2
3D salient object detection
point cloud
geometry-aware
Input: 3D point clouds 3D点云
Step1: Superpoint partitioning 超点划分
Step2: Point feature learning 点特征学习
Step3: Geometry enhancement geometry enhancement
Output: Salient object map 显著目标图
9.5 [9.5] 2502.16575 Efficient 4D Gaussian Stream with Low Rank Adaptation
[{'name': 'Zhenhuan Liu, Shuai Liu, Yidong Lu, Yirui Chen, Jie Yang, Wei Liu'}]
3D Reconstruction and Modeling 三维重建 v2
Dynamic novel view synthesis 动态新视图合成
3D Gaussian Splatting 3D高斯点云
Input: Video frames 视频帧
Step1: Scene representation using 3D Gaussians 使用3D高斯表示场景
Step2: Low-rank adaptation for bandwidth reduction 低秩适应以减少带宽
Step3: Continuous dynamic reconstruction 进行连续动态重建
Output: Scalable dynamic novel views 可扩展的动态新视图
9.5 [9.5] 2502.16652 Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
[{'name': 'Kim Jun-Seong, GeonU Kim, Kim Yu-Ji, Yu-Chiang Frank Wang, Jaesung Choe, Tae-Hyun Oh'}]
3D Scene Understanding 三维场景理解 v2
3D Gaussian Splatting
open-vocabulary scene understanding
language embedding
3D reconstruction
3D perception
Input: 3D Gaussians 3D高斯点云
Step1: Feature registration 特征注册
Step2: Direct language embedding association 直接语言嵌入关联
Step3: Evaluation of 3D perception tasks 3D感知任务评估
Output: Enhanced scene understanding 改进的场景理解
9.5 [9.5] 2502.16826 Noise2Score3D:Unsupervised Tweedie's Approach for Point Cloud Denoising
[{'name': 'Xiangbin Wei'}]
Point Cloud Processing 点云处理 v2
point cloud denoising
unsupervised learning
Tweedie's formula
Input: Noisy point cloud data 含噪点云数据
Step1: Gradient learning from noisy data 从含噪数据中学习梯度
Step2: Single-step denoising using Tweedie's formula 使用Tweedie公式进行单步去噪
Output: Denoised point cloud 输出: 去噪点云
9.5 [9.5] 2502.17053 PointSea: Point Cloud Completion via Self-structure Augmentation
[{'name': 'Zhe Zhu, Honghua Chen, Xing He, Mingqiang Wei'}]
3D Reconstruction and Modeling 三维重建 v2
Point Cloud Completion 点云补全
Self-structure Augmentation 自结构增强
Input: Incomplete point cloud data 不完整的点云数据
Step1: Data augmentation 数据增强
Step2: Self-view fusion network self-view融合网络
Step3: Feature fusion feature融合
Step4: Point generation point生成
Output: Completed point cloud 完成的点云
9.5 [9.5] 2502.17288 GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow
[{'name': 'Simon Boeder, Fabian Gigengack, Benjamin Risse'}]
3D Reconstruction and Modeling 三维重建 v2
occupancy estimation
3D Gaussian representation
autonomous driving
Gaussian Splatting
Input: Multi-view images 多视角图像
Step1: Construct a sparse 3D Gaussian representation 构建稀疏的3D高斯表示
Step2: Integrate temporal flow estimation 整合时间流估计
Step3: Utilize Gaussian Splatting for training 采用高斯点云训练
Output: Efficient occupancy estimation 高效的占用率估计
9.5 [9.5] 2502.17377 Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting
[{'name': 'Chong Cheng, Gaochao Song, Yiyang Yao, Qinzheng Zhou, Gangjian Zhang, Hao Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Splatting
scene reconstruction
autonomous driving
Input: Images captured by RGB cameras RGB摄像机拍摄的图像
Step1: Spatial prior-based scene structure estimation 空间先验场景结构估计
Step2: Camera graph creation 相机图创建
Step3: Graph-guided optimization guided by multi-view consistency graph-guided多视角一致性优化
Output: High-fidelity 3D reconstruction of scenes 改进的大规模三维场景重建
9.5 [9.5] 2502.17429 CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation
[{'name': 'Vishal Thengane, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Lu Yin, Xiatian Zhu, Salman Khan'}]
3D Instance Segmentation 3D实例分割 v2
3D instance segmentation 3D实例分割
continual learning 持续学习
class imbalance 类别不平衡
Input: RGB-D images with 3D instance annotations 采用带有3D实例标注的RGB-D图像
Step1: Implement a unified framework 实现统一框架
Step2: Integrate Exemplar Replay, Knowledge Distillation, and Imbalance Correction 集成样本重放、知识蒸馏和不平衡修正
Step3: Create benchmark scenarios for evaluation 创建基准场景进行评估
Output: Improved 3D instance segmentation performance 提高3D实例分割性能
9.0 [9.0] 2502.16779 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
[{'name': 'Yaxuan Huang, Xili Dai, Jianan Wang, Xianbiao Qi, Yixing Yuan, Xiangyu Yue'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
room layout estimation
multi-view geometry
DUSt3R
autonomous systems
Input: Multi-view images 多视角图像
Step1: 2D plane detection 2D 平面检测
Step2: 3D point representation 密集 3D 点表示
Step3: Plane correspondence establishment 平面对应关系的建立
Output: Estimated room layout 估计的房间布局
9.0 [9.0] 2502.16907 MambaFlow: A Novel and Flow-guided State Space Model for Scene Flow Estimation
[{'name': 'Jiehao Luo, Jintao Cheng, Xiaoyu Tang, Qingwen Zhang, Bohuan Xue, Rui Fan'}]
Scene Flow Estimation 场景流估计 v2
Scene Flow Estimation
State Space Model
3D motion
Input: Consecutive point cloud frames 连续点云帧
Step1: Model design 模型设计
Step2: Feature extraction 特征提取
Step3: Scene flow estimation 场景流估计
Output: Motion vectors 运动向量
9.0 [9.0] 2502.17237 MegaLoc: One Retrieval to Place Them All
[{'name': 'Gabriele Berton, Carlo Masone'}]
Visual Place Recognition 视觉位置识别 v2
3D reconstruction
Visual Place Recognition
Image Retrieval
SLAM
Input: Diverse image datasets 多样的图像数据集
Step1: Data integration 数据集成
Step2: Model training using combined techniques 模型训练结合多种技术
Step3: Evaluation across multiple tasks 在多项任务中评估模型
Output: Robust image retrieval model 可靠的图像检索模型
8.5 [8.5] 2502.15888 Understanding and Evaluating Hallucinations in 3D Visual Language Models
[{'name': 'Ruiying Peng, Kaiyuan Li, Weichen Zhang, Chen Gao, Xinlei Chen, Yong Li'}]
3D Reconstruction and Modeling 三维重建 v2
3D-LLMs
hallucinations
evaluation metrics
point cloud
Input: 3D point-cloud data 3D点云数据
Step1: Definition of 3D hallucinations 3D幻觉的定义
Step2: Evaluation of hallucinations in 3D-LLMs 对3D-LLMs中的幻觉进行评估
Step3: Analysis of underlying causes 分析潜在原因
Output: New evaluation metrics for hallucinations 针对幻觉的新评估指标
8.5 [8.5] 2502.16012 Cross-Model Transferability of Adversarial Patches in Real-time Segmentation for Autonomous Driving
[{'name': 'Prashant Shekhar, Bidur Devkota, Dumindu Samaraweera, Laxima Niure Kandel, Manoj Babu'}]
Autonomous Driving 自动驾驶 v2
autonomous driving
adversarial attacks
semantic segmentation
Input: Adversarial patch training adversarial 图像片段训练
Step1: Attack formulation 攻击形式化
Step2: Model performance analysis 模型性能分析
Step3: Cross-model transferability evaluation 跨模型转移性评估
Output: Insights on attack susceptibility 输出: 攻击易受性的洞见
8.5 [8.5] 2502.16164 A Deep Learning Framework with Geographic Information Adaptive Loss for Remote Sensing Images based UAV Self-Positioning
[{'name': 'Mingkun Li, Ziming Wang, Guang Huo, Wei Chen, Xiaoning Zhao'}]
Autonomous Systems and Robotics 自动驾驶与机器人 v2
UAV self-positioning
remote sensing
deep learning
Input: Remote sensing images and UAV images 遥感图像与无人机图像
Step1: Data alignment 数据对齐
Step2: Adaptive loss integration 自适应损失集成
Step3: Model evaluation 模型评估
Output: Precise UAV positioning output 精确的无人机定位输出
8.5 [8.5] 2502.16214 SalM$2$: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention
[{'name': 'Chunyu Zhao, Wentao Mu, Xian Zhou, Wenbo Liu, Fei Yan, Tao Deng'}]
Autonomous Driving 自动驾驶 v2
driver attention recognition
real-time model
semantic information
Input: Driving scene data 驾驶场景数据
Step 1: Extract bottom-up image features 提取自下而上的图像特征
Step 2: Extract top-down semantic information 提取自上而下的语义信息
Step 3: Integrate extracted features and map driver attention 集成提取的特征并映射驾驶者注意力
Output: Driver attention map 驾驶员注意力图
8.5 [8.5] 2502.16302 DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation
[{'name': 'Yuxuan Xiong, Yue Shi, Yishun Dou, Bingbing Ni'}]
3D Scene Editing 三维场景编辑 v2
3D scene editing
Neural Radiance Fields
Text-driven generation
Input: Text instructions and 3D scene representation 文本指令与三维场景表示
Step1: Introduce dual-field representation 引入双场表示
Step2: Simulated annealing strategy implementation 模拟退火策略实现
Step3: Apply CLIP-based consistency indicator 应用CLIP一致性指标
Output: Edited 3D scenes with preserved backgrounds 输出:保留背景的编辑后的三维场景
8.5 [8.5] 2502.16303 Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field
[{'name': 'Wenhao Hu, Wenhao Chai, Shengyu Hao, Xiaotong Cui, Xuexiang Wen, Jenq-Neng Hwang, Gaoang Wang'}]
3D Segmentation 3D分割 v2
3D segmentation
Gaussian segmentation
autonomous driving
Input: Multi-view images 多视角图像
Step1: Establish pixel correspondence 建立像素对应关系
Step2: Optimize mask association using the Hungarian algorithm 使用匈牙利算法优化掩膜关联
Step3: Apply piecewise-plane constraints 施加分段平面约束
Output: Consistent and compact 3D segmentation field 一致且紧凑的3D分割场
8.5 [8.5] 2502.16351 AquaNeRF: Neural Radiance Fields in Underwater Media with Distractor Removal
[{'name': 'Luca Gough, Adrian Azzarelli, Fan Zhang, Nantheera Anantrasirichai'}]
3D Reconstruction and Modeling 三维重建 v2
Neural Radiance Fields
3D reconstruction
Underwater imaging
Input: Underwater scenes focusing on static objects 水下静态物体场景
Step1: Model the cumulative density of volumes along a ray 建模沿光线的体积累积密度
Step2: Apply Gaussian distribution for transmittance modeling 应用高斯分布建模透过率
Step3: Optimize the Gaussian distribution for stable rendering 优化高斯分布以实现稳定渲染
Output: Enhanced 3D representation of underwater scenes 改进的水下场景的三维表示
8.5 [8.5] 2502.16389 An Expert Ensemble for Detecting Anomalous Scenes, Interactions, and Behaviors in Autonomous Driving
[{'name': 'Tianchen Ji, Neeloy Chakraborty, Andre Schreiber, Katherine Driggs-Campbell'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人 v2
anomaly detection
autonomous driving
Input: Egocentric videos 自我中心视频
Step1: Scene analysis 场景分析
Step2: Expert model development 专家模型开发
Step3: Anomaly score fusion 异常分数融合
Output: Anomaly detection scores 异常检测分数
8.5 [8.5] 2502.16915 Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model
[{'name': 'Kang Fu, Huiyu Duan, Zicheng Zhang, Xiaohong Liu, Xiongkuo Min, Jia Wang, Guangtao Zhai'}]
3D Reconstruction 三维重建 v2
text-to-3D generation
quality assessment
3D modeling
Input: 3D assets generated via text prompts 生成的3D资产与文本提示
Step1: Database creation 数据库创建
Step2: Quality feature extraction 质量特征提取
Step3: Model evaluation and benchmarking 模型评估与基准测试
Output: Quality assessment scores 质量评估分数
8.5 [8.5] 2502.16941 Gaussian Difference: Find Any Change Instance in 3D Scenes
[{'name': 'Binbin Jiang, Rui Huang, Qingyi Zhao, Yuxiang Zhang'}]
3D Change Detection 三维变化检测 v2
3D change detection
Gaussian distributions
instance segmentation
Input: Multi-view images 多视角图像
Step1: Embed images into 4D Gaussians 将图像嵌入4D高斯中
Step2: Segment images and assign IDs 分割图像并分配ID
Step3: Compare IDs for change detection 比较ID以进行变化检测
Output: Change maps from any viewpoint 从任何视点生成变化图
8.5 [8.5] 2502.16992 Semantic Neural Radiance Fields for Multi-Date Satellite Data
[{'name': 'Valentin Wagner, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
multi-date satellite images
3D reconstruction
Input: Multi-date satellite images with semantic labels 多日期卫星图像和语义标签
Step1: Model adaptation for satellite images 针对卫星图像的模型适应
Step2: Semantic and color fusion 语义与颜色融合
Step3: Robustness evaluation and improvement 可靠性评估与改进
Output: 3D semantic representations 输出三维语义表示
8.5 [8.5] 2502.17039 LCV2I: Communication-Efficient and High-Performance Collaborative Perception Framework with Low-Resolution LiDAR
[{'name': 'Xinxin Feng, Haoran Sun, Haifeng Zheng, Huacong Chen, Wenqiang Chen'}]
Autonomous Driving 自动驾驶 v2
3D object detection
LiDAR
collaborative perception
Input: Data collected from low-resolution LiDAR and cameras 低分辨率LiDAR和相机收集的数据
Step1: Feature extraction 特征提取
Step2: Voxel-wise fusion voxel级融合
Step3: Feature offset correction 特征偏移矫正
Step4: Regional feature enhancement 区域特征增强
Output: Enhanced 3D object detection 改进的三维物体检测
8.0 [8.0] 2502.15956 Human Motion Prediction, Reconstruction, and Generation
[{'name': 'Canxuan Gang, Yiran Wang'}]
3D Reconstruction and Modeling 三维重建 v2
human motion prediction
3D reconstruction
motion generation
Input: Historical motion data 历史运动数据
Step1: Pose forecasting 姿势预测
Step2: 3D motion reconstruction 三维运动重建
Step3: Motion generation 运动生成
Output: Realistic human motion sequences 真实的人类动作序列
7.5 [7.5] 2502.16427 Fine-Grained Video Captioning through Scene Graph Consolidation
[{'name': 'Sanghyeok Chu, Seonguk Seo, Bohyung Han'}]
VLM & VLA 视觉语言模型与视觉语言对齐 v2
video captioning
visual-language models
scene graphs
Input: Video frames 视频帧
Step1: Generate frame-level captions using an image VLM 使用图像视觉语言模型生成帧级字幕
Step2: Convert captions into scene graphs 将字幕转换为场景图
Step3: Consolidate frame-level scene graphs into a video-level scene graph 将帧级场景图整合为视频级场景图
Output: Comprehensive video captions 生成综合视频字幕
7.5 [7.5] 2502.16493 Trunk-branch Contrastive Network with Multi-view Deformable Aggregation for Multi-view Action Recognition
[{'name': 'Yingyuan Yang, Guoyuan Liang, Can Wang, Xiaojun Wu'}]
Multi-view Stereo 多视角立体 v2
Multi-view action recognition
Contrastive learning
Feature fusion
Input: Multi-view RGB images 多视角RGB图像
Step1: Feature aggregation 特征聚合
Step2: Contrastive learning against trunk features 对比学习
Step3: Model evaluation on datasets 模型在数据集上的评估
Output: Enhanced action representations 改进的动作表征
7.5 [7.5] 2502.16618 Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
[{'name': 'Qipan Xu, Zhenting Wang, Xiaoxiao He, Ligong Han, Ruixiang Tang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
copyright detection
Generative AI
Input: Image samples 图像样本
Step1: Dataset creation 数据集创建
Step2: Model evaluation 模型评估
Step3: Analysis of failure cases 失败案例分析
Output: Proposed solutions 提出的解决方案
6.5 [6.5] 2502.16368 Concept Corrector: Erase concepts on the fly for text-to-image diffusion models
[{'name': 'Zheling Meng, Bo Peng, Xiaochuan Jin, Yueming Lyu, Wei Wang, Jing Dong'}]
Image Generation 图像生成 v2
concept erasure
text-to-image generation
Input: Intermediate-generated images 中间生成图像
Step1: Concept presence checking 概念存在检查
Step2: Concept removal correction 概念移除修正
Output: Corrected images 修正后的图像

Arxiv 2025-02-24

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.14891 CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection
[{'name': 'Zhe Huang, Shuo Wang, Yongcai Wang, Lei Wang'}]
3D Object Detection 3D物体检测 v2
3D object detection
autonomous driving
diffusion models
Input: Point clouds from multiple agents 多个代理的点云
Step1: Feature extraction from point clouds 从点云中提取特征
Step2: Information sharing between agents 代理之间共享信息
Step3: Noise reduction using diffusion models 使用扩散模型进行噪声减少
Output: Accurate collaborative 3D object detection 准确的协作3D物体检测
9.5 [9.5] 2502.14938 GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models
[{'name': 'Miao Tao, Yuanzhen Zhou, Haoran Xu, Zeyu He, Zhenyu Yang, Yuchang Zhang, Zhongling Su, Linning Xu, Zhenxiang Ma, Rong Fu, Hengjie Li, Xingcheng Zhang, Jidong Zhai'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
neural rendering
real-time rendering
virtual reality
Input: Large-scale 3D Gaussian Splatting models 大规模3D高斯点云模型
Step1: Design cache-centric rendering pipeline 设计基于缓存的渲染管线
Step2: Implement multi-GPU scheduling 实现多GPU调度
Step3: Optimize CUDA kernels to enhance performance 优化CUDA内核以提高性能
Output: Real-time rendered 3D scenes 实时渲染的3D场景
9.5 [9.5] 2502.14940 FacaDiffy: Inpainting Unseen Facade Parts Using Diffusion Models
[{'name': 'Thomas Froech, Olaf Wysocki, Yan Xia, Junyu Xie, Benedikt Schwab, Daniel Cremers, Thomas H. Kolbe'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
image inpainting
Stable Diffusion
conflict maps
diffusion models
Input: 3D building models and laser scanning point clouds 3D建筑模型和激光扫描点云
Step1: Deriving 2D conflict maps by deterministic ray analysis 通过确定性光线分析推导二维冲突图
Step2: Personalizing a Stable Diffusion model for inpainting 个性化稳定扩散模型进行修复
Step3: Generating synthetic conflict maps for training 生成合成冲突图用于训练
Output: Completed conflict maps for 3D semantic reconstruction 输出用于三维语义重建的完整冲突图
9.5 [9.5] 2502.15011 CrossOver: 3D Scene Cross-Modal Alignment
[{'name': 'Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Daniel Barath, Iro Armeni'}]
3D Scene Understanding 三维场景理解 v2
3D scene understanding
cross-modal alignment
point clouds
Input: Multi-modal 3D data 多模态三维数据
Step1: Flexible modality alignment 灵活的模态对齐
Step2: Unified embedding space learning 统一嵌入空间学习
Step3: Scene retrieval and object localization 场景检索与物体定位
Output: Enhanced scene understanding 改进的场景理解
9.5 [9.5] 2502.15076 Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving Scenarios
[{'name': 'Richard Marcus, Christian Vogel, Inga Jatzkowski, Niklas Knoop, Marc Stamminger'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D object detection 3D目标检测
LiDAR
Synthetic data 合成数据
Domain randomization 域随机化
Autonomous driving 自动驾驶
Input: LiDAR point clouds and synthetic data 输入: LiDAR点云和合成数据
Step1: Sensor modeling 传感器建模
Step2: Domain randomization 域随机化
Step3: Object detection training 目标检测训练
Step4: Performance evaluation 性能评估
Output: Enhanced object detection模型输出: 改进的目标检测
9.5 [9.5] 2502.15438 LEAP: Enhancing Vision-Based Occupancy Networks with Lightweight Spatio-Temporal Correlation
[{'name': 'Fengcheng Yu, Haoran Xu, Canming Xia, Guang Tan'}]
3D Scene Reconstruction 3D场景重建 v2
3D occupancy networks 3D占用网络
autonomous driving 自动驾驶
spatio-temporal correlation 空间时间相关性
Input: Multi-view images 多视角图像
Step1: Tokenization of baseline and motion features 基线和运动特征的标记化
Step2: Tri-stream fusion architecture for correlation establishment 三流融合架构进行关系建立
Step3: Generation of occupancy results 生成占用结果
Output: Enhanced occupancy predictions 改进的占用预测
9.5 [9.5] 2502.15633 RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes
[{'name': 'Sicheng Yu, Chong Cheng, Yifan Zhou, Xiaojun Yang, Hao Wang'}]
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 v2
RGB-only SLAM
3D Gaussian Splatting
outdoor scenes
pose estimation
Input: RGB images RGB图像
Step1: Pointmap regression to generate spatial relationships 生成空间关系点图回归
Step2: Pose estimation based on pointmaps 使用点图进行位姿估计
Step3: 3D Gaussian Splatting for rendering 进行3D高斯喷涂渲染
Output: High-fidelity novel views 输出高保真新视图
9.5 [9.5] 2502.15635 Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis
[{'name': 'Ziqian Ni, Sicong Du, Zhenghua Hou, Chenming Wu, Sheng Yang'}]
3D Reconstruction and Modeling 三维重建 v2
novel view synthesis
multi-lane dataset
autonomous driving
3D reconstruction
LiDAR
Input: Multi-lane dataset containing LiDAR and camera data multi车道数据集,包括LiDAR和相机数据
Step1: Data collection通过多次扫描收集数据
Step2: Multi-sensor pose optimization多传感器姿态优化
Step3: Dataset registration数据集注册
Output: Evaluated novel view synthesis capabilities评估新的视图合成能力
9.2 [9.2] 2502.15488 Q-PETR: Quant-aware Position Embedding Transformation for Multi-View 3D Object Detection
[{'name': 'Jiangyong Yu, Changyong Shu, Dawei Yang, Zichen Yu, Xing Hu, Yan Chen'}]
3D Object Detection 3D物体检测 v2
3D object detection
quantization
autonomous driving
Input: Multi-view images 多视角图像
Step1: Identify quantization issues 识别量化问题
Step2: Propose Q-PETR model 提出Q-PETR模型
Step3: Evaluate performance evaluation 性能评估
Output: Enhanced detection accuracy 提高的检测精度
9.2 [9.2] 2502.15516 Depth-aware Fusion Method based on Image and 4D Radar Spectrum for 3D Object Detection
[{'name': 'Yue Sun, Yeqiang Qian, Chunxiang Wang, Ming Yang'}]
3D Object Detection 3D目标检测 v2
3D object detection 3D目标检测
4D millimeter-wave radar 4D毫米波雷达
Input: 4D radar spectra and depth-aware camera images 4D雷达谱和深度感知相机图像
Step1: Feature extraction from RGB and depth images 从RGB和深度图像中提取特征
Step2: Feature fusion in BEV feature space 在鸟瞰特征空间中融合特征
Step3: 3D object detection using fused features 使用融合特征进行3D目标检测
Output: Enhanced 3D object detection results 改进的3D目标检测结果
8.8 [8.8] 2502.15448 MVIP -- A Dataset and Methods for Application Oriented Multi-View and Multi-Modal Industrial Part Recognition
[{'name': 'Paul Koch, Marian Schl\"uter, J\"org Kr\"uger'}]
Multi-view and Stereo Vision 多视角与立体视觉 v2
Multi-View
Multi-Modal
Industrial Part Recognition
Input: Multi-view RGBD dataset 多视角RGBD数据集
Step1: Data acquisition 数据采集
Step2: Modality integration 模态集成
Step3: Model training and evaluation 模型训练与评估
Output: Robust industrial classifiers 稳健的工业分类器
8.5 [8.5] 2502.14908 KOALA: Knowledge Conflict Augmentations for Robustness in Vision Language Models
[{'name': 'Peter Carragher, Nikitha Rao, Abhinand Jha, R Raghav, Kathleen M. Carley'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
knowledge conflicts
robustness
Input: Visual Question Answering (VQA) with multimodal sources 视觉问答与多模态源
Step1: Introduce targeted perturbations 引入目标扰动
Step2: Evaluate model robustness 评估模型的鲁棒性
Step3: Fine-tune models to improve reasoning 优化模型以提高推理能力
Output: Enhanced understanding of knowledge conflicts 增强对知识冲突的理解
8.5 [8.5] 2502.14917 Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
[{'name': 'Rui Zhao, Qirui Yuan, Jinyu Li, Haofeng Hu, Yun Li, Chengyuan Zheng, Fei Gao'}]
Autonomous Driving 自动驾驶 v2
autonomous driving
3D spatial understanding
multimodal learning
Input: Local scene videos and global BEV maps 本地场景视频和全局鸟瞰图
Step1: Modal encoders align visual representations 模态编码器对齐视觉表示
Step2: Generate natural language responses 生成自然语言响应
Step3: Enhance model performance through extensive training 通过大量训练提升模型性能
Output: Improved perception and reasoning for autonomous driving 改进的感知和推理能力用于自动驾驶
8.5 [8.5] 2502.15180 OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework
[{'name': 'Junliang Chen, Huaiyuan Xu, Yi Wang, Lap-Pui Chau'}]
Autonomous Driving 自动驾驶 v2
occupancy forecasting
autonomous driving
3D perception
Input: Multi-camera video input 多摄像头视频输入
Step1: Feature extraction 特征提取
Step2: Future occupancy forecasting 未来占用状态预测
Step3: Refinement of predictions 预测优化
Output: Future occupancy map 未来占用图
8.5 [8.5] 2502.15307 Road Traffic Sign Recognition method using Siamese network Combining Efficient-CNN based Encoder
[{'name': 'Zhenghao Xi, Yuchao Shao, Yang Zheng, Xiang Liu, Yaqi Liu, Yitong Cai'}]
Autonomous Driving 自动驾驶 v2
Traffic Sign Recognition 交通标志识别
Siamese Network 西安网络
Efficient-CNN 高效卷积神经网络
Input: Traffic sign images 交通标志图像
Step1: Feature extraction 特征提取 using Efficient-CNN based encoders
Step2: Distance computation 距离计算 using Siamese network
Step3: Classification 分类 using fully-connected layer
Output: Recognized traffic sign categories 识别的交通标志类别
8.5 [8.5] 2502.15342 PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments
[{'name': 'Yueting Liu, Hanshi Wang, Yunfei Lei, Zhengjun Zha, Weiming Hu, Jin Gao'}]
Autonomous Driving 自动驾驶 v2
autonomous driving
3D detection
dataset
pedestrian detection
semi-structured environments
Input: Multi-modal data input 多模态数据输入
Step1: Dataset annotation 数据集标注
Step2: Hybrid Multi-Scale Fusion Network framework development 混合多尺度融合网络框架开发
Step3: Performance evaluation 性能评估
Output: Improved pedestrian detection results 改进的行人检测结果
8.5 [8.5] 2502.15398 Enhancing Vehicle Make and Model Recognition with 3D Attention Modules
[{'name': 'Narges Semiromizadeh, Omid Nejati Manzari, Shahriar B. Shokouhi, Sattar Mirzakuchaki'}]
Autonomous Driving 自动驾驶 v2
Vehicle Make and Model Recognition
Attention Module
Deep Learning
Input: Vehicle images from various makes and models 车辆图像输入
Step1: Integrate attention module into convolutional model 将注意力模块集成到卷积模型中
Step2: Enhance focus on distinguishing vehicle features 提高对识别车辆特征的关注
Step3: Evaluate performance on Stanford Cars dataset 在斯坦福汽车数据集上评估性能
Output: Improved VMMR accuracy 提高的车辆品牌和型号识别准确度
8.5 [8.5] 2502.15601 WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents
[{'name': 'Xinhang Liu, Chi-Keung Tang, Yu-Wing Tai'}]
3D Reconstruction and Modeling 三维重建 v2
3D world creation
LLM agents
procedural generation
Input: User natural language commands 用户自然语言指令
Step1: Interaction with LLM agents 与LLM代理交互
Step2: Object customization and control 对象自定义与控制
Step3: Scene layout optimization 场景布局优化
Output: Photorealistic 3D scenes 照相真实3D场景
8.5 [8.5] 2502.15672 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
[{'name': "Florent Bartoccioni, Elias Ramzi, Victor Besnier, Shashanka Venkataramanan, Tuan-Hung Vu, Yihong Xu, Loick Chambon, Spyros Gidaris, Serkan Odabas, David Hurych, Renaud Marlet, Alexandre Boulch, Mickael Chen, \'Eloi Zablocki, Andrei Bursuc, Eduardo Valle, Matthieu Cord"}]
Autonomous Driving 自动驾驶 v2
autonomous driving
video generative models
Input: Driving video sequences 驾驶视频序列
Step1: Frame prediction 帧预测
Step2: Representation learning 表示学习
Step3: Action generation 动作生成
Output: Driving trajectories 驾驶轨迹
8.0 [8.0] 2502.15079 Can Hallucination Correction Improve Video-Language Alignment?
[{'name': "Lingjun Zhao, Mingyang Xie, Paola Cascante-Bonilla, Hal Daum\'e III, Kwonjoon Lee"}]
Vision-Language Models (VLMs) 视觉语言模型 v2
video-language alignment
hallucination correction
Input: Video and textual descriptions 视频和文本描述
Step1: Identify hallucinations 识别幻觉
Step2: Correct inconsistencies 修正不一致性
Step3: Enhance alignment 增强对齐
Output: Improved video-language alignment 改进的视频语言对齐
7.5 [7.5] 2502.14888 The Multi-Faceted Monosemanticity in Multimodal Representations
[{'name': 'Hanqi Yan, Xiangxiang Cui, Lu Yin, Paul Pu Liang, Yulan He, Yifei Wang'}]
VLM & VLA 视觉语言模型与视觉语言对齐 v2
multimodal models
interpretability
CLIP
Input: CLIP features from image-text pairs CLIP特征
Step1: Feature extraction 特征提取
Step2: Classification into vision, language, and visual-language categories 分类为视觉、语言和视觉-语言类别
Step3: Evaluation of Modality Dominance Score (MDS) MDS评估
Output: Categorized and interpretable multimodal features 分类和可解释的多模态特征
7.5 [7.5] 2502.15389 The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting
[{'name': 'Masayo Tomita, Katsuhiko Hayashi, Tomoyuki Kaneko'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
object hallucination
background context
Input: Visual-Language Models (VLMs) 视觉语言模型
Step1: Analyze object hallucination in outputs 分析输出中的物体幻觉
Step2: Examine effectiveness of background context 研究背景上下文的有效性
Step3: Evaluate visual prompting techniques 评估视觉提示技术
Output: Recommendations for reducing hallucination 输出:减少幻觉的建议
7.5 [7.5] 2502.15563 Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation
[{'name': 'Tim R\"adsch, Leon Mayer, Simon Pavicic, A. Emre Kavur, Marcel Knopp, Bar{\i}\c{s} \"Ozt\"urk, Klaus Maier-Hein, Paul F. Jaeger, Fabian Isensee, Annika Reinke, Lena Maier-Hein'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models (VLMs) 视觉语言模型
benchmark generation 基准生成
task augmentation 任务增强
Input: Existing VLM tasks 现有的VLM任务
Step1: Task augmentation for diverse tasks 任务增强以生成多样化任务
Step2: Benchmark creation for multiple domains 基准创建以适应多个领域
Step3: Performance evaluation performance performance evaluation 评估22个VLMs的表现
Output: Resource-efficient VLM benchmarks 资源高效的VLM基准

Arxiv 2025-02-21

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.14129 GlossGau: Efficient Inverse Rendering for Glossy Surface with Anisotropic Spherical Gaussian
[{'name': 'Bang Du, Runfa Blark Li, Chen Du, Truong Nguyen'}]
3D Reconstruction 三维重建 v2
3D reconstruction
inverse rendering
glossy surfaces
NeRF
Gaussian Splatting
Input: Multi-view images 多视角图像
Step1: Model surface normals and BRDF parameters 模型表面法线和BRDF参数
Step2: Use Anisotropic Spherical Gaussian to approximate reflections 使用各向异性球面高斯近似反射
Step3: Apply regularization for better normal estimation 应用正则化以提高法线估计
Output: Efficiently rendered glossy 3D surfaces 经过高效渲染的光泽3D表面
9.5 [9.5] 2502.14142 Token Adaptation via Side Graph Convolution for Temporally and Spatially Efficient Fine-tuning of 3D Point Cloud Transformers
[{'name': 'Takahiko Furuya'}]
3D Reconstruction and Modeling 三维重建 v2
3D point cloud
Transformer
fine-tuning
Input: 3D point cloud data 三维点云数据
Step1: Define graph convolutional network 定义图卷积网络
Step2: Implement Side Token Adaptation 进行侧边令牌适应
Step3: Evaluate performance on benchmarks 在基准上评估性能
Output: Efficiently fine-tuned models 高效微调模型
9.5 [9.5] 2502.14235 OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving
[{'name': 'Yedong Shen, Xinran Zhang, Yifan Duan, Shiqi Zhang, Heng Li, Yilong Wu, Jianmin Ji, Yanyong Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
autonomous driving
Input: Surround-view camera images 环视摄像头图像
Step1: Generate Occupancy Grids 使用占用网格生成
Step2: Separate dynamic and static objects 分离动态与静态物体
Step3: Convert Occupancy Grids to point clouds 将占用网格转换为点云
Step4: Estimate poses and trajectories 估计姿态和轨迹
Output: 3D reconstructed scene 3D重建场景
9.5 [9.5] 2502.14520 Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance
[{'name': 'Meng Wang, Fan Wu, Ruihui Li, Yunchuan Qin, Zhuo Tang, Kenli Li'}]
3D Scene Completion 三维场景补全 v2
3D Semantic Scene Completion
optical flow
autonomous driving
temporal modeling
Input: Temporal RGB images 时间RGB图像
Step1: Optical flow estimation 光流估计
Step2: Flow-guided temporal aggregation 模块流引导的时间聚合
Step3: Occlusion-guided voxel refinement 模块遮挡引导的体素细化
Output: 3D semantic scene completion 3D语义场景补全
9.5 [9.5] 2502.14789 Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing
[{'name': 'Yoel Levy, David Shavin, Itai Lang, Sagie Benaim'}]
3D Understanding and Editing 3D理解与编辑 v2
3D Understanding 3D理解
3D Editing 3D编辑
Feature Distillation 特征提炼
Input: 2D feature maps obtained from large pre-trained models 基于大型预训练模型的2D特征图
Step1: Distillation of 2D features to 3D structurally disentangled feature fields 2D特征向3D结构解缠特征场的提炼
Step2: Control of individual structural components for semantic understanding 语义理解的个体结构成分控制
Step3: Application of segmentation and editing capabilities 应用分割与编辑功能
Output: Enhanced 3D understanding and editing capabilities 改进的3D理解与编辑能力
8.5 [8.5] 2502.14061 EfficientPose 6D: Scalable and Efficient 6D Object Pose Estimation
[{'name': 'Zixuan Fang, Thomas P\"ollabauer, Tristan Wirth, Sarah Berkei, Volker Knauthe, Arjan Kuijper'}]
Pose Estimation 姿态估计 v2
6D pose estimation
autonomous navigation
real-time feedback
robotics
Input: Monocular RGB-D images 单目RGB-D图像
Step1: Architecture adaptation 架构适配
Step2: AMIS algorithm implementation AMIS算法实现
Step3: Model testing across datasets 在数据集上进行模型测试
Output: Optimized 6D pose estimation optimized 6D目标姿态估计
8.5 [8.5] 2502.14068 A Racing Dataset and Baseline Model for Track Detection in Autonomous Racing
[{'name': 'Shreya Ghosh, Yi-Huan Chen, Ching-Hsiang Huang, Abu Shafin Mohammad Mahdee Jameel, Chien Chou Ho, Aly El Gamal, Samuel Labi'}]
Autonomous Driving 自动驾驶 v2
3D reconstruction
autonomous driving
Input: Multi-camera image data 多摄像机图像数据
Step1: Data collection and annotation 数据收集与注释
Step2: Algorithm development using GAN 算法开发,使用生成对抗网络(GAN)
Step3: Model evaluation and benchmarking 模型评估与基准测试
Output: Track detection results 轨道检测结果
8.5 [8.5] 2502.14099 Point Cloud Geometry Scalable Coding Using a Resolution and Quality-conditioned Latents Probability Estimator
[{'name': "Daniele Mari, Andr\'e F. R. Guarda, Nuno M. M. Rodrigues, Simone Milani, Fernando Pereira"}]
Point Cloud Processing 点云处理 v2
Point Cloud Coding
scalable coding
deep learning
Input: Point Cloud geometry points 点云几何点
Step1: Development of Scalable Resolution and Quality Hyperprior (SRQH)方案开发
Step2: Integration into JPEG PCC 将SRQH集成到JPEG PCC中
Step3: Experimental validation 实验验证
Output: Scalable coding for point clouds 提供点云的可扩展编码
8.5 [8.5] 2502.14113 Object-centric Binding in Contrastive Language-Image Pretraining
[{'name': 'Rim Assouel, Pietro Astolfi, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
object-centric
compositional understanding
Input: CLIP-like models CLIP类模型
Step1: Integrating scene graphs with image representations 将场景图与图像表示结合
Step2: Developing a binding module 设计绑定模块
Step3: Enhancing spatial relationship understanding 加强空间关系理解
Output: Improved compositional understanding 提升组合理解
8.5 [8.5] 2502.14156 Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration
[{'name': 'Katie Z Luo, Minh-Quan Dao, Zhenzhen Liu, Mark Campbell, Wei-Lun Chao, Kilian Q. Weinberger, Ezio Malis, Vincent Fremont, Bharath Hariharan, Mao Shan, Stewart Worrall, Julie Stephany Berrio Perez'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
V2X
point cloud
LiDAR sensors
Input: LiDAR sensor data from vehicles 和车辆的激光雷达传感器数据
Step1: Data collection 数据收集
Step2: Data alignment 数据对齐
Step3: Statistical analysis 统计分析
Output: Comprehensive V2X dataset 综合V2X数据集
8.5 [8.5] 2502.14190 Stereo Image Coding for Machines with Joint Visual Feature Compression
[{'name': 'Dengchao Jin, Jianjun Lei, Bo Peng, Zhaoqing Pan, Nam Ling, Qingming Huang'}]
Multi-view Stereo 多视角立体 v2
stereo image compression
3D visual tasks
Input: Stereo images 立体图像
Step1: Feature extraction 特征提取
Step2: Feature compression 特征压缩
Step3: Data transmission 数据传输
Output: Efficiently compressed stereo visual features 高效压缩的立体视觉特征
8.5 [8.5] 2502.14191 Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models
[{'name': 'Michihiro Yasunaga, Luke Zettlemoyer, Marjan Ghazvininejad'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
reward models
vision-language models
benchmark
Input: Vision-language models (VLMs) 视觉语言模型
Step1: Benchmark creation 基准创建
Step2: Expert annotation 专家标注
Step3: Model evaluation 模型评估
Output: Reward model evaluation reward model评估
8.5 [8.5] 2502.14195 Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
[{'name': 'Tianyi Shang, Zhenyu Li, Pengjie Xu, Jinwei Qiao, Gang Chen, Zihan Ruan, Weijun Hu'}]
Visual Place Recognition 视觉地点识别 v2
text-vision registration
place recognition
cross-modal localization
Input: Multi-view images 多视角图像
Step1: Text embedding extraction 文本嵌入提取
Step2: Clustering of visual descriptors 视觉描述符聚类
Step3: Cross-modal alignment 交模态对齐
Output: Place recognition based on text-image pairs 基于文本-图像对的地点识别
8.5 [8.5] 2502.14279 OrchardDepth: Precise Metric Depth Estimation of Orchard Scene from Monocular Camera Images
[{'name': 'Zhichao Zheng, Henry Williams, Bruce A MacDonald'}]
Depth Estimation 深度估计 v2
depth estimation
monocular camera
autonomous driving
Input: Monocular camera images 单目相机图像
Step1: Data collection 数据收集
Step2: Depth estimation model training 深度估计模型训练
Step3: Consistency monitoring 一致性监测
Output: Enhanced depth maps 改进的深度图
8.5 [8.5] 2502.14316 Textured 3D Regenerative Morphing with 3D Diffusion Prior
[{'name': 'Songlin Yang, Yushi Lan, Honghua Chen, Xingang Pan'}]
3D Reconstruction and Modeling 三维重建 v2
3D morphing
3D diffusion models
textured 3D representations
Input: Textured 3D objects 纹理3D对象
Step1: Source-target information integration 源-目标信息集成
Step2: 3D diffusion model application 3D扩散模型应用
Step3: Attention Fusion strategy implementation 注意力融合策略实施
Output: Morphing sequence output 变形序列输出
8.5 [8.5] 2502.14412 Evaluating Precise Geolocation Inference Capabilities of Vision Language Models
[{'name': 'Neel Jay, Hieu Minh Nguyen, Trung Dung Hoang, Jacob Haimes'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
geolocation
privacy
dataset
Input: Images from Google Street View 来自 Google 街景图像
Step1: Dataset collection 数据集收集
Step2: Model evaluation 模型评估
Step3: Geolocation inference 位置推断
Output: Geolocation accuracy results 地理位置精度结果
8.5 [8.5] 2502.14454 Exploiting Deblurring Networks for Radiance Fields
[{'name': 'Haeyun Choi, Heemin Yang, Janghyeok Han, Sunghyun Cho'}]
Neural Rendering 神经渲染 v2
radiance fields
deblurring
3D Gaussian
novel view synthesis
Input: Blurred multi-view images 退化的多视角图像
Step1: RF-guided deblurring RF导向去模糊
Step2: Radiance field construction 辐射场构建
Step3: Iterative enhancement 迭代增强
Output: High-quality novel views 高质量新视图
8.5 [8.5] 2502.14503 LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera
[{'name': 'Weiyi Xiong, Zean Zou, Qiuchi Zhao, Fengchun He, Bing Zhu'}]
3D Object Detection 3D物体检测 v2
3D object detection 3D物体检测
4D radar 4D雷达
camera camera
Input: 4D radar and camera data 4D 雷达和相机数据
Step1: Depth supervision strategy via radar points 通过雷达点的深度监督策略
Step2: Attention-based multi-modal fusion module attention-based 多模态融合模块
Step3: Model evaluation on standard datasets 在标准数据集上评估模型
Output: Enhanced detection accuracy 改进的检测精度
8.5 [8.5] 2502.14573 Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining
[{'name': 'Wonhyeok Choi, Kyumin Hwang, Wei Peng, Minwoo Choi, Sunghoon Im'}]
Depth Estimation 深度估计 v2
monocular depth estimation
triplet mining
reflective surfaces
autonomous driving
Input: Monocular images 单目图像
Step1: Triplet mining to identify reflective regions 三元矿挖掘以识别反射区域
Step2: Apply reflection-aware triplet mining loss 应用反射感知的三元损失
Step3: Knowledge distillation for depth estimation 知识蒸馏以进行深度估计
Output: Enhanced depth map 改进的深度图
8.5 [8.5] 2502.14616 Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion
[{'name': 'Jiangyuan Liu, Hongxuan Ma, Yuxin Guo, Yuhao Zhao, Chi Zhang, Wei Sui, Wei Zou'}]
Depth Estimation 深度估计 v2
monocular depth estimation
segmentation
transparent objects
Input: Single RGB image 单幅RGB图像
Step1: Feature extraction 特征提取
Step2: Semantic and geometric fusion 语义和几何融合
Step3: Iterative feature refinement 迭代特征优化
Output: Segmentation mask and depth map 分割掩膜和深度图
8.5 [8.5] 2502.14676 BP-SGCN: Behavioral Pseudo-Label Informed Sparse Graph Convolution Network for Pedestrian and Heterogeneous Trajectory Prediction
[{'name': 'Ruochen Li, Stamos Katsigiannis, Tae-Kyun Kim, Hubert P. H. Shum'}]
Autonomous Systems and Robotics 自动驾驶 v2
trajectory prediction
behavioral pseudo-labels
autonomous vehicles
Input: Observed agent trajectories 所观测的代理轨迹
Step1: Unsupervised behavior clustering module 无监督行为聚类模块
Step2: Goal-guided trajectory prediction module 目标引导轨迹预测模块
Step3: Cascaded training scheme cascade training scheme
Output: Enhanced trajectory predictions 改进的轨迹预测
8.5 [8.5] 2502.14721 Multi-dataset synergistic in supervised learning to pre-label structural components in point clouds from shell construction scenes
[{'name': 'Lukas Rauch, Thomas Braml'}]
Point Cloud Processing 点云处理 v2
Point Cloud
Semantic Segmentation
Transformer Models
Construction Industry
Input: Point cloud data from shell construction sites 壳体建筑现场的点云数据
Step1: Supervised training using custom validation dataset 使用自定义验证数据集进行监督训练
Step2: Cross-domain inference with existing datasets 使用现有数据集进行跨域推理
Step3: Transfer learning to enhance performance 迁移学习以提高性能
Output: Improved semantic segmentation for construction components 改进的建筑组件语义分割
8.5 [8.5] 2502.14792 RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation
[{'name': 'Henrique Pi\~neiro Monteagudo, Leonardo Taccari, Aurel Pjetri, Francesco Sambo, Samuele Salti'}]
Image and Video Generation 图像生成与视频生成 v2
Bird's Eye View segmentation 鸟瞰视图分割
self-supervised training 自监督训练
Input: Video sequences 视频序列
Step1: Monocular semantic segmentation 单目语义分割
Step2: Rendering of perspective views 视角图像渲染
Step3: Self-supervised training 自监督训练
Output: BEV segmentation results BEV分割结果
8.5 [8.5] 2502.14801 AVD2: Accident Video Diffusion for Accident Video Description
[{'name': 'Cheng Li, Keyuan Zhou, Tong Liu, Yu Wang, Mingqiao Zhuang, Huan-ang Gao, Bu Jin, Hao Zhao'}]
Autonomous Driving 自动驾驶 v2
Accident Video Diffusion
Autonomous Driving
Video Understanding
Input: Accident videos 事故视频
Step1: Video generation 视频生成
Step2: Detailed description alignment 详细描述对齐
Step3: Actionable prevention strategies 制定可行动的预防策略
Output: Enhanced understanding of accident scenarios 提升对事故场景的理解
7.5 [7.5] 2502.14221 H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging
[{'name': 'Zhen Huang, Ronghao Xu, Xiaoqian Zhou, Yangbo Wei, Suhua Wang, Xiaoxin Sun, Han Li, Qingsong Yao'}]
3D Reconstruction and Modeling 三维重建 v2
3D landmark detection
medical image analysis
deep learning
Input: 3D volumetric data 3D体积数据
Step1: Local feature extraction 局部特征提取
Step2: Global dependency modeling 全局依赖建模
Step3: Multi-scale feature fusion 多尺度特征融合
Output: Accurate 3D landmark detection 精确的3D特征检测
7.5 [7.5] 2502.14493 CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond
[{'name': 'Yukai Shi, Cidan Shi, Zhipeng Weng, Yin Tian, Xiaoyu Xian, Liang Lin'}]
Image Fusion 图像融合 v2
Infrared-visible fusion 红外可见图像融合
Autonomous driving 自动驾驶
Input: Infrared and visible images 红外和可见图像
Step1: External data augmentation by Top-k Selective Vision Alignment 使用 Top-k 选择性视觉对齐的外部数据增强
Step2: Internal data augmentation with self-supervised learning 使用自监督学习的内部数据增强
Step3: Fusion process 融合过程
Output: Enhanced fused images 改进的融合图像
6.0 [6.0] 2502.14070 DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models
[{'name': 'Daewon Chae, June Suk Choi, Jinkyu Kim, Kimin Lee'}]
Image Generation 图像生成 v2
text-to-image generation
reward fine-tuning
diffusion models
Input: Text prompts 文本提示
Step1: Dynamic scaling of classifier-free guidance 动态调整无分类器引导的规模
Step2: Randomly weight prompt phrases 随机加权提示短语
Step3: Sample generation and evaluation 样本生成与评估
Output: Improved sampling efficiency 改进的采样效率

Arxiv 2025-02-20

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.13335 Geometry-Aware Diffusion Models for Multiview Scene Inpainting
[{'name': 'Ahmad Salimi, Tristan Aumentado-Armstrong, Marcus A. Brubaker, Konstantinos G. Derpanis'}]
3D Scene Inpainting 3D场景修复 v2
3D inpainting
multi-view consistency
geometry-aware models
Input: Multi-view images 多视角图像
Step1: Image masking 图像遮蔽
Step2: Geometry-aware fusion 几何感知融合
Step3: Generative inpainting 生成式修复
Output: Multi-view consistent images 多视角一致图像
9.5 [9.5] 2502.13803 3D Gaussian Splatting aided Localization for Large and Complex Indoor-Environments
[{'name': 'Vincent Ress, Jonas Meyer, Wei Zhang, David Skuddis, Uwe Soergel, Norbert Haala'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
visual localization
SLAM
indoor environments
Input: Multi-view images 多视角图像
Step1: Use visual SLAM to generate a 3D Gaussian Splatting (3DGS) based map 使用视觉SLAM生成基于3D高斯的地图
Step2: Render images from the 3DGS map to create reference data 从3DGS地图中渲染图像以创建参考数据
Step3: Evaluate the performance impact of additional rendered views 评估附加渲染视图对性能的影响
Output: Improved localization accuracy 改进的定位精度
9.5 [9.5] 2502.13968 Betsu-Betsu: Multi-View Separable 3D Reconstruction of Two Interacting Objects
[{'name': 'Suhas Gopal, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
neuro-implicit methods
multi-view
human-object interactions
Input: Multi-view RGB images 多视角RGB图像
Step1: Data integration 数据集成
Step2: Algorithm development 算法开发
Step3: Alpha-blending regularization implementation α混合正则化实施
Step4: Joint optimization of Signed Distance Fields (SDFs) 联合优化有符号距离场(SDF)
Output: Separable 3D geometries 可分离的3D几何
8.5 [8.5] 2502.13524 MobileViM: A Light-weight and Dimension-independent Vision Mamba for 3D Medical Image Analysis
[{'name': 'Wei Dai, Steven Wang, Jun Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D medical imaging
segmentation
deep learning
Input: 3D medical images 三维医学图像
Step1: Data transformation 数据转换
Step2: Model enhancement 模型增强
Step3: Evaluation on datasets 数据集评估
Output: Efficient segmentation results 高效分割结果
8.5 [8.5] 2502.13883 Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition
[{'name': 'Idris Hamoud, Vinkle Srivastav, Muhammad Abdullah Jamal, Didier Mutter, Omid Mohareri, Nicolas Padoy'}]
Multi-view and Stereo Vision 多视角与立体视觉 v2
Surgical Activity Recognition
Multi-view
Pose Estimation
Computer Vision
Input: Multi-view camera recordings 多视角摄像头录制
Step1: Align 2D pose and vision embeddings 2D姿势和视觉嵌入对齐
Step2: Dual-encoder architecture implementation 双编码器架构实现
Step3: Pretraining with geometric constraints 几何约束预训练
Output: Enhanced surgical activity recognition model 改进的手术活动识别模型

Arxiv 2025-02-19

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.12456 Not-So-Optimal Transport Flows for 3D Point Cloud Generation
[{'name': 'Ka-Hei Hui, Chao Liu, Xiaohui Zeng, Chi-Wing Fu, Arash Vahdat'}]
3D Generation 三维生成 v2
3D point cloud generation 3D 点云生成
Optimal transport 最优传输
Shape completion 形状补全
Input: 3D point clouds 3D 点云
Step1: Analyze existing models 分析现有模型
Step2: Propose not-so-optimal transport flow models 提出不那么最优的传输流模型
Step3: Empirical study 实证研究
Output: Enhanced generation techniques 改进的生成技术
9.5 [9.5] 2502.12534 NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization
[{'name': 'Zhen Li, Weiwei Sun, Shrisudhan Govindarajan, Shaobo Xia, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi'}]
3D Reconstruction 三维重建 v2
3D reconstruction
point cloud
signed distance field
autonomous driving
Input: Irregular point cloud 不规则点云
Step1: Convert to signed distance field (SDF) 转换为有符号距离场
Step2: Serialize point cloud into tokens 将点云序列化为标记
Step3: Predict SDF by aggregating features 通过聚合特征预测SDF值
Output: Reconstructed surface 重建表面
9.5 [9.5] 2502.12545 IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras
[{'name': 'Dongki Jung, Jaehoon Choi, Yonghan Lee, Dinesh Manocha'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction 三维重建
Omnidirectional cameras 全向摄像头
Texture optimization 纹理优化
Input: Omnidirectional images 全向图像
Step1: Feature detection 特征检测
Step2: Sparse matching with spherical model 使用球形模型进行稀疏匹配
Step3: Neural implicit surface reconstruction 神经隐式表面重建
Step4: Texture mapping and optimization 纹理映射和优化
Output: Textured meshes with improved rendering quality 改进的三维纹理网格
9.5 [9.5] 2502.12673 ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition
[{'name': "Quoc-Anh Bui, Gilles Rougeron, G\'eraldine Morin, Simone Gasparini"}]
3D Reconstruction 三维重建 v2
3D reconstruction 3D重建
Neural Radiance Fields 神经辐射场
visualization 可视化
level of detail 细节级别
Input: Multi-view images 多视角图像
Step1: Decompose the scene into Scene NeRF and ROI NeRFs 将场景分解为场景NeRF和感兴趣区域NeRF
Step2: Camera selection module chooses relevant cameras 相机选择模块选择相关相机
Step3: Ray-level compositional rendering combines NeRFs 使用光线级组合渲染结合NeRF
Output: High-fidelity rendered images outputs 高保真渲染图像
9.5 [9.5] 2502.12894 CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
[{'name': 'Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Lan Xu, Wei Yang, Jiayuan Gu, Jingyi Yu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
robotics
scene recovery
Input: Single RGB image 单张RGB图像
Step1: Extract object-level 2D segmentation 提取物体级2D分割
Step2: Analyze inter-object spatial relationships 分析物体间空间关系
Step3: Generate object geometries 生成物体几何
Step4: Align and integrate meshes with point cloud 对齐并集成网格与点云
Step5: Optimize object poses using physics-aware methods 利用物理感知方法优化物体姿态
Output: High-quality 3D scene reconstruction 高质量3D场景重建
9.5 [9.5] 2502.12985 PartSDF: Part-Based Implicit Neural Representation for Composite 3D Shape Parametrization and Optimization
[{'name': 'Nicolas Talabot, Olivier Clerc, Arda Cinar Demirtas, Doruk Oner, Pascal Fua'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D shape representation
implicit neural representation
part-based modeling
Input: Composite 3D shapes 复合三维形状
Step 1: Supervised part-aware representation 监督的部件感知表示
Step 2: Modeling independent parts 模型独立部件
Step 3: Shape optimization 形状优化
Output: Controllable 3D models 可控的三维模型
9.5 [9.5] 2502.13071 RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
[{'name': 'Jingtong Yue, Zhiwei Lin, Xin Lin, Xiaoyu Zhou, Xiangtai Li, Lu Qi, Yongtao Wang, Ming-Hsuan Yang'}]
3D Object Detection 3D目标检测 v2
3D object detection 3D目标检测
radar-camera fusion 雷达-相机融合
autonomous driving 自动驾驶
Input: Multi-modal data from radar and camera 传感器与相机的多模态数据
Step1: Systematic analysis of noise patterns 噪音模式的系统分析
Step2: Development of 3D Gaussian Expansion (3DGE) module 开发3D高斯扩展模块
Step3: Implementation of weather-adaptive fusion module 实现天气自适应融合模块
Output: Robust 3D object detection results 稳健的3D目标检测结果
9.5 [9.5] 2502.13144 RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
[{'name': 'Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, Haoran Yin, Xiangyu Li, Xinbang Zhang, Ying Zhang, Wenyu Liu, Qian Zhang, Xinggang Wang'}]
Autonomous Driving 自动驾驶 v2
autonomous driving
3DGS
reinforcement learning
Input: Photorealistic digital replica of the real world 逼真的数字复制环境
Step1: Establish closed-loop reinforcement learning paradigm 建立闭环强化学习范式
Step2: Incorporate imitation learning for alignment 融入模仿学习以进行对齐
Step3: Design specialized reward functions 设计专门的奖励函数
Output: Optimized end-to-end driving policy 优化的端到端驾驶策略
9.0 [9.0] 2502.12231 PUGS: Zero-shot Physical Understanding with Gaussian Splatting
[{'name': 'Yinghao Shuai, Ran Yu, Yuantao Chen, Zijian Jiang, Xiaowei Song, Nan Wang, Jv Zheng, Jianzhu Ma, Meng Yang, Zhicheng Wang, Wenbo Ding, Hao Zhao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Splatting
physical properties
robotics
Input: Multi-view images 多视角图像
Step1: Shape-aware 3D Gaussian Splatting reconstruction 形状感知的3D高斯点云重建
Step2: Geometry-aware regularization loss geometry-aware regularization loss functions
Step3: Region-aware feature contrastive loss region-aware feature contrastive loss functions
Step4: Physical property prediction with VLMs 使用视觉语言模型进行物理属性预测
Output: 3D models with physical properties and enhanced quality 具有物理属性和增强质量的3D模型
9.0 [9.0] 2502.12546 Spatiotemporal Multi-Camera Calibration using Freely Moving People
[{'name': 'Sang-Eun Lee, Ko Nishino, Shohei Nobuhara'}]
3D Reconstruction and Modeling 三维重建 v2
multi-camera calibration
3D reconstruction
freely moving people
Input: Multi-view videos with freely moving people 多视角视频与自由移动的人
Step1: 3D pose estimation from videos 从视频中进行3D姿态估计
Step2: Solve rotation and translation with 3D points 求解与三维点的旋转和平移
Step3: Optimize camera poses and temporal offsets 优化相机姿态和时间偏移
Output: Accurate camera calibration and scene reconstruction 输出:准确的相机标定和场景重建
9.0 [9.0] 2502.12752 High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion
[{'name': 'Xiang Zhang, Yang Zhang, Lukas Mehl, Markus Gross, Christopher Schroers'}]
Novel View Synthesis 新视图合成 v2
Novel View Synthesis
Splatting
Diffusion Model
Input: Single image 单张图像
Step1: Splatting for pixel alignment 像素对齐的点云处理
Step2: Diffusion model training 扩散模型训练
Step3: Texture generation texture generation通过自适应特征融合
Output: High-fidelity novel views 高保真新视图
8.5 [8.5] 2502.12303 From Gaming to Research: GTA V for Synthetic Data Generation for Robotics and Navigations
[{'name': 'Matteo Scucchia, Matteo Ferrara, Davide Maltoni'}]
Autonomous Systems and Robotics 自主系统与机器人 v2
synthetic data
GTA V
SLAM
Visual Place Recognition
robotics
Input: Synthetic environment data from GTA V 以GTA V的合成环境数据为输入
Step1: Data generation 数据生成
Step2: Algorithm for VPR dataset creation VPR数据集创建算法
Step3: Experimentation for SLAM and VPR applications 针对SLAM和VPR应用的实验
Output: Usable synthetic datasets for robotics 提供可用的机器人合成数据集
8.5 [8.5] 2502.12360 Detecting Systematic Weaknesses in Vision Models along Predefined Human-Understandable Dimensions
[{'name': 'Sujan Sai Gannamaneni, Rohil Prakash Rao, Michael Mock, Maram Akila, Stefan Wrobel'}]
Vision Models and Safety Analysis 视觉模型与安全分析 v2
systematic weaknesses
autonomous driving
computer vision
Input: Image dataset 图像数据集
Step1: Metadata generation 元数据生成
Step2: Slice discovery 模块切片发现
Step3: Systematic weakness identification 系统弱点识别
Output: Identified weaknesses identified weaknesses
8.5 [8.5] 2502.12640 RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation
[{'name': 'Chenxi Zheng, Yihong Lin, Bangzhen Liu, Xuemiao Xu, Yongwei Nie, Shengfeng He'}]
3D Generation 三维生成 v2
3D generation
text-to-3D generation
score distillation
Input: Text-based descriptions 基于文本的描述
Step1: Data distribution rectification 数据分布整治
Step2: Pose consistency enhancement 姿态一致性增强
Step3: Integration with score distillation algorithms 与得分蒸馏算法集成
Output: Consistent 3D asset generation 一致的3D资产生成
8.5 [8.5] 2502.12742 3D Shape-to-Image Brownian Bridge Diffusion for Brain MRI Synthesis from Cortical Surfaces
[{'name': 'Fabian Bongratz, Yitong Li, Sama Elbaroudy, Christian Wachinger'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
brain MRI
diffusion model
Input: Continuous cortical shape priors 连续皮层形状先验
Step1: Leverage Brownian bridge process 利用布朗桥过程
Step2: Map shape contours to synthetic MRIs 将形状轮廓映射到合成MRI
Step3: Improve geometric accuracy 改进几何精度
Output: Anatomically plausible brain MRIs 解剖学上合理的脑MRI
8.5 [8.5] 2502.12819 Carotid Artery Plaque Analysis in 3D Based on Distance Encoding in Mesh Representations
[{'name': 'Hinrich Rahlfs, Markus H\"ullebrand, Sebastian Schmitter, Christoph Strecker, Andreas Harloff, Anja Hennemuth'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
plaque analysis
carotid artery
Input: MRI scans of carotid arteries 磁共振扫描的颈动脉
Step1: 3D vessel wall segmentation 3D血管壁分割
Step2: Distance encoding to extract plaque mesh 使用距离编码提取斑块网格
Step3: Quantification and visualization of plaque parameters 斑块参数的量化和可视化
Output: Detailed 3D plaque models 详细的3D斑块模型
8.5 [8.5] 2502.12860 An Experimental Study of SOTA LiDAR Segmentation Models
[{'name': 'Bike Chen, Antti Tikanm\"aki, Juha R\"oning'}]
Point Cloud Processing 点云处理 v2
Point Cloud Segmentation
LiDAR
autonomous driving
Input: LiDAR data LiDAR数据
Step 1: Data acquisition 数据采集
Step 2: Model training and evaluation 模型训练与评估
Step 3: Performance comparison 性能比较
Output: Selection of optimal PCS models 最优PCS模型选择
8.5 [8.5] 2502.12994 SHADeS: Self-supervised Monocular Depth Estimation Through Non-Lambertian Image Decomposition
[{'name': 'Rema Daher, Francisco Vasconcelos, Danail Stoyanov'}]
Depth Estimation 深度估计 v2
monocular depth estimation
specular reflection
self-supervised learning
Input: Single images 单幅图像
Step1: Image decomposition 图像分解
Step2: Depth and light component estimation 深度和光成分估计
Step3: Model validation against real data 模型验证与真实数据
Output: Depth maps and light components 深度图和光成分
8.5 [8.5] 2502.13037 Enhancing Power Grid Inspections with Machine Learning
[{'name': 'Diogo Lavado, Ricardo Santos, Andre Coelho, Joao Santos, Alessandra Micheletti, Claudia Soares'}]
3D Reconstruction and Modeling 三维重建 v2
3D computer vision
3D semantic segmentation
power grid inspections
Input: 3D LiDAR point clouds 3D LiDAR 点云
Step1: Data preprocessing 数据预处理
Step2: 3D semantic segmentation 3D 语义分割
Step3: Performance evaluation 性能评估
Output: Enhanced detection results 改进的检测结果
8.5 [8.5] 2502.13130 Magma: A Foundation Model for Multimodal AI Agents
[{'name': 'Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Lars Liden, Jianfeng Gao'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
multimodal AI
robotic manipulation
vision-language models
Input: Heterogeneous multimodal data 其他模态数据
Step1: Data labeling for action grounding and planning 动作基础和规划数据标记
Step2: Model training with SoM and ToM techniques 使用SoM和ToM技术进行模型训练
Step3: Evaluation on various tasks 在各种任务上进行评估
Output: A multimodal AI agent capable of understanding and acting on inputs 输出:能够理解和根据输入执行操作的多模态AI代理
7.5 [7.5] 2502.12801 Learning Wall Segmentation in 3D Vessel Trees using Sparse Annotations
[{'name': 'Hinrich Rahlfs, Markus H\"ullebrand, Sebastian Schmitter, Christoph Strecker, Andreas Harloff, Anja Hennemuth'}]
3D Segmentation 3D分割 v2
3D segmentation 3D分割
clinical annotations 临床标注
carotid artery 颈动脉
Input: Sparse annotations from clinical studies 临床研究中的稀疏标注
Step1: Sample perpendicular cross-sections of the carotid artery 采样颈动脉的垂直横截面
Step2: Segment using an adversarial 2D network 使用对抗性2D网络进行分割
Step3: Transform annotations into 3D pseudo-labels 将标注转换为3D伪标签
Output: Train a 3D convolutional neural network 训练3D卷积神经网络
7.5 [7.5] 2502.13146 Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
[{'name': 'Shuo Xing, Yuping Wang, Peiran Li, Ruizheng Bai, Yueqi Wang, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tu'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision Language Models
cross-modal applications
direct preference optimization
visual question answering
Input: Vision Language Models data 视觉语言模型数据
Step1: Construct dual-preference dataset 构建双重偏好数据集
Step2: Fine-tune with rDPO using visual preference signals 使用视觉偏好信号进行rDPO微调
Output: Improved VLM alignment 改进的VLM对齐

Arxiv 2025-02-18

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.10674 Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
[{'name': 'Khanh Nguyen, Ghulam Mubashar Hassan, Ajmal Mian'}]
3D Object Recognition 3D物体识别 v2
3D object recognition 3D物体识别
point clouds 点云
occlusion-aware遮挡感知
Input: Synthetic 3D models from ShapeNetCore 3D模型
Step1: Generate partial point clouds from 3D models 从3D模型生成部分点云
Step2: Implement occlusion-aware pretraining 进行遮挡感知预训练
Step3: Evaluate recognition performance 评估识别性能
Output: Improved recognition accuracy 提高识别准确性
9.5 [9.5] 2502.10704 Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy
[{'name': 'Mingyang Zhao, Gaofeng Meng, Dong-Ming Yan'}]
Point Cloud Processing 点云处理 v2
non-rigid registration
point cloud alignment
occlusion handling
Input: Point cloud data 点云数据
Step1: Identify occluded regions 确定遮挡区域
Step2: Apply maximum correntropy criterion 采用最大相关熵准则
Step3: Optimize deformation field 优化变形场
Output: Accurately aligned point clouds 准确对齐的点云
9.5 [9.5] 2502.10827 E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting
[{'name': 'Sohaib Zahid, Viktor Rudnev, Eddy Ilg, Vladislav Golyanik'}]
3D Reconstruction and Modeling 三维重建 v2
novel view synthesis
event cameras
3D rendering
Gaussian splatting
Input: Event camera data 事件相机数据
Step1: Data processing 数据处理
Step2: 3D Gaussian representation construction 3D高斯表示构建
Step3: Novel view synthesis 利用生成新的视角
Output: High-quality rendered scenes 高质量渲染场景
9.5 [9.5] 2502.10842 Mobile Robotic Multi-View Photometric Stereo
[{'name': 'Suryansh Kumar'}]
3D Reconstruction and Modeling 三维重建 v2
Multi-View Photometric Stereo
3D acquisition
Mobile Robotics
Input: Multi-view images 多视角图像
Step1: Supervised learning setup for predicting surface normals, object depth, and uncertainty 监督学习设置以预测表面法线、物体深度和不确定性
Step2: Solve MVPS-driven optimization problem to refine depth maps 解决基于MVPS的优化问题以细化深度图
Step3: Fuse refined depth maps while tracking camera pose 融合精细化深度图并跟踪相机位姿
Output: Globally consistent 3D geometry 具有全局一致性的3D几何体
9.5 [9.5] 2502.10982 TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
[{'name': 'Yunfei Liu, Lei Zhu, Lijian Lin, Ye Zhu, Ailing Zhang, Yu Li'}]
3D Reconstruction 三维重建 v2
3D facial reconstruction
expression capture
neural renderer
Input: A single in-the-wild image 一张单一的野外图像
Step1: Extract hybrid facial parameters 提取混合面部参数
Step2: Design multi-scale tokenizer 设计多尺度标记器
Step3: Implement token-guided neural renderer 实现标记引导的神经渲染器
Step4: Train with token cycle loss 采用标记周期损失进行训练
Output: High-fidelity facial expressions output 高保真的面部表情输出
9.5 [9.5] 2502.10988 OMG: Opacity Matters in Material Modeling with Gaussian Splatting
[{'name': 'Silong Yong, Venkata Nagarjun Pudureddiyur Manivannan, Bernhard Kerbl, Zifu Wan, Simon Stepputtis, Katia Sycara, Yaqi Xie'}]
Neural Rendering 神经渲染 v2
neural rendering
3D Gaussian Splatting
material modeling
opacity
Input: Images 图像
Step1: Inverse rendering process 逆向渲染过程
Step2: Opacity modeling 透明度建模
Step3: Algorithm integration 集成算法
Output: Improved material properties 改进的材料属性
9.5 [9.5] 2502.11390 MARS: Mesh AutoRegressive Model for 3D Shape Detailization
[{'name': 'Jingnan Gao, Weizhe Liu, Weixuan Sun, Senbo Wang, Xibin Song, Taizhang Shang, Shenzhou Chen, Hongdong Li, Xiaokang Yang, Yichao Yan, Pan Ji'}]
3D Reconstruction and Modeling 三维重建 v2
3D shape detailization
Generative Adversarial Networks (GANs)
geometry-consistency
MARS
autoregressive model
Input: Coarse mesh shapes 低质量网格形状
Step1: Tokenization of meshes 网格的标记化
Step2: Geometry-consistency supervision geometry-consistency 监督
Step3: Autoregressive detailization 自回归细节化
Output: Detailed meshes 细化的网格
9.5 [9.5] 2502.11618 Real-time Neural Rendering of LiDAR Point Clouds
[{'name': 'Joni Vanherck, Brent Zoomers, Tom Mertens, Lode Jorissen, Nick Michiels'}]
Neural Rendering 神经渲染 v2
Neural Rendering
LiDAR Point Clouds
Real-time Rendering
Input: LiDAR point clouds LiDAR点云
Step1: Point cloud projection 点云投影
Step2: Depth-based filtering based on heuristics 基于启发式的深度过滤
Step3: Final image reconstruction using U-Net 使用U-Net进行最终图像重建
Output: Photorealistic images of LiDAR scans LiDAR扫描的照片真实图像
9.5 [9.5] 2502.11777 Deep Neural Networks for Accurate Depth Estimation with Latent Space Features
[{'name': 'Siddiqui Muhammad Yasir, Hyunsik Ahn'}]
Depth Estimation 深度估计 v2
depth estimation
3D scene reconstruction
Input: RGB image to depth image mapping
Step1: Feature extraction using latent space
Step2: Dual encoder-decoder architecture
Step3: Introduce a novel loss function
Output: Enhanced depth maps with improved boundaries
9.5 [9.5] 2502.11801 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency
[{'name': 'Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang'}]
3D Inpainting 3D修复 v2
3D Gaussian Inpainting
Neural Radiance Field
multi-view consistency
3D reconstruction
computer vision
Input: Multi-view images 多视角图像
Step1: Infer Depth-Guided Inpainting Masks 深度引导的修复掩码推断
Step2: Update inpainting mask based on background pixels 更新修复掩码基于背景像素
Step3: Perform 3D inpainting with cross-view consistency 在视图间一致性下进行3D修复
Output: High-fidelity 3D inpainting results 高保真3D修复结果
9.5 [9.5] 2502.12135 MagicArticulate: Make Your 3D Models Articulation-Ready
[{'name': 'Chaoyue Song, Jianfeng Zhang, Xiu Li, Fan Yang, Yiwen Chen, Zhongcong Xu, Jun Hao Liew, Xiaoyang Guo, Fayao Liu, Jiashi Feng, Guosheng Lin'}]
3D Reconstruction and Modeling 三维重建 v2
3D models
articulation
skeleton generation
skinning weights
Input: Static 3D models 静态3D模型
Step1: Dataset creation 数据集合成
Step2: Skeleton generation 骨架生成
Step3: Skinning weight prediction 皮肤权重预测
Output: Articulation-ready 3D models 准备好的关节动作3D模型
9.5 [9.5] 2502.12138 FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views
[{'name': 'Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, Gordon Wetzstein'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
camera pose estimation
novel view synthesis
Input: Uncalibrated sparse-view images 未标定稀疏视图
Step1: Camera pose estimation 摄像机姿态估计
Step2: Geometry and appearance estimation 几何体和外观估计
Step3: Novel-view synthesis 新视图合成
Output: High-quality 3D geometry 高质量三维几何体
9.2 [9.2] 2502.10492 Multi-view 3D surface reconstruction from SAR images by inverse rendering
[{'name': 'Emile Barbier--Renard (IDS, IMAGES), Florence Tupin (IMAGES, IDS), Nicolas Trouv\'e (LabHC), Lo\"ic Denis (LabHC)'}]
3D Reconstruction 三维重建 v2
3D Reconstruction
SAR Imaging
Inverse Rendering
Deep Learning
Input: SAR images from radar sensors 合成孔径雷达图像
Step1: Develop a differentiable rendering model 开发可微分的渲染模型
Step2: Implement a coarse-to-fine MLP strategy 实施精细训练的多层感知器策略
Step3: Train the model on synthetic datasets 在合成数据集上训练模型
Output: 3D surface reconstruction results 3D表面重建结果
9.2 [9.2] 2502.10606 HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation
[{'name': 'Yibo Liu, Zhaodong Jiang, Binbin Xu, Guile Wu, Yuan Ren, Tongtong Cao, Bingbing Liu, Rui Heng Yang, Amir Rasouli, Jinjun Shan'}]
3D Reconstruction and Modeling 三维重建 v2
6D pose estimation
image-to-3D
Diffusion Models
Input: Images and scenes from robotics applications
Step1: Utilize image-to-3D priors to generate initial meshes
Step2: Estimate the 6D pose of observed objects
Step3: Continuously refine the mesh and pose estimation based on new observations
Output: Enhanced 3D mesh and accurate 6D pose estimation
8.7 [8.7] 2502.11663 MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
[{'name': 'Jingcheng Ni, Yuxin Guo, Yichen Liu, Rui Chen, Lewei Lu, Zehuan Wu'}]
Autonomous Driving 自动驾驶 v2
autonomous driving
video generation
mask reconstruction
Input: Video sequences 视频序列
Step1: Video mask reconstruction 视频掩码重建
Step2: Diffusion Transformer training 扩散变换器训练
Step3: Model evaluation 模型评估
Output: Generalizable driving world model 通用驾驶世界模型
8.7 [8.7] 2502.12080 HumanGif: Single-View Human Diffusion with Generative Prior
[{'name': 'Shoukang Hu, Takuya Narihira, Kazumi Fukuda, Ryosuke Sawata, Takashi Shibuya, Yuki Mitsufuji'}]
3D Reconstruction and Modeling 三维重建 v2
3D human reconstruction
novel view synthesis
human avatars
Input: Single-view image 单视图图像
Step1: Integrate generative priors from diffusion models 从扩散模型中集成生成先验
Step2: Implement Human NeRF module 引入Human NeRF模块
Step3: Optimize with image-level loss 使用图像级损失进行优化
Output: Novel view and pose consistent human avatars 输出: 新视图和姿态一致的人类头像
8.5 [8.5] 2502.10498 The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey
[{'name': 'Sifan Tu, Xin Zhou, Dingkang Liang, Xingyu Jiang, Yumeng Zhang, Xiaofan Li, Xiang Bai'}]
Autonomous Driving 自动驾驶 v2
Driving World Model
autonomous driving
scene prediction
3D perception
Step1: Literature review and categorization of DWM approaches 进行文献回顾并对DWM方法进行分类
Step2: Analysis of existing methodologies and datasets 对现有方法和数据集进行分析
Step3: Discussion on limitations and future directions 讨论局限性和未来方向
8.5 [8.5] 2502.10603 Adaptive Neural Networks for Intelligent Data-Driven Development
[{'name': 'Youssef Shoeb, Azarm Nowzad, Hanno Gottschalk'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人 v2
adaptive neural networks
autonomous driving
out-of-distribution learning
Input: Autonomous driving environments 自动驾驶环境
Step1: Data collection 数据收集
Step2: Dynamic integration of new object classes 新对象类别的动态集成
Step3: Continuous learning 模型的持续学习
Output: Adaptive perception system 自适应感知系统
8.5 [8.5] 2502.10720 NPSim: Nighttime Photorealistic Simulation From Daytime Images With Monocular Inverse Rendering and Ray Tracing
[{'name': 'Shutong Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
mesh reconstruction
autonomous driving
nighttime simulation
Input: Daytime images and semantic labels 白天图像和语义标签
Step1: Mesh reconstruction 网格重建
Step2: Relighting 重光照
Step3: Nighttime image simulation 夜间图像仿真
Output: Realistic nighttime images 真实的夜间图像
8.5 [8.5] 2502.10724 Semantics-aware Test-time Adaptation for 3D Human Pose Estimation
[{'name': 'Qiuxia Lin, Rongyu Chen, Kerui Gu, Angela Yao'}]
3D Reconstruction and Modeling 三维重建 v2
3D Human Pose Estimation
Test-time adaptation
Semantics-aware motion prior
Input: Video sequences containing human poses 视频 sequences containing human poses
Step1: Identify semantics from video using language models 使用语言模型识别视频中的语义
Step2: Integrate motion prior with semantic information 将运动先验与语义信息整合
Step3: Adapt 3D pose predictions during test-time adaptation (TTA) 在测试时间适应中调整3D姿势预测
Output: Refined 3D pose estimations 提炼的3D姿势估计
8.5 [8.5] 2502.11287 MC-BEVRO: Multi-Camera Bird Eye View Road Occupancy Detection for Traffic Monitoring
[{'name': 'Arpitsinh Vaghela, Duo Lu, Aayush Atul Verma, Bharatesh Chakravarthi, Hua Wei, Yezhou Yang'}]
3D Perception 3D感知 v2
3D perception 3D感知
traffic monitoring 交通监测
multi-camera 多摄像头
occupancy detection 占用检测
Input: Multi-camera images 多摄像头图像
Step1: Data acquisition 数据收集
Step2: Background integration 背景集成
Step3: Late and early fusion 方法的融合
Output: BEV occupancy map BEV占用图
8.5 [8.5] 2502.11307 Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection
[{'name': 'Jiaxiang Wang, Haote Xu, Xiaolu Chen, Haodi Xu, Yue Huang, Xinghao Ding, Xiaotong Tu'}]
3D Anomaly Detection 3D异常检测 v2
anomaly detection
3D point cloud
Point-Language model
Input: 3D point clouds 3D点云
Step1: Dual-prompt learning 双提示学习
Step2: Dynamic prompt creation 动态提示创建
Step3: Anomaly detection 异常检测
Output: Enhanced anomaly detection performance 改进的异常检测性能
8.5 [8.5] 2502.11586 Syllables to Scenes: Literary-Guided Free-Viewpoint 3D Scene Synthesis from Japanese Haiku
[{'name': 'Chunan Yu, Yidong Han, Chaotao Ding, Ying Zang, Lanyun Zhu, Xinhao Chen, Zejian Li, Renjun Xu, Tianrun Chen'}]
3D Scene Generation 三维场景生成 v2
3D scene synthesis
Japanese Haiku
Input: Japanese Haiku 日本俳句
Step1: Literary analysis 文学分析
Step2: Spatial representation 空间表现
Step3: 3D scene synthesis 三维场景合成
Output: Navigable 3D scenes 可导航三维场景
8.5 [8.5] 2502.11642 GaussianMotion: End-to-End Learning of Animatable Gaussian Avatars with Pose Guidance from Text
[{'name': 'Gyumin Shim, Sangmin Lee, Jaegul Choo'}]
Image Generation 图像生成 v2
3D human models
Gaussian Splatting
text-to-3D generation
animation
Input: Textual descriptions 文本描述
Step1: Data integration 数据集成
Step2: Model optimization 模型优化
Step3: Animation generation 动画生成
Output: Animatable 3D avatars 可动画的三维头像
8.5 [8.5] 2502.11697 MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow
[{'name': 'Hanzhuo Huang, Yuan Liu, Ge Zheng, Jiepeng Wang, Zhiyang Dou, Sibei Yang'}]
Image and Video Generation 图像生成 v2
4D generation
multiview diffusion models
autonomous systems
Input: Monocular videos 单目视频
Step1: Generate multiview images using multiview diffusion models 利用多视角扩散模型生成多视角图像
Step2: Associate pixels using token flow technique 使用令牌流技术关联像素
Step3: Refine the coarse 4D field 细化粗糙的4D场
Output: High-quality 4D field 高质量4D场
8.5 [8.5] 2502.11710 The Worse The Better: Content-Aware Viewpoint Generation Network for Projection-related Point Cloud Quality Assessment
[{'name': 'Zhiyong Su, Bingxu Xie, Zheng Li, Jincan Wu, Weiqing Li'}]
Point Cloud Processing 点云处理 v2
Point Cloud Quality Assessment 点云质量评估
Content-Aware Viewpoint Generation 内容感知视点生成
Geometric Features 几何特征
Input: Degraded point clouds 退化点云
Step1: Extract multi-scale geometric and texture features 提取多尺度几何和纹理特征
Step2: Refine features per viewpoint 针对每个视点进行特征优化
Step3: Generate optimized viewpoints 生成优化视角
Output: Optimized viewpoints for projection-related PCQA 用于投影相关PCQA的优化视角
8.5 [8.5] 2502.11726 No-reference geometry quality assessment for colorless point clouds via list-wise rank learning
[{'name': 'Zheng Li, Bingxu Xie, Chao Chu, Weiqing Li, Zhiyong Su'}]
Geometry Quality Assessment 几何质量评估 v2
geometry quality assessment
point clouds
3D reconstruction
Input: Colorless point clouds 颜色点云
Step1: Construct LRL dataset 生成 LRL 数据集
Step2: Design GQANet to extract geometric features 设计 GQANet 提取几何特征
Step3: Use LRLNet for ranking the quality of point clouds 使用 LRLNet 对点云品质进行排序
Output: Predicted geometry quality index 预测的几何质量指数
8.5 [8.5] 2502.11742 Range and Bird's Eye View Fused Cross-Modal Visual Place Recognition
[{'name': 'Jianyi Peng, Fan Lu, Bin Li, Yuan Huang, Sanqing Qu, Guang Chen'}]
Visual Place Recognition 视觉地点识别 v2
Visual Place Recognition
Cross-modal
RGB images
LiDAR
Bird's Eye View
Input: RGB images and LiDAR point clouds
Step1: Initial retrieval using global descriptor similarity
Step2: Re-ranking based on Bird's Eye View (BEV) images
Output: Improved Visual Place Recognition results
8.5 [8.5] 2502.11864 Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving?
[{'name': 'Natalie Grabowsky, Annika M\"utze, Joshua Wendland, Nils Jansen, Matthias Rottmann'}]
Autonomous Driving 自动驾驶 v2
Perceptual Uncertainty
Reinforcement Learning
Automated Driving
Input: Perturbed observation space 观察空间
Step1: Introduce uncertainty 引入不确定性
Step2: Inform agent of uncertainty 通知代理不确定性
Step3: Reward agent for navigating safely 奖励代理安全导航
Output: Adjusted behavior with uncertainty 根据不确定性调整行为
8.5 [8.5] 2502.11971 Robust 6DoF Pose Tracking Considering Contour and Interior Correspondence Uncertainty for AR Assembly Guidance
[{'name': 'Jixiang Chen, Jing Chen, Kai Liu, Haochen Chang, Shanfeng Fu, Jian Yang'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人 v2
6DoF pose tracking
augmented reality
contour-based methods
object tracking
intelligent manufacturing
Input: 6DoF object poses 6DoF 物体姿态
Step1: Robust contour-based tracking 方法 提出了一种基于轮廓的跟踪
Step2: CPU-only strategy for symmetric objects 针对对称物体的CPU仅策略
Step3: Unified energy function formulation 统一能量函数的表述
Output: Accurate tracking and assembly guidance 精确的跟踪和装配指导
8.5 [8.5] 2502.12151 VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution
[{'name': 'Chendong Wang, Anlan Zhang, Yifan Yang, Lili Qiu, Yuqing Yang, Xinyang Jiang, Feng Qian, Suman Banerjee'}]
3D Reconstruction and Modeling 三维重建 v2
3D volumetric video
super-resolution
bandwidth reduction
lookup tables (LUTs)
Input: Low-resolution volumetric data 低分辨率体积数据
Step1: Downsampling data to reduce bandwidth 数据下采样以减少带宽
Step2: Applying super-resolution algorithm to upscale data 应用超分辨率算法对数据进行上采样
Step3: Utilizing lookup tables (LUTs) for efficient processing 使用查找表 (LUTs) 进行高效处理
Output: Enhanced volumetric video for streaming 改进的体积视频用于流传输
7.8 [7.8] 2502.10444 A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision
[{'name': 'Hao Ai, Zidong Cao, Lin Wang'}]
3D Geometry and Motion Estimation 3D几何与运动估计 v2
Omnidirectional vision
Deep learning
3D geometry
Autonomous driving
Input: Omnidirectional images 全景图像
Step 1: Literature review 文献综述
Step 2: Challenges and complexities analysis 挑战与复杂性分析
Step 3: Taxonomy development 分类法开发
Objective: Summarize DL methods for omnidirectional vision 总结全景视觉的深度学习方法
7.5 [7.5] 2502.12095 Descriminative-Generative Custom Tokens for Vision-Language Models
[{'name': 'Pramuditha Perera, Matthew Trager, Luca Zancato, Alessandro Achille, Stefano Soatto'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
image retrieval
custom tokens
Input: Concept images and text 描述概念的图像和文本
Step1: Learn custom tokens 学习自定义token
Step2: Align text and image features 对齐文本和图像特征
Step3: Use in VLMs 应用于视觉语言模型
Output: Improved query performance 改进的查询性能

Arxiv 2025-02-17

Relavance Title Research Topic Keywords Pipeline
9.2 [9.2] 2502.09672 IMM-MOT: A Novel 3D Multi-object Tracking Framework with Interacting Multiple Model Filter
[{'name': 'Xiaohong Liu, Xulong Zhao, Gang Liu, Zili Wu, Tao Wang, Lei Meng, Yuhan Wang'}]
3D Multi-Object Tracking 3D多目标跟踪 v2
3D Multi-Object Tracking
Interacting Multiple Model filter
3D point clouds
Input: 3D point clouds and images 3D点云和图像
Step1: Damping Window mechanism for trajectory management 轨迹管理的阻尼窗口机制
Step2: Interacting Multiple Model filter for dynamic tracking 动态跟踪的交互多个模型滤波器
Step3: Distance-Based Score Enhancement for detection scores 检测分数的基于距离的增强
Output: Enhanced 3D multi-object tracking system 改进的3D多目标跟踪系统
9.0 [9.0] 2502.09980 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models
[{'name': 'Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen'}]
Autonomous Driving 自动驾驶 v2
Autonomous Driving
Cooperative Perception
Large Language Models
Input: Perception information from multiple CAVs 从多个CAV获取感知信息
Step1: Data integration 数据集成
Step2: LLM-based fusion 方法:基于LLM的特征融合
Step3: Question answering 问题回答
Output: Driving-related answers 驾驶相关答案
8.5 [8.5] 2502.09652 GraphCompNet: A Position-Aware Model for Predicting and Compensating Shape Deviations in 3D Printing
[{'name': 'Lei (Rachel), Chen, Juheon Lee, Juan Carlos Catana, Tsegai Yhdego, Nathan Moroney, Mohammad Amin Nabian, Hui Wang, Jun Zeng'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D printing 3D 打印
shape deviation 形状偏差
additive manufacturing 增材制造
Input: Point cloud data 点云数据
Step1: Integrate positional factors 集成位置因素
Step2: Develop compensation algorithms 开发补偿算法
Step3: Validate and refine with experimental data 验证和完善实验数据
Output: Enhanced shape accuracy 改进的形状精度
8.5 [8.5] 2502.09669 Meta-INR: Efficient Encoding of Volumetric Data via Meta-Learning Implicit Neural Representation
[{'name': 'Maizhe Yang, Kaiyuan Tang, Chaoli Wang'}]
Volumetric Reconstruction 体积重建 v2
implicit neural representation
volumetric data
meta-learning
3D reconstruction
volume rendering
Input: Volumetric dataset 体积数据集
Step1: Meta-pretraining on subsampled data 亚采样数据上的元预训练
Step2: Volume-specific finetuning on complete data 对完整数据的卷特定微调
Output: Adapted implicit neural representations (INRs) 调整后的隐式神经表征
8.5 [8.5] 2502.09795 Vision-based Geo-Localization of Future Mars Rotorcraft in Challenging Illumination Conditions
[{'name': 'Dario Pisanti, Robert Hewitt, Roland Brockers, Georgios Georgakis'}]
Autonomous Systems and Robotics 自动驾驶机器人 v2
Map-based Localization
Mars
image registration
deep learning
Input: Onboard images and reference map
Step1: Development of Geo-LoFTR model
Step2: Incorporation of geometric context
Step3: Simulation of Martian terrain
Output: Enhanced localization accuracy
8.5 [8.5] 2502.10028 ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation
[{'name': 'Yuxin He, Qiang Nie'}]
3D Flow and Action Prediction 3D流和动作预测 v2
3D flow
action prediction
robotic manipulation
Input: Language instructions and video data 语言指令和视频数据
Step 1: 3D flow prediction 3D流预测
Step 2: Model training using causal transformer 使用因果变换器训练模型
Output: Fine-grained action predictions and future image generation 输出: 精细的动作预测和未来图像生成
8.5 [8.5] 2502.10059 RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control
[{'name': 'Teng Li, Guangcong Zheng, Rui Jiang, Shuigenzhan, Tao Wu, Yehao Lu, Yining Lin, Xi Li'}]
3D Reconstruction and Modeling 三维重建 v2
image-to-video generation
3D scene reconstruction
camera control
depth estimation
Input: Monocular images 单目图像
Step1: Depth estimation 深度估计
Step2: 3D scene reconstruction 3D场景重建
Step3: Camera trajectory scaling 相机轨迹缩放
Output: Interactive video generation 交互式视频生成
8.5 [8.5] 2502.10127 Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation
[{'name': 'Gamal Elghazaly, Raphael Frank'}]
Autonomous Driving 自动驾驶 v2
Collaboration
HD maps
V2X
Scene Graph Generation
Input: Front-facing camera images 前视相机图像
Step1: Extract lane centerlines from images 从图像中提取车道中心线
Step2: Represent lane centerlines as directed graphs 将车道中心线表示为有向图
Step3: Transmit data to the cloud via V2X 通过V2X将数据传输到云端
Output: Generated localized HD map 生成的局部高清地图
8.5 [8.5] 2502.10377 ReStyle3D: Scene-Level Appearance Transfer with Semantic Correspondences
[{'name': 'Liyuan Zhu, Shengqu Cai, Shengyu Huang, Gordon Wetzstein, Naji Khosravan, Iro Armeni'}]
3D Generation 三维生成 v2
3D reconstruction
style transfer
multi-view consistency
Input: Multi-view images 多视角图像
Step1: Style transfer to a single view using semantic attention mechanism 在单视图上使用语义注意机制进行风格转移
Step2: Lift stylization to additional views using warp-and-refine network 通过变换和细化网络将风格提升到其他视图
Output: Consistent stylized results across multiple views 在多个视图中获得一致的风格化结果
8.5 [8.5] 2502.10392 Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
[{'name': 'Wenxuan Guo, Xiuwei Xu, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu'}]
3D Visual Grounding 3D视觉定位 v2
3D visual grounding 3D视觉定位
sparse convolution 稀疏卷积
text features 文本特征
Input: 3D scene representation and text features 3D场景表示和文本特征
Step1: Text-guided pruning to sparsify the 3D voxel features 文本引导的修剪以减少3D体素特征
Step2: Completion-based addition to address over-pruned areas 基于补全的添加以解决过度修剪区域
Output: Efficiently fused features for 3D visual grounding 高效融合的特征用于3D视觉定位
8.0 [8.0] 2502.10273 Probing Perceptual Constancy in Large Vision Language Models
[{'name': 'Haoran Sun, Suyang Yu, Yijiang Li, Qingying Gao, Haiyun Lyu, Hokin Deng, Dezhi Luo'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
perceptual constancy
vision-language models
VLMs
cognitive tasks
Input: Vision-Language Models (VLMs) 视觉语言模型
Step1: Evaluation using cognitive experiments 使用认知实验进行评估
Step2: Testing across dimensions of perceptual constancy 在感知恒常性的各个维度进行测试
Step3: Analysis of model variability in performance 对模型性能的变异性进行分析
Output: Insights into perceptual constancy capabilities of VLMs 输出: 对VLMs感知恒常性能力的洞察
7.5 [7.5] 2502.09818 On the robustness of multimodal language model towards distractions
[{'name': 'Ming Liu, Hao Chen, Jindong Wang, Wensheng Zhang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models (VLMs) 视觉语言模型
Robustness of Models 模型鲁棒性
Input: Vision-language models (VLMs) 视觉语言模型
Step1: Develop a benchmark 数据集开发
Step2: Introduce distractions in visual and textual inputs 输入中引入干扰
Step3: Evaluate model robustness 评估模型鲁棒性
Output: Insights on VLM performance 视觉语言模型性能洞察

Arxiv 2025-02-14

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.08902 CoL3D: Collaborative Learning of Single-view Depth and Camera Intrinsics for Metric 3D Shape Recovery
[{'name': 'Chenghao Zhang, Lubin Fan, Shen Cao, Bojian Wu, Jieping Ye'}]
3D Reconstruction and Modeling 三维重建 v2
3D shape recovery
depth estimation
camera calibration
Input: Single image 单幅图像
Step1: Depth estimation 深度估计
Step2: Camera intrinsics estimation 相机内参估计
Step3: Collaborative optimization 协同优化
Output: Metric 3D shape metric 3D 形状
9.5 [9.5] 2502.09111 DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior
[{'name': 'Mingrui Li, Shuhong Liu, Tianchen Deng, Hongyu Wang'}]
SLAM 同时定位与地图构建 v2
SLAM
Neural Radiance Fields
3D Reconstruction
Gaussian Splatting
Input: RGB-D stream of frames RGB-D帧流
Step1: Camera pose and neural radiance fields optimization 相机位姿和神经辐射场优化
Step2: Initialize Gaussian primitives using implicit radiance fields based on sampled points 使用样本点的隐式辐射场初始化高斯原语
Step3: Implement local loop closure detection and bundle optimization 进行局部闭环检测和捆绑优化
Output: Enhanced Gaussian maps with improved tracking and mapping performance 输出:具有改进跟踪和映射性能的增强高斯地图
9.5 [9.5] 2502.09274 FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation
[{'name': 'Bin Yang, Alexandru Paul Condurache'}]
3D Scene Understanding 3D场景理解 v2
3D scene understanding
LiDAR
semantic segmentation
autonomous driving
Input: LiDAR point clouds LiDAR点云
Step1: Redesign data representation 重新设计数据表示
Step2: Implement data augmentation 实施数据增强
Step3: Apply post-processing methods 应用后处理方法
Output: Enhanced semantic segmentation performance 提升的语义分割性能
9.5 [9.5] 2502.09278 ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization
[{'name': 'Onat \c{S}ahin, Mohammad Altillawi, George Eskandar, Carlos Carbone, Ziyuan Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
image-to-3D
mesh generation
Input: Multi-view images 多视角图像
Step1: Generate multi-view prior images 生成多视角先验图像
Step2: Use score distillation sampling (SDS) to guide view generation 使用得分蒸馏采样引导视图生成
Step3: Optimize rough shape and fine details 优化粗形状和细节
Output: View-consistent 3D mesh 视图一致的三维网格
9.5 [9.5] 2502.09425 A 3D Facial Reconstruction Evaluation Methodology: Comparing Smartphone Scans with Deep Learning Based Methods Using Geometry and Morphometry Criteria
[{'name': "\'Alvaro Heredia-Lid\'on, Alejandro Mo\~nux-Bernal, Alejandro Gonz\'alez, Luis M. Echeverry-Quiceno, Max Rubert, Aroa Casado, Mar\'ia Esther Esteban, Mireia Andreu-Montoriol, Susanna Gallardo, Cristina Ruffo, Neus Mart\'inez-Abad\'ias, Xavier Sevillano"}]
3D Reconstruction and Modeling 三维重建 v2
3D facial reconstruction
morphometric analysis
deep learning
Input: Smartphone-based 3D scans and deep learning models 智能手机3D扫描与深度学习模型
Step1: Data acquisition 数据采集
Step2: Morphometric shape analysis morphometric形状分析
Step3: Comparison with ground truth 比较真实模型
Output: Evaluation of global and local shape differences 输出:全球和局部形状差异的评估
9.5 [9.5] 2502.09563 Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction
[{'name': 'Youming Deng, Wenqi Xian, Guandao Yang, Leonidas Guibas, Gordon Wetzstein, Steve Marschner, Paul Debevec'}]
3D Reconstruction 三维重建 v2
3D Reconstruction 三维重建
Camera Calibration 相机校准
Gaussian Splatting 高斯点云
Input: Wide-angle images 广角图像
Step1: Optimize camera parameters 优化相机参数
Step2: Model lens distortion 建模镜头畸变
Step3: Use Gaussian representations 使用高斯表示
Step4: Resample with cubemap strategy 使用立方映射策略
Output: Accurate 3D scene reconstruction 准确的三维场景重建
9.5 [9.5] 2502.09613 Latent Radiance Fields with 3D-aware 2D Representations
[{'name': 'Chaoyi Zhou, Xi Liu, Feng Luo, Siyu Huang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
latent representations
photorealistic rendering
Input: 2D latent representations 2D 潜在表示
Step1: Enhance 3D consistency with correspondence-aware autoencoding 方法1: 使用对应感知自编码增强3D一致性
Step2: Lift 3D-aware representations into 3D space 方法2: 将3D感知表示提升至3D空间
Step3: Align VAE-Radiance Fields for image decoding 方法3: 对齐VAE-放射场以进行图像解码
Output: Photorealistic 3D reconstruction output 照片真实的3D重建输出
9.5 [9.5] 2502.09615 RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets
[{'name': 'Isabella Liu, Zhan Xu, Wang Yifan, Hao Tan, Zexiang Xu, Xiaolong Wang, Hao Su, Zifan Shi'}]
3D Reconstruction and Modeling 三维重建 v2
3D assets
autoregressive modeling
automatic rigging
Input: 3D asset shapes 3D资产形状
Step1: Joint probabilistic generation 关节概率生成
Step2: Skeleton topology prediction 骨架拓扑预测
Step3: Skinning weights assignment 绑定权重分配
Output: Rigged 3D asset 装配好的3D资产
9.5 [9.5] 2502.09623 Embed Any NeRF: Graph Meta-Networks for Neural Tasks on Arbitrary NeRF Architectures
[{'name': 'Francesco Ballerini, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di Stefano'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
3D representation
Graph Meta-Networks
Input: Neural Radiance Fields (NeRFs) 神经辐射场
Step1: Train a Graph Meta-Network 训练图元网络
Step2: Apply contrastive learning 施加对比学习
Step3: Perform classification and retrieval tasks 执行分类和检索任务
Output: Architecture-agnostic representations 架构无关表示
8.8 [8.8] 2502.09620 Exploring the Potential of Encoder-free Architectures in 3D LMMs
[{'name': 'Yiwen Tang, Zoey Guo, Zhuhao Wang, Ray Zhang, Qizhi Chen, Junli Liu, Delin Qu, Zhigang Wang, Dong Wang, Xuelong Li, Bin Zhao'}]
3D Reconstruction and Modeling 三维重建 v2
3D LMMs
Encoder-free architectures 3D LMMs
3D understanding 3D 理解
Input: 3D point clouds 3D 点云
Step1: Semantic Encoding in pre-training 阶段的语义编码
Step2: Hierarchical Geometry Aggregation in tuning 调优中的层次几何聚合
Output: Encoder-free 3D LMM 编码器自由 3D LMM
8.5 [8.5] 2502.08884 ShapeLib: designing a library of procedural 3D shape abstractions with Large Language Models
[{'name': 'R. Kenny Jones, Paul Guerrero, Niloy J. Mitra, Daniel Ritchie'}]
3D Reconstruction and Modeling 三维重建 v2
3D shape representation
procedural modeling
Large Language Models
Input: Design intent (text descriptions, seed shapes) 设计意图(文本描述,种子形状)
Step1: Library interface design 库接口设计
Step2: Function application proposing 函数应用提出
Step3: Function implementation formulation 函数实现制定
Step4: Geometric validation of functions 几何验证函数
Output: Library of procedural shape functions 程序化形状函数库
8.5 [8.5] 2502.08974 Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning
[{'name': 'Yiming Yang, Yueru Luo, Bingkun He, Erlong Li, Zhipeng Cao, Chao Zheng, Shuqi Mei, Zhen Li'}]
Autonomous Systems and Robotics 自动驾驶 v2
lane topology
autonomous driving
topology reasoning
Input: Perspective views (PV) from cameras
Step1: Extract lane topology sequences from PV
Step2: Implement dual-decoder architecture for segment and topology decoding
Step3: Utilize randomized order prompt-to-sequence learning
Output: Enhanced lane topology sequences for autonomous driving
8.5 [8.5] 2502.08977 Text-driven 3D Human Generation via Contrastive Preference Optimization
[{'name': 'Pengfei Zhou, Xukun Shen, Yong Hu'}]
3D Generation 三维生成 v2
3D human generation
text-driven
contrastive preferences
Input: Textual descriptions 文本描述
Step1: Preference optimization module 偏好优化模块
Step2: Integration of multiple preference models 多个偏好模型的集成
Step3: Negation preference module 引入否定偏好模块
Output: Enhanced 3D human models 改进的三维人类模型
8.5 [8.5] 2502.09039 Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting
[{'name': 'Lingting Zhu, Guying Lin, Jinnan Chen, Xinjie Zhang, Zhenchao Jin, Zhao Wang, Lequan Yu'}]
3D Reconstruction and Modeling 三维重建 Gaussian Splatting
3D reconstruction
image representation
Input: Large images 大图像
Step1: Gaussian point fitting 高斯点拟合
Step2: Optimization strategy 优化策略
Step3: Level-of-Gaussian reconstruction 高斯层次重建
Output: High-quality image representations 高质量图像表示
8.5 [8.5] 2502.09057 Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
[{'name': 'Shiryu Ueno, Yoshikazu Hayashi, Shunsuke Nakatsuka, Yusei Yamada, Hiroaki Aizawa, Kunihito Kato'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Visual Inspection
Vision-Language Model
In-Context Learning
Input: Few-shot images of products 产品的少量图像
Step1: Construct dataset 创建数据集
Step2: Fine-tune VLM for inspection 对VLM进行微调以进行检查
Step3: Perform visual inspection using In-Context Learning 使用In-Context Learning进行视觉检查
Output: Inspection results and defective location detection 检查结果及缺陷位置检测
8.5 [8.5] 2502.09080 BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization
[{'name': 'Qiwei Wang, Shaoxun Wu, Yujiao Shi'}]
Cross-View Localization 跨视角定位 v2
3D Gaussian primitives
cross-view localization
autonomous driving
Input: Ground image and satellite image 地面图像与卫星图像
Step1: Generate 3D Gaussian primitives 生成三维高斯原语
Step2: Synthesize BEV feature map 合成鸟瞩视图特征图
Step3: Conduct pose estimation 进行姿态估计
Output: Location probability map of the query image 查询图像的位置信息图
8.5 [8.5] 2502.09528 SteROI-D: System Design and Mapping for Stereo Depth Inference on Regions of Interest
[{'name': 'Jack Erhardt, Ziang Li, Reid Pinkham, Andrew Berkovich, Zhengya Zhang'}]
Multi-view Stereo 多视角立体 v2
Stereo Depth
Region of Interest
Energy Efficiency
AR/VR
Dynamic ROIs
Input: Stereo images 立体图像
Step1: ROI identification ROI识别
Step2: Depth estimation depth estimation
Step3: Energy optimization 能耗优化
Output: Efficient depth maps 高效深度图
8.5 [8.5] 2502.09617 LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh
[{'name': 'Jing Wen, Alexander G. Schwing, Shenlong Wang'}]
Neural Rendering 神经渲染 v2
3D reconstruction
human rendering
computational efficiency
Input: Sparse source images稀疏源图像
Step1: Iterative feedback update iterative feedback update
Step2: Coupled multi-resolution Gaussians-on-Mesh representation耦合多分辨率高斯-网格表示
Output: Animatable human representation可动画人类表示
7.5 [7.5] 2502.09075 PTZ-Calib: Robust Pan-Tilt-Zoom Camera Calibration
[{'name': 'Jinhui Guo, Lubin Fan, Bojian Wu, Jiaqi Gu, Shen Cao, Jieping Ye'}]
Camera Calibration 相机校准 v2
PTZ calibration
camera parameters
3D information
Input: Reference images 参考图像
Step1: Image selection 图像选择
Step2: Apply PTZ-IBA algorithm 应用PTZ增量束调整算法
Step3: Parameter optimization 参数优化
Output: Calibrated camera parameters 校准的相机参数
7.5 [7.5] 2502.09088 Unsupervised Anomaly Detection on Implicit Shape representations for Sarcopenia Detection
[{'name': 'Louise Piecuch (MD), Jeremie Huet (MD), Antoine Frouin (PT), Antoine Nordez (MD), Anne-Sophie Boureau (MD), Diana Mateus'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
anomaly detection
implicit neural representation
sarcopenia
Input: Muscle shape data 肌肉形状数据
Step1: Model normal muscle shapes using implicit neural representation (INR) 使用隐式神经表征建模正常肌肉形状
Step2: Employ unsupervised anomaly detection based on reconstruction error 使用基于重建误差的无监督异常检测
Step3: Classify and separate normal and sarcopenic muscles from learned representations 对学习的表示进行分类和分离正常与肌肉萎缩肌肉
Output: Anomaly detection results for sarcopenic and non-sarcopenic muscles 输出:肌肉萎缩及非肌肉萎缩的异常检测结果

Arxiv 2025-02-13

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.07822 PDM-SSD: Single-Stage Three-Dimensional Object Detector With Point Dilation
[{'name': 'Ao Liang, Haiyang Hua, Jian Fang, Wenyu Chen, Huaici Zhao'}]
3D Object Detection 三维物体检测 v2
3D object detection
Point Dilation Mechanism
autonomous driving
Input: Point cloud data 点云数据
Step1: Efficient feature encoding using PointNet-style backbone 使用PointNet风格的骨干网进行高效特征编码
Step2: Point Dilation Mechanism (PDM) to expand feature space 使用点膨胀机制(PDM)扩展特征空间
Step3: Hybrid detection head for joint learning 设计混合检测头进行联合学习
Output: Enhanced 3D object detection results 改进的三维物体检测结果
9.5 [9.5] 2502.07840 TranSplat: Surface Embedding-guided 3D Gaussian Splatting for Transparent Object Manipulation
[{'name': 'Jeongyun Kim, Jeongho Noh, Dong-Guw Lee, Ayoung Kim'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
transparent object manipulation
depth completion
latent diffusion model
robotics
Input: RGB images and surface embeddings RGB图像和表面嵌入
Step1: Generate surface embeddings using a latent diffusion model 使用潜在扩散模型生成表面嵌入
Step2: Jointly optimize Gaussian splatting with RGB images and surface embeddings 与RGB图像和表面嵌入共同优化高斯点云
Step3: Render depth for object manipulation 渲染深度以进行物体操作
Output: Accurate depth completion for transparent objects 为透明物体提供准确的深度完成
9.5 [9.5] 2502.07869 EventEgo3D++: 3D Human Motion Capture from a Head-Mounted Event Camera
[{'name': 'Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Alain Pagani, Didier Stricker, Christian Theobalt, Vladislav Golyanik'}]
3D Reconstruction and Modeling 三维重建 v2
3D human motion capture
event cameras
egocentric vision
Input: Monocular event camera with fisheye lens 单眼事件相机与鱼眼镜头
Step1: Data acquisition from event camera 数据采集
Step2: Integration of RGB and event data RGB与事件数据集成
Step3: Algorithm development for pose estimation 算法开发以估计姿势
Step4: Real-time processing and 3D reconstruction 实时处理与三维重建
Output: Accurate 3D human motion capture 精确的三维人类运动捕捉
9.5 [9.5] 2502.08169 CoDynTrust: Robust Asynchronous Collaborative Perception via Dynamic Feature Trust Modulus
[{'name': 'Yunjiang Xu, Lingzhi Li, Jin Wang, Benyuan Yang, Zhiwen Wu, Xinhong Chen, Jianping Wang'}]
3D Object Detection 三维物体检测 v2
3D detection 三维检测
autonomous driving 自动驾驶
collaborative perception 协同感知
Input: Sensor data from LiDAR and cameras 传感器数据来自LiDAR和相机
Step1: Evaluate dynamic feature trust modulus (DFTM) 评估动态特征信任模数 (DFTM)
Step2: Implement multi-scale fusion method 实现多尺度融合方法
Step3: Validate performance through extensive experiments 通过广泛实验验证性能
Output: Enhanced robustness in 3D object detection 提高三维物体检测的鲁棒性
9.5 [9.5] 2502.08285 Fully-Geometric Cross-Attention for Point Cloud Registration
[{'name': 'Weijie Wang, Guofeng Mei, Jian Zhang, Nicu Sebe, Bruno Lepri, Fabio Poiesi'}]
3D Reconstruction 三维重建 v2
Point Cloud Registration 点云配准
Geometric Attention 几何注意力
Transformer Network 变换网络
Input: Point clouds 输入: 点云
Step1: Cross-attention mechanism development 步骤1: 交叉注意力机制开发
Step2: Integration of Gromov-Wasserstein distance into attention 步骤2: 将Gromov-Wasserstein距离集成到注意力机制中
Step3: Point feature aggregation through self-attention 步骤3: 通过自注意力聚合点特征
Output: Enhanced point cloud registration results 输出: 改进的点云配准结果
9.5 [9.5] 2502.08352 Sat-DN: Implicit Surface Reconstruction from Multi-View Satellite Images with Depth and Normal Supervision
[{'name': 'Tianle Liu, Shuangming Zhao, Wanshou Jiang, Bingxuan Guo'}]
3D Reconstruction 三维重建 v2
3D reconstruction
satellite imagery
neural networks
Input: Multi-view satellite images 多视角卫星图像
Step1: Incorporate explicit depth guidance 引入显式深度指导
Step2: Apply surface normal consistency constraints 应用表面法线一致性约束
Step3: Utilize a multi-resolution hash grid for efficient reconstruction 使用多分辨率哈希网格进行高效重建
Output: Accurate 3D models from satellite images 从卫星图像获得精准的三维模型
8.5 [8.5] 2502.07829 Preference Alignment on Diffusion Model: A Comprehensive Survey for Image Generation and Editing
[{'name': 'Sihao Wu, Xiaonan Si, Chi Xing, Jianhong Wang, Gaojie Jin, Guangliang Cheng, Lijun Zhang, Xiaowei Huang'}]
Image Generation 图像生成 v2
diffusion models
image generation
preference alignment
autonomous driving
Input: Integration of preference alignment with diffusion models 偏好对齐与扩散模型的结合
Step1: Systematic review of optimization techniques 对优化技术进行系统回顾
Step2: Exploration of applications across various fields 在多个领域探索应用
Step3: Discussion of challenges in preference alignment 讨论偏好对齐中的挑战
Output: Insights for future innovation 未来创新的洞察
8.5 [8.5] 2502.08377 Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features
[{'name': 'Liying Yang, Chen Liu, Zhenwei Zhu, Ajian Liu, Hui Ma, Jian Nong, Yanyan Liang'}]
3D Generation 三维生成 v2
4D generation
dynamic-static features
computer vision
Input: Video frames 视频帧
Step1: Feature extraction 特征提取
Step2: Dynamic-static feature decoupling 动态静态特征解耦
Step3: Temporal-spatial similarity fusion 在时间-空间上选择相似特征
Output: 4D content generation 4D内容生成
8.5 [8.5] 2502.08639 CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
[{'name': 'Qinghe Wang, Yawen Luo, Xiaoyu Shi, Xu Jia, Huchuan Lu, Tianfan Xue, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai'}]
Image and Video Generation 图像生成 v2
3D-aware
text-to-video generation
depth maps
camera trajectories
Input: User-defined scene parameters 用户定义的场景参数
Step1: Interactive workflow for 3D control 3D控制的交互工作流程
Step2: Condition signal construction 条件信号构建
Step3: Text-to-video generation from control signals 基于控制信号的文本生成视频
Output: Generated controllable video 输出: 生成的可控视频
8.0 [8.0] 2502.08374 AdvSwap: Covert Adversarial Perturbation with High Frequency Info-swapping for Autonomous Driving Perception
[{'name': 'Yuanhao Huang, Qinfan Zhang, Jiandong Xing, Mengyue Cheng, Haiyang Yu, Yilong Ren, Xiao Xiong'}]
Autonomous Driving 自动驾驶 v2
adversarial attack
autonomous driving
information swapping
Input: Autonomous vehicle images 自动驾驶车辆图像
Step1: Information swapping 信息交换
Step2: Adversarial sample generation 对抗样本生成
Step3: Evaluation on datasets 在数据集上评估
Output: Robust adversarial samples 稳健的对抗样本
7.5 [7.5] 2502.08646 Poly-Autoregressive Prediction for Modeling Interactions
[{'name': 'Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegaran, Shiry Ginosar, Jitendra Malik'}]
Autonomous Systems and Robotics 自动驾驶 v2
autonomous vehicles
trajectory prediction
multi-agent interactions
behavior forecasting
Input: Ego agent's state history and states of other interacting agents 自我代理的状态历史和其他交互代理的状态
Step1: Model behavior as a sequence of tokens 将行为建模为状态序列
Step2: Use a transformer for prediction 使用变压器进行预测
Step3: Apply to different prediction tasks 应用到不同的预测任务
Output: Predicted future behavior of the ego agent 输出自我代理的未来行为预测
6.5 [6.5] 2502.07838 NanoVLMs: How small can we go and still make coherent Vision Language Models?
[{'name': 'Mukund Agarwalla, Himanshu Kumar, Raj Dandekar, Rajat Dandekar, Sreedath Panat'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
lightweight models
Input: Image-text pairs 图像-文本对
Step1: Dataset creation 数据集创建
Step2: Model training 模型训练
Step3: Evaluation using creative scoring 通过创意评分进行评估
Output: Lightweight vision-language models 轻量级视觉语言模型

Arxiv 2025-02-12

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.07140 Few-Shot Multi-Human Neural Rendering Using Geometry Constraints
[{'name': 'Qian li, Victoria Fern\`andez Abrevaya, Franck Multon, Adnane Boukhayma'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
multi-human scenes
neural rendering
Input: Sparse multi-view images 稀疏多视角图像
Step1: Geometry constraints using SMPL meshes 使用SMPL网格的几何约束
Step2: Regularize signed distances for optimization 通过正则化签名距离进行优化
Step3: Apply ray and saturation regularization 应用射线和饱和度正则化
Output: Accurate multi-human 3D reconstructions and renderings 准确的多人的三维重建和渲染
9.5 [9.5] 2502.07278 Articulate That Object Part (ATOP): 3D Part Articulation from Text and Motion Personalization
[{'name': 'Aditya Vora, Sauradip Nag, Hao Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D articulation
motion personalization
video diffusion
Input: Segmented mesh and text prompt 输入:分割网格和文本提示
Step1: Few-shot finetuning for category-specific motion generation 第一步:针对特定类别的运动生成进行少量样本微调
Step2: Multi-view rendering to generate personalized motion video 第二步:多视角渲染生成个性化运动视频
Step3: Differentiable rendering for transferring motion to the 3D object 第三步:可微渲染将运动转移到三维对象
Output: Articulated 3D object with realistic motion 输出:具有真实运动的关节三维对象
9.5 [9.5] 2502.07289 Learning Inverse Laplacian Pyramid for Progressive Depth Completion
[{'name': 'Kun Wang, Zhiqiang Yan, Junkai Fan, Jun Li, Jian Yang'}]
Depth Estimation 深度估计 v2
depth completion
3D reconstruction
state-of-the-art
Input: Sparse depth measurements and corresponding color image 稀疏深度测量和相应的彩色图像
Step 1: Initial low-resolution depth prediction 初步低分辨率深度预测
Step 2: Multi-path feature extraction via MFP module 通过MFP模块进行多路径特征提取
Step 3: Depth map refinement through upsampling and selective filtering 通过上采样和选择性过滤进行深度图优化
Output: Dense depth map 稠密深度图
9.5 [9.5] 2502.07309 Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
[{'name': 'Xiang Li, Pengfei Li, Yupeng Zheng, Wei Sun, Yan Wang, Yilun Chen'}]
3D Reconstruction and Modeling 三维重建 v2
3D occupancy modeling 3D占用建模
autonomous driving 自动驾驶
scene understanding 场景理解
Input: Multi-view images 多视角图像
Step1: Self-supervised pre-training with 2D labels 自监督预训练以2D标签
Step2: Fully-supervised fine-tuning with 3D occupancy labels 全监督微调以3D占用标签
Step3: State-conditioned forecasting module for future occupancy 未来占用状态条件预测模块
Output: 3D occupancy predictions 3D占用预测
9.5 [9.5] 2502.07403 Extended monocular 3D imaging
[{'name': 'Zicheng Shen, Feng Zhao, Yibo Ni, Yuanmu Yang'}]
3D Reconstruction and Modeling 三维重建 v2
3D imaging 3D成像
monocular vision 单目视觉
depth estimation 深度估计
material identification 材料识别
Input: Monocular camera with diffractive-refractive hybrid lens 使用具备衍射-折射混合透镜的单目相机
Step1: Multi-stage fusion of depth cues 深度线索的多级融合
Step2: Snapshot acquisition of 3D point cloud 3D点云的快照获取
Step3: Accurate 3D reconstruction 精确的3D重建
Output: Enhanced 3D imaging capabilities 改进的3D成像能力
9.5 [9.5] 2502.07505 Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds
[{'name': 'Lisa Weijler, Pedro Hermosilla'}]
Point Cloud Processing 点云处理 v2
3D point clouds 3D点云
equivariance 等变性
Input: 3D point clouds 3D点云
Step1: Define Local Reference Frame (LRF) 定义局部参考框架
Step2: Implement continuous SE(3) equivariant convolution 实现连续SE(3)等变卷积
Step3: Train the model with stochastically sampled frames 用随机采样的框架训练模型
Output: Local rotation equivariant features 输出局部旋转等变特征
9.5 [9.5] 2502.07615 Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors
[{'name': 'Lin-Zhuo Chen, Kangjie Liu, Youtian Lin, Siyu Zhu, Zhihao Li, Xun Cao, Yao Yao'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
mesh reconstruction
geometry reconstruction
Input: 3D Gaussian Splatting images 3D高斯点云图像
Step1: Incorporate pre-trained matching prior 引入预训练匹配先验
Step2: Implement Flow Distillation Sampling 算法流蒸馏抽样
Step3: Target unobserved views 目标未观察视图
Output: Enhanced geometric reconstruction 改进的几何重建
9.5 [9.5] 2502.07685 Matrix3D: Large Photogrammetry Model All-in-One
[{'name': 'Yuanxun Lu, Jingyang Zhang, Tian Fang, Jean-Daniel Nahmias, Yanghai Tsin, Long Quan, Xun Cao, Yao Yao, Shiwei Li'}]
3D Reconstruction 三维重建 v2
3D reconstruction
photogrammetry
depth estimation
pose estimation
novel view synthesis
Input: Multi-modal data (images, camera parameters, depth maps) 图像、相机参数和深度图的多模态数据
Step 1: Masked input learning 掩码输入学习
Step 2: Pose estimation 位置估计
Step 3: Depth prediction 深度预测
Step 4: Novel view synthesis 新视图合成
Output: Comprehensive 3D model 综合三维模型
9.0 [9.0] 2502.07030 PrismAvatar: Real-time animated 3D neural head avatars on edge devices
[{'name': 'Prashant Raina, Felix Taubner, Mathieu Tuli, Eu Wern Teh, Kevin Ferreira'}]
3D Reconstruction and Modeling 三维重建 v2
3D avatar
neural rendering
real-time animation
head modeling
Input: Series of matted images of a head 头部图像序列
Step1: Data acquisition and tracking 数据采集与跟踪
Step2: Train hybrid mesh-volumetric model 训练混合网格-体积模型
Step3: Distillation into rigged mesh and neural textures 蒸馏成具有骨架的网格和神经纹理
Output: Real-time animated 3D head avatar 实时动画3D头像
8.5 [8.5] 2502.06843 Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation
[{'name': 'Namhee Kim, Woojin Park'}]
Autonomous Driving 自动驾驶 v2
autonomous driving
large language models
computer vision
Input: Visual inputs and scenarios 视觉输入与场景
Step1: Feature extraction using YOLOv4 and ViT 使用YOLOv4和ViT进行特征提取
Step2: Integration with LLM for reasoning 与LLM结合进行推理
Step3: Generation of situation descriptions and responses 生成情境描述和适当反应
Output: Improved autonomous driving assistance system 改进的自动驾驶辅助系统
8.5 [8.5] 2502.06957 GAS: Generative Avatar Synthesis from a Single Image
[{'name': 'Yixing Lu, Junting Dong, Youngjoong Kwon, Qin Zhao, Bo Dai, Fernando De la Torre'}]
3D Reconstruction and Modeling 三维重建与建模 v2
avatar generation
3D reconstruction
diffusion models
Input: A single image 单幅图像
Step1: 3D human reconstruction 人体三维重建
Step2: Dense driving signal generation 生成密集驱动信号
Step3: Video diffusion model application 应用视频扩散模型
Output: View-consistent and temporally coherent avatars 输出:视图一致且时间连贯的头像
8.5 [8.5] 2502.07001 From Image to Video: An Empirical Study of Diffusion Representations
[{'name': "Pedro V\'elez, Luisa F. Polan\'ia, Yi Yang, Chuhan Zhang, Rishab Kabra, Anurag Arnab, Mehdi S. M. Sajjadi"}]
Image and Video Generation 图像生成与视频生成 v2
diffusion models
video synthesis
image generation
depth estimation
Input: Video and image diffusion models 视频与图像扩散模型
Step1: Model architecture comparison 模型架构比较
Step2: Performance analysis of latent representations 潜在表示性能分析
Step3: Feature extraction and qualitative analysis 特征提取与定性分析
Output: Insights into representations and performance 表示与性能的见解
8.5 [8.5] 2502.07007 Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC
[{'name': 'Siwei Meng, Yawei Luo, Ping Liu'}]
Image and Video Generation 图像生成与视频生成 v2
3D generation
physics priors
AI-generated content
physical realism
Input: Generative models 生成模型
Step1: Review of physics-aware methods 物理感知方法的回顾
Step2: Categorization of generation techniques 生成技术的分类
Step3: Comparative analysis 比较分析
Output: Insights for future research 未来研究的洞见
8.5 [8.5] 2502.07120 Is Long Range Sequential Modeling Necessary For Colorectal Tumor Segmentation?
[{'name': 'Abhishek Srivastava, Koushik Biswas, Gorkem Durak, Gulsah Ozden, Mustafa Adli, Ulas Bagci'}]
3D Segmentation and Reconstruction 3D分割与重建 v2
3D segmentation
tumor segmentation
colorectal cancer
Input: 3D medical images 3D医学影像
Step 1: Evaluate long-range and local token modeling mechanisms 评估长范围和局部标记建模机制
Step 2: Propose MambaOutUNet for tumor segmentation 提出MambaOutUNet用于肿瘤分割
Step 3: Analyze performance on the CTS-204 dataset 在CTS-204数据集上分析性能
Output: Comparative results on tumor segmentation techniques 输出:肿瘤分割技术的比较结果
8.5 [8.5] 2502.07145 Mesh2SSM++: A Probabilistic Framework for Unsupervised Learning of Statistical Shape Model of Anatomies from Surface Meshes
[{'name': 'Krithika Iyer, Mokshagna Sai Teja Karanam, Shireen Elhabian'}]
3D Reconstruction and Modeling 三维重建与建模 v2
Statistical Shape Modeling
Surface Meshes
Unsupervised Learning
Input: Surface meshes 表面网格
Step1: Estimate correspondences from meshes 估计来自网格的对应关系
Step2: Develop probabilistic shape model 开发概率形状模型
Step3: Evaluate model performance 评估模型性能
Output: Statistical shape model 统计形状模型
8.5 [8.5] 2502.07194 Dense Object Detection Based on De-homogenized Queries
[{'name': 'Yueming Huang, Chenrui Ma, Hao Zhou, Hao Wu, Guowu Yuan'}]
Autonomous Driving 自动驾驶 v2
dense object detection
autonomous driving
DETR
deep learning
computer vision
Input: Dense object detection scenario 密集目标检测场景
Step1: Identify issues with existing NMS methods 识别现有NMS方法的问题
Step2: Propose differentiated encoding for queries 提出差异化编码以应对查询
Step3: Implement joint loss for better query initialization 实施联合损失以更好地初始化查询
Output: Enhanced dense object detection framework 改进的密集目标检测框架
8.5 [8.5] 2502.07372 USRNet: Unified Scene Recovery Network for Enhancing Traffic Imaging under Multiple Adverse Weather Conditions
[{'name': 'Yuxu Lu, Ai Chen, Dong Yang, Ryan Wen Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
autonomous driving
image restoration
Input: Degraded images 退化图像
Step1: Feature extraction 特征提取
Step2: Scene restoration 场景恢复
Step3: Edge feature extraction 边缘特征提取
Output: Enhanced image quality 改进的图像质量
8.5 [8.5] 2502.07417 Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving
[{'name': 'Novendra Setyawan, Ghufron Wahyu Kurniawan, Chi-Chia Sun, Wen-Kai Kuo, Jun-Wei Hsieh'}]
Autonomous Driving 自动驾驶 v2
Object Detection 目标检测
Autonomous Driving 自动驾驶
Vision Transformer 视觉变换器
Input: Driving scene images 驾驶场景图像
Step1: Analyze backbone architectures 分析主干架构
Step2: Develop reparameterized attention vision transformer 开发重参数化注意力视觉变换器
Step3: Integrate multi-scale feature extraction 集成多尺度特征提取
Step4: Model evaluation 模型评估
Output: High-performance object detection model 高性能目标检测模型
8.5 [8.5] 2502.07486 Automated Road Extraction and Centreline Fitting in LiDAR Point Clouds
[{'name': 'Xinyu Wang, Muhammad Ibrahim, Atif Mansoor, Hasnein Tareque, Ajmal Mian'}]
3D Reconstruction and Modeling 三维重建 v2
road extraction
3D point clouds
LiDAR
Input: 3D LiDAR point clouds 3D LiDAR点云
Step 1: Statistical outlier removal 统计离群值去除
Step 2: Density-based clustering 基于密度的聚类
Step 3: Ground point filtering using grid-based segmentation 使用基于网格的分割进行地面点过滤
Step 4: 2D projection and skeletonization 2D投影和骨架化
Step 5: Back-projection onto 3D point cloud 反投影到3D点云
Output: Refined road points and centreline 提炼的道路点和中心线
8.5 [8.5] 2502.07631 Divide and Merge: Motion and Semantic Learning in End-to-End Autonomous Driving
[{'name': 'Yinzhe Shen, \"Omer \c{S}ahin Ta\c{s}, Kaiwen Wang, Royden Wagner, Christoph Stiller'}]
Autonomous Driving 自动驾驶 v2
autonomous driving
motion learning
semantic learning
Input: Camera data 摄像头数据
Step1: Motion and semantic task separation 任务分离
Step2: Neural-Bayes motion decoder 运动解码器
Step3: Interactive semantic decoder 交互式语义解码器
Output: Improved detection and tracking 改进的检测与跟踪
8.5 [8.5] 2502.07680 Multiview Point Cloud Registration Based on Minimum Potential Energy for Free-Form Blade Measurement
[{'name': 'Zijie Wu, Yaonan Wang, Yang Mo, Qing Zhu, He Xie, Haotian Wu, Mingtao Feng, Ajmal Mian'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
point cloud registration
noise resistance
industrial measurement
Input: Point cloud data 点云数据
Step1: Definition of objective function 目标函数定义
Step2: Global optimization procedure 全局优化过程
Step3: Fine registration using trimmed ICP 精细配准,使用修剪的ICP算法
Output: Registered point clouds 注册后的点云
8.5 [8.5] 2502.07785 Pippo: High-Resolution Multi-View Humans from a Single Image
[{'name': 'Yash Kant, Ethan Weber, Jin Kyu Kim, Rawal Khirodkar, Su Zhaoen, Julieta Martinez, Igor Gilitschenski, Shunsuke Saito, Timur Bagautdinov'}]
3D Generation 三维生成 v2
3D consistency
multi-view generation
video generation
Input: Single image of a person 一个人的单张图像
Step1: Pre-training on human images 人体图像的预训练
Step2: Multi-view mid-training 多视角中期训练
Step3: Post-training with pixel-aligned controls 像素对齐控制的后期训练
Output: 1K resolution multi-view consistent images 1K分辨率的多视角一致图像
8.0 [8.0] 2502.07508 Enhance-A-Video: Better Generated Video for Free
[{'name': 'Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, Yang You'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
temporal consistency
DiT-based models
Input: DiT-based video generation models 基于DiT的视频生成模型
Step1: Analyze temporal attention analysis 时序注意力分析
Step2: Introduce cross-frame intensity parameters 引入跨帧强度参数
Step3: Enhance video quality through adjusted dependencies 调整依赖关系以增强视频质量
Output: Enhanced video generation quality 提升的视频生成质量
8.0 [8.0] 2502.07564 An Elliptic Curve Based Solution to the Perspective-Three-Point Problem
[{'name': 'Michael Q. Rieck'}]
Computer Vision and Pose Estimation 计算机视觉与位姿估计 v2
P3P
camera pose
elliptic curves
Input: Control points 控制点
Step1: Determine directions of lines 计算直线方向
Step2: Develop P3P solver 开发P3P求解器
Step3: Compare with linear solvers 与线性求解器比较
Output: Accurate camera poses 准确的相机位姿
7.5 [7.5] 2502.07306 TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
[{'name': 'Navid Rajabi, Jana Kosecka'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Navigation
modular approach
navigation instruction
Input: Navigation instruction and environment map 导航指令和环境地图
Step1: Extract landmarks using LLM 提取地标
Step2: Retrieve top-k locations using shortest path algorithm 检索前k个位置,使用最短路径算法
Step3: Compute alignment score with dynamic programming 使用动态规划计算对齐评分
Output: Evaluate path fidelity using nDTW metric 输出:使用nDTW指标评估路径可信度
7.5 [7.5] 2502.07617 Scaling Pre-training to One Hundred Billion Data for Vision Language Models
[{'name': 'Xiao Wang, Ibrahim Alabdulmohsin, Daniel Salz, Zhe Li, Keran Rong, Xiaohua Zhai'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
cultural diversity
multilinguality
Input: 100 billion image-text pairs 1000亿图像-文本对
Step1: Empirical investigation 实证研究
Step2: Performance analysis 性能分析
Step3: Cultural diversity assessment 文化多样性评估
Output: Insights on VLM performance 视觉语言模型性能见解
7.5 [7.5] 2502.07701 Magic 1-For-1: Generating One Minute Video Clips within One Minute
[{'name': 'Hongwei Yi, Shitong Shao, Tian Ye, Jiantong Zhao, Qingyu Yin, Michael Lingelbach, Li Yuan, Yonghong Tian, Enze Xie, Daquan Zhou'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
diffusion models
text-to-image
image-to-video
Input: Text and video data 文本和视频数据
Step1: Task factorization 任务分解
Step2: Generative prior injection 生成先验注入
Step3: Model optimization 模型优化
Output: Efficient video clips 生成高效视频片段
7.5 [7.5] 2502.07737 Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
[{'name': 'Shuhuai Ren, Shuming Ma, Xu Sun, Furu Wei'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
semi-autoregressive modeling
Input: Video data 视频数据
Step1: Block decomposition 块分解
Step2: Semi-autoregressive generation 半自回归生成
Step3: Bidirectional attention application 双向注意力应用
Output: Generated video frames 生成的视频帧

Arxiv 2025-02-11

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.05222 VistaFlow: Photorealistic Volumetric Reconstruction with Dynamic Resolution Management via Q-Learning
[{'name': 'Jayram Palamadai, William Yu'}]
3D Reconstruction and Modeling 三维重建 v2
3D volumetric reconstruction 3D体积重建
dynamic resolution management 动态分辨率管理
photorealistic rendering 照相真实渲染
Input: 2D photographs 二维照片
Step1: Image conversion to PlenOctree data structure 图像转换为PlenOctree数据结构
Step2: Dynamic resolution management using QuiQ 动态分辨率管理使用QuiQ
Step3: Synthesizing novel viewpoints using differentiable rendering 合成新视角使用可微渲染
Output: Interactive 3D volumetric images 互动三维体积图像
9.5 [9.5] 2502.05378 NextBestPath: Efficient 3D Mapping of Unseen Environments
[{'name': "Shiyao Li, Antoine Gu\'edon, Cl\'ementin Boittiaux, Shizhe Chen, Vincent Lepetit"}]
3D Mapping and Reconstruction 3D映射与重建 v2
3D mapping
active mapping
robotics
Input: Unseen indoor environments 未知室内环境
Step1: Create and benchmark a new dataset (AiMDoom) 创建并基准新的数据集 (AiMDoom)
Step2: Develop the next-best-path method (NBP) 开发下一最佳路径方法 (NBP)
Step3: Plan and optimize trajectory for active mapping 规划和优化主动映射的轨迹
Output: Efficiently reconstructed 3D models 有效重建的三维模型
9.5 [9.5] 2502.05859 SphereFusion: Efficient Panorama Depth Estimation via Gated Fusion
[{'name': 'Qingsong Yan, Qiang Wang, Kaiyong Zhao, Jie Chen, Bo Li, Xiaowen Chu, Fei Deng'}]
Depth Estimation 深度估计 v2
panorama depth estimation
3D reconstruction
autonomous driving
Input: Panorama images 全景图像
Step1: Feature extraction 特征提取
Step2: Feature fusion 特征融合
Step3: Depth estimation 深度估计
Output: Depth map and point cloud 深度图和点云
9.5 [9.5] 2502.05874 MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation
[{'name': 'Zhifei Yang, Keyang Lu, Chao Zhang, Jiaxing Qi, Hanqi Jiang, Ruifei Ma, Shenglin Yin, Yifan Xu, Mingzhe Xing, Zhen Xiao, Jieyi Long, Xiangde Liu, Guangyao Zhai'}]
3D Generation 三维生成 v2
3D scene generation
geometry control
mixed-modality graph
Input: Mixed-Modality Graph combining textual and visual modalities
Step1: Process user inputs involving text, image, or both
Step2: Visual enhancement module constructs visual representations
Step3: Relation predictor infers relationships between nodes
Output: Generated 3D indoor scenes with controllable geometry
9.5 [9.5] 2502.06336 DefTransNet: A Transformer-based Method for Non-Rigid Point Cloud Registration in the Simulation of Soft Tissue Deformation
[{'name': 'Sara Monji-Azad, Marvin Kinz, Siddharth Kothari, Robin Khanna, Amrei Carla Mihan, David Maennel, Claudia Scherl, Juergen Hesser'}]
Point Cloud Processing 点云处理 v2
3D reconstruction
point cloud registration
Transformers
Input: Source and target point clouds 源点云和目标点云
Step1: Feature descriptor design 特征描述符设计
Step2: Learning displacement vector fields 学习位移向量场
Output: Enhanced point cloud registration 改进的点云配准
9.5 [9.5] 2502.06338 Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior
[{'name': 'Lee Hyoseok, Kyeong Seon Kim, Kwon Byung-Ki, Tae-Hyun Oh'}]
Depth Estimation 深度估计 v2
depth completion
3D reconstruction
zero-shot learning
Input: Sparse depth measurements and RGB images 输入:稀疏深度测量与RGB图像
Step1: Alignment of depth prior with sparse measurements 步骤1:将深度先验与稀疏测量对齐
Step2: Optimization loop at test-time to enforce constraints 步骤2:在测试时进行优化循环以强制约束
Step3: Depth map completion based on aligned prior 步骤3:基于对齐的先验完成深度图
Output: Complete dense depth map 输出:完整的密集深度图
9.5 [9.5] 2502.06367 FOCUS - Multi-View Foot Reconstruction From Synthetically Trained Dense Correspondences
[{'name': 'Oliver Boyne, Roberto Cipolla'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
multi-view reconstruction
foot model
structure-from-motion
dense correspondences
Input: Multi-view RGB images 多视角RGB图像
Step1: Dataset extension 数据集扩展
Step2: Dense correspondence prediction 密集对应关系预测
Step3: 3D surface reconstruction via SfM and optimization 通过SfM和优化进行3D表面重建
Output: 3D mesh model 输出: 3D网格模型
9.5 [9.5] 2502.06608 TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
[{'name': 'Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, Yan-Pei Cao'}]
3D Generation 三维生成 v2
3D Generation
Shape Diffusion
High-Fidelity 3D Models
Input: Images 输入: 图像
Step1: Data processing 数据处理
Step2: Shape generation 形状生成
Step3: Model evaluation 模型评估
Output: High-fidelity 3D meshes 输出: 高保真3D网格
9.5 [9.5] 2502.06682 Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene
[{'name': 'Tai-Yu Pan, Sooyoung Jeon, Mengdi Fan, Jinsu Yoo, Zhenyang Feng, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao'}]
3D Generation 三维生成 v2
3D generation
collaborative perception
autonomous driving
point cloud generation
Input: Ego-car sensory data 车载传感器数据
Step 1: Data integration 数据集成
Step 2: Conditioned diffusion model training 条件扩散模型训练
Step 3: Generate realistic point clouds 生成真实的点云
Output: Collaborative perception data 协同感知数据
9.2 [9.2] 2502.05769 Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform
[{'name': 'Kyle Gao, Dening Lu, Liangzhi Li, Nan Chen, Hongjie He, Linlin Xu, Jonathan Li'}]
3D Modeling 三维建模 v2
3D modeling
Gaussian Splatting
urban digital twin
GIS integration
Large Language Models
Input: Building's address, postal code, or geographic coordinates
Step1: Integrate with Google Maps Platform APIs
Step2: Perform Gaussian Splatting-based mesh extraction
Step3: Retrieve 3D models and visual descriptions
Output: Digital twin of the building with 3D models and layers of data
8.5 [8.5] 2502.05409 Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment
[{'name': 'Maneesha Wickramasuriya, Beomyeol Yu, Taeyoung Lee, Murray Snyder'}]
3D Simulation and Modeling 三维仿真与建模 v2
3D simulation
pose estimation
UAV
Gaussian splatting
Input: Monocular images from UAV 无人机采集的单目图像
Step1: Data integration and simulation 数据集成与仿真
Step2: Deep pose estimation algorithm development 深度姿态估计算法开发
Step3: Indoor testing and validation 室内测试与验证
Output: Accurate pose estimation for UAV relative to the vessel 输出:无人机相对于船只的准确姿态估计
8.5 [8.5] 2502.05779 A 3D Multimodal Feature for Infrastructure Anomaly Detection
[{'name': 'Yixiong Jing, Wei Lin, Brian Sheil, Sinan Acikgoz'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
anomaly detection
point clouds
crack detection
Input: Point clouds and multimodal features 点云和多模态特征
Step1: Feature extraction 特征提取
Step2: Integration with PatchCore algorithm 集成至PatchCore算法
Step3: Evaluation with statistical methods 使用统计方法进行评估
Output: Enhanced defect detection results 改进的缺陷检测结果
8.5 [8.5] 2502.05964 Revisiting Gradient-based Uncertainty for Monocular Depth Estimation
[{'name': 'Julia Hornauer, Amir El-Ghoussani, Vasileios Belagiannis'}]
Depth Estimation 深度估计 v2
Monocular Depth Estimation 单目深度估计
Uncertainty Estimation 不确定性估计
Input: Monocular images 单目图像
Step1: Gradient extraction using auxiliary loss 梯度提取与辅助损失
Step2: Uncertainty score calculation 不确定性评分计算
Output: Depth predictions and uncertainty scores 深度预测与不确定性评分
8.5 [8.5] 2502.06019 Noise is an Efficient Learner for Zero-Shot Vision-Language Models
[{'name': 'Raza Imam, Asif Hanif, Jian Zhang, Khaled Waleed Dawoud, Yova Kementchedjhieva, Mohammad Yaqub'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language models
noise adaptation
test-time adaptation
Input: Visual representations 视觉表征
Step1: Test-time adaptation 测试时适应
Step2: Learnable noise optimization 可学习噪声优化
Step3: Inter-view representation alignment 视图间表征对齐
Output: Enhanced VLM performance 改进的视觉语言模型性能
8.5 [8.5] 2502.06219 Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing
[{'name': 'Sicen Guo, Tianyou Wen, Chuang-Wei Liu, Qijun Chen, Rui Fan'}]
3D Reconstruction and Modeling 三维重建 v2
RGB-D driving scene parsing
Heterogeneous Feature Integration Transformer
Vision Foundation Models
Input: RGB and depth data RGB和深度数据
Step1: Relative depth estimation 进行相对深度估计
Step2: Heterogeneous Feature Integration Transformer (HFIT) development 开发异构特征集成变换器 (HFIT)
Step3: Feature integration and evaluation 特征集成与评估
Output: Enhanced driving scene parsing model 改进的驾驶场景解析模型
8.5 [8.5] 2502.06337 Accelerating Outlier-robust Rotation Estimation by Stereographic Projection
[{'name': 'Taosi Xu, Yinlong Liu, Xianbo Wang, Zhi-Xin Yang'}]
3D Reconstruction and Modeling 三维重建 v2
Rotation Estimation
Outlier Robustness
Stereographic Projection
Point Cloud Registration
Input: 3D point sets from different views 3D 点集来自不同视角
Step1: Investigate geometric constraints 调查几何约束
Step2: Use stereographic projection for rotation axis estimation 使用立体投影进行旋转轴估计
Step3: Implement spatial voting for axis identification 实施空间投票以识别轴
Output: Optimal rotation estimations optimal 旋转估计
8.5 [8.5] 2502.06392 TANGLED: Generating 3D Hair Strands from Images with Arbitrary Styles and Viewpoints
[{'name': 'Pengyu Long, Zijun Zhao, Min Ouyang, Qingcheng Zhao, Qixuan Zhang, Wei Yang, Lan Xu, Jingyi Yu'}]
3D Generation 三维生成 v2
3D hair generation
diffusion models
multi-view input
Input: Multi-view linearts and images 多视角线稿和图像
Step 1: Collecting and annotating diverse hairstyle dataset 收集和标注多样的发型数据集
Step 2: Implementing a latent diffusion model with cross-attention 采用具有跨注意力的潜在扩散模型
Step 3: Applying parametric post-processing to enforce structural constraints 应用参数后处理以强制执行结构约束
Output: High-quality 3D hair strands 高质量三维发丝
8.5 [8.5] 2502.06543 Unsupervised Learning for Feature Extraction and Temporal Alignment of 3D+t Point Clouds of Zebrafish Embryos
[{'name': 'Zhu Chen, Ina Laube, Johannes Stegmaier'}]
3D Reconstruction 三维重建 v2
3D+t point clouds
temporal alignment
unsupervised learning
Input: 3D+t point clouds of zebrafish embryos 3D+t 点云
Step1: Feature extraction using autoencoder 特征提取通过自编码器
Step2: Temporal alignment using regression network 时间对齐通过回归网络
Output: Aligned time frames of 3D+t point clouds 对齐的3D+t点云时间帧
8.5 [8.5] 2502.06782 Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
[{'name': 'Dongyang Liu, Shicheng Li, Yutong Liu, Zhen Li, Kai Wang, Xinyue Li, Qi Qin, Yufei Liu, Yi Xin, Zhongyu Li, Bin Fu, Chenyang Si, Yuewen Cao, Conghui He, Ziwei Liu, Yu Qiao, Qibin Hou, Hongsheng Li, Peng Gao'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
Diffusion Transformers
Input: Video generation task 视频生成任务
Step1: Implement Multi-scale Next-DiT architecture 实现多尺度Next-DiT架构
Step2: Incorporate motion conditioning 引入运动条件
Step3: Progressive and multi-source training for efficiency 进行渐进和多源训练以提高效率
Output: High-quality generated videos 高质量生成视频
8.5 [8.5] 2502.06787 Visual Agentic AI for Spatial Reasoning with a Dynamic API
[{'name': 'Damiano Marsili, Rohun Agrawal, Yisong Yue, Georgia Gkioxari'}]
Spatial Reasoning 空间推理 v2
3D spatial reasoning
Visual reasoning
Dynamic API
Input: Queries for 3D understanding 3D理解的查询
Step1: Dynamic API generation 动态API生成
Step2: Program synthesis 程序合成
Step3: Evaluation with benchmarks 使用基准评估
Output: Enhanced 3D spatial reasoning capabilities 改进的3D空间推理能力
8.0 [8.0] 2502.06023 Dual Caption Preference Optimization for Diffusion Models
[{'name': 'Amir Saeidi, Yiran Luo, Agneet Chatterjee, Shamanthak Hegde, Bimsara Pathiraja, Yezhou Yang, Chitta Baral'}]
Image Generation 图像生成 v2
image generation
text-to-image
diffusion models
Input: Text-to-image diffusion model 文本到图像扩散模型
Step1: Mitigate irrelevant prompts 减少无关提示
Step2: Optimize dual caption preferences 优化双重标题偏好
Step3: Experiment with different caption strategies 采用不同的标题策略
Output: Improved image generation 改进的图像生成

Arxiv 2025-02-10

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.04630 High-Speed Dynamic 3D Imaging with Sensor Fusion Splatting
[{'name': 'Zihao Zou, Ziyuan Qu, Xi Peng, Vivek Boominathan, Adithya Pediredla, Praneeth Chakravarthula'}]
3D Reconstruction 三维重建 v2
3D reconstruction
sensor fusion
Gaussian splatting
high-speed imaging
Input: RGB, depth, and event camera data 输入: RGB、深度和事件相机数据
Step1: Data integration 数据集成
Step2: Scene representation using deformable 3D Gaussians 场景表示使用可变形3D高斯
Step3: Joint optimization of Gaussian parameters jointly 优化高斯参数
Output: High-quality 3D scene reconstruction 输出: 高质量3D场景重建
9.5 [9.5] 2502.04734 SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting
[{'name': 'Huajian Huang, Yingshu Chen, Longwei Li, Hui Cheng, Tristan Braud, Yajie Zhao, Sai-Kit Yeung'}]
3D Reconstruction 三维重建 v2
3D reconstruction
omnidirctional images
Input: 360-degree images 360度图像
Step1: Direct pose calibration 直接姿态标定
Step2: 3D Gaussians optimization 3D高斯优化
Step3: Joint optimization of parameters 参数的联合优化
Output: Enhanced omnidirectional radiance fields 改进的全方位辐射场
9.5 [9.5] 2502.04804 DetVPCC: RoI-based Point Cloud Sequence Compression for 3D Object Detection
[{'name': 'Mingxuan Yan, Ruijie Zhang, Xuedou Xiao, Wei Wang'}]
3D Object Detection 3D 物体检测 v2
3D reconstruction
point cloud compression
object detection
Input: 3D point cloud sequences 3D 点云序列
Step1: Identify regions of interest (RoIs) 识别兴趣区域 (RoIs)
Step2: Apply RoI-based encoding 应用 RoI 基于编码
Step3: Compress using VPCC and evaluate compressive performance 基于 VPCC 压缩并评估压缩性能
Output: Compressed point cloud data with improved detection accuracy 输出: 经过压缩的点云数据,具有改进的检测准确性
9.5 [9.5] 2502.04843 PoI: Pixel of Interest for Novel View Synthesis Assisted Scene Coordinate Regression
[{'name': 'Feifei Li, Qi Song, Chi Zhang, Hui Shuai, Rui Huang'}]
3D Reconstruction 三维重建 v2
3D reconstruction
scene coordinate regression
novel view synthesis
Input: Rendered images and sparse inputs 渲染图像和稀疏输入
Step1: Pixel filtering to retain well-rendered pixels 像素过滤以保留渲染良好的像素
Step2: Scene Coordinate Regression (SCR) model training based on filtered data 基于过滤数据的场景坐标回归模型训练
Step3: Evaluation of pose estimation performance 性能评估
9.5 [9.5] 2502.04981 OccGS: Zero-shot 3D Occupancy Reconstruction with Semantic and Geometric-Aware Gaussian Splatting
[{'name': 'Xiaoyu Zhou, Jingqi Wang, Yongtao Wang, Yufei Wei, Nan Dong, Ming-Hsuan Yang'}]
3D Reconstruction 三维重建 v2
3D occupancy reconstruction
semantic reconstruction
Gaussian Splatting
Input: Raw sensor data 原始传感器数据
Step1: Extract semantic information from vision-language models 提取语言模型中的语义信息
Step2: Construct Semantic and Geometric-Aware Gaussians 构建语义和几何意识高斯
Step3: Implement cumulative Gaussian-to-3D voxel splatting 实现累积高斯到3D体素的溅射
Output: Semantic 3D occupancy reconstruction 语义3D占用重建
9.5 [9.5] 2502.05040 GaussRender: Learning 3D Occupancy with Gaussian Rendering
[{'name': 'Loick Chambon, Eloi Zablocki, Alexandre Boulch, Mickael Chen, Matthieu Cord'}]
3D Reconstruction 三维重建 v2
3D occupancy
Gaussian rendering
autonomous driving
semantic understanding
voxel-based supervision
Input: 3D voxel representations 3D体素表示
Step1: Projection to 2D perspectives 投影到2D视图
Step2: Introduction of Gaussian splatting 高斯点云引入
Step3: Loss integration for training 损失函数集成
Output: Enhanced 3D occupancy models 改进的3D占用模型
9.5 [9.5] 2502.05175 Fillerbuster: Multi-View Scene Completion for Casual Captures
[{'name': 'Ethan Weber, Norman M\"uller, Yash Kant, Vasu Agrawal, Michael Zollh\"ofer, Angjoo Kanazawa, Christian Richardt'}]
3D Reconstruction 三维重建 v2
3D scene completion
multi-view synthesis
novel view generation
Input: Multi-view casual captures 多视角随意捕捉
Step1: Unobserved content recovery 未观察到的内容恢复
Step2: Generative model training 生成模型训练
Step3: Scene completion and pose prediction 场景补全与姿势预测
Output: Complete 3D scene with novel views 输出: 完整的三维场景与新视角
9.5 [9.5] 2502.05176 AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360{\deg} Unbounded Scene Inpainting
[{'name': 'Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen, Jie-Ying Lee, Bo-Hsu Ke, Chun-Wei Tuan Mu, Yi-Chuan Huang, Chin-Yang Lin, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D scene inpainting
Gaussian Splatting
depth-aware methods
multi-view coherence
unbounded scenes
Input: Multi-view images, camera parameters, object masks, and reference images 输入: 多视角图像、相机参数、对象掩膜和参考图像
Step1: Generate depth-aware unseen masks for occlusion identification 步骤1: 生成深度感知的看不见掩膜以识别遮挡
Step2: Apply Adaptive Guided Depth Diffusion for point placement 步骤2: 应用自适应引导深度扩散进行点放置
Step3: Employ SDEdit for detail enhancement and coherence 步骤3: 使用SDEdit进行细节增强和一致性
Output: High-quality inpainted 3D scenes 输出: 高质量的3D场景修复
8.5 [8.5] 2502.04361 Predicting 3D Motion from 2D Video for Behavior-Based VR Biometrics
[{'name': 'Mingjun Li, Natasha Kholgade Banerjee, Sean Banerjee'}]
3D Motion Prediction 三维运动预测 v2
3D motion prediction
biometric authentication
virtual reality
2D video
Input: 2D body joint data from video 输入: 来自视频的2D身体关节数据
Step1: External video tracking 外部视频追踪
Step2: 2D to 3D motion prediction 从2D到3D的运动预测
Step3: Authentication model evaluation 认证模型评估
Output: Enhanced biometric authentication system 输出: 增强的生物识别认证系统
8.5 [8.5] 2502.04377 MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction
[{'name': 'Xiaoshuai Hao, Yunfeng Diao, Mengchuan Wei, Yifan Yang, Peng Hao, Rong Yin, Hui Zhang, Weiming Li, Shu Zhao, Yu Liu'}]
Map Construction 地图构建 v2
BEV Feature Fusion
Autonomous Driving
Map Construction
Cross-modal Interaction
Input: Multi-modal data from camera and LiDAR sensors
Step1: Cross-modal Interaction Transform (CIT) for semantic alignment
Step2: Dual Dynamic Fusion (DDF) for selective information integration
Step3: Map construction tasks evaluation
Output: Enhanced HD and BEV maps
8.5 [8.5] 2502.04378 DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation
[{'name': "Luciano Baresi, Davide Yi Xian Hu, Muhammad Irfan Mas'udi, Giovanni Quattrocchi"}]
Multi-modal Testing and Image Generation 多模态测试与图像生成 v2
autonomous driving
deep learning testing
diffusion models
Input: Existing images from datasets 现有数据集中的图像
Step1: Image captioning 进行图像描述
Step2: Keyword identification 关键词识别
Step3: Counterfactual caption generation 生成反事实描述
Step4: Image generation using diffusion model 利用扩散模型生成图像
Output: Augmented test images 增强的测试图像
8.5 [8.5] 2502.04478 OneTrack-M: A multitask approach to transformer-based MOT models
[{'name': 'Luiz C. S. de Araujo, Carlos M. S. Figueiredo'}]
Autonomous Systems and Robotics 自动驾驶 v2
Multi-Object Tracking
transformers
autonomous vehicles
Input: Video sequences from cameras 视频序列
Step1: Data pre-processing 数据预处理
Step2: Model architecture design 模型架构设计
Step3: Multitask training techniques 多任务训练技术
Output: Enhanced tracking and detection performance 改进的跟踪与检测性能
8.5 [8.5] 2502.04483 Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
[{'name': 'Nathan Louis, Mahzad Khoshlessan, Jason J. Corso'}]
3D Reconstruction 三维重建 v2
3D human pose estimation
physical plausibility
physics simulation
3D reconstruction
Input: 3D human poses from estimation models 3D 人类姿势估计模型
Step1: Physics simulation setup 物理仿真设置
Step2: Metric introduction (CoM distance, Pose Stability Duration) 指标引入(质心距离,姿态稳定时间)
Step3: Evaluation against state-of-the-art methods 评估与现有最佳方法的比较
Output: Metrics for physical plausibility and stability 普适性的物理合理性和稳定性的指标
8.5 [8.5] 2502.04566 An Optimized YOLOv5 Based Approach For Real-time Vehicle Detection At Road Intersections Using Fisheye Cameras
[{'name': 'Md. Jahin Alam, Muhammad Zubair Hasan, Md Maisoon Rahman, Md Awsafur Rahman, Najibul Haque Sarker, Shariar Azad, Tasnim Nishat Islam, Bishmoy Paul, Tanvir Anjum, Barproda Halder, Shaikh Anowarul Fattah'}]
Autonomous Systems and Robotics 自主系统与机器人技术 v2
vehicle detection
YOLOv5
fisheye camera
autonomous systems
Input: Fisheye camera images 鱼眼摄像头图像
Step1: Data acquisition 数据采集
Step2: Image preprocessing 图像预处理
Step3: Vehicle detection using modified YOLOv5 基于改进的YOLOv5进行车辆检测
Step4: Model training and ensemble 模型训练与集成
Output: Real-time vehicle detection results 实时车辆检测结果
8.5 [8.5] 2502.04615 Neural Clustering for Prefractured Mesh Generation in Real-time Object Destruction
[{'name': 'Seunghwan Kim, Sunha Park, Seungkyu Lee'}]
3D Reconstruction 三维重建 v2
3D reconstruction
point cloud segmentation
real-time object destruction
Input: Point cloud data 点云数据
Step1: Clustering point cloud with a neural network 使用神经网络进行点云聚类
Step2: Predicting structural weaknesses 预测结构弱点
Step3: Generating prefractured meshes 生成预裂网格
Output: Ready-to-use prefractured meshes 准备使用的预裂网格
8.5 [8.5] 2502.05055 Differentiable Mobile Display Photometric Stereo
[{'name': 'Gawoon Ban, Hyeongjun Kim, Seokjun Choi, Seungwoo Yoon, Seung-Hwan Baek'}]
3D Reconstruction 三维重建 v2
Photometric stereo
3D reconstruction
Mobile devices
Surface normals
Input: Mobile phone display and camera 移动电话显示器和相机
Step1: Developing a mobile app 开发移动应用
Step2: Capturing HDR images and display patterns 捕获HDR图像和显示模式
Step3: Learning display patterns 通过可微学习模式
Output: 3D surface normals and albedos 3D表面法线和反射率
8.5 [8.5] 2502.05091 DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions
[{'name': 'Gorkem Can Ates, Kuang Gong, Wei Shao'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
3D vision-language models
medical imaging
zero-shot classification
efficient computation
Input: 3D medical images 3D医学图像
Step1: Decomposed convolution设计 设计分解卷积
Step2: Integration into CLIP framework 集成到 CLIP 框架中
Step3: Evaluation on CT-RATE dataset 在 CT-RATE 数据集上评估
Output: Efficient 3D vision-language model 高效的 3D 视觉-语言模型
8.5 [8.5] 2502.05153 Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment
[{'name': 'Minh-Quan Le, Gaurav Mittal, Tianjian Meng, A S M Iftekhar, Vishwas Suryanarayanan, Barun Patra, Dimitris Samaras, Mei Chen'}]
Image Generation 图像生成 v2
Image Generation
Visual Question Answering
Multimodal learning
Input: Multimodal context (reference image + text guidance) 多模态上下文(参考图像 + 文本指导)
Step1: Context description generation 上下文描述生成
Step2: Fine-tuning of the diffusion model 调整扩散模型
Step3: Image generation 生成图像
Output: High-fidelity, diverse images 高保真、多样化图像
8.5 [8.5] 2502.05178 QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
[{'name': 'Yue Zhao, Fuzhao Xue, Scott Reed, Linxi Fan, Yuke Zhu, Jan Kautz, Zhiding Yu, Philipp Kr\"ahenb\"uhl, De-An Huang'}]
Neural Rendering 神经渲染 v2
visual tokenization
multimodal understanding
image generation
reconstruction
Input: Image data 影像数据
Step1: Train binary-spherical-quantization-based autoencoder 训练基于二元球面量化的自编码器
Step2: Dynamically balance reconstruction and alignment objectives 动态平衡重建与对齐目标
Step3: Validate performance on multimodal understanding and image generation 验证在多模态理解与图像生成中的表现
Output: Unified model for multimodal tasks 输出:多模态任务的统一模型
7.5 [7.5] 2502.04475 Augmented Conditioning Is Enough For Effective Training Image Generation
[{'name': 'Jiahui Chen, Amy Zhang, Adriana Romero-Soriano'}]
Image Generation 图像生成 v2
image generation
data augmentation
classification
Input: Real images and text prompts 真实图像和文本提示
Step1: Apply data augmentations 应用数据增强
Step2: Condition image generation on augmented data 基于增强数据进行图像生成
Step3: Generate synthetic training images 生成合成训练图像
Output: Enhanced training datasets 改进的训练数据集
7.5 [7.5] 2502.04896 Goku: Flow Based Video Generative Foundation Models
[{'name': 'Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu'}]
Image and Video Generation 图像生成和视频生成 v2
image generation
video generation
text-to-video tasks
Input: Image and video datasets 图像和视频数据集
Step1: Data processing pipeline 数据处理管道
Step2: Model architecture optimization 模型架构优化
Step3: Training and evaluation 训练与评估
Output: High-quality image and video generation 高质量的图像和视频生成

Arxiv 2025-02-07

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.03901 LeAP: Consistent multi-domain 3D labeling using Foundation Models
[{'name': 'Simon Gebraad, Andras Palffy, Holger Caesar'}]
3D Semantic Understanding 3D语义理解 v2
3D semantic labeling
Bayesian update
Vision Foundation Models
Input: Unlabeled image-pointcloud pairs 输入: 未标记的图像-点云对
Step1: Generate soft 2D labels using Vision Foundation Models 步骤1: 使用视觉基础模型生成软2D标签
Step2: Apply Bayesian updating to obtain 3D pseudo-labels 步骤2: 应用贝叶斯更新以获得3D伪标签
Step3: Use 3D Consistency Network to improve label quality 步骤3: 使用3D一致性网络提高标签质量
Output: High-quality 3D semantic labels 输出: 高质量的3D语义标签
9.5 [9.5] 2502.04318 sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views
[{'name': 'Eyvaz Najafli, Marius K\"astingsch\"afer, Sebastian Bernhard, Thomas Brox, Andreas Geiger'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
sparse views
latent features
Input: Sparse view images 稀疏视图图像
Step1: Generate intermediate virtual views 生成中间虚拟视图
Step2: Decode Gaussian primitives 解码高斯原语
Step3: Render novel views 渲染新视图
Output: 360-degree reconstructed scene 360度重建场景
9.0 [9.0] 2502.04139 Beyond the Final Layer: Hierarchical Query Fusion Transformer with Agent-Interpolation Initialization for 3D Instance Segmentation
[{'name': 'Jiahao Lu, Jiacheng Deng, Tianzhu Zhang'}]
3D Instance Segmentation 3D实例分割 v2
3D instance segmentation
transformer-based methods
Input: Scene point cloud input 场景点云输入
Step1: Query initialization 查询初始化
Step2: Hierarchical query fusion 层次查询融合
Step3: Instance segmentation 实例分割
Output: Binary foreground masks with semantic labels 输出:带语义标签的二元前景掩码
8.5 [8.5] 2502.03510 Mapping and Localization Using LiDAR Fiducial Markers
[{'name': 'Yibo Liu'}]
Mapping and Localization 映射与定位 v2
LiDAR
fiducial markers
mapping
localization
Input: LiDAR sensors and fiducial markers
Step1: Development of Intensity Image-based LiDAR Fiducial Marker system
Step2: Detection of 3D fiducials from intensity images
Step3: Algorithm enhancement for 3D map merging and localization
Output: Optimized mapping and localization using LFMs
8.5 [8.5] 2502.03628 The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
[{'name': 'Zhuowei Li, Haizhou Shi, Yunhe Gao, Di Liu, Zhenting Wang, Yuxiao Chen, Ting Liu, Long Zhao, Hao Wang, Dimitris N. Metaxas'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
hallucination
VISTA
multimodal learning
Input: Visual tokens from large Vision-Language Models (LVLMs) 视觉令牌来自大型视觉-语言模型
Step1: Analyze token logits ranking 分析令牌的对数排名
Step2: Identify visual information loss 识别视觉信息损失
Step3: Propose VISTA framework 提出VISTA框架
Output: Enhanced decoding with reduced hallucination 输出:减少幻觉的增强解码
8.5 [8.5] 2502.03639 Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
[{'name': 'Yunuo Chen, Junli Cao, Anil Kag, Vidit Goel, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren'}]
Image and Video Generation 图像生成与视频生成 v2
Video Generation 视频生成
3D Point Regularization 3D点正则化
Diffusion Models 扩散模型
Input: 2D videos with 3D point trajectories 2D视频与3D点轨迹
Step1: Data augmentation 数据增强
Step2: Model fine-tuning 模型微调
Step3: Regularization of shape and motion 形状与运动的正则化
Output: Enhanced video quality 改进的视频质量
8.5 [8.5] 2502.03836 Adapting Human Mesh Recovery with Vision-Language Feedback
[{'name': 'Chongyang Xu, Buzhen Huang, Chengfang Zhang, Ziliang Feng, Yangang Wang'}]
3D Reconstruction and Modeling 三维重建 v2
human mesh recovery
vision-language models
3D reconstruction
diffusion-based framework
Input: Monocular images 单目图像
Step1: Initial pose prediction using a regression model 初始姿态预测
Step2: 2D keypoints extraction from images 从图像中提取2D关键点
Step3: Integration of vision-language descriptions 结合视觉语言描述
Step4: Refinement of 3D mesh using diffusion modeling 使用扩散模型优化3D网格
Output: Enhanced 3D human mesh 改进的3D人类网格
8.5 [8.5] 2502.03877 Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks
[{'name': 'Yuhui Jin, Yaqiong Zhang, Zheyuan Xu, Wenqing Zhang, Jingyu Xu'}]
6D Object Detection and Pose Estimation 6D对象检测与姿态估计 v2
6D object detection
pose estimation
Hybrid Task Cascade
High-Resolution Network
Input: 6D object detection data 6D对象检测数据
Step1: Hybrid Task Cascade integration 集成混合任务级联
Step2: High-Resolution Network backbone usage 使用高分辨率网络骨干
Step3: Advanced post-processing techniques 先进的后处理技术
Output: Improved object detection and pose estimation models 改进的对象检测和姿态估计模型
8.5 [8.5] 2502.04111 Adaptive Margin Contrastive Learning for Ambiguity-aware 3D Semantic Segmentation
[{'name': 'Yang Chen, Yueqi Duan, Runzhong Zhang, Yap-Peng Tan'}]
3D Reconstruction and Modeling 三维重建 v2
3D Semantic Segmentation
Point Cloud Processing
Contrastive Learning
Input: 3D point cloud 数据集
Step1: Ambiguity estimation based on position embeddings 基于位置嵌入的模糊性估计
Step2: Development of adaptive margin contrastive learning algorithm 自适应边际对比学习算法开发
Step3: Evaluation on large-scale datasets 在大规模数据集上进行评估
Output: Improved semantic segmentation results 改进的语义分割结果
8.5 [8.5] 2502.04293 GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
[{'name': 'Weihang Li, Hongli Xu, Junwen Huang, Hyunjun Jung, Peter KT Yu, Nassir Navab, Benjamin Busam'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
semantic shape
pose estimation
Input: Partial RGB-D observations 具有部分可见性的RGB-D观测
Step1: Semantic Shape Reconstruction (SSR) 语义形状重建
Step2: Global Context Enhanced (GCE) feature fusion module 全球上下文增强特征融合模块
Output: Enhanced object poses 改进的物体姿态
8.5 [8.5] 2502.04329 SMART: Advancing Scalable Map Priors for Driving Topology Reasoning
[{'name': 'Junjie Ye, David Paz, Hengyuan Zhang, Yuliang Guo, Xinyu Huang, Henrik I. Christensen, Yue Wang, Liu Ren'}]
Autonomous Systems and Robotics 自动驾驶 v2
autonomous driving
lane topology reasoning
Input: Standard-definition (SD) and satellite maps 标准清晰度和卫星地图
Step 1: Train map prior model to infer lane graphs 训练地图先验模型以推断车道图
Step 2: Integrate model with online topology reasoning models 将模型与在线拓扑推理模型集成
Output: Enhanced lane topology understanding 改进的车道拓扑理解
7.5 [7.5] 2502.03813 Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation
[{'name': 'Xuan Li, Quanchao Lu, Yankaiqi Li, Muqing Li, Yijiashun Qi'}]
Image Generation 图像生成 v2
semantic segmentation
attention mechanism
autonomous driving
Input: Multi-scale images 多尺度图像
Step1: Implement attention mechanism 实施注意力机制
Step2: Optimize Unet architecture 优化Unet架构
Step3: Evaluate on Cityscapes dataset 在Cityscapes数据集上评估
Output: Improved segmentation results 改进的分割结果
7.5 [7.5] 2502.04244 An object detection approach for lane change and overtake detection from motion profiles
[{'name': 'Andrea Benericetti, Niccol\`o Bellaccini, Henrique Pi\~neiro Monteagudo, Matteo Simoncini, Francesco Sambo'}]
Autonomous Driving 自动驾驶 v2
object detection
lane change
ADAS
motion profiles
autonomous driving
Input: Motion profile images 运动轮廓图像
Step1: Dataset creation 数据集创建
Step2: Object detection model development 目标检测模型开发
Step3: Performance evaluation 性能评估
Output: Detection of lane change and overtake maneuvers 车道变换和超车动作检测

Arxiv 2025-02-06

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.02936 Every Angle Is Worth A Second Glance: Mining Kinematic Skeletal Structures from Multi-view Joint Cloud
[{'name': 'Junkun Jiang, Jie Chen, Ho Yin Au, Mingyuan Chen, Wei Xue, Yike Guo'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Joint Cloud
multi-view motion capture
Input: Multi-view images 多视角图像
Step1: Triangulate 2D joints into Joint Cloud 将2D关节三角测量为联合云
Step2: Process using JCSAT to explore correlations 使用JCSAT处理以探索相关性
Step3: Utilize OTAP for feature selection 使用OTAP进行特征选择
Output: 3D motion estimation 3D运动估计
9.5 [9.5] 2502.03449 Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics
[{'name': 'Xuan Li, Chang Yu, Wenxin Du, Ying Jiang, Tianyi Xie, Yunuo Chen, Yin Yang, Chenfanfu Jiang'}]
3D Reconstruction 三维重建 v2
3D reconstruction
garment generation
multi-view images
simulation-ready
Input: In-the-wild image 单张图像
Step1: Pre-trained image-to-sewing pattern generation model 预训练的图像到缝制模式生成模型
Step2: Multi-view diffusion model for producing images 多视角扩散模型用于生成图像
Step3: Refinement using a differentiable garment simulator differentiable garment simulator 进行细化
Output: Simulation-ready 3D garment 适合模拟的三维服装
8.5 [8.5] 2502.02907 PoleStack: Robust Pole Estimation of Irregular Objects from Silhouette Stacking
[{'name': 'Jacopo Villa, Jay W. McMahon, Issa A. D. Nesnas'}]
3D Reconstruction and Modeling 三维重建 v2
3D pole estimation
silhouette stacking
Input: Silhouette images from multiple camera poses 多个相机视角的轮廓图像
Step1: Create a silhouette-stack image 创建轮廓堆叠图像
Step2: Apply Discrete Fourier Transform to enhance robustness 应用离散傅里叶变换以增强鲁棒性
Step3: Estimate 3D pole orientation using projected-pole measurements 使用投影极坐标测量来估计3D极坐标方向
Output: Accurate pole orientation estimation 准确的极坐标方向估计
8.5 [8.5] 2502.02977 Disentangling CLIP Features for Enhanced Localized Understanding
[{'name': 'Samyak Rawelekar, Yujun Cai, Yiwei Wang, Ming-Hsuan Yang, Narendra Ahuja'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
mutual feature information (MFI)
vision-language models (VLM)
multi-label recognition (MLR)
Input: CLIP features from vision-language models 视觉语言模型中的CLIP特征
Step1: Analyze feature correlation 分析特征相关性
Step2: Implement MFI loss 施加MFI损失
Step3: Align text and image features 对齐文本和图像特征
Output: Improved localized understanding 改进的局部理解
8.5 [8.5] 2502.03005 Driver Assistance System Based on Multimodal Data Hazard Detection
[{'name': 'Long Zhouxiang, Ovanes Petrosian'}]
Autonomous Driving 自动驾驶 v2
multimodal data
hazard detection
autonomous driving
incident recognition
Input: Multimodal data (video, audio) 输入:多模态数据(视频、音频)
Step1: Data integration 数据集成
Step2: Attention-based fusion strategy 基于注意力的融合策略
Step3: Incident recognition incidents 事件识别
Output: Enhanced detection accuracy 改进的检测精度
8.5 [8.5] 2502.03465 Seeing World Dynamics in a Nutshell
[{'name': 'Qiuhong Shen, Xuanyu Yi, Mingbao Lin, Hanwang Zhang, Shuicheng Yan, Xinchao Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D representation
Monocular video
Dynamic Gaussian Splatting
Input: Monocular videos 单目视频
Step1: Transform videos to dynamic Gaussian representations 将视频转换为动态高斯表示
Step2: Introduce STAG representation 引入结构化时空对齐高斯表示
Step3: Optimizing for spatial and temporal coherence 进行空间和时间一致性的优化
Output: High-fidelity video reconstruction and spatial-temporal modeling 高保真视频重建和时空建模
7.5 [7.5] 2502.02951 VQA-Levels: A Hierarchical Approach for Classifying Questions in VQA
[{'name': 'Madhuri Latha Madaka, Chakravarthy Bhagvati'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Visual Question Answering
VQA dataset
Hierarchical questions
Input: Visual content and questions 视觉内容和问题
Step1: Dataset development 数据集开发
Step2: Classification of questions 问题分类
Step3: Initial testing on VQA systems 在VQA系统上的初步测试
Output: VQA-Levels dataset VQA-Levels数据集

Arxiv 2025-02-05

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.01666 Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
[{'name': 'Jingming Xia, Guanqun Cao, Guang Ma, Yiben Luo, Qinzhao Li, John Oyekan'}]
Depth Estimation 深度估计 v2
monocular depth estimation
3D reconstruction
generative models
autonomous driving
Input: RGB image
Step1: Extract latent features using Image Encoder
Step2: Extract semantic vector through Image Semantic Encoder
Step3: Integrate features within a denoising UNet
Step4: Generate final metric depth map
Output: Enhanced depth prediction
9.5 [9.5] 2502.01846 UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
[{'name': 'Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Arthur Chen, Srinath Sridhar, Aayush Prakash'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
diffusion models
3D generation
structured representation
Input: 3D Gaussian Splatting data 3D高斯点云数据
Step1: Spherical mapping to transform data into structured 2D representation 使用球面映射将数据转换为结构化2D表示
Step2: Multi-branch network for feature compression 使用多分支网络进行特征压缩
Step3: Integration with existing 2D models with zero-shot learning 将其与现有的2D模型进行无缝整合
Output: Structured 3D representation ready for generative tasks 输出:准备好用于生成任务的结构化3D表示
9.5 [9.5] 2502.01855 Learning Fine-to-Coarse Cuboid Shape Abstraction
[{'name': 'Gregor Kobsik, Morten Henkel, Yanjiang He, Victor Czech, Tim Elsner, Isaak Lim, Leif Kobbelt'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
shape abstraction
cuboids
unsupervised learning
structural analysis
Input: Collections of 3D shapes 3D形状集
Step1: Initialize with fine reconstruction to capture details 细致重建以捕获细节
Step2: Gradually reduce primitives while optimizing loss 渐进减少原始体并优化损失
Step3: Evaluate performance on shape benchmarks 在形状基准上评估性能
Output: Compact cuboid-based representations 紧凑的立方体表示
9.5 [9.5] 2502.01856 Reliability-Driven LiDAR-Camera Fusion for Robust 3D Object Detection
[{'name': 'Reza Sadeghian, Niloofar Hooshyaripour, Chris Joslin, WonSook Lee'}]
3D Object Detection 三维物体检测 v2
LiDAR-camera fusion
3D object detection
autonomous driving
Input: LiDAR and camera data 数据
Step1: Spatio-Temporal Feature Aggregation (STFA) module processes input 提取时空特征
Step2: Reliability module assigns confidence scores 可靠性模块自信度评分
Step3: Confidence-Weighted Mutual Cross-Attention (CW-MCA) module balances information with confidence 用置信度动态平衡信息
Output: Enhanced 3D object detection 改进的三维物体检测
9.5 [9.5] 2502.01896 INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy
[{'name': 'Nastaran Darabi, Divake Kumar, Sina Tayebati, Amit Ranjan Trivedi'}]
3D Perception and Modeling 3D 感知与建模 v2
LiDAR
3D perception
object detection
Input: Noisy LiDAR data 噪声激光雷达数据
Step 1: Meta-learning phase 迁移学习阶段
Step 2: Generate robust saliency maps 生成健壮的显著性图
Step 3: Adversarial curriculum training 对抗性课程训练
Output: Enhanced noise resilience 提升噪声鲁棒性
9.5 [9.5] 2502.02163 Progressive Correspondence Regenerator for Robust 3D Registration
[{'name': 'Guiyu Zhao, Sheng Ao, Ye Zhang, Kai Xu Yulan Guo'}]
3D Registration 3D配准 v2
3D registration
point cloud
outlier removal
reconstruction
robustness
Input: Point cloud data 点云数据
Step1: Prior-guided local grouping using generalized mutual matching 先验引导的局部分组与互匹配
Step2: Local correspondence correction using center-aware three-point consistency 局部对应关系修正
Step3: Global correspondence refinement using extensive iterations 全局对应关系的细化
Output: High-quality point correspondences 高质量的点对应关系
9.5 [9.5] 2502.02187 ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion
[{'name': 'Nissim Maruani, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun'}]
3D Generation 三维生成 v2
3D Generation 3D生成
Shape Variations 形状变体
Input: Reference 3D model 参考3D模型
Step1: Sparse voxel grid and point sampling 稀疏体素网格和点采样
Step2: Multiscale neural architecture training 多尺度神经架构训练
Step3: Generate shape variations 生成形状变体
Output: High-quality 3D shapes 高质量3D形状
9.5 [9.5] 2502.02247 Rotation-Adaptive Point Cloud Domain Generalization via Intricate Orientation Learning
[{'name': 'Bangzhen Liu, Chenxi Zheng, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Shengfeng He'}]
3D Reconstruction and Modeling 三维重建 v2
3D point cloud analysis 3D点云分析
domain generalization 域推广
rotation robustness 旋转鲁棒性
Input: 3D point clouds 3D点云
Step 1: Identify challenging rotations 识别具有挑战性的旋转
Step 2: Construct intricate orientation set 构建复杂方向集
Step 3: Utilize contrastive learning against orientations 使用对比学习进行方向建模
Output: Generalizable features with rotation consistency 输出: 具有旋转一致性的可泛化特征
9.5 [9.5] 2502.02283 GP-GS: Gaussian Processes for Enhanced Gaussian Splatting
[{'name': 'Zhihao Guo, Jingxuan Su, Shenglin Wang, Jinlong Fan, Jing Zhang, Liangxiu Han, Peng Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
Structure-from-Motion
point clouds
novel view synthesis
Input: Sparse SfM point clouds 稀疏结构光点云
Step1: Dynamic sampling dynamic sampling 动态采样
Step2: Gaussian Process modeling 高斯过程建模
Step3: Densification of point clouds 点云稠密化
Output: Enhanced 3D Gaussian representation 改进的3D高斯表示
9.5 [9.5] 2502.02334 Event-aided Semantic Scene Completion
[{'name': 'Shangwei Guo, Hao Shi, Song Wang, Xiaoting Yin, Kailun Yang, Kaiwei Wang'}]
3D Reconstruction and Modeling 三维重建 v2
Semantic Scene Completion
3D Reconstruction
Input: Multi-view images 多视角图像
Step1: Data integration 数据集成
Step2: Algorithm development 算法开发
Step3: Model evaluation 模型评估
Output: Enhanced 3D models 改进的三维模型
9.5 [9.5] 2502.02338 Geometric Neural Process Fields
[{'name': 'Wenzhe Yin, Zehao Xiao, Jiayi Shen, Yunlu Chen, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
3D scenes
probabilistic modeling
Input: Limited context images 限制的上下文图像
Step1: Probabilistic modeling 概率建模
Step2: Integrate geometric bases 集成几何基底
Step3: Hierarchical latent variable design 分层潜变量设计
Output: Improved generalization 改进的泛化能力
9.5 [9.5] 2502.02372 MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning
[{'name': 'Shengbo Gu, Yu-Kun Qiu, Yu-Ming Tang, Ancong Wu, Wei-Shi Zheng'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
avatar generation
continual learning
Input: Image data of avatars 头像图像数据
Step1: Implement continual learning strategy 进行持续学习策略
Step2: Develop Global-Local Joint Storage Module 开发全局-局部联合存储模块
Step3: Develop Pose Distillation Module 开发姿态提炼模块
Output: Maintainable virtual avatar 可维护虚拟头像
9.5 [9.5] 2502.02548 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
[{'name': 'Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Chris Choy'}]
3D Segmentation 三维分割 v2
3D segmentation
open-vocabulary
Vision-Language Models
Input: Multi-view images 多视角图像
Step1: Data generation 数据生成
Step2: Data annotation 数据注释
Step3: Training model 训练模型
Output: Open-vocabulary segmentation model 开放词汇分割模型
9.5 [9.5] 2502.02590 Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
[{'name': 'Xiaowen Qiu, Jincheng Yang, Yian Wang, Zhehuan Chen, Yufei Wang, Tsun-Hsuan Wang, Zhou Xian, Chuang Gan'}]
3D Reconstruction and Modeling 三维重建 v2
3D articulated objects
Vision-Language Models
3D modeling
Input: 3D meshes 3D 网格
Step1: Movable Part Segmentation 可动部分分割
Step2: Articulation Estimation 关节估计
Step3: Refinement 精化
Output: Articulated 3D objects 装配式三维物体
9.2 [9.2] 2502.01940 Toward a Low-Cost Perception System in Autonomous Vehicles: A Spectrum Learning Approach
[{'name': 'Mohammed Alsakabi, Aidan Erickson, John M. Dolan, Ozan K. Tonguz'}]
Autonomous Driving 自动驾驶 v2
3D reconstruction
autonomous driving
depth maps
Input: Images from 4D radar detectors and RGB cameras 4D 雷达探测器和 RGB 摄像头的图像
Step1: Integrate radar depth maps and RGB images 集成雷达深度图和 RGB 图像
Step2: Apply pixel positional encoding algorithm 应用像素位置信息编码算法
Step3: Develop spectrum estimation algorithms 研发光谱估计算法
Step4: Train depth map generative models 训练深度图生成模型
Output: Enhanced depth maps 改进的深度图
9.2 [9.2] 2502.02144 DOC-Depth: A novel approach for dense depth ground truth generation
[{'name': 'Simon de Moreau, Mathias Corsia, Hassan Bouchiba, Yasser Almehio, Andrei Bursuc, Hafid El-Idrissi, Fabien Moutarde'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Reconstruction 三维重建
Dense Depth Generation 密集深度生成
LiDAR 激光雷达
Input: LiDAR sensor data 利用激光雷达传感器数据
Step1: 3D environment reconstruction 3D环境重建
Step2: Dynamic object classification 动态对象分类
Step3: Dense depth generation 密集深度生成
Output: Dense depth annotation output 输出:密集深度标注
8.5 [8.5] 2502.01814 PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph
[{'name': 'Dazhou Yu, Genpei Zhang, Liang Zhao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
polyhedral representation
surface-attributed graph
Input: Polyhedral data 多面体数据
Step1: Decompose into local rigid representations 将其分解为局部刚性表示
Step2: Hierarchical aggregation of representations 层次聚合表示
Output: Global representation of polyhedra 全球多面体表示
8.5 [8.5] 2502.01894 SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
[{'name': 'Goodarz Mehr, Azim Eskandarian'}]
Autonomous Systems and Robotics 自主系统与机器人 v2
Synthetic Data Generation 合成数据生成
Autonomous Driving 自动驾驶
BEV Representation 鸟瞰视图表示
Input: Multi-sensor data collection 多传感器数据收集
Step1: Configuration of synthetic data generation 生成合成数据的配置
Step2: Data generation for BEV representation 生成鸟瞰视图表示的数据
Step3: Annotation of perception data 性能数据的标注
Output: SimBEV dataset with annotated driving scenarios 输出: 包含标注的驾驶场景的SimBEV数据集
8.5 [8.5] 2502.01949 LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
[{'name': 'Yang Zhou, Zongjin He, Qixuan Li, Chao Wang'}]
3D Generation 三维生成 3D scene generation
physically consistent layouts
text-guided generation
Input: Text prompt 文本提示
Step1: Convert text to scene graph 将文本转换为场景图
Step2: Adjust Gaussian densities and layouts 调整高斯密度和布局
Step3: Make dynamic camera adjustments 进行动态相机调整
Output: 3D compositional scene generation 3D 组合场景生成
8.5 [8.5] 2502.01961 Hierarchical Consensus Network for Multiview Feature Learning
[{'name': 'Chengwei Xia, Chaoxi Niu, Kun Zhan'}]
Multi-view and Stereo Vision 多视角立体 v2
multiview feature learning
hierarchical consensus
3D reconstruction
Input: Multi-view images 多视角图像
Step1: Learning view-consistency features 学习视图一致性特征
Step2: Hierarchical consensus derivation 层次共识推导
Step3: Comprehensive feature extraction 综合特征提取
Output: Discriminative features 具有区分性的特征
8.5 [8.5] 2502.02091 Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation
[{'name': 'JooHyun Kwon, Hanbyel Cho, Junmo Kim'}]
Image and Video Generation 图像生成 v2
4D Gaussian Splatting
dynamic scene editing
computer vision
motion artifacts
Input: 4D dynamic scene data 4D动态场景数据
Step1: Model static 3D Gaussians 模型静态三维高斯
Step2: Implement Hexplane-based deformation field 实现基于Hexplane的变形场
Step3: Perform editing on static 3D Gaussians 在静态三维高斯上执行编辑
Step4: Apply score distillation for refinement 应用得分蒸馏进行细化
Output: Enhanced edited dynamic scenes 改进的编辑动态场景
8.5 [8.5] 2502.02322 Improving Generalization Ability for 3D Object Detection by Learning Sparsity-invariant Features
[{'name': 'Hsin-Cheng Lu, Chung-Yi Lin, Winston H. Hsu'}]
3D Object Detection 3D物体检测 v2
3D object detection 3D物体检测
autonomous driving 自动驾驶
generalization 泛化
Input: Source domain 3D point clouds 源域3D点云
Step1: Downsample the point cloud based on confidence scores 根据置信度得分下采样点云
Step2: Teacher-student framework to align BEV features 使用师生框架对齐鸟瞰视图特征
Step3: Apply FCA and GERA to maintain consistency 使用FCA和GERA保持一致性
Output: Domain-agnostic 3D object detector 域无关的3D物体检测器
8.5 [8.5] 2502.02468 High-Fidelity Human Avatars from Laptop Webcams using Edge Compute
[{'name': 'Akash Haridas, Imran N. Junejo'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Morphable Models 3D可变形模型
Photo-realistic Rendering 照相真实渲染
Avatar Generation 头像生成
Input: Images from consumer-grade laptop webcams 笔记本电脑网络摄像头拍摄的图像
Step1: Shape generation by fitting 3DMM shape parameters 通过拟合3D形状模型参数生成形状
Step2: Texture map generation 纹理图生成
Step3: Rendering using pre-defined parameters 使用预定义参数进行渲染
Output: High-fidelity animatable avatars 高保真可动画化头像
8.5 [8.5] 2502.02537 Uncertainty Quantification for Collaborative Object Detection Under Adversarial Attacks
[{'name': 'Huiqun Huang, Cong Chen, Jean-Philippe Monteuuis, Jonathan Petit, Fei Miao'}]
Autonomous Systems and Robotics 自动驾驶 v2
Collaborative Object Detection
Uncertainty Quantification
Adversarial Attacks
Autonomous Driving
Input: Collaborative Object Detection (COD) models 协作目标检测模型
Step1: Apply adversarial training adversarially during collaboration 在协作中施加对抗性训练
Step2: Provide output uncertainty estimation through learning-based module 提供基于学习的模块输出的不确定性估计
Step3: Calibrate uncertainty using conformal prediction 对不确定性进行校准
Output: Enhanced object detection accuracy 提高的目标检测准确性
7.5 [7.5] 2502.01906 Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
[{'name': 'Chia-Wen Kuo, Sijie Zhu, Fan Chen, Xiaohui Shen, Longyin Wen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language models
Decomposed Attention
cross-modal learning
Input: Visual and textual embeddings 视觉和文本嵌入
Step1: Decompose the self-attention mechanism 解构自注意力机制
Step2: Optimize visual-to-visual self-attention 视觉-视觉自注意力优化
Step3: Merge visual and textual information 视觉与文本信息合并
Output: Improved efficiency and performance of LVLMs 提高LVLM效率与性能
7.5 [7.5] 2502.01969 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration
[{'name': 'Younan Zhu, Linwei Tao, Minjing Dong, Chang Xu'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
object hallucination
attention calibration
Input: Large Vision-Language Models (LVLMs) 大型视觉语言模型
Step1: Bias estimation from input image 输入图像的偏差估计
Step2: Uniform Attention Calibration (UAC) application 应用统一注意力校准
Step3: Dynamic Attention Calibration (DAC) implementation 实现动态注意力校准
Output: Reduced object hallucination 减少物体幻觉

Arxiv 2025-02-05

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.01814 PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph
[{'name': 'Dazhou Yu, Genpei Zhang, Liang Zhao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
polyhedral representation
Input: 3D polyhedral objects 3D 多面体对象
Step1: Surface-attributed graph construction 表面属性图构建
Step2: Local rigid representation learning 局部刚性表示学习
Step3: Hierarchical aggregation of representations 表示的分层聚合
Output: Global representation of polyhedra 全球多面体表示
9.5 [9.5] 2502.01846 UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
[{'name': 'Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Arthur Chen, Srinath Sridhar, Aayush Prakash'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
UV Mapping
image-based generation
3D reconstruction 3D重建
Input: 3D Gaussian Splatting (3DGS) data 3D高斯点云数据
Step1: Spherical mapping to create a structured 2D representation 使用球面映射创建结构化的2D表示
Step2: Compression of heterogeneous features into a shared feature space 将异构特征压缩到共享特征空间
Step3: Integration with pre-trained 2D generative models 与预训练的2D生成模型集成
Output: Structured 2D UV Gaussian Splatting representation 结构化的2D UV高斯点云表示
9.5 [9.5] 2502.01856 Reliability-Driven LiDAR-Camera Fusion for Robust 3D Object Detection
[{'name': 'Reza Sadeghian, Niloofar Hooshyaripour, Chris Joslin, WonSook Lee'}]
3D Object Detection 3D目标检测 v2
3D object detection
LiDAR-camera fusion
autonomous driving
Input: Sensor data from LiDAR and camera LiDAR和摄像头的传感器数据
Step1: Integration of spatial and semantic information 空间和语义信息的集成
Step2: Implementation of Reliability module to assess confidence 实现可靠性模块以评估置信度
Step3: Use of CW-MCA for dynamic weighting of modalities 使用CW-MCA对模态进行动态加权
Output: Robust 3D object detection results 稳健的3D目标检测结果
9.5 [9.5] 2502.01940 Toward a Low-Cost Perception System in Autonomous Vehicles: A Spectrum Learning Approach
[{'name': 'Mohammed Alsakabi, Aidan Erickson, John M. Dolan, Ozan K. Tonguz'}]
Depth Estimation 深度估计 v2
Depth Estimation 深度估计
Autonomous Vehicles 自动驾驶
Radar-RGB Integration 雷达- RGB集成
Input: Radar depth maps and RGB images 雷达深度图和RGB图像
Step1: Pixel positional encoding 像素位置编码
Step2: Transformation to Spatial Spectrum 转换为空间谱
Step3: Generating denser depth maps 生成更密集的深度图
Output: Enhanced depth maps 改进的深度图
9.5 [9.5] 2502.02144 DOC-Depth: A novel approach for dense depth ground truth generation
[{'name': 'Simon de Moreau, Mathias Corsia, Hassan Bouchiba, Yasser Almehio, Andrei Bursuc, Hafid El-Idrissi, Fabien Moutarde'}]
Depth Estimation 深度估计 v2
depth estimation 深度估计
LiDAR
3D reconstruction 三维重建
Input: LiDAR measurements LiDAR测量
Step1: Data aggregation 数据聚合
Step2: Dynamic object classification 动态物体分类
Step3: Dense depth generation 密集深度生成
Output: Fully-dense depth annotations 完全密集的深度注解
9.5 [9.5] 2502.02163 Progressive Correspondence Regenerator for Robust 3D Registration
[{'name': 'Guiyu Zhao, Sheng Ao, Ye Zhang, Kai Xu Yulan Guo'}]
3D Registration 3D 注册 v2
3D registration
point cloud registration
Input: Point clouds from different perspectives 从不同视角获得点云
Step1: Prior-guided local grouping prior引导局部分组
Step2: Generalized mutual matching 广义互匹配
Step3: Center-aware three-point consistency center-aware三点一致性
Step4: Global correspondence refinement 全局对应关系精炼
Output: High-quality correspondences 高质量对应关系
9.5 [9.5] 2502.02187 ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion
[{'name': 'Nissim Maruani, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun'}]
3D Generation 三维生成 v2
3D Generation
shape variations
multiscale neural architecture
interactive generation
Input: A single reference 3D model 单一参考3D模型
Step1: Shape variations generation 形状变体生成
Step2: Multiscale diffusion sampling 多尺度扩散采样
Step3: Interactive editing 交互式编辑
Output: High-quality 3D shape variants 高质量3D形状变体
9.5 [9.5] 2502.02247 Rotation-Adaptive Point Cloud Domain Generalization via Intricate Orientation Learning
[{'name': 'Bangzhen Liu, Chenxi Zheng, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Shengfeng He'}]
3D Reconstruction and Modeling 三维重建 v2
3D point cloud
domain generalization
rotation robustness
Input: Point clouds with variable orientations 变量方向的点云
Step1: Identify challenging rotations 识别具有挑战性的旋转
Step2: Construct intricate orientation set 构建复杂方向集
Step3: Apply contrastive learning using intricate samples 使用复杂样本进行对比学习
Output: Enhanced orientation-aware 3D representations 改进的方向感知3D表示
9.5 [9.5] 2502.02283 GP-GS: Gaussian Processes for Enhanced Gaussian Splatting
[{'name': 'Zhihao Guo, Jingxuan Su, Shenglin Wang, Jinlong Fan, Jing Zhang, Liangxiu Han, Peng Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Processes
novel view synthesis
Input: Sparse SfM point clouds 稀疏的结构光点云
Step1: Develop MOGP model 开发多输出高斯过程模型
Step2: Adaptive sampling and filtering strategy 自适应采样和过滤策略
Step3: Densify the point clouds 使点云密集化
Output: High-quality 3D Gaussians 高质量的3D高斯
9.5 [9.5] 2502.02322 Improving Generalization Ability for 3D Object Detection by Learning Sparsity-invariant Features
[{'name': 'Hsin-Cheng Lu, Chung-Yi Lin, Winston H. Hsu'}]
3D Object Detection 3D物体检测 v2
3D object detection
autonomous driving
domain generalization
Input: LiDAR point clouds from various domains 各种域的LiDAR点云
Step1: Data subsampling based on confidence scores 根据置信度评分进行数据子采样
Step2: Teacher-student framework implementation 教师-学生框架实施
Step3: Feature alignment between domains 域间特征对齐
Output: Generalized 3D object detector 具备良好泛化能力的3D物体检测器
9.5 [9.5] 2502.02334 Event-aided Semantic Scene Completion
[{'name': 'Shangwei Guo, Hao Shi, Song Wang, Xiaoting Yin, Kailun Yang, Kaiwei Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
semantic scene completion
autonomous driving
event cameras
Input: Event and RGB images 输入:事件图像和RGB图像
Step1: Data integration 数据集成
Step2: Event-aided Lifting Module (ELM) 事件辅助提升模块开发
Step3: 3D scene reconstruction 三维场景重建
Output: Enhanced 3D semantic occupancy models 输出:改进的3D语义占用模型
9.5 [9.5] 2502.02338 Geometric Neural Process Fields
[{'name': 'Wenzhe Yin, Zehao Xiao, Jiayi Shen, Yunlu Chen, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
Geometric Neural Process Fields
3D reconstruction
Input: Limited context observations 有限上下文观察
Step 1: Formulate NeF generalization as a probabilistic problem 将NeF泛化表述为一个概率问题
Step 2: Design geometric bases to encode structural information 设计几何基以编码结构信息
Step 3: Develop a hierarchical latent variable model for parameterization 建立分层潜变量模型以进行参数化
Output: Improved generalization for novel scenes and signals 改进的新场景和信号的泛化能力
9.5 [9.5] 2502.02548 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
[{'name': 'Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Chris Choy'}]
3D Segmentation 三维分割 v2
3D segmentation 3D分割
open-vocabulary 开放词汇
Input: 3D scene datasets 3D场景数据集
Step1: Data generation data generation 数据生成
Step2: Model training 模型训练
Step3: Segmentation validation 分割验证
Output: Open-vocabulary 3D segmentation results 开放词汇3D分割结果
9.5 [9.5] 2502.02590 Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
[{'name': 'Xiaowen Qiu, Jincheng Yang, Yian Wang, Zhehuan Chen, Yufei Wang, Tsun-Hsuan Wang, Zhou Xian, Chuang Gan'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D modeling
articulated objects 3D建模
可动物体
Input: 3D mesh 输入: 3D网格
Step1: Movable Part Segmentation 可移动部分分割
Step2: Articulation Estimation and Refinement 动作估计与精细化
Output: Articulated 3D object 输出: 可动的3D物体
9.0 [9.0] 2502.01666 Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
[{'name': 'Jingming Xia, Guanqun Cao, Guang Ma, Yiben Luo, Qinzhao Li, John Oyekan'}]
Depth Estimation 深度估计 v2
Monocular Depth Estimation 单目深度估计
Autonomous Driving 自动驾驶
3D Reconstruction 三维重建
Input: Single RGB image 单个RGB图像
Step1: Image-based semantic embedding image-based using SeeCoder 图像语义嵌入
Step2: Integration of features via denoising UNet 特征集成通过去噪UNet
Step3: Depth map generation 深度图生成
Output: Enhanced depth map 改进的深度图
9.0 [9.0] 2502.01855 Learning Fine-to-Coarse Cuboid Shape Abstraction
[{'name': 'Gregor Kobsik, Morten Henkel, Yanjiang He, Victor Czech, Tim Elsner, Isaak Lim, Leif Kobbelt'}]
3D Reconstruction and Modeling 三维重建 v2
3D shape abstraction
unsupervised learning
cuboids
Input: Collections of 3D shapes 三维形状集合
Step1: Initial fine reconstruction 初始化细致重建
Step2: Apply fine-to-coarse abstraction fine-to-coarse abstraction
Step3: Optimize reconstruction and volume preservation 优化重建与体积保持
Output: Cuboid-based structural abstraction cuboid 基于的结构抽象
8.5 [8.5] 2502.01894 SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
[{'name': 'Goodarz Mehr, Azim Eskandarian'}]
Autonomous Driving 自动驾驶 v2
BEV perception
synthetic data generation
autonomous driving
Input: Multi-sensor data 多传感器数据
Step1: Data generation 生成数据
Step2: Ground truth capture 捕获真实数据
Step3: Dataset creation 创建数据集
Output: Comprehensive BEV dataset 完整的鸟瞩图数据集
8.5 [8.5] 2502.01896 INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy
[{'name': 'Nastaran Darabi, Divake Kumar, Sina Tayebati, Amit Ranjan Trivedi'}]
3D Point Cloud Processing 点云处理 v2
LiDAR
adversarial training
3D perception
Input: Noisy LiDAR data 噪声LiDAR数据
Step1: Prepare saliency maps 准备显著性图
Step2: Apply adversarial curriculum training 应用对抗课程训练
Step3: Train student network 训练学生网络
Output: Robust deep learning model 稳健的深度学习模型
8.5 [8.5] 2502.01949 LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
[{'name': 'Yang Zhou, Zongjin He, Qixuan Li, Chao Wang'}]
3D Generation 三维生成 3D scene generation
3D Gaussian Splatting
physics-guided generation
Input: Text prompt 文本提示
Step1: Convert text to scene graph 将文本转换为场景图
Step2: Adjust density and layout 调整密度和布局
Step3: Dynamic camera adjustments 动态相机调整
Output: Compositional 3D scenes 组合三维场景
8.5 [8.5] 2502.01961 Hierarchical Consensus Network for Multiview Feature Learning
[{'name': 'Chengwei Xia, Chaoxi Niu, Kun Zhan'}]
Multi-view and Stereo Vision 多视角与立体视觉 v2
Multiview Learning 多视角学习
Consensus Learning 共识学习
Feature Integration 特征整合
Input: Multi-view data 多视角数据
Step1: Learn distinct and common information 学习独特和共同信息
Step2: Derive consensus indices 生成共识指标
Step3: Perform hierarchical consensus learning 进行分层共识学习
Output: Comprehensive and discriminative features 详尽和有辨识度的特征
8.5 [8.5] 2502.01969 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration
[{'name': 'Younan Zhu, Linwei Tao, Minjing Dong, Chang Xu'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
object hallucination
Input: LVLMs with visual tokens 视觉语言模型与视觉标记
Step1: Analyze attention biases 分析注意力偏差
Step2: Implement UAC for calibration 实施均匀注意力校准
Step3: Develop DAC for dynamic adjustment 开发动态注意力校准模块
Output: Improved alignment and reduced hallucination 输出: 改进的对齐和减少的幻觉
8.5 [8.5] 2502.02171 DeepForest: Sensing Into Self-Occluding Volumes of Vegetation With Aerial Imaging
[{'name': 'Mohamed Youssef, Jian Peng, Oliver Bimber'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
remote sensing
vegetation analysis
Input: Aerial images from drones 通过无人机获取航空图像
Step1: Synthetic-aperture imaging 合成孔径成像
Step2: Use 3D convolutional neural networks to reduce out-of-focus signals 使用3D卷积神经网络减少模糊信号
Step3: Combine multiple reflectance stacks from various spectral channels 结合来自不同光谱通道的多重反射堆栈
Output: Volumetric representations of vegetation 体积植被表示
8.5 [8.5] 2502.02372 MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning
[{'name': 'Shengbo Gu, Yu-Kun Qiu, Yu-Ming Tang, Ancong Wu, Wei-Shi Zheng'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
3D rendering
continual learning
Input: Limited training data 对应的有限训练数据
Step1: Employ NeRF for 3D rendering 使用NeRF进行3D渲染
Step2: Implement a Global-Local Joint Storage Module 实现全局-局部联合存储模块
Step3: Utilize a Pose Distillation Module 使用姿态蒸馏模块
Output: Maintainable virtual avatars 可维护的虚拟 avatar
8.5 [8.5] 2502.02468 High-Fidelity Human Avatars from Laptop Webcams using Edge Compute
[{'name': 'Akash Haridas Imran N. Junejo'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
avatar generation
differentiable rendering
Input: Consumer-grade laptop webcam images 使用普通笔记本电脑网络摄像头的图像
Step1: Shape generation using 3D morphable models 使用3D可变形模型生成形状
Step2: Landmark detection using optimization 标记检测使用优化
Step3: Texture generation with GANs 使用GAN生成纹理
Step4: Differentiable rendering to create avatars 使用可微渲染创建虚拟形象
Output: High-fidelity human avatars 高保真度人类虚拟形象
8.5 [8.5] 2502.02525 Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation
[{'name': 'Jian Liu, Wei Sun, Hui Yang, Pengchao Deng, Chongpei Liu, Nicu Sebe, Hossein Rahmani, Ajmal Mian'}]
Object Pose Estimation 物体姿态估计 v2
9-DoF object pose estimation
domain generalization
robotic grasping
Input: Rendered synthetic data 渲染合成数据
Step1: Model training 模型训练
Step2: Pose estimation 估计姿态
Step3: Real-time performance optimization 实时性能优化
Output: Estimated 9-DoF object poses 估计的9自由度物体姿态
8.5 [8.5] 2502.02537 Uncertainty Quantification for Collaborative Object Detection Under Adversarial Attacks
[{'name': 'Huiqun Huang, Cong Chen, Jean-Philippe Monteuuis, Jonathan Petit, Fei Miao'}]
Autonomous Systems and Robotics 自动驾驶 v2
Collaborative Object Detection
Uncertainty Quantification
Adversarial Robustness
Autonomous Vehicles
Input: Collaborative object detection models 协作目标检测模型
Step1: Adversarial training for robustness 对抗训练以增强鲁棒性
Step2: Uncertainty quantification estimation 不确定性量化估计
Step3: Calibration of uncertainty using conformal prediction 使用保形预测进行不确定性校准
Output: Enhanced object detection accuracy 改进的目标检测准确性
8.0 [8.0] 2502.01890 Geometric Framework for 3D Cell Segmentation Correction
[{'name': 'Peter Chen, Bryan Chang, Olivia Annette Creasey, Julie Beth Sneddon, Yining Liu'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Segmentation 3D分割
Geometric Framework 几何框架
Input: 2D cell segmentation results 2D细胞分割结果
Step1: Extract geometric features 提取几何特征
Step2: Train binary classifier 训练二元分类器
Step3: Correct segmentation errors 修正分割错误
Output: Accurate 3D cell body reconstruction 精确的3D细胞体重建
8.0 [8.0] 2502.01906 Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
[{'name': 'Chia-Wen Kuo, Sijie Zhu, Fan Chen, Xiaohui Shen, Longyin Wen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Decomposed Attention
Computational Efficiency
Input: Visual and textual embeddings 视觉和文本嵌入
Step1: Decompose the attention mechanism 分解注意力机制
Step2: Optimize visual-to-visual self-attention 优化视觉间自注意力
Step3: Debias positional encodings 去偏差位置编码
Output: Enhanced processing of visual and textual embeddings 改进的视觉和文本嵌入处理
7.5 [7.5] 2502.02225 Exploring the latent space of diffusion models directly through singular value decomposition
[{'name': 'Li Wang, Boyan Gao, Yanran Li, Zhao Wang, Xiaosong Yang, David A. Clifton, Jun Xiao'}]
Image Generation 图像生成 v2
diffusion models
image editing
latent space
Singular Value Decomposition
image generation
Input: Latent space of diffusion models 扩散模型的潜在空间
Step1: Investigate latent space using Singular Value Decomposition (SVD) 通过奇异值分解(SVD)研究潜在空间
Step2: Discover properties of latent space 发现潜在空间的属性
Step3: Propose image editing framework based on properties 提出基于属性的图像编辑框架
Output: Enhanced image editing capabilities 改进的图像编辑能力

Arxiv 2025-02-04

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.00173 Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation
[{'name': 'Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D instance segmentation 3D实例分割
Gaussian Splatted Radiance Fields 高斯点云辐射场
novel view synthesis 新视图合成
Input: Posed 2D image data 2D图像数据
Step1: Extract per-image 2D segmentation masks 提取每帧的2D分割掩码
Step2: 2D-to-3D lifting to assign unique object IDs 在3D中分配唯一对象ID的2D到3D提升流程
Step3: Incremental merging of object fragments into coherent objects 将对象片段合并成一致的对象
Output: High-quality 3D object segments 高质量的3D对象片段
9.5 [9.5] 2502.00360 Shape from Semantics: 3D Shape Generation from Multi-View Semantics
[{'name': 'Liangchen Li, Caoliwen Wang, Yuqi Zhou, Bailin Deng, Juyong Zhang'}]
3D Shape Generation 3D形状生成 v2
3D reconstruction
shape generation
semantic input
Input: Semantic descriptions 语义描述
Step1: Distill 3D geometry from 2D diffusion models 从2D扩散模型提取3D几何
Step2: Refine textures using image and video generation models 使用图像和视频生成模型细化纹理
Step3: Represent the refined 3D model with neural implicit representations 使用神经隐式表示来表示细化的3D模型
Output: Fabricable high-quality meshes 可制造的高质量网格
9.5 [9.5] 2502.00801 Environment-Driven Online LiDAR-Camera Extrinsic Calibration
[{'name': 'Zhiwei Huang, Jiaqi Li, Ping Zhong, Rui Fan'}]
3D Reconstruction and Modeling 三维重建 v2
LiDAR-camera calibration
3D reconstruction
autonomous driving
Input: LiDAR and camera data 激光雷达和相机数据
Step1: Environment interpretation 环境解读
Step2: Data fusion 数据融合
Step3: Dual-path correspondence matching 双色通道对应匹配
Step4: Spatial-temporal optimization 空间-时间优化
Output: Accurate extrinsic calibration 精准的外部标定
9.5 [9.5] 2502.01045 WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction
[{'name': 'Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
generative models
dynamic avatars
Input: Monocular video 单目视频
Step1: Generative prior usage 生成优先级使用
Step2: Dual-Space Optimization 双空间优化
Step3: View selection strategy 视图选择策略
Step4: Pose feature injection 姿势特征注入
Output: High-fidelity dynamic human avatars 高保真动态人形象
9.5 [9.5] 2502.01405 FourieRF: Few-Shot NeRFs via Progressive Fourier Frequency Control
[{'name': 'Diego Gomez, Bingchen Gong, Maks Ovsjanikov'}]
3D Reconstruction and Modeling 三维重建与建模 v2
Few-Shot NeRF
3D Reconstruction
Neural Rendering
Input: Limited input views 有限的输入视角
Step1: Frequency control frequency control
Step2: Curriculum training curriculum training
Step3: Scene reconstruction scene reconstruction
Output: Accurate 3D representations 准确的三维表示
9.2 [9.2] 2502.00262 Your submission contained main.bib and main.tex file, but no main.bbl file (include main.bbl, or submit without main.bib; and remember to verify references)
[{'name': 'Dianwei Chen, Zifan Zhang, Yuchen Liu, Xianfeng Terry Yang'}]
Autonomous Systems and Robotics 自动驾驶 v2
hazard detection
vision-language model
autonomous driving
Input: Multimodal data fusion 多模态数据融合
Step1: Semantic and visual inputs integration 语义和视觉输入集成
Step2: Supervised fine-tuning of vision-language models 有监督微调视觉语言模型
Step3: Hazard detection and edge case evaluation 危险检测和边缘案例评估
Output: Enhanced situational awareness 改进的情境意识
9.2 [9.2] 2502.00315 MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model
[{'name': 'Jihyeok Kim, Seongwoo Moon, Sungwon Nah, David Hyunchul Shim'}]
3D Object Detection 3D对象检测 v2
3D object detection 3D对象检测
monocular vision 单目视觉
depth estimation 深度估计
Input: Monocular images 单目图像
Step1: Depth estimation using Vision Transformer 步骤1:使用视觉Transformer进行深度估计
Step2: Feature extraction with Hierarchical Feature Fusion 步骤2:利用层次特征融合提取特征
Step3: Object detection using DETR architecture 步骤3:使用DETR架构进行对象检测
Output: 3D bounding boxes for detected objects 输出:检测到对象的3D边界框
8.5 [8.5] 2502.00074 SpikingRTNH: Spiking Neural Network for 4D Radar Object Detection
[{'name': 'Dong-Hee Paek, Seung-Hyun Kong'}]
3D Object Detection 目标检测 v2
4D Radar
3D object detection
energy efficiency
autonomous driving
Input: 4D Radar point clouds 4D雷达点云
Step1: Convert RTNH to SNN architecture 将RTNH转换为SNN架构
Step2: Implement biological top-down inference (BTI) 实现生物学自上而下推理(BTI)
Step3: Model evaluation and comparison 模型评估与比较
Output: Energy-efficient 3D object detection model 能源高效的3D目标检测模型
8.5 [8.5] 2502.00342 Embodied Intelligence for 3D Understanding: A Survey on 3D Scene Question Answering
[{'name': 'Zechuan Li, Hongshan Yu, Yihao Ding, Yan Li, Yong He, Naveed Akhtar'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
3D Scene Question Answering
multimodal models
Input: 3D scene representation and query 3D场景表示和查询
Step1: Systematic literature review 系统文献综述
Step2: Dataset analysis 数据集分析
Step3: Methodology evaluation 方法评估
Output: Comprehensive insights and challenges on 3D SQA 对3D SQA的综合见解和挑战
8.5 [8.5] 2502.00500 Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
[{'name': 'Yang Cao, Zhao Song, Chiwun Yang'}]
Video Generation 视频生成 v2
video generation
interpolation
extrapolation
latent flow matching
Input: Video frames 视频帧
Step1: Model latent flow 模型潜在流
Step2: Polynomial projection 多项式投影
Step3: Generate time-dependent frames 生成时间相关帧
Output: Video with interpolation and extrapolation 带插值和外推的视频
8.5 [8.5] 2502.00708 PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
[{'name': 'Qixuan Li, Chao Wang, Zongjin He, Yan Peng'}]
3D Generation 三维生成 v2
3D generation
compositional scenes
large language models
Input: Complex scene descriptions 复杂场景描述
Step1: Semantic parsing and relationship extraction 语义解析和关系提取
Step2: Scene graph generation 场景图生成
Step3: 2D and 3D asset generation 2D和3D资产生成
Step4: Layout prediction and planning 布局预测与规划
Output: High-quality 3D compositional scenes 高质量三维组合场景
8.5 [8.5] 2502.00843 VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
[{'name': 'Yuxin Lin, Mengshi Qi, Liang Liu, Huadong Ma'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Visual Question Answering
Vision-Language Models
Autonomous Driving
Input: Visual Question Answering task in autonomous driving 视觉问答任务之于自动驾驶
Step1: Integrate Vision-Language Models with continual learning 结合视觉语言模型与持续学习
Step2: Implement selective memory replay and knowledge distillation 实施选择性记忆重放与知识蒸馏
Step3: Apply task-specific projection layer regularization 应用特定任务的投影层正则化
Output: Enhanced VQA performance in autonomous driving environments 改进的自动驾驶环境中的视觉问答性能
8.5 [8.5] 2502.00954 Hypo3D: Exploring Hypothetical Reasoning in 3D
[{'name': 'Ye Mao, Weixun Luo, Junpeng Jing, Anlan Qiu, Krystian Mikolajczyk'}]
3D Reasoning in Scenes 三维场景推理 v2
3D reasoning
visual question answering
hypothetical reasoning
Input: Context change descriptions 上下文变化描述
Step1: Dataset construction 数据集构建
Step2: Model evaluation 模型评估
Output: Performance analysis 性能分析
8.5 [8.5] 2502.00960 SAM-guided Pseudo Label Enhancement for Multi-modal 3D Semantic Segmentation
[{'name': 'Mingyu Yang, Jitong Lu, Hun-Seok Kim'}]
3D Semantic Segmentation 三维语义分割 v2
3D semantic segmentation
domain adaptation
pseudo labels
autonomous driving
Input: 3D point cloud and SAM masks 3D点云和SAM掩码
Step1: Class label determination using majority voting 类别标签确定(使用投票法)
Step2: Application of filtering constraints to unreliable labels 对不可靠标签应用过滤约束
Step3: Geometry-Aware Progressive Propagation (GAPP) for label propagation 到所有3D点进行标签传播(GAPP方法)
Output: Enhanced pseudo-labels and improved segmentation performance 输出:改进的伪标签和增强的分割性能
8.5 [8.5] 2502.00972 Pushing the Boundaries of State Space Models for Image and Video Generation
[{'name': 'Yicong Hong, Long Mai, Yuan Yao, Feng Liu'}]
Image and Video Generation 图像生成和视频生成 v2
image generation
video generation
state-space models
transformer models
Input: Images and video sequences 图像和视频序列
Step1: Develop SSM-Transformer hybrid model 开发SSM-Transformer混合模型
Step2: Efficient processing of visual sequences 高效处理视觉序列
Step3: Generate images and videos 生成图像和视频
Output: High-quality images and dynamic videos 高质量图像和动态视频
8.5 [8.5] 2502.01004 ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking
[{'name': 'Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He'}]
Autonomous Systems and Robotics 自动驾驶与机器人技术 v2
6D pose estimation
bin-picking
zero-shot learning
robotic manipulation
Input: RGB-D image and CAD model 输入: RGB-D图像和CAD模型
Step1: Object detection 物体检测
Step2: Point cloud extraction 点云提取
Step3: Position-Aware Correspondence learning 位置感知对应学习
Step4: Pose estimation 位置估计
Output: 6D pose predictions 输出: 6D姿态预测
8.5 [8.5] 2502.01157 Radiant Foam: Real-Time Differentiable Ray Tracing
[{'name': 'Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi'}]
Neural Rendering 神经渲染 v2
differentiable rendering
volumetric meshes
real-time rendering
Input: Volumetric mesh representations 体积网格表示
Step1: Mesh parameterization 网格参数化
Step2: Differentiable ray tracing 可微光线追踪
Step3: Rendering and evaluation 渲染与评估
Output: Real-time rendering results 实时渲染结果
8.5 [8.5] 2502.01281 Label Correction for Road Segmentation Using Road-side Cameras
[{'name': 'Henrik Toikka, Eerik Alamikkotervo, Risto Ojala'}]
Autonomous Systems and Robotics 自动驾驶机器人系统 v2
road segmentation
autonomous vehicles
image registration
deep learning
Input: Roadside camera images 道路监控摄像头图像
Step1: Automatic data collection 自动数据收集
Step2: Semi-automatic annotation method 开发半自动注释方法
Step3: Image registration to correct labels 图像配准以修正标签
Output: Enhanced road segmentation models 改进的道路分割模型
8.5 [8.5] 2502.01297 XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications
[{'name': 'Shangjin Zhai, Nan Wang, Xiaomeng Wang, Danpeng Chen, Weijian Xie, Hujun Bao, Guofeng Zhang'}]
Autonomous Systems and Robotics 自动驾驶与机器人技术 v2
Visual Inertial Odometry
Initialization
Feature Matching
AR
VR
Input: Visual Inertial Odometry (VIO) data 视觉惯性里程计数据
Step1: Initialization using gyroscope and visual measurements 初始化算法
Step2: Hybrid feature matching using optical flow and descriptor methods 特征匹配
Step3: Evaluation on benchmarks and practical applications 验证和实际应用
Output: Enhanced VIO performance 改进的VIO性能
8.5 [8.5] 2502.01357 Bayesian Approximation-Based Trajectory Prediction and Tracking with 4D Radar
[{'name': 'Dong-In Kim, Dong-Hee Paek, Seung-Hyun Song, Seung-Hyun Kong'}]
Autonomous Driving 自动驾驶 v2
3D multi-object tracking
4D Radar
Input: 4D Radar data 4D雷达数据
Step1: Object detection using Bayesian approximation 基于贝叶斯近似进行目标检测
Step2: Motion prediction with transformer network 使用变换器网络进行运动预测
Step3: Two-stage data association integrating Doppler measurements 两阶段数据关联,整合多普勒测量
Output: Accurate 3D MOT results 准确的3D多目标跟踪结果
8.5 [8.5] 2502.01401 Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
[{'name': 'Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang'}]
3D Visual Grounding 3D视觉基础 v2
3D visual grounding
Large Language Model
3D reconstruction
vision-language model
Input: Referring utterances and 3D scene scans 参考话语和三维场景扫描
Step1: Parse utterance into symbolic expression 将话语解析为符号表达式
Step2: Generate spatial relation features 生成空间关系特征
Step3: Use VLM to process visual information 使用视觉语言模型处理视觉信息
Output: Identified target object 确定目标对象
8.0 [8.0] 2502.00800 Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data
[{'name': 'Mengping Yang, Zhe Wang, Ziqiu Chi, Dongdong Li, Wenli Du'}]
Image Generation 图像生成 v2
Generative Adversarial Networks
Data Augmentation
Image Generation
Input: Limited training data 有限训练数据
Step 1: Estimate covariance matrices 估计协方差矩阵
Step 2: Identify semantic transformation directions 确定语义转换方向
Step 3: Apply adversarial semantic augmentation 应用对抗性语义增强
Output: Improved generation quality 改进的生成质量
7.5 [7.5] 2502.00618 DesCLIP: Robust Continual Adaptation via General Attribute Descriptions for Pretrained Vision-Language Models
[{'name': 'Chiyuan He, Zihuan Qiu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language models
knowledge forgetting
general attributes
Input: Pretrained Vision-Language Models (VLMs) 预训练视觉语言模型
Step1: Generating General Attribute Descriptions 生成通用属性描述
Step2: Establishing Vision-GA-Class Associations 建立视觉-通用属性-类关联
Step3: Tuning Visual Encoder 调整视觉编码器
Output: Enhanced Adaptation with Reduced Knowledge Forgetting 改进的适应性,减少知识遗忘
7.5 [7.5] 2502.00639 Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
[{'name': 'Tao Ren, Zishi Zhang, Zehao Li, Jingyang Jiang, Shentao Qin, Guanghao Li, Yan Li, Yi Zheng, Xinping Li, Min Zhan, Yijie Peng'}]
Image Generation 图像生成 v2
Diffusion Model
Image Generation
Video Generation
Input: Diffusion Model (DM) diffusion模型
Step1: Analyze variance and bias variance和偏差分析
Step2: Develop Recursive Likelihood Ratio optimizer 开发递归似然比优化器
Step3: Validate on image and video tasks 在图像和视频任务上验证
Output: Fine-tuned model 改进的模型
7.0 [7.0] 2502.01530 The in-context inductive biases of vision-language models differ across modalities
[{'name': 'Kelsey Allen, Ishita Dasgupta, Eliza Kosoy, Andrew K. Lampinen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language models
inductive biases
generalization
Input: Visual and textual stimuli 视觉和文本刺激
Step1: Inductive bias analysis 偏置分析
Step2: Experimental paradigm application 实验范式应用
Step3: Data collection and evaluation 数据收集与评估
Output: Insights on model generalization 关于模型泛化的见解
6.5 [6.5] 2502.01524 Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective
[{'name': 'Xiaorui Ma, Haoran Xie, S. Joe Qin'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
multimodal learning
Large Language Models
parameter-efficient learning
Vision-Language Models
Input: Vision-language models 视觉-语言模型
Step1: Categorize and review VLLMs 对VLLMs进行分类和审查
Step2: Discuss training paradigms 讨论训练范式
Step3: Summarize benchmarks 总结基准测试
Output: Comprehensive survey report 综合调查报告

Arxiv 2025-02-04

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.00173 Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation
[{'name': 'Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee'}]
3D Reconstruction and Modeling 三维重建 3D instance segmentation 3D实例分割
Gaussian Splatted Radiance Fields 高斯喷溅辐射场
Input: 2D segmentation masks 2D分割掩码
Step1: Feature integration 特征集成
Step2: 3D Gaussian lifting 3D高斯提升
Step3: Segmentation application 分割应用
Output: 3D segmented assets 3D分割资产
9.5 [9.5] 2502.00360 Shape from Semantics: 3D Shape Generation from Multi-View Semantics
[{'name': 'Liangchen Li, Caoliwen Wang, Yuqi Zhou, Bailin Deng, Juyong Zhang'}]
3D Generation 三维生成 3D reconstruction
shape generation
semantics
Input: Multi-view semantics 多视角语义
Step1: Semantic input analysis 语义输入分析
Step2: Geometry and appearance distillation from 2D models 从2D模型提取几何与外观
Step3: Image restoration and detail enhancement 图像修复与细节增强
Step4: Shape reconstruction using neural SDF representation 使用神经签名距离场重建形状
Output: Complex detailed 3D meshes 复杂细节的三维网格
9.5 [9.5] 2502.00801 Environment-Driven Online LiDAR-Camera Extrinsic Calibration
[{'name': 'Zhiwei Huang, Jiaqi Li, Ping Zhong, Rui Fan'}]
3D Reconstruction and Modeling 三维重建 LiDAR-camera calibration
3D reconstruction
data fusion
Input: LiDAR and camera data LiDAR和相机数据
Step1: Environmental interpretation 环境解释
Step2: Dual-path correspondence matching 双路径对应匹配
Step3: Spatial-temporal optimization 空间时间优化
Output: Precise extrinsic calibration 精确的外部标定
8.5 [8.5] 2502.00074 SpikingRTNH: Spiking Neural Network for 4D Radar Object Detection
[{'name': 'Dong-Hee Paek, Seung-Hyun Kong'}]
3D Object Detection 三维物体检测 3D object detection
neural networks
autonomous driving
Input: 4D Radar data 4D 雷达数据
Step1: Process high-density point clouds 处理高密度点云
Step2: Implement spiking neural network architecture 实现脉冲神经网络架构
Step3: Apply biological top-down inference (BTI) 应用生物学的自上而下推理法
Output: Efficient 3D object detection results 高效的三维物体检测结果
8.5 [8.5] 2502.00262 Your submission contained main.bib and main.tex file, but no main.bbl file (include main.bbl, or submit without main.bib; and remember to verify references)
[{'name': 'Dianwei Chen, Zifan Zhang, Yuchen Liu, Xianfeng Terry Yang'}]
Autonomous Driving 自动驾驶 hazard detection
autonomous driving
multimodal data fusion
Input: Multimodal data 输入: 多模态数据
Step1: Data integration 数据集成
Step2: Hazard detection 危险检测
Step3: Spatial localization 空间定位
Output: Enhanced hazard prediction 改进的危险预测
8.5 [8.5] 2502.00315 MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model
[{'name': 'Jihyeok Kim, Seongwoo Moon, Sungwon Nah, David Hyunchul Shim'}]
3D Reconstruction 三维重建 3D object detection
depth estimation
Input: Monocular images 单目图像
Step1: Feature extraction using Vision Transformer 基于视觉变换器的特征提取
Step2: Depth estimation using a relative depth model 使用相对深度模型进行深度估计
Step3: Object detection using DETR architecture 使用DETR架构进行物体检测
Output: Enhanced 3D object detection capabilities 改进的3D物体检测能力
8.5 [8.5] 2502.00528 Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings
[{'name': 'Zachary Huemann, Samuel Church, Joshua D. Warner, Daniel Tran, Xin Tie, Alan B McMillan, Junjie Hu, Steve Y. Cho, Meghan Lubner, Tyler J. Bradshaw'}]
VLM & VLA 视觉语言模型 3D vision-language model
PET/CT
visual grounding
Input: PET/CT reports and images PET/CT 报告和图像
Step1: Automation of weak labeling pipeline 弱标记生成管道自动化
Step2: Data extraction from reports 报告中数据提取
Step3: Training of ConTEXTual Net 3D 训练 ConTEXTual Net 3D
Output: 3D visual grounding model 3D 视觉定位模型
8.5 [8.5] 2502.00708 PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
[{'name': 'Qixuan Li, Chao Wang, Zongjin He, Yan Peng'}]
3D Generation 三维生成 text-to-3D generation
compositional scenes
physics-guided generation
Input: Complex scene descriptions 复杂场景描述
Step1: Scene graph generation 场景图生成
Step2: Asset creation using multimodal agents 使用多模态代理进行资产创建
Step3: Layout prediction with physical model 使用物理模型进行布局预测
Output: Compositional scenes with physical rationality 具有物理合理性的组合场景
8.5 [8.5] 2502.00843 VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
[{'name': 'Yuxin Lin, Mengshi Qi, Liang Liu, Huadong Ma'}]
VLM & VLA 视觉语言模型与视觉语言对齐 Vision-Language Models
Visual Question Answering
autonomous driving
continual learning
Input: Visual Question Answering tasks in autonomous driving 在自动驾驶中的视觉问答任务
Step1: Integrate Vision-Language Models with continual learning 整合视觉语言模型与持续学习
Step2: Implement selective memory replay and knowledge distillation 实施选择性记忆重放和知识蒸馏
Step3: Apply task-specific projection layer regularization 应用任务特定投影层正则化
Output: Improved VQA system performance 改进的视觉问答系统性能
8.5 [8.5] 2502.00954 Hypo3D: Exploring Hypothetical Reasoning in 3D
[{'name': 'Ye Mao, Weixun Luo, Junpeng Jing, Anlan Qiu, Krystian Mikolajczyk'}]
3D Reasoning 3D推理 3D reasoning
Visual Question Answering
scene understanding
Input: Context changes and indoor scene descriptions 上下文变化和室内场景描述
Step1: Benchmark formulation 基准测试制定
Step2: Model evaluation models performance evaluation 模型性能评估
Output: Hypothetical reasoning capabilities 设想推理能力
8.5 [8.5] 2502.00960 SAM-guided Pseudo Label Enhancement for Multi-modal 3D Semantic Segmentation
[{'name': 'Mingyu Yang, Jitong Lu, Hun-Seok Kim'}]
3D Reconstruction and Modeling 三维重建 3D semantic segmentation
domain adaptation
pseudo-labels
autonomous driving
Input: 3D point cloud and SAM masks 输入: 3D点云和SAM掩码
Step1: Class label determination using majority voting 步骤1: 使用投票法确定类别标签
Step2: Unreliable mask label filtering using constraints 步骤2: 使用约束过滤不可靠的掩码标签
Step3: Geometry-Aware Progressive Propagation (GAPP) to propagate mask labels 步骤3: 使用几何感知逐步传播来传递掩码标签
Output: Enhanced pseudo-labels with improved quality 输出: 质量提升的增强伪标签
8.5 [8.5] 2502.01004 ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking
[{'name': 'Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He'}]
Autonomous Systems and Robotics 自动驾驶 6D pose estimation
bin-picking
robotic manipulation
zero-shot learning
Input: Scene instances and CAD models 场景实例与CAD模型
Step1: Feature extraction 特征提取
Step2: Position-aware correspondence learning 基于位置的对应学习
Step3: Pose estimation 位置估计
Output: Accurate 6D poses 准确的6D姿势
8.5 [8.5] 2502.01045 WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction
[{'name': 'Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo'}]
3D Reconstruction 三维重建 3D human reconstruction
photorealistic rendering
Input: Monocular video 单目视频
Step1: Dual-Space Optimization 双空间优化
Step2: Score Distillation Sampling (SDS) 评分蒸馏采样
Step3: View Selection_strategy 视图选择策略
Step4: Pose Feature Injection 姿态特征注入
Output: High-fidelity dynamic human avatars 高保真动态人类虚拟形象
8.5 [8.5] 2502.01157 Radiant Foam: Real-Time Differentiable Ray Tracing
[{'name': 'Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi'}]
Neural Rendering 神经渲染 differentiable rendering
ray tracing
computer vision
Input: Scene representations 场景表示
Step1: Implement volumetric mesh ray tracing 实现体积网格光线追踪
Step2: Develop a novel scene representation 发展新场景表示
Step3: Evaluate rendering speed and quality 评估渲染速度和质量
Output: Real-time rendering model 实时渲染模型
8.5 [8.5] 2502.01281 Label Correction for Road Segmentation Using Road-side Cameras
[{'name': 'Henrik Toikka, Eerik Alamikkotervo, Risto Ojala'}]
Autonomous Driving 自动驾驶 road segmentation
deep learning
autonomous vehicles
data annotation
Input: Roadside camera feeds 路边摄像头视频
Step1: Manual labeling of one frame 手动标注一帧
Step2: Transfer labels to other frames 转移标签到其他帧
Step3: Compensate for camera movements 使用频域图像配准补偿相机位移
Output: Semi-automatically labeled road data 半自动标注的道路数据
8.5 [8.5] 2502.01297 XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications
[{'name': 'Shangjin Zhai, Nan Wang, Xiaomeng Wang, Danpeng Chen, Weijian Xie, Hujun Bao, Guofeng Zhang'}]
Visual Odometry 视觉里程计 Visual Inertial Odometry
Structure from Motion
Augmented Reality
Virtual Reality
Input: Visual inertial measurements 视觉惯性测量
Step1: Robust initialization initialization 稳健初始化
Step2: Feature matching 特征匹配
Step3: State estimation 状态估计
Output: Accurate visual inertial odometry result 精确的视觉惯性里程计结果
8.5 [8.5] 2502.01356 Quasi-Conformal Convolution : A Learnable Convolution for Deep Learning on Riemann Surfaces
[{'name': 'Han Zhang, Tsz Lok Ip, Lok Ming Lui'}]
3D Reconstruction and Modeling 3D重建 3D facial analysis
Riemann surfaces
Input: Geometric data and Riemann surfaces 几何数据和黎曼曲面
Step1: Define quasi-conformal mappings 定义准保形映射
Step2: Develop Quasi-Conformal Convolution operators 开发准保形卷积算子
Step3: Implement Quasi-Conformal Convolutional Neural Network (QCCNN) 实现准保形卷积神经网络
Output: Adaptive convolution for geometric data 自适应卷积用于几何数据
8.5 [8.5] 2502.01357 Bayesian Approximation-Based Trajectory Prediction and Tracking with 4D Radar
[{'name': 'Dong-In Kim, Dong-Hee Paek, Seung-Hyun Song, Seung-Hyun Kong'}]
Robotic Perception 机器人感知 3D multi-object tracking
Bayesian approximation
autonomous driving
Input: 4D Radar data 4D 雷达数据
Step1: Motion prediction using transformer-based network 使用基于变换器的网络进行运动预测
Step2: Bayesian approximation for detection and prediction 步骤 2: 检测和预测中的贝叶斯近似
Step3: Two-stage data association leveraging Doppler measurements 基于多普勒测量的两阶段数据关联
Output: Enhanced multi-object tracking performance 提升的多目标跟踪性能
8.5 [8.5] 2502.01401 Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
[{'name': 'Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang'}]
3D Visual Grounding 3D视觉定位 3D visual grounding
weakly supervised learning
Input: 3D visual information and language 3D视觉信息与语言
Step1: Code generation using LLM 通过LLM生成代码
Step2: Spatial relationship computation 空间关系计算
Step3: Quality evaluation and optimization 质量评估和优化
Output: Efficient grounding results 高效的定位结果
8.5 [8.5] 2502.01405 FourieRF: Few-Shot NeRFs via Progressive Fourier Frequency Control
[{'name': 'Diego Gomez, Bingchen Gong, Maks Ovsjanikov'}]
3D Reconstruction 三维重建 Few-Shot NeRFs 少样本神经辐射场
3D Reconstruction 三维重建
Input: Scene images 场景图像
Step1: Curriculum training curriculum training 课程训练
Step2: Feature parameterization 特征参数化
Step3: Scene complexity increment 增加场景复杂性
Output: High-quality reconstruction 高质量重建
8.0 [8.0] 2502.00342 Embodied Intelligence for 3D Understanding: A Survey on 3D Scene Question Answering
[{'name': 'Zechuan Li, Hongshan Yu, Yihao Ding, Yan Li, Yong He, Naveed Akhtar'}]
3D Reconstruction and Modeling 3D重建与建模 3D scene question answering
multimodal modelling
datasets
Input: 3D scene data 3D场景数据
Step1: Systematic review of datasets 数据集的系统评审
Step2: Analysis of methodologies 方法论分析
Step3: Evaluation of metrics 评估指标
Output: Comprehensive understanding of 3D SQA 3D场景问答的综合理解
8.0 [8.0] 2502.00800 Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data
[{'name': 'Mengping Yang, Zhe Wang, Ziqiu Chi, Dongdong Li, Wenli Du'}]
Image Generation 图像生成 Generative Adversarial Networks
data augmentation
image synthesis
semantic features
Input: Limited image datasets 有限图像数据集
Step1: Estimate covariance matrices 估计协方差矩阵
Step2: Identify meaningful transformation directions 识别有意义的转化方向
Step3: Apply transformations to semantic features 对语义特征应用转化
Output: Enhanced synthetic images 增强合成图像
7.5 [7.5] 2502.00333 BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution
[{'name': 'Kai Liu, Kaicheng Yang, Zheng Chen, Zhiteng Li, Yong Guo, Wenbo Li, Linghe Kong, Yulun Zhang'}]
Image Generation 图像生成 super-resolution
diffusion model
binarization
model compression
Input: Diffusion model for super-resolution 超分辨率扩散模型
Step1: Binarization of model models 模型的二值化
Step2: One-step distillation into extreme compression 一步蒸馏以实现极端压缩
Step3: Integration of sparse and low rank matrix branches 结合稀疏和低秩矩阵分支
Output: Compressed and accelerated super-resolution model 压缩和加速的超分辨率模型
7.5 [7.5] 2502.00500 Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
[{'name': 'Yang Cao, Zhao Song, Chiwun Yang'}]
Image and Video Generation 图像生成 video generation
interpolation
extrapolation
Input: Video frames 视频帧
Step1: Hypothesis generation 假设生成
Step2: Optimal projection approximation 最优投影近似
Step3: Interpolation and extrapolation 插值和外推
Output: Time-dependent video frames 时间依赖视频帧
7.5 [7.5] 2502.00639 Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
[{'name': 'Tao Ren, Zishi Zhang, Zehao Li, Jingyang Jiang, Shentao Qin, Guanghao Li, Yan Li, Yi Zheng, Xinping Li, Min Zhan, Yijie Peng'}]
Image Generation 图像生成 Diffusion Model
image generation
video generation
Input: Probabilistic diffusion model 概率扩散模型
Step1: Pre-training on unlabeled data 在无标签数据上进行预训练
Step2: Recursive Likelihood Ratio optimizer proposal 提出递归似然比优化器
Step3: Implementation of zero-order gradient estimation 零阶梯度估计的实施
Output: Aligned diffusion models 对齐的扩散模型
7.5 [7.5] 2502.00662 Mitigating the Modality Gap: Few-Shot Out-of-Distribution Detection with Multi-modal Prototypes and Image Bias Estimation
[{'name': 'Yimu Wang, Evelien Riddell, Adrian Chow, Sean Sedwards, Krzysztof Czarnecki'}]
VLM & VLA 视觉语言模型与对齐 vision-language models
out-of-distribution detection
few-shot learning
Input: ID image and text prototypes 输入: ID图像和文本原型
Step1: Theoretical analysis 理论分析
Step2: Incorporation of image prototypes 图像原型的整合
Step3: Development of biased prompts generation (BPG) module 偏差提示生成(BPG)模块的开发
Step4: Implementation of image-text consistency (ITC) module 图像文本一致性(ITC)模块的实施
Output: Enhanced VLM-based OOD detection performance 输出: 改进的基于VLM的OOD检测性能
7.5 [7.5] 2502.00711 VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
[{'name': 'Chunbai Zhang, Chao Wang, Yang Zhou, Yan Peng'}]
Vision-Language Models (VLMs) 视觉语言模型 visual reasoning
evidence-based reasoning
VLM
Input: Visual information (images/videos) 输入: 视觉信息(图像/视频)
Step1: Extract fine-grained visual knowledge from visual relationships 第一步: 从视觉关系中提取细粒度视觉知识
Step2: Paraphrase questions with underspecification using extracted knowledge 第二步: 利用提取的知识对欠规范的问题进行改写
Step3: Employ Chain-of-Evidence prompting for interpretable reasoning 第三步: 使用证据链提示进行可解释推理
Output: Enhanced visual reasoning capabilities 输出: 改进的视觉推理能力
7.5 [7.5] 2502.00719 Vision and Language Reference Prompt into SAM for Few-shot Segmentation
[{'name': 'Kosuke Sakurai, Ryotaro Shimizu, Masayuki Goto'}]
VLM & VLA 视觉语言模型与对齐 few-shot segmentation
vision-language model
Input: Annotated reference images and text labels 参考图像和文本标签
Step1: Input visual and semantic reference信息输入视觉和语义参考
Step2: Integrate prompt embeddings into SAM 将提示嵌入集成到SAM
Step3: Few-shot segmentation via VLP-SAM 通过VLP-SAM进行少样本分割
Output: High-performance segmentation results 高性能的分割结果
7.5 [7.5] 2502.00972 Pushing the Boundaries of State Space Models for Image and Video Generation
[{'name': 'Yicong Hong, Long Mai, Yuan Yao, Feng Liu'}]
Image Generation 图像生成 image generation
video generation
Input: Visual sequences 视觉序列
Step1: Model development 模型开发
Step2: Integration of SSM and Transformers SSM与变换器的整合
Step3: Evaluation of generated outputs 生成结果的评估
Output: Generated images and videos 生成的图像和视频
7.5 [7.5] 2502.01524 Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective
[{'name': 'Xiaorui Ma, Haoran Xie, S. Joe Qin'}]
VLM & VLA 视觉语言模型与对齐 Vision-Language
Large Language Models
parameter efficiency
Step1: Introduce architecture of LLMs 介绍LLM架构
Step2: Discuss parameter-efficient learning methods 讨论参数效率学习方法
Step3: Present taxonomy of modality integrators 提出模态集成器分类
Step4: Review training paradigms and efficiency considerations 回顾训练范式及效率考虑
Step5: Compare experimental results of representative models 比较代表模型的实验结果
7.5 [7.5] 2502.01530 The in-context inductive biases of vision-language models differ across modalities
[{'name': 'Kelsey Allen, Ishita Dasgupta, Eliza Kosoy, Andrew K. Lampinen'}]
Vision-Language Models (VLMs) 视觉语言模型 vision-language models
inductive biases
generalization
Input: Stimuli presented in vision and text 视觉和文本中呈现的刺激
Step1: Conduct experiments 进行实验
Step2: Analyze generalization across models 分析模型间的概括性
Output: Insights on inductive biases regarding shape and color 对形状和颜色的归纳偏见的见解
5.0 [5.0] 2502.00618 DesCLIP: Robust Continual Adaptation via General Attribute Descriptions for Pretrained Vision-Language Models
[{'name': 'Chiyuan He, Zihuan Qiu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li'}]
Vision-Language Models (VLMs) 视觉语言模型 vision-language models
continual adaptation
attribute descriptions
Input: Visual features and class text visuals 视觉特征和类别文本
Step1: Generate general attribute descriptions 生成一般属性描述
Step2: Design anchor-based embedding filter 设计基于锚点的嵌入过滤器
Step3: Tune visual encoder 调整视觉编码器
Output: Robust vision-GA-class associations 稳健的视觉-一般属性-类别关联

Arxiv 2025-01-31

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2501.17978v2 VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
view-dependent representation
3D高斯渲染
视角依赖表示
input: images 图片
extend the 3D Gaussian Splatting model 扩展3D高斯渲染模型
introduce an additional symmetric matrix 引入额外的对称矩阵
achieve view-dependent opacity representation 实现视角依赖的透明度表示
output: improved 3D scene reconstruction 输出:改进的3D场景重建
8.5 [8.5] 2501.19319v1 Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping 3D reconstruction 三维重建 3D reconstruction
3D Gaussian Splatting
endoscopic SLAM
depth reconstruction
三维重建
3D高斯斑点
内窥镜SLAM
深度重建
input: endoscopic image sequences 内窥镜图像序列
Step 1: tracking using Gaussian Splatting 使用高斯斑点的跟踪
Step 2: mapping and bundle adjustment 映射与束调整
Step 3: surface normal-aware reconstruction 结合表面法向量进行重构
output: accurate 3D reconstruction and real-time tracking 输出: 精确的3D重建与实时跟踪
8.5 [8.5] 2501.19270v1 Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way 3D reconstruction 三维重建 Point Cloud Completion
3D Shape Completion
Knowledge Distillation
Points Completion
点云补全
3D形状补全
知识蒸馏
点补全
input: incomplete point cloud 有缺失的点云
step1: apply autoencoder to encode the point cloud 应用自编码器对点云进行编码
step2: use knowledge distillation for completion 使用知识蒸馏进行补全
step3: output: completed 3D shape 输出:完整的3D形状
8.5 [8.5] 2501.19196v1 RaySplats: Ray Tracing based Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
Gaussian Splatting
3D高斯喷溅
高斯喷溅
Input: 2D images 2D图像
Ray-tracing mechanism 射线追踪机制
Intersection computation 交点计算
Ray-tracing algorithms construction 射线追踪算法构建
Final 3D object with lighting and shadows 最终带有光影效果的三维物体
8.5 [8.5] 2501.19088v1 JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
3D reconstruction
实时渲染
3D高斯分喷
三维重建
input: 3D key points (输入:3D关键点)
Step 1: Create a joint-driven 3D Gaussian representation (步骤1:创建联合驱动的3D高斯表示)
Step 2: Implement differentiable spatial transformations (步骤2:实现可微分的空间变换)
Step 3: Apply real-time shadow simulation method (步骤3:应用实时阴影模拟方法)
output: High-fidelity hand images (输出:高保真的手部图像)
8.5 [8.5] 2501.18982v1 OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation 3D generation 3D生成 3D generation
3D gaussian
物体生成
3D高斯
input: 3D assets 3D资产
extract: physical properties 提取物理属性
generate: physics-based dynamics 生成基于物理的动态
output: dynamic scene 输出动态场景
7.5 [7.5] 2501.19382v1 LiDAR Loop Closure Detection using Semantic Graphs with Graph Attention Networks Autonomous Driving 自动驾驶 LiDAR
loop closure detection
graph attention networks
place recognition
semanitic registration
激光雷达
回环闭合检测
图注意力网络
地点识别
语义注册
input: semantic graphs 语义图
step1: encode semantic graphs using graph attention networks 使用图注意力网络编码语义图
step2: compare graph vectors to identify loop closure 比较图向量以识别回环闭合
step3: estimate 6 DoF pose constraint using semantic registration 使用语义注册估计6自由度位姿约束
output: loop closure detection results 回环闭合检测结果
7.5 [7.5] 2501.19259v1 Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge Autonomous Driving 自主驾驶 Autonomous Driving
Neuromorphic Vision
Real-time Navigation
Autonomous Systems
自驾驶
神经形态视觉
实时导航
自主系统
Input: Human speech commands 人类语音指令
Step 1: Translate speech into planning commands 将语音翻译成规划指令
Step 2: Execute commands using neuromorphic vision 执行命令使用神经形态视觉
Step 3: Navigate and avoid obstacles in real-time 实时导航和避免障碍
Output: Autonomous drone navigation output 自主无人机导航输出
7.5 [7.5] 2501.19252v1 Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search Video Generation 视频生成 video generation
text-to-video models
视频生成
文本到视频模型
input: diffusion model inputs 输入:扩散模型输入
step1: align video frames with text prompts 步骤1:将视频帧与文本提示对齐
step2: utilize a beam search strategy to optimize output 使用束搜索策略优化输出
step3: compute metrics for perceptual quality evaluation 计算感知质量评估的指标
output: high-quality, aligned video generation 输出:高质量、对齐的视频生成
7.5 [7.5] 2501.19035v1 SynthmanticLiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging Autonomous Driving 自动驾驶 Semantic Segmentation
LiDAR Imaging
Autonomous Driving
合成分割
LiDAR成像
自动驾驶
input: LiDAR data 输入: LiDAR 数据
step1: generate synthetic dataset 生成合成数据集
step2: utilize CARLA simulator 使用 CARLA 模拟器
step3: train segmentation algorithms 训练分割算法
output: improved segmentation performance 输出: 改进的分割性能
7.5 [7.5] 2501.17159v2 IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait Image Generation 图像生成 personalized portrait generation
identity preservation
view-consistent reconstruction
个性化肖像生成
身份保留
视角一致重建
input: reference images 参考图像
step1: Lighting-Aware Stitching 光照感知拼接
step2: View-Consistent Adaptation 视角一致自适应
step3: ControlNet-like supervision 控制网络样监督
output: personalized portraits 个性化肖像
6.5 [6.5] 2501.18994v1 VKFPos: A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration Autonomous Driving (自动驾驶) Monocular Positioning
Extended Kalman Filter
Deep Learning
Single-shot
单目定位
扩展卡尔曼滤波
深度学习
单次
input: monocular images 单目图像
step1: Absolute Pose Regression (APR) 绝对姿态回归
step2: Relative Pose Regression (RPR) 相对姿态回归
step3: Integrate APR and RPR using EKF 通过扩展卡尔曼滤波整合APR和RPR
output: accurate positioning results 精确定位结果
6.0 [6.0] 2501.19331v1 Consistent Video Colorization via Palette Guidance Video Generation 视频生成 Video Colorization
Stable Video Diffusion
Palette Guidance
视频上色
稳定视频扩散
调色板引导
input: video sequences 视频序列
step 1: design palette-based color guider 设计调色板引导器
step 2: utilize Stable Video Diffusion as base model 利用稳定视频扩散作为基础模型
step 3: generate vivid colors using color context 根据颜色上下文生成生动的颜色
output: colorized video sequences 上色的视频序列
5.5 [5.5] 2501.18865v1 REG: Rectified Gradient Guidance for Conditional Diffusion Models Image Generation 图像生成 conditional generation
diffusion models
conditional generation 条件生成
扩散模型
input: guidance techniques 指导技术
step1: replace the scaled marginal distribution target 替换缩放的边际分布目标
step2: implement rectified gradient guidance 实施矩形梯度指导
step3: conduct experiments on image generation tasks 进行图像生成任务的实验
output: improved image generation results 改进的图像生成结果

Arxiv 2025-01-31

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2501.19196v1 RaySplats: Ray Tracing based Gaussian Splatting 3D generation 三维生成 3D Gaussian Splatting
Ray Tracing
3D高斯点云
光线追踪
input: 2D images 2D图像
process: Gaussian Splatting 高斯点云渲染
process: ray tracing based on Gaussian primitives 基于高斯原始体的光线追踪
output: 3D objects with light and shadow effects 输出具有光影效果的3D物体
9.0 [9.0] 2501.17978v2 VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
view-dependent rendering
3D高斯点云
视角依赖的渲染
input: 3D scene reconstruction from images 3D场景重建从图像中提取
step 1: extend 3D Gaussian Splatting model 扩展3D高斯点云模型
step 2: introduce symmetric matrix to enhance opacity representation 引入对称矩阵以增强不透明性表示
step 3: optimize suppression of Gaussians based on viewer perspective 根据观察者视角优化高斯的抑制
output: improved representation of view-dependent reflections and specular highlights 输出:改进视角依赖的反射和镜面高光的表示
8.5 [8.5] 2501.19319v1 Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping 3D reconstruction 三维重建 3D Gaussian Splatting
SLAM
endoscopic reconstruction
depth reconstruction
3D 高斯点
SLAM
内窥镜重建
深度重建
input: endoscopic images 内窥镜图像
step1: surface normal-aware tracking 表面法线感知跟踪
step2: accurate mapping 精确地图构建
step3: bundle adjustment 捆绑调整
output: geometrically accurate 3D reconstruction 准确的三维重建
8.5 [8.5] 2501.19252v1 Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search Video Generation 视频生成 Text-to-video
Diffusion models
Video generation
评分调整
文本转视频
扩散模型
视频生成
奖励校准
input: video generation prompts 视频生成提示
step1: employ diffusion latent beam search 使用扩散潜在光束搜索
step2: maximize alignment reward 最大化对齐奖励
step3: improve perceptual quality 提升感知质量
output: high-quality video optimized for natural movement 输出:高质量视频,优化自然运动
8.5 [8.5] 2501.19088v1 JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
animatable hand avatar
3D高斯喷涂
可动画手部化身
input: 3D key points 3D关键点
Jointly 3D Gaussian Splatting (3DGS) joint-driven representation 联合3D高斯喷涂(3DGS)驱动表示
apply spatial transformations based on 3D key points 基于3D关键点应用空间变换
real-time rendering and shadow simulation 实时渲染和阴影模拟
output: animatable high-fidelity hand images 输出:可动画的高保真手部图像
8.5 [8.5] 2501.18982v1 OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation 3D generation 3D生成 3D generation
3D gaussian
3D生成
3D高斯
input: user-specified prompts 用户指定的提示
step1: define a scene according to user prompts 根据用户提示定义场景
step2: estimate material weighting factors using a pretrained video diffusion model 使用预训练的视频扩散模型估计材料权重因子
step3: represent each 3D asset as a collection of constitutive 3D Gaussians 将每个3D资产表示为一组组成的3D高斯分布
output: a physics-based 3D dynamic scene 输出:基于物理的3D动态场景
8.0 [8.0] 2501.19270v1 Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way 3D reconstruction三维重建 Point Cloud Completion
Multi-view Distillation
3D Shape Recovery
点云补全
多视图蒸馏
3D形状恢复
input: incomplete point cloud 输入: 不完整的点云
step1: apply autoencoder architecture 应用自编码器架构
step2: use knowledge distillation strategy to enhance completion 使用知识蒸馏策略以增强完成度
step3: output: completed point cloud 输出: 完整的点云
7.5 [7.5] 2501.19382v1 LiDAR Loop Closure Detection using Semantic Graphs with Graph Attention Networks Autonomous Driving 自主驾驶 Loop Closure Detection
Semantic Graphs
Graph Attention Networks
闭环检测
语义图
图注意力网络
input: point cloud 输入: 点云
step1: encode semantic graphs using graph attention networks 步骤1: 使用图注意力网络编码语义图
step2: generate graph vectors through self-attention mechanisms 步骤2: 通过自注意力机制生成图向量
step3: compare graph vectors to detect loop closure 步骤3: 比较图向量以检测闭环
output: loop closure candidates 输出: 闭环候选
7.5 [7.5] 2501.19035v1 SynthmanticLiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging Autonomous Driving 自主驾驶 Semantic segmentation
LiDAR imaging
autonomous driving
合成分割
LiDAR成像
自主驾驶
input: LiDAR images (输入: LiDAR图像)
modify CARLA simulator (修改CARLA模拟器)
generate SynthmanticLiDAR dataset (生成SynthmanticLiDAR数据集)
evaluate with transfer learning (使用迁移学习进行评估)
output: improved semantic segmentation performance (输出: 改进的语义分割性能)
7.5 [7.5] 2501.17159v2 IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait Image Generation 图像生成 Personalized Portrait Generation
3D-aware relighting
个性化肖像生成
具3D感知的重光照
Input: reference portrait images 参考肖像图像
Step 1: Lighting-Aware Stitching 具光照感知的拼接
Step 2: View-Consistent Adaptation 具视图一致的适配
Output: personalized portraits with identity preservation 具有身份保留的个性化肖像
7.0 [7.0] 2501.19243v1 Accelerating Diffusion Transformer via Error-Optimized Cache Image Generation 图像生成 Image Generation
Diffusion Transformer
ImageNet Dataset
图像生成
扩散变换器
ImageNet数据集
input: Diffusion Transformer features (扩散变换器特征)
extract caching differences (提取缓存差异)
optimize cache based on errors (基于错误优化缓存)
output: improved generated images (输出: 改进的生成图像)
6.5 [6.5] 2501.19259v1 Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge Autonomous Driving 自主驾驶 autonomous driving
natural language processing
neuroscience
autonomous navigation
自主驾驶
自然语言处理
神经科学
自主导航
input: human speech and dynamic environment 输入:人类语言和动态环境
step1: translate human speech into planning commands 步骤1:将人类语言翻译为规划命令
step2: navigate and avoid obstacles using neuromorphic vision 步骤2:利用神经形态视觉导航并避免障碍物
output: real-time autonomous navigation output 实时自主导航结果
6.5 [6.5] 2501.18994v1 VKFPos: A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration Autonomous Driving 自主驾驶 monocular positioning
extended kalman filter
variational bayesian inference
单目定位
扩展卡尔曼滤波
变分贝叶斯推理
input: monocular images 单目图像
step1: Absolute Pose Regression (APR) 绝对姿态回归
step2: Relative Pose Regression (RPR) 相对姿态回归
step3: Integration with Extended Kalman Filter (EKF) 通过扩展卡尔曼滤波整合
output: accurate positional predictions 准确的位置信息预测

Arxiv 2025-01-30

Relavance Title Research Topic Keywords Pipeline
8.5 [8.5] 2501.18594v1 Foundational Models for 3D Point Clouds: A Survey and Outlook 3D reconstruction 3D重建 3D point clouds
foundational models
3D视觉理解
基础模型
3D点云
input: 3D point clouds 3D点云
step1: review of foundational models FMs 基础模型的回顾
step2: categorize use of FMs in 3D tasks 分类基础模型在3D任务中的应用
step3: summarize state-of-the-art methods 总结最新的方法
output: comprehensive overview of FMs for 3D understanding 输出:基础模型在3D理解中的综合概述
8.5 [8.5] 2501.18162v1 IROAM: Improving Roadside Monocular 3D Object Detection Learning from Autonomous Vehicle Data Domain Autonomous Driving 自动驾驶 3D object detection
autonomous driving
3D对象检测
自动驾驶
input: roadside data and vehicle-side data
In-Domain Query Interaction module learns content and depth information
Cross-Domain Query Enhancement decouples queries into semantic and geometry parts
outputs enhanced object queries
8.5 [8.5] 2501.18110v1 Lifelong 3D Mapping Framework for Hand-held & Robot-mounted LiDAR Mapping Systems 3D reconstruction 三维重建 3D Mapping
3D Reconstruction
Lifelong Mapping
激光雷达
三维映射
三维重建
终身映射
Input: Hand-held and robot-mounted LiDAR maps 输入:手持和机器人安装的激光雷达地图
Dynamic point removal algorithm 动态点去除算法
Multi-session map alignment using feature descriptor matching and fine registration 多会话地图对齐,使用特征描述符匹配和精细配准
Map change detection to identify changes between aligned maps 地图变化检测以识别对齐地图之间的变化
Map version control for maintaining current environmental state and querying changes 地图版本控制,用于维护当前环境状态和查询变化
8.0 [8.0] 2501.18595v1 ROSA: Reconstructing Object Shape and Appearance Textures by Adaptive Detail Transfer Mesh Reconstruction 网格重建 Mesh Reconstruction
3D reconstruction
网格重建
三维重建
input: limited set of images 限制的图像集
step1: optimize mesh geometry 优化网格几何形状
step2: refine mesh with spatially adaptive resolution 使用空间自适应分辨率细化网格
step3: reconstruct high-resolution textures 重新构建高分辨率纹理
output: textured mesh with detailed appearance 带有详细外观的纹理网格
7.5 [7.5] 2501.18590v1 DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models Rendering Techniques 渲染技术 Inverse Rendering
Forward Rendering
Video Diffusion Models
Inverse渲染
正向渲染
视频扩散模型
input: real-world videos, 真实世界视频
step1: estimate G-buffers using inverse rendering model, 使用逆向渲染模型估计G-buffer
step2: generate photorealistic images from G-buffers, 从G-buffer生成照片级真实图像
output: relit images, material edited images, realistic object insertions, 重新照明图像,材料编辑图像,逼真的物体插入
7.5 [7.5] 2501.18315v1 Surface Defect Identification using Bayesian Filtering on a 3D Mesh Mesh Reconstruction 网格重建 3D Mesh
Mesh Reconstruction
3D网格
网格重建
input: CAD model and point cloud data 输入:CAD模型和点云数据
transform CAD model into polygonal mesh 将CAD模型转换为多边形网格
apply weighted least squares algorithm 应用加权最小二乘算法
estimate state based on point cloud measurements 根据点云测量估计状态
output: high-precision defect identification 输出:高精度缺陷识别
7.5 [7.5] 2501.17636v2 Efficient Interactive 3D Multi-Object Removal 3D reconstruction 三维重建 3D scene understanding
multi-object removal
3D场景理解
多对象移除
input: selected areas and objects for removal 选定的移除区域和对象
step1: mask matching and refinement mask 匹配和细化掩码步骤
step2: homography-based warping 同伦变换基础的扭曲
step3: inpainting process 修复过程
output: modified 3D scene 修改后的3D场景
7.0 [7.0] 2501.18246v1 Ground Awareness in Deep Learning for Large Outdoor Point Cloud Segmentation 3D reconstruction 三维重建 point cloud segmentation
outdoor point clouds
semantic segmentation
point cloud
关键点云分割
户外点云
语义分割
点云
input: outdoor point clouds 户外点云
compute Digital Terrain Models (DTMs) 计算数字地形模型
employ RandLA-Net for segmentation 使用 RandLA-Net 进行分割
evaluate performance on datasets 评估在数据集上的表现
integrate relative elevation features 集成相对高程特征
6.5 [6.5] 2501.18494v1 Runway vs. Taxiway: Challenges in Automated Line Identification and Notation Approaches Autonomous Driving 自动驾驶 Automated line identification 自动化线识别
Convolutional Neural Network 卷积神经网络
runway markings 跑道标记
autonomous systems 自动化系统
labeling algorithms 标记算法
input: runway and taxiway images 跑道和滑行道图像
Step 1: color threshold adjustment 颜色阈值调整
Step 2: refine region of interest selection 精细化感兴趣区域选择
Step 3: integrate CNN classification 集成CNN分类
output: improved marking identification 改进的标记识别

Newly Found Papers on ...

(Older entries get replaced automatically when the script runs again.)

About

Call Arxiv API and automatically update paper list

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published