Skip to content

Latest commit

 

History

History
3881 lines (2406 loc) · 171 KB

CVPR2023.md

File metadata and controls

3881 lines (2406 loc) · 171 KB

CVPR2023最新信息及论文下载(Papers/Codes/Project/PaperReading/Demos/直播分享/论文分享会等)

官网链接:https://cvpr.thecvf.com/Conferences/2023
论文接收公布时间:2023年2月28日

相关问题:如何评价 CVPR 2023 的论文接收结果?
相关报道:CVPR 2023 接收结果出炉!录用2360篇,接收数量上升12%

update: (更新附打包下载链接)
2023/2/28 更新13篇
2023/3/02 更新54篇
2023/3/09 更新35篇
2023/3/15 更新29篇
2023/3/16 更新8篇
2023/3/17 更新19篇
2023/3/20 更新37篇
2023/3/22 更新61篇
2023/3/23 更新55篇
2023/3/24 更新70篇
2023/3/25 更新99篇
2023/3/26 更新23篇
2023/3/29 更新101篇
2023/3/31 更新89篇
2023/4/11 更新127篇
2023/4/12 更新48篇
2023/4/13 更新51篇



目录

1. CVPR2023 接受论文/代码分方向汇总(更新中)
2. CVPR2023 spotlight(更新中)
3. CVPR2023 论文解读汇总(更新中)
4. CVPR2023 极市论文分享
5. To do list


分类目录:

[43. 自动驾驶(Federated Learning](#automatic driving)




[14]DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
paper

[13]Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection
paper

[12]Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision
paper | code

[11]Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains
paper

[10]Continual Detection Transformer for Incremental Object Detection
paper

[9]Object Discovery from Motion-Guided Tokens
paper | code

[8]What Can Human Sketches Do for Object Detection?
paper

[7]NeRF-RPN: A general framework for object detection in NeRFs
paper

[6]Detecting Everything in the Open World: Towards Universal Object Detection
paper

[5]Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
paper

[4]CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
paper

[3]Enhanced Training of Query-Based Object Detection via Selective Query Recollection
paper | code

[2]DETRs with Hybrid Matching
paper | code

[1]YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors(YOLOv7)
paper | code



[4]Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video
paper | code

[3]Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
paper | code

[2]3D Video Object Detection with Learnable Object-Centric Global Optimization
paper | code

[1]SCOTCH and SODA: A Transformer Video Shadow Detection Framework
paper



[28]Curricular Object Manipulation in LiDAR-based Object Detection
paper | code

[27]Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
paper | code

[26]Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving
paper

[25]Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
paper

[24]Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
paper | code

[23]Viewpoint Equivariance for Multi-View 3D Object Detection
paper | code

[22]Neural Part Priors: Learning to Optimize Part-Based Object Completion in RGB-D Scans
paper

[21]itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection
paper

[20]Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild
paper | code

[19]FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection
paper | code

[18]NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
paper

[17]Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving
paper

[16]VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
paper | code

[15]OcTr: Octree-based Transformer for 3D Object Detection
paper

[14]MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
paper

[13]CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
paper | code

[12]Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency
paper

[11]AeDet: Azimuth-invariant Multi-view 3D Object Detection
paper | code

[10]Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection
paper

[9]PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
paper | code

[8]MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
paper

[7]Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View
paper

[6]X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection
paper

[5]Virtual Sparse Convolution for Multimodal 3D Object Detection
paper | code

[4]MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection
paper | code

[3]Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection
paper | code

[2]LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
paper | code

[1]ConQueR: Query Contrast Voxel-DETR for 3D Object Detection(3D 目标检测的Query Contrast Voxel-DETR) paper | code



[4]Relational Context Learning for Human-Object Interaction Detection
paper

[3]Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
paper

[2]Category Query Learning for Human-Object Interaction Classification
paper

[1]Detecting Human-Object Contact in Images
paper



[1]Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
paper | code




[2]Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
paper

[1]Texture-guided Saliency Distilling for Unsupervised Salient Object Detection
paper | code



[2]Few-shot Geometry-Aware Keypoint Localization
paper

[1]Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
paper



[1]BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline
paper



[2]The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
paper | code

[1]Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections
paper | code





[14]Video Event Restoration Based on Keyframes for Video Anomaly Detection
paper

[13]Robust Outlier Rejection for 3D Registration with Variational Bayes
paper | code

[12]OpenMix: Exploring Outlier Samples for Misclassification Detection
paper | code

[11]WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
paper

[10]SimpleNet: A Simple Network for Image Anomaly Detection and Localization
paper | code

[9]Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
paper

[8]SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection
paper

[7]Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection
paper

[6]Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection
paper

[5]DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection
paper

[4]Diversity-Measurable Anomaly Detection
paper

[3]Block Selection Method for Using Feature Norm in Out-of-distribution Detection
paper

[2]Lossy Compression for Robust Unsupervised Time-Series Anomaly Detection
paper

[1]Multimodal Industrial Anomaly Detection via Hybrid Fusion
paper | code




[7]FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
paper

[6]Zero-shot Referring Image Segmentation with Global-Local Context Features
paper | code

[5]Parameter Efficient Local Implicit Image Function Network for Face Segmentation
paper

[4]EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
paper

[3]Focused and Collaborative Feedback Integration for Interactive Image Segmentation
paper | code

[2]MP-Former: Mask-Piloted Transformer for Image Segmentation
paper | code

[1]Interactive Segmentation as Gaussian Process Classification
paper



[3]You Only Segment Once: Towards Real-Time Panoptic Segmentation
paper | code

[2]UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration
paper

[1]Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
paper



[28]Federated Incremental Semantic Segmentation
paper | code

[27]Continual Semantic Segmentation with Automatic Memory Sample Selection
paper

[26]DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation
paper | code

[25]Exploiting the Complementarity of 2D and 3D Networks to Address Domain-Shift in 3D Semantic Segmentation
paper | code

[24]3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds
paper | code

[23]Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
paper

[22]Instant Domain Augmentation for LiDAR Semantic Segmentation
paper

[21]Leveraging Hidden Positives for Unsupervised Semantic Segmentation
paper | code

[20]LaserMix for Semi-Supervised LiDAR Semantic Segmentation
paper | code

[19]Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
paper | code

[18]Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
paper | code

[17]Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
paper | code

[16]Reliability in Semantic Segmentation: Are We on the Right Track?
paper | code

[15]Generative Semantic Segmentation
paper | code

[14]Novel Class Discovery for 3D Point Cloud Semantic Segmentation
paper | code

[13]MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving
paper | code

[12]Side Adapter Network for Open-Vocabulary Semantic Segmentation
paper | code

[11]Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes
paper

[10]Token Contrast for Weakly-Supervised Semantic Segmentation
paper | code

[9]Delivering Arbitrary-Modal Semantic Segmentation
paper | code

[8]Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation
paper

[7]Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
paper | code

[6]Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
paper | code

[5]SCPNet: Semantic Scene Completion on Point Cloud
paper

[4]On Calibrating Semantic Segmentation Models: Analyses and An Algorithm
paper

[3]Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
paper

[2]Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
paper | code

[1]Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
paper



[11]Mask-Free Video Instance Segmentation
paper | code

[10]Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
paper

[9]DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
paper

[8]The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation
paper

[7]A Generalized Framework for Video Instance Segmentation
paper | code

[6]FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
paper

[5]SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation
paper | code

[4]DynaMask: Dynamic Mask Selection for Instance Segmentation
paper | code

[3]Beyond mAP: Towards better evaluation of instance segmentation
paper

[2]ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution
paper

[1]PolyFormer: Referring Image Segmentation as Sequential Polygon Generation(PolyFormer:将图像分割表述为顺序多边形生成)
paper




[5]Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation
paper | code

[4]Two-shot Video Object Segmentation
paper

[3]Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
paper

[2]MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
paper

[1]InstMove: Instance Motion for Object-centric Video Segmentation
paper | code




[5]Probabilistic Prompt Learning for Dense Prediction
paper

[4]Ensemble-based Blackbox Attacks on Dense Prediction
paper

[3]Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection
paper | code

[2]One-to-Few Label Assignment for End-to-End Dense Detection
paper | code

[1]DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
paper



[12]BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation
paper | code

[11]VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
paper

[10]Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates
paper

[9]Affordance Grounding from Demonstration Video to Target Image
paper | code

[8]Frame Flexible Network
paper | code

[7]Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time
paper | code

[6]A Unified Pyramid Recurrent Network for Video Frame Interpolation
paper

[5]Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior
paper | code

[4]Blind Video Deflickering by Neural Filtering with a Flawed Atlas
paper | code

[3]Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
paper | code

[2]UV Volumes for Real-time Rendering of Editable Free-view Human Performance
paper | code

[1]Exploring Discontinuity for Video Frame Interpolation
[paper]([2202.07291] Exploring Discontinuity for Video Frame Interpolation (arxiv.org))



[4]VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs
paper

[3]Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
paper

[2]Text-Visual Prompting for Efficient 2D Temporal Video Grounding
paper

[1]Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
paper | code



[7]Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
paper | code

[6]Conditional Image-to-Video Generation with Latent Flow Diffusion Models
paper

[5]3D Cinemagraphy from a Single Image
paper

[4]VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
paper | code

[3]MOSO: Decomposing MOtion, Scene and Object for Video Prediction
paper | code

[2]SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
paper | code

[1]Video Probabilistic Diffusion Models in Projected Latent Space(投影潜在空间中的视频概率扩散模型)
paper | project



[2]Structured Sparsity Learning for Efficient Video Super-Resolution
paper

[1]Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting
paper



[1]Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation
paper


[4]AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation
paper

[3]Semi-Weakly Supervised Object Kinematic Motion Prediction
paper

[2]DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling
paper

[1]Rethinking Optical Flow from Geometric Matching Consistent Perspective
paper | code




[8]EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera Depth Estimation
paper

[7]DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium
paper | code

[6]Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
paper

[5]SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates
paper

[4]PlaneDepth: Self-supervised Depth Estimation via Orthogonal Planes
paper | code

[3]HRDFuse: Monocular 360°Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions
paper

[2]Fully Self-Supervised Depth Estimation from Defocus Clue
paper | code

[1] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
paper | code



[18]A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
paper | code

[17]Monocular 3D Human Pose Estimation for Sports Broadcasts using Partial Sports Field Registration
paper | code

[16]DeFeeNet: Consecutive 3D Human Motion Prediction with Deviation Feedback
paper

[15]TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation
paper

[14]PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
paper | code

[13]ScarceNet: Animal Pose Estimation with Scarce Annotations
paper | code

[12]Human Pose Estimation in Extremely Low-Light Conditions
paper

[11]Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation
paper

[10]3D Human Mesh Estimation from Virtual Markers
paper

[9]Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation
paper

[8]Rigidity-Aware Detection for 6D Object Pose Estimation
paper

[7]Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video
paper

[6]Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer
paper

[5]TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation
paper

[4]Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting
paper

[3]PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
paper

[2]DistilPose: Tokenized Pose Regression with Heatmap Distillation
paper

[1]Relightable Neural Human Assets from Multi-view Gradient Illuminations(来自多视图渐变照明的可照明神经人类资产)
paper



[6]CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis
paper

[5]Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild
paper

[4]Natural Language-Assisted Sign Language Recognition
paper | code

[3]CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment
paper | code

[2]Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement
paper

[1]Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos
paper | code



[3]Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR
paper

[2]PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment
paper

[1]DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
paper | code


[11]Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution
paper

[10]Implicit Diffusion Models for Continuous Super-Resolution
paper | code

[9]SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
paper

[8]Learning Generative Structure Prior for Blind Text Image Super-resolution
paper | code

[7]Learning to Zoom and Unzoom
paper

[6]Activating More Pixels in Image Super-Resolution Transformer
paper | code

[5]Super-Resolution Neural Operator
paper | code

[4]Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution
paper

[3]Perception-Oriented Single Image Super-Resolution using Optimal Objective Estimation
paper | code

[2]N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
paper | code

[1]Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild(野外鲁棒图像超分辨率的去噪扩散概率模型)
paper | project



[20]CherryPicker: Semantic Skeletonization and Topological Reconstruction of Cherry Trees
paper

[19]Generative Diffusion Prior for Unified Image Restoration and Enhancement
paper

[18]CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
paper

[17]HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization
paper | code

[16]Burstormer: Burst Image Restoration and Enhancement Transformer
paper

[15]Visual-Tactile Sensing for In-Hand Object Reconstruction
paper

[14]3D-Aware Multi-Class Image-to-Image Translation with NeRFs
paper | code

[13]CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not
paper

[12]Instant Volumetric Head Avatars
paper

[11]Contrastive Semi-supervised Learning for Underwater Image Restoration via Reliable Bank
paper | code

[10]ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
paper | code

[9]Masked Image Modeling with Local Multi-Scale Reconstruction
paper | code

[8]Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective
paper | code

[7]DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration
paper

[6]Robust Unsupervised StyleGAN Image Restoration
paper

[5]Raw Image Reconstruction with Learned Compact Metadata
paper

[4]Efficient and Explicit Modelling of Image Hierarchies for Image Restoration
paper | code

[3]Imagic: Text-Based Real Image Editing with Diffusion Models
paper | project

[2]High-resolution image reconstruction with latent diffusion models from human brain activity
paper | project

[1]Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models
paper



[2]Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prior
paper | code

[1]LightPainter: Interactive Portrait Relighting with Freehand Scribble
paper


[12]RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors
paper | code

[11]HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering
paper

[10]Real-time Controllable Denoising for Image and Video
paper

[9]LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising
paper | code

[8]Curricular Contrastive Regularization for Physics-aware Single Image Dehazing
paper | code

[7]Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising
paper | code

[6]Masked Image Training for Generalizable Deep Image Denoising
paper | code

[5]Learning A Sparse Transformer Network for Effective Image Deraining
paper | code

[4]Uncertainty-Aware Unsupervised Image Deblurring with Deep Residual Prior
paper

[3]Polarized Color Image Denoising using Pocoformer
paper

[2]Blur Interpolation Transformer for Real-World Motion from Blur
paper | code

[1]Structured Kernel Estimation for Photon-Limited Deconvolution
paper | code



[6]SIEDOB: Semantic Image Editing by Disentangling Object and Background
paper | code

[5]CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing
paper

[4]SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model
paper

[3]Interactive Cartoonization with Controllable Perceptual Factors
paper

[2]Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint
paper | code

[1]LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data
paper | code



[1]Masked and Adaptive Transformer for Exemplar Based Image Translation
paper | code



[3]Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild
paper

[2]CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability
paper

[1]Quality-aware Pre-trained Models for Blind Image Quality Assessment
paper


[4]CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer
paper

[3]Neural Preset for Color Style Transfer
paper | code

[2]StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
paper

[1]Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN
paper | code



[1]Indescribable Multi-modal Spatial Evaluator
paper | code





[6]Gradient Attention Balance Network: Mitigating Face Recognition Racial Bias via Gradient Attention
paper

[5]Micron-BERT: BERT-based Facial Micro-Expression Recognition
paper | code

[4]Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
paper

[3]Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition
paper

[2]Local Region Perception and Relationship Learning Combined with Feature Fusion for Facial Action Unit Detection
paper

[1]Multi Modal Facial Expression Recognition with Transformer-Based Fusion Networks and Dynamic Sampling
paper



[13]GANHead: Towards Generative Animatable Neural Head Avatars
paper

[12]Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
paper

[11]StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer
paper

[10]OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering
paper

[9]High-fidelity 3D Human Digitization from Single 2K Resolution Images
paper

[8]FaceLit: Neural 3D Relightable Faces
paper

[7]SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage
paper

[6]MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
paper | code

[5]NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images
paper

[4]Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images
paper

[3]Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation
paper | code

[2]A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images
paper

[1]MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation(MetaPortrait:具有快速个性化适应的身份保持谈话头像生成)
paper | code



[4]Hierarchical Fine-Grained Image Forgery Detection and Localization
paper

[3]Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment
paper

[2]Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization
paper | code

[1]Physical-World Optical Adversarial Attacks on 3D Face Recognition
paper



[11]Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
paper | code

[10]Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
paper

[9]Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction
paper | code

[8]Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
paper

[7]DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks
paper | code

[6]MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking
paper

[5]Visual Prompt Multi-Modal Tracking
paper | code

[4]Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking
paper | code

[3]Focus On Details: Online Multi-object Tracking with Diverse Fine-grained Representation
paper

[2]Referring Multi-Object Tracking
paper

[1]Simple Cues Lead to a Strong Multi-Object Tracker
paper



[17]Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
paper

[16]System-status-aware Adaptive Network for Online Streaming Video Understanding
paper

[15]Hierarchical Video-Moment Retrieval and Step-Captioning
paper | code

[14]Procedure-Aware Pretraining for Instructional Video Understanding
paper | code

[13]Use Your Head: Improving Long-Tail Video Recognition
paper

[12]Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
paper | code

[11]Selective Structured State-Spaces for Long-Form Video Understanding
paper

[10]Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
paper | code

[9]NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
paper

[8]Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
paper

[7]Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
paper

[6]Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
paper | code

[5]Dual-path Adaptation from Image to Video Transformers
paper | code

[4]Data-Free Sketch-Based Image Retrieval
paper

[3]DAA: A Delta Age AdaIN operation for age estimation via binary code transformer
paper

[2]VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval
paper | code

[1]Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
paper


[16]Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition
paper

[15]STMixer: A One-Stage Sparse Action Detector
paper

[14]TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
paper

[13]Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
paper

[12]STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
paper

[11]MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition
paper

[10]On the Benefits of 3D Pose and Tracking for Human Action Recognition
paper

[9]3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
paper

[8]Box-Level Active Detection
paper

[7]Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition
paper

[6]Open Set Action Recognition via Multi-Label Evidential Learning
paper

[5]Video Test-Time Adaptation for Action Recognition
paper

[4]Post-Processing Temporal Action Detection
paper

[3]TriDet: Temporal Action Detection with Relative Boundary Modeling
paper | code

[2]Learning Discriminative Representations for Skeleton Based Action Recognition
paper

[1]Continuous Sign Language Recognition with Correlation Network
paper | code


[7]Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
paper | code

[6]PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification
paper

[5]Large-scale Training Data Search for Object Re-identification
paper

[4]Adaptive Sparse Pairwise Loss for Object Re-Identification
paper | code

[3]Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
paper | code

[2]TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
paper | code

[1]MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
paper | code


[8]Model-Agnostic Gender Debiased Image Captioning
paper

[7]Cross-Domain Image Captioning with Discriminative Finetuning
paper | code

[6]AutoAD: Movie Description in Context
paper | code

[5]Text with Knowledge Graph Augmented Transformer for Video Captioning
paper

[4]Dual-Stream Transformer for Generic Event Boundary Captioning
paper | code

[3]ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
paper | code

[2]Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
paper

[1]Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
paper | code


[15]Deep Prototypical-Parts Ease Morphological Kidney Stone Identification and are Competitively Robust to Photometric Perturbations
paper | code

[14]Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis
paper | code

[13]Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
paper

[12]Fair Federated Medical Image Segmentation via Client Contribution Estimation
paper

[11]Directional Connectivity-based Segmentation of Medical Images
paper | code

[10]Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization
paper

[9]Label-Free Liver Tumor Segmentation
paper | code

[8]Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
paper

[7]RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction
paper | code

[6]Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation
paper | code

[5]Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification
paper

[4]Neuron Structure Modeling for Generalizable Remote Physiological Measurement
paper | code

[3]Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses
paper | code

[2]Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images
paper | code

[1]Label-Free Liver Tumor Segmentation
paper | code


[7]Towards Unified Scene Text Spotting based on Sequence Generation
paper

[6]Images Speak in Images: A Generalist Painter for In-Context Visual Learning
paper | code

[5]Context De-confounded Emotion Recognition
paper

[4]Joint Visual Grounding and Tracking with Natural Language Specification
paper

[3]Unifying Vision, Text, and Layout for Universal Document Processing
paper

[2]Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling
paper

[1]DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
paper | code



[7]Fine-Grained Face Swapping via Regional GAN Inversion
paper

[6]Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models
paper

[5]Graph Transformer GANs for Graph-Constrained House Generation
paper

[4]Improving GAN Training via Feature Space Shrinkage
paper | code

[3]Adversarial Attack with Raindrops
paper

[2]T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations
paper | project

[1]Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
paper | project


[30]Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
paper

[29]Few-shot Semantic Image Synthesis with Class Affinity Transfer
paper

[28]Variational Distribution Learning for Unsupervised Text-to-Image Generation
paper

[27]HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
paper

[26]LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
paper | code

[25]Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
paper | code

[24]Unsupervised Domain Adaption with Pixel-level Discriminator for Image-aware Layout Generation
paper

[23]Freestyle Layout-to-Image Synthesis
paper | code

[22]All are Worth Words: A ViT Backbone for Diffusion Models
paper | code

[21]Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
paper | code

[20]Shifted Diffusion for Text-to-image Generation
paper | code

[19]Towards Practical Plug-and-Play Diffusion Models
paper

[18]Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis
paper

[17]Wavelet Diffusion Models are fast and scalable Image Generators
paper | code

[16]Learning 3D-aware Image Synthesis with Unknown Pose Distribution
paper

[15]Picture that Sketch: Photorealistic Image Generation from Abstract Sketches
paper

[14]3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
paper | code

[13]A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
paper | code

[12]Regularized Vector Quantization for Tokenized Image Synthesis
paper

[11]SpaText: Spatio-Textual Representation for Controllable Image Generation
paper

[10]Unifying Layout Generation with a Decoupled Diffusion Model
paper

[9]Scaling up GANs for Text-to-Image Synthesis
paper

[8]Inversion-Based Style Transfer with Diffusion Models
paper | code

[7]Perspective Fields for Single Image Camera Calibration
paper

[6]VGFlow: Visibility guided Flow Network for Human Reposing
paper

[5]DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
paper | code

[4]Progressive Open Space Expansion for Open-Set Model Attribution
paper | code

[3]Person Image Synthesis via Denoising Diffusion Model
paper

[2]Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models(使用预训练的 2D 扩散模型解决 3D 逆问题)
paper

[1]Parallel Diffusion Models of Operator and Image for Blind Inverse Problems(盲反问题算子和图像的并行扩散模型)
paper


[3]LinK: Linear Kernel for LiDAR-based 3D Perception
paper

[2]Learning a 3D Morphable Face Reflectance Model from Low-cost Data
paper | code

[1]Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
paper | code


[26]MEnsA: Mix-up Ensemble Average for Unsupervised Multi Target Domain Adaptation on 3D Point Clouds
paper | code

[25]Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
paper

[24]Self-positioning Point-based Transformer for Point Cloud Understanding
paper | code

[23]NerVE: Neural Volumetric Edges for Parametric Curve Extraction from Point Cloud
paper

[22]PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
paper

[21]Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation
paper | code

[20]Learning Human-to-Robot Handovers from Point Clouds
paper

[19]Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting
paper | code

[18]Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors
paper

[17]NeuralPCI: Spatio-temporal Neural Field for 3D Point Cloud Multi-frame Non-linear Interpolation
paper | code

[16]Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives
paper

[15]CLIP2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data
paper

[14]Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
paper | code

[13]Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration
paper | code

[12]Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
paper

[11]Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
paper | code

[10]Rotation-Invariant Transformer for Point Cloud Matching
paper

[9]GraVoS: Voxel Selection for 3D Point-Cloud Detection
paper

[8]DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
paper | code

[7]PointCert: Point Cloud Classification with Deterministic Certified Robustness Guarantees
paper

[6]ACL-SPC: Adaptive Closed-Loop system for Self-Supervised Point Cloud Completion
paper | code

[5]DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization
paper

[4]Frequency-Modulated Point Cloud Rendering with Easy Editing
paper

[3]Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss
paper

[2]ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer
paper | code

[1]Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
paper | code


[28]Multi-View Azimuth Stereo via Tangent Space Consistency
paper | code

[27]3D Line Mapping Revisited
paper | code

[26]PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
paper | code

[25]HexPlane: A Fast Representation for Dynamic Scenes
paper

[24]Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container
paper

[23]BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
paper

[22]Structured 3D Features for Reconstructing Controllable Avatars
paper

[21]PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360∘
paper

[20]Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization
paper

[19]TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
paper | code

[18]MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
paper | code

[17]PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision
paper

[16]SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
paper | code

[15]Masked Wavelet Representation for Compact Neural Radiance Fields
paper

[14]Decoupling Human and Camera Motion from Videos in the Wild
paper

[13]Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
paper

[12]NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images
paper

[11]Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion
paper | code

[10]MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices
paper | code

[9]Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly
paper

[8]NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
paper

[7]HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
paper

[6]MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
paper

[4]Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
paper | code

[3]Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes
paper | code

[2]ECON: Explicit Clothed humans Obtained from Normals
paper | code

[1]Structured 3D Features for Reconstructing Relightable and Animatable Avatars
paper | project


[51]Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
paper

[50]POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo
paper | code

[49]Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
paper

[48]Neural Lens Modeling
paper

[47]One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
paper

[46]MonoHuman: Animatable Human Neural Field from Monocular Video
paper

[45]GINA-3D: Learning to Generate Implicit Neural Assets in the Wild
paper

[44]Neural Fields meet Explicit Geometric Representation for Inverse Rendering of Urban Scenes
paper

[43]F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories
paper

[42]NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination
paper

[41]Enhanced Stable View Synthesis
paper

[40]Consistent View Synthesis with Pose-Guided Diffusion Models
paper

[39]NeRF-Supervised Deep Stereo
paper | code

[38]Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations
paper

[37]DyLiN: Making Light Field Networks Dynamic
paper

[36]FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
paper

[35]NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
paper | code

[34]SUDS: Scalable Urban Dynamic Scenes
paper

[33]JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields
paper

[32]Magic3D: High-Resolution Text-to-3D Content Creation
paper

[31]DiffRF: Rendering-Guided 3D Radiance Field Diffusion
paper

[30]Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization
paper | code

[29]Interactive Segmentation of Radiance Fields
paper

[28]MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation
paper

[27]GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images
paper

[26]Progressively Optimized Local Radiance Fields for Robust View Synthesis
paper

[25]ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field
paper

[24]HandNeRF: Neural Radiance Fields for Animatable Interacting Hands
paper

[23]Grid-guided Neural Radiance Fields for Large Urban Scenes
paper

[22]EventNeRF: Neural Radiance Fields from a Single Colour Event Camera
paper

[21]SPARF: Neural Radiance Fields from Sparse and Noisy Poses
paper

[20]RUST: Latent Neural Scene Representations from Unposed Imagery
paper

[19]SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field
paper

[18]ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision
paper | code

[17]Balanced Spherical Grid for Egocentric View Synthesis
paper | code

[16]Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention
paper

[15]MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
paper | code

[14]Robust Dynamic Radiance Fields
paper

[13]I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs
paper

[12]Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis from Monocular Image
paper

[11]Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision
paper

[10]Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields
paper

[9]DP-NeRF: Deblurred Neural Radiance Field with Physical Scene Priors
paper | code

[8]SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields
paper

[7]3D Video Loops from Asynchronous Input
paper | code

[6]NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer
paper | code

[5]NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation
paper

[4]Renderable Neural Radiance Map for Visual Navigation
paper

[3]Real-Time Neural Light Field on Mobile Devices
paper | project

[2]Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
paper | code

[1]NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
paper | project


[1]Neural Video Compression with Diverse Contexts
paper | code


[7]Supervised Masked Knowledge Distillation for Few-Shot Transformers
paper | code

[6]DisWOT: Student Architecture Search for Distillation WithOut Training
paper

[5]KD-DLGAN: Data Limited Image Generation via Knowledge Distillation
paper

[4]Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation
paper

[3]Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
paper

[2]Generic-to-Specific Distillation of Masked Autoencoders
paper | code

[1]CLIPPING: Distilling CLIP-based Models for Video-Language Understanding(CLIPPING:为视频语言理解提炼基于 CLIP 的模型)
paper


[2]CP3: Channel Pruning Plug-in for Point-based Networks
paper

[1]DepGraph: Towards Any Structural Pruning
paper | code


[4]Hard Sample Matters a Lot in Zero-Shot Quantization
paper

[3]Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
paper

[2]Post-training Quantization on Diffusion Models
paper | code

[1]Adaptive Data-Free Quantization
paper | code


[9]SMPConv: Self-moving Point Representations for Continuous Convolution
paper | code

[8]Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
paper | code

[7]Compacting Binary Neural Networks by Sparse Kernel Selection
paper

[6]LINe: Out-of-Distribution Detection by Leveraging Important Neurons
paper

[5]Towards Scalable Neural Representation for Diverse Videos
paper

[4]Boundary Unlearning
paper

[3]Equiangular Basis Vectors
paper | code

[2]LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
paper | code

[1]Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
paper | code


[6]VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
paper | code

[5]Randomized Adversarial Training via Taylor Expansion
paper | code

[4]Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations
paper | code

[3]DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
paper | code

[2]Demystify Transformers & Convolutions in Modern Image Deep Networks
paper | code

[1]InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
paper | code


[24]Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
paper | code

[23]METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
paper

[22]MethaneMapper: Spectral Absorption aware Hyperspectral Transformer for Methane Detection
paper

[21]Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
paper | code

[20]Learning Expressive Prompting With Residuals for Vision Transformers
paper

[19]Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization
paper

[18]One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer
paper | code

[17]Generalized Relation Modeling for Transformer Tracking
paper | code

[16]Learning Anchor Transformations for 3D Garment Animation
paper

[15]CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection
paper | code

[14]Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
paper

[13]POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
paper

[12]FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER
paper

[11]Spherical Transformer for LiDAR-based 3D Recognition
paper | code

[10]MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
paper | code

[9]Top-Down Visual Attention from Analysis by Synthesis
paper

[8]BiFormer: Vision Transformer with Bi-Level Routing Attention
paper | code

[7]Making Vision Transformers Efficient from A Token Sparsification View
paper

[6]Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
paper

[5]Learning Imbalanced Data with Vision Transformers
paper | code

[4]SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency
paper

[3]Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
paper | code

[2]Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR
paper | code

[1]Integrally Pre-Trained Transformer Pyramid Networks
paper | code


[4]Adversarially Robust Neural Architecture Search for Graph Neural Networks
paper

[3]Mind the Label Shift of Augmentation-based Graph OOD Generalization
paper

[2]Turning Strengths into Weaknesses: A Certified Robustness Inspired Attack Framework against Graph Neural Networks
paper

[1]From Node Interaction to Hop Interaction: New Effective and Scalable Graph Learning Paradigm
paper


[3]Polynomial Implicit Neural Representations For Large Diverse Datasets
paper | code

[2]PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
paper | code

[1]Stitchable Neural Networks(可缝合神经网络)
paper | code


[1]ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization
paper | code


[1]Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders paper | code


[1]TINC: Tree-structured Implicit Neural Compression
paper | code



[2]Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
paper

[1]Masked Images Are Counterfactual Samples for Robust Fine-tuning
paper


[2]DivClust: Controlling Diversity in Deep Clustering
paper

[1]On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering
paper | code


[2]Learned Image Compression with Mixed Transformer-CNN Architectures
paper | code

[1]Context-Based Trit-Plane Coding for Progressive Image Compression
paper | code


[25]Improved Test-Time Adaptation for Domain Generalization
paper

[24]Re-thinking Model Inversion Attacks Against Deep Neural Networks
paper

[23]Regularize implicit neural representation by itself
paper

[22]Improving the Transferability of Adversarial Samples by Path-Augmented Method
paper

[21]Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency
paper | code

[20]Progressive Random Convolutions for Single Domain Generalization
paper

[19]Tunable Convolutions with Parametric Multi-Loss Optimization
paper

[18]Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
paper | code

[17]CFA: Class-wise Calibrated Fair Adversarial Training
paper | code

[16]Generalist: Decoupling Natural and Robust Generalization
paper

[15]Feature Separation and Recalibration for Adversarial Robustness
paper

[14]Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck
paper

[13]FlexiViT: One Model for All Patch Sizes
paper | code

[12]Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
paper | code

[11]Improving Generalization with Domain Convex Game
paper

[10]TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization
paper | code

[9]An Extended Study of Human-like Behavior under Adversarial Training
paper

[8]Sharpness-Aware Gradient Matching for Domain Generalization
paper | code

[7]HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
paper

[6]Universal Instance Perception as Object Discovery and Retrieval
paper | code

[5]Practical Network Acceleration with Tiny Sets
paper | code

[4]Towards Bridging the Performance Gaps of Joint Energy-based Models
paper | code

[3]DropKey
paper

[2]Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
paper

[1]DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
paper


[2]Fine-Grained Classification with Noisy Labels
paper

[1]Combating noisy labels in object detection datasets
paper


[3]Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
paper | code

[2]SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail
paper

[1]Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification
paper


[7]CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes
paper

[6]Adaptive Spot-Guided Transformer for Consistent Local Feature Matching
paper

[5]PMatch: Paired Masked Image Modeling for Dense Geometric Matching
paper

[4]Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints
paper

[3]Referring Image Matting
paper | code

[2]Iterative Geometry Encoding Volume for Stereo Matching
paper | code

[1]Modality-Agnostic Debiasing for Single Domain Generalization
paper


[17]HNeRV: A Hybrid Neural Representation for Videos
paper | code

[16]Learning Rotation-Equivariant Features for Visual Correspondence
paper

[15]Mixed Autoencoder for Self-supervised Visual Representation Learning
paper

[14]Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
paper

[13]Multi-Modal Representation Learning with Text-Driven Soft Masks
paper

[12]Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
paper

[11]CrOC: Cross-View Online Clustering for Dense Visual Representation Learning
paper | code

[10]Masked Motion Encoding for Self-Supervised Video Representation Learning
paper | code

[9]Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
paper | code

[8]MARLIN: Masked Autoencoder for facial video Representation LearnINg
paper | code

[7]Hierarchical discriminative learning improves visual representations of biomedical microscopy
paper

[6]Fine-tuned CLIP Models are Efficient Video Learners
paper | code

[5]Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
paper | code

[4]Open-Set Representation Learning through Combinatorial Embedding
paper

[3]NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction
paper

[2]Stare at What You See: Masked Image Modeling without Reconstruction
paper | code

[1]Switchable Representation Learning Framework with Self-compatibility
paper


[3]ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
paper | code

[2]Physically Adversarial Infrared Patches with Learnable Shapes and Locations
paper

[1]TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
paper | code


[18]Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
paper | code

[17]Detecting and Grounding Multi-Modal Media Manipulation
paper | code

[16]Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce
paper

[15]Quantum Multi-Model Fitting
paper | code

[14]Towards Flexible Multi-modal Document Models
paper

[13]CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP
paper | code

[12]MaPLe: Multi-modal Prompt Learning
paper | code

[11]Decoupled Multimodal Distilling for Emotion Recognition
paper

[10]MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
paper | code

[9]BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
paper | code

[8]Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos
paper | code

[7]Emotional Reaction Intensity Estimation Based on Multimodal Data
paper

[6]Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers
paper

[5]Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
paper

[4]Multimodal Prompting with Missing Modalities for Visual Recognition
paper | code

[3]Align and Attend: Multimodal Summarization with Dual Contrastive Losses
paper | code

[2]Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information(通过最大化多模态互信息实现一体化预训练)
paper | code

[1]Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks(Uni-Perceiver v2:用于大规模视觉和视觉语言任务的通才模型)
paper | code


[11]Fine-grained Audible Video Description
paper

[10]Language-Guided Audio-Visual Source Separation via Trimodal Consistency
paper

[9]Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
paper

[8]Audio-Visual Grouping Network for Sound Localization from Mixtures
paper | code

[7]Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
paper

[6]Egocentric Audio-Visual Object Localization
paper | code

[5]Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
paper

[4]Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
paper

[3]Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
paper | code

[2]CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
paper | code

[1]A Light Weight Model for Active Speaker Detection
paper | code


[30]CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
paper | code

[29]Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
paper

[28]Learning to Name Classes for Vision and Language Models
paper

[27]VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision
paper | code

[26]HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
paper | code

[25]KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
paper | code

[24]PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout
paper | code

[23]SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
paper

[22]VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
paper

[21]Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
paper | code

[20]IFSeg: Image-free Semantic Segmentation via Vision-Language Model
paper | code

[19]Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
paper | code

[18]MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model
paper | code

[17]Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
paper

[16]Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
paper | code

[15]Test of Time: Instilling Video-Language Models with a Sense of Time
paper | code

[14]Accelerating Vision-Language Pretraining with Free Language Modeling
paper

[13]Task Residual for Tuning Vision-Language Models
paper | code

[12]MAGVLT: Masked Generative Vision-and-Language Transformer
paper

[11]Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
paper | code

[10]Lana: A Language-Capable Navigator for Instruction Following and Generation
paper | code

[9]FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
paper | code

[8]Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
paper

[7]Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing
paper

[6]Connecting Vision and Language with Video Localized Narratives
paper | code

[5]Policy Adaptation from Foundation Model Feedback
paper

[4]Open-vocabulary Attribute Detection
paper

[3]Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
paper

[2]Turning a CLIP Model into a Scene Text Detector
paper | code

[1]GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
paper


[4]TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving
paper

[3]Intention-Conditioned Long-Term Human Egocentric Action Forecasting
paper | code

[2]Computational Choreography using Human Motion Synthesis
paper

[1]IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
paper


[21]Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
paper | code

[20]CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions
paper

[19]CelebV-Text: A Large-Scale Facial Text-Video Dataset
paper | code

[18]On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
paper | code

[17]Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
paper | code

[16]Recovering 3D Hand Mesh Sequence from a Single Blurry Image: A New Dataset and Temporal Unfolding
paper | code

[15]GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
paper

[14]ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data
paper

[13]Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts
paper

[12]A Bag-of-Prototypes Representation for Dataset-Level Applications
paper

[11]Music-Driven Group Choreography
paper

[10]RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset
paper

[9]Backdoor Defense via Adaptively Splitting Poisoned Dataset
paper | code

[8]Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models
paper | code

[7]SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments
paper | code

[6]A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
paper | code

[5]MVImgNet: A Large-scale Dataset of Multi-view Images
paper

[4]Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo
paper

[3]CUDA: Convolution-based Unlearnable Datasets
paper

[2]V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception
paper

[1]Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
paper



[15]Zero-shot Generative Model Adaptation via Image-specific Prompt Learning
paper

[14]Zero-shot Model Diagnosis
paper

[13]AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
paper

[12]Hierarchical Dense Correlation Distillation for Few-Shot Segmentation
paper

[11]ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection
paper | code

[10]Learning Attention as Disentangler for Compositional Zero-shot Learning
paper | code

[9]Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning
paper | code

[8]CF-Font: Content Fusion for Few-shot Font Generation
paper

[7]DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
paper | code

[6]Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings
paper | code

[5]Bi-directional Distribution Alignment for Transductive Zero-Shot Learning
paper | code

[4]Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
paper

[3]Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
paper | code

[2]NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging
paper

[1]FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization
paper | code


[7]Asynchronous Federated Continual Learning
paper | code

[6]Exploring Data Geometry for Continual Learning
paper

[5]Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning
paper | code

[4]Online Distillation with Continual Learning for Cyclic Domain Shifts
paper

[3]Preserving Linear Separability in Continual Learning by Backward Feature Projection
paper

[2]Computationally Budgeted Continual Learning: What Does Matter?
paper | code

[1]Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
paper | code


[2]Devil's on the Edges: Selective Quad Attention for Scene Graph Generation
paper

[1]Probabilistic Debiasing of Scene Graphs
paper | code


[1]Prototype-based Embedding Network for Scene Graph Generation
paper


[1]VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
paper | code


[2]SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
paper

[1]PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
paper | code


[7]OrienterNet: Visual Localization in 2D Public Maps with Neural Matching
paper

[6]Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
paper

[5]Human Pose as Compositional Tokens
paper

[4]Data-efficient Large Scale Place Recognition with Graded Similarity Supervision
paper | code

[3]PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers
paper

[2]StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition
paper

[1]PyramidFlow: High-Resolution Defect Contrastive Localization using Pyramid Normalizing Flow
paper


[9]Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
paper

[8]MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
paper | code

[7]3D Concept Learning and Reasoning from Multi-View Images
paper

[6]Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive Matrices
paper | code

[5]Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
paper | code

[4]Generative Bias for Robust Visual Question Answering
paper

[3]MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
paper | code

[2]Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering
paper | code

[1]From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
paper | code


[4]Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments
paper

[3]Semantic Prompt for Few-Shot Image Recognition
paper

[2]Boosting Verified Training for Robust Image Classifications via Abstraction
paper | code

[1]I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification(I2MVFormer:用于零样本图像分类的大型语言模型生成的多视图文档监督)
paper


[17]DATE: Domain Adaptive Product Seeker for E-commerce
paper

[16]Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer
paper

[15]GeoNet: Benchmarking Unsupervised Adaptation across Geographies
paper

[14]C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
paper

[13]AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
paper | code

[12]BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
paper | code

[11]Deep Frequency Filtering for Domain Generalization
paper

[10]Semi-Supervised Domain Adaptation with Source Label Adaptation
paper | code

[9]Unsupervised Continual Semantic Adaptation through Neural Rendering
paper

[8]MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
paper | code

[7]Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
paper

[6]Manipulating Transfer Learning for Property Inference
paper | code

[5]Trainable Projected Gradient Method for Robust Fine-tuning
paper

[4]DA-DETR: Domain Adaptive Detection Transformer with Information Fusion
paper

[3]Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
paper | code

[2]Guiding Pseudo-labels with Uncertainty Estimation for Source-free Unsupervised Domain Adaptation
paper | code

[1]Adaptive Assignment for Geometry Aware Local Feature Matching
paper



[11]FEND: A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-tail Trajectory Prediction
paper

[10]Dynamic Conceptional Contrastive Learning for Generalized Category Discovery
paper | code

[9]Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
paper

[8]PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery
paper | code

[7]Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data
paper

[6]Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss
paper

[5]Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation
paper | code

[4]MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset
paper | code

[3]CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
paper | code

[2]Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
paper | code

[1]Twin Contrastive Learning with Noisy Labels
paper | code


[5]PCR: Proxy-based Contrastive Replay for Online Class-Incremental Continual Learning
paper

[4]On the Stability-Plasticity Dilemma of Class-Incremental Learning
paper

[3]Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning
paper | code

[2]Class-Incremental Exemplar Compression for Class-Incremental Learning
paper

[1]Dense Network Expansion for Class Incremental Learning
paper


[4]Reinforcement Learning-Based Black-Box Model Inversion Attacks
paper

[3]Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
paper | code

[2]ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals
paper

[1]EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
paper | code


[4]Meta-causal Learning for Single Domain Generalization
paper

[3]Meta Compositional Referring Expression Segmentation
paper

[2]Meta-Learning with a Geometry-Adaptive Preconditioner
paper | code

[1]A Meta-Learning Approach to Predicting Performance and Data Requirements
paper


[2]Efficient Map Sparsification Based on 2D and 3D Discretized Grids
paper

[1]PyPose: A Library for Robot Learning with Physics-based Optimization(PyPose:基于物理优化的机器人学习库)
paper | code


[29]Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model
paper

[28]Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
paper

[27]SOOD: Towards Semi-Supervised Oriented Object Detection
paper | code

[26]Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
paper | code

[25]Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks
paper | code

[24]Siamese DETR
paper

[23]HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions
paper

[22]Detecting Backdoors in Pre-trained Encoders
paper | code

[21]Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders
paper

[20]Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
paper | code

[19]ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning
paper

[18]Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels
paper

[17]Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
paper | code

[16]Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data
paper

[15]Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning
paper

[14]Correlational Image Modeling for Self-Supervised Visual Pre-Training
paper

[13]Extracting Class Activation Maps from Non-Discriminative Features as well
paper | code

[12]TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation
paper | code

[11]LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
paper

[10]MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection
paper | code

[9]Semi-supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination
paper

[8]Non-Contrastive Unsupervised Learning of Physiological Signals from Video
paper

[7]Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
paper | code

[6]Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
paper

[5]The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
paper | code

[4]Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning
paper | code

[3]Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
paper

[2]Siamese Image Modeling for Self-Supervised Vision Representation Learning
paper | code

[1]Cut and Learn for Unsupervised Object Detection and Instance Segmentation
paper | project


[6]Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning
paper

[5]Are Data-driven Explanations Robust against Out-of-distribution Data?
paper

[4]IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
paper | code

[3]OCTET: Object-aware Counterfactual Explanations
paper | code

[2]Don't Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis
paper

[1]SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries(SplineCam:深度网络几何和决策边界的精确可视化和表征)
paper | code


[2]Density Map Distillation for Incremental Object Counting
paper

[1]Zero-shot Object Counting
paper


[4]The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
paper

[3]Make Landscape Flatter in Differentially Private Federated Learning
paper

[2]STDLens: Model Hijacking-resilient Federated Learning for Object Detection
paper | code

[1]Re-thinking Federated Active Learning based on Inter-class Diversity
paper | code


[1]BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision(BEVFormer v2:通过透视监督使现代图像主干适应鸟瞰图识别)
paper


[74]Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification
paper | code

[73]Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
paper

[72]CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition
paper

[71]DC2: Dual-Camera Defocus Control by Learning to Refocus
paper

[70]Scalable, Detailed and Mask-Free Universal Photometric Stereo
paper | code

[69]DiffCollage: Parallel Generation of Large Content with Diffusion Models
paper

[68]Why is the winner the best?
paper

[67]UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning
paper

[66]HypLiLoc: Towards Effective LiDAR Pose Regression with Hyperbolic Fusion
paper | code

[65]Neural Volumetric Memory for Visual Locomotion Control
paper

[64]DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
paper | code

[63]PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
paper

[62]Disentangling Writer and Character Styles for Handwriting Generation
paper | code

[61]Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
paper

[60]DANI-Net: Uncalibrated Photometric Stereo by Differentiable Shadow Handling, Anisotropic Reflectance Modeling, and Neural Inverse Rendering
paper | code

[59]Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph
paper | code

[58]Handwritten Text Generation from Visual Archetypes
paper | code

[57]Level-S2fM: Structure from Motion on Neural Level Set of Implicit Surfaces
paper

[56]FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network
paper

[55]ARO-Net: Learning Implicit Fields from Anchored Radial Observations
paper | code

[54]Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects
paper

[53]Robust Test-Time Adaptation in Dynamic Scenarios
paper

[52]LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction
paper

[51]Doubly Right Object Recognition: A Why Prompt for Visual Rationales
paper

[50]CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
paper

[49]Marching-Primitives: Shape Abstraction from Signed Distance Function
paper

[48]Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery
paper | code

[47]ActMAD: Activation Matching to Align Distributions for Test-Time-Training
paper | code

[46]Robust Mean Teacher for Continual and Gradual Test-Time Adaptation
paper | code

[45]Planning-oriented Autonomous Driving
paper | code

[44]Explicit Visual Prompting for Low-Level Structure Segmentations
paper | code

[43]Leapfrog Diffusion Model for Stochastic Trajectory Prediction
paper | code

[42]Feature Alignment and Uniformity for Test Time Adaptation
paper

[41]Attribute-preserving Face Dataset Anonymization via Latent Code Optimization
paper | code

[40]Fix the Noise: Disentangling Source Feature for Controllable Domain Translation
paper | code

[39]Effective Ambiguity Attack Against Passport-based DNN Intellectual Property Protection Schemes through Fully Connected Layer Substitution
paper

[38]Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark
paper | code

[37]Learning a Depth Covariance Function
paper

[36]VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions
paper

[35]Dense Distinct Query for End-to-End Object Detection
paper | code

[34]Facial Affective Analysis based on MAE and Multi-modal Information for 5th ABAW Competition
paper

[33]Partial Network Cloning
paper | code

[32]Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
paper | code

[31]Adversarial Counterfactual Visual Explanations
paper | code

[3-]A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
paper | code

[29]Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
paper | code

[28]Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry
paper | code

[27]Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations
paper | code

[26]Backdoor Defense via Deconfounded Representation Learning
paper | code

[25]Label Information Bottleneck for Label Enhancement
paper

[24]LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
paper | code

[23]Diversity-Aware Meta Visual Prompting
paper | code

[22]ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges
paper

[21]Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
paper

[20]UniHCP: A Unified Model for Human-Centric Perceptions
paper | code

[19]Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
paper

[18]Revisiting Rotation Averaging: Uncertainties and Robust Losses
paper | code

[17]3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
paper

[16]Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection
paper | code

[15]Understanding and Improving Visual Prompting: A Label-Mapping Perspective
paper | code

[14]vMAP: Vectorised Object Mapping for Neural Field SLAM
paper | code

[13]EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization
paper

[12]Upcycling Models under Domain and Category Shift
paper | code

[11]Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images
paper | code

[10]Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies
paper

[9]Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples
paper | code

[8]Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
paper

[7]Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
paper

[6]Physical-World Optical Adversarial Attacks on 3D Face Recognition
paper

[5]Improving Cross-Modal Retrieval with Set of Diverse Embeddings
paper

[4]Neural Video Compression with Diverse Contexts
paper | code

[3]Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
paper

[2]Single Image Backdoor Inversion via Robust Smoothed Classifiers
paper | code

[1]Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
paper | code



1.CVPR2023|打破对MIM(掩码图像建模)的数据缩放能力的误解!

2.CVPR 2023|基于CLIP的微调新范式!训练速度和性能均创新高!

3.CVPR 2023|浙大提出全归一化流模型PyramidFlow:高分辨率缺陷异常定位新范式

4.CVPR 2023|大脑视觉信号被Stable Diffusion复现图像!“人类的谋略和谎言不存在了”

5.CVPR 2023|港科大 DA-BEV: 3D目标检测新 SOTA,一种强大的深度信息挖掘方法

6.CVPR 23|表征学习超MAE,谷歌等提出MAGE:无监督图像生成超越 Latent Diffusion

7.CVPR2023|不好意思我要加速度了!FasterNet:更高FLOPS才是更快更强的底气

8.CVPR 2023|大模型流行之下,SN-Net给出一份独特的答卷

9.CVPR 2023|结合特征金字塔结构的自监督学习 iTPNs

10.CVPR 2023|SQR:对于训练DETR-family目标检测的探索和思考

11.CVPR 2023|COCO新纪录65.4mAP!InternImage:注入新机制,扩展DCNv3,探索视觉大模型

12.CVPR 2023|YOLOv7强势收录!时隔6年,YOLOv系列再登CVPR!

13.CVPR 2023|谷歌提出Imagic:扩散模型只用文字就能PS照片了!

14.CVPR 2023|Lite DETR:计算量减少60%!高效交错多尺度编码器

15.CVPR 2023|白翔团队新作:借助CLIP完成场景文字检测

16.CVPR'23|即插即用系列!一种轻量高效的自注意力机制助力图像恢复网络问鼎 SOTA

17.CVPR 2023|英伟达提出VoxFromer: 单目3D语义场景补全新SOTA

18.CVPR 2023|EMA-VFI: 基于帧间注意力提取运动和外观信息的高效视频插帧

19.CVPR 2023|Point-NN:​首次实现0参数量、0训练的3D点云分析

20.CVPR 2023|Prophet: 用小模型启发大语言模型解决外部知识图像问答

21.CVPR23|港中文和IDEA联合推出首个大规模全场景人体数据集Human-Art

22.CVPR 2023 Highlight|PDPP:基于扩散模型的教学视频过程规划

23.CVPR'23|训练出首个十亿参数量视频自监督大模型!VideoMAE V2: 可扩展的视频基础模型预训练范式

24.CVPR2023|基于视觉提示器微调的多模态单目标跟踪算法

25.CVPR 2023|全新基于消费者移动设备采集的多样性RGB-D目标跟踪数据集

26.CVPR2023部署Trick|解决量化误差振荡问题,让MobileNetv2的能力超过ResNet家族

27.CVPR 2023|IGEV-Stereo & IGEV-MVS:双目立体匹配网络新SOTA!

28.CVPR 2023|UniMatch: 重新审视半监督语义分割中的强弱一致性

29.CVPR'23|DepGraph:任意架构的结构化剪枝,CNN、Transformer、GNN等都适用!

30.CVPR 2023|打破CAM的局限性!ToCo:进一步激发 ViT 在弱监督语义分割的潜力

31.CVPR 2023|一键去除视频闪烁,该研究提出了一个通用框架

32.CVPR2023|TriDet:高效时序动作检测网络,刷新三个数据集SOTA!

33.CVPR'23|3D模型分割新方法!不用人工标注,只需一次训练,未标注类别也能识别

34.CVPR 2023|标注500类,检测7000类!清华大学等提出通用目标检测算法UniDetector

35.CVPR 2023|用于半监督目标检测的知识蒸馏方法

36.CVPR 2023|基于图像质量评价的半监督水下复原

37.CVPR 2023|基于多层多尺度重建任务的MIM改进算法

38.CVPR 2023|神经网络家族添新丁!小步快跑追求高速的FasterNet



极市直播回放第108期丨 潘梓正:模型部署新范式—可缝合神经网络(CVPR 2023)



  • CVPR2023 Workshop