Teaching DINOv3 About Partial 3D Geometry: A Self-Supervised Geometry-Aware Approach

CVPR, 2026

Viktoria Ehm^1,2 Dongliang Cao^3,4 Riccardo Marin^1,2 Daniel Scholz^1,2 Weikang Wang^3,4 Florian Bernard^3,4 Daniel Cremers^1,2

¹TUM ²Munich Center for Machine Learning ³University of Bonn ⁴Lamarr Institute

Paper Poster Video Code (coming soon)

GeoLoRA refines DINO features via LoRA-based self-supervised learning to make them robust to partiality. The resulting descriptors substantially improve partial shape matching over traditional and vision foundation features, and, when plugged into existing matching pipelines, achieve state-of-the-art results on partial shape matching and left–right prediction benchmarks.

Problem

Establishing correspondences between 3D shapes is a long-standing problem in computer vision and graphics. In practice, shapes are rarely observed in their entirety: 3D scans are typically incomplete, leading to partial-to-full and partial-to-partial matching settings. While 2D foundation features (e.g. DINOv2/DINOv3) provide strong priors, they are unaware of 3D geometry and struggle on partial 3D inputs, especially under geometric ambiguities such as the left–right symmetry of articulated shapes.

Partial shape matching. Given a partial observation of a 3D shape (left of each pair), the goal is to find dense correspondences to a full shape (Partial-to-Full) or to another partial observation (Partial-to-Partial).

Method

GeoLoRA pipeline. We refine DINOv3 with a lightweight LoRA module through self-supervised learning, producing feature descriptors that are robust to partiality.

Given a full 3D shape $X$, we generate a partial observation $Y$ and render both shapes from multiple randomly sampled viewpoints $c_i$ into images $I^i_X$ and $I^i_Y$. The rendered images are processed by a shared DINOv3 feature extractor augmented with a LoRA module, producing dense pixel-wise features $Q^i_X$ and $Q^i_Y$. These per-view features are backprojected under the known camera $c_i$ to obtain vertex-wise features $\mathbf{F}_X$ and $\mathbf{F}_Y$ on the underlying meshes. Training is driven by a contrastive objective that pulls features of corresponding vertices together and pushes non-corresponding ones apart.

Geodesic-Aware Contrastive Loss

Geodesic-aware contrastive loss. Feature similarity is weighted by geodesic distance on the full shape, so that nearby points on the surface are encouraged to have similar features and distant points are pushed apart.

For a vertex on the partial shape $Y$ and its corresponding vertex on the full shape $X$, we pull the two feature vectors together. For other vertices on $X$, we push them away with a strength that depends on their geodesic distance to the corresponding point: points that are geodesically close to the anchor are penalized less, while points that are geodesically far are penalized more. This geodesic weighting respects the intrinsic geometry of the shape and helps disambiguate geometrically symmetric regions (e.g. left vs. right limbs) that are not distinguishable from local image evidence alone.

Results

Feature Quality

We visualize the quality of GeoLoRA features by transferring a color-coded signal from a full source shape (left) to our predicted correspondences on a partial target shape (right). Use the arrows to browse all qualitative pairs.

Left-Right Prediction

Articulated shapes are often symmetric, so 2D foundation features alone cannot distinguish left from right. For each shape below, we show the predicted left/right segmentation of frozen DINOv3 (left) and GeoLoRA (right). Frozen DINOv3 frequently collapses both sides into one class; GeoLoRA recovers a clean left/right split.

DINOv3Ours

Partial-to-Full Matching

GeoLoRA features can be plugged into existing partial-to-full shape matching pipelines. For each pair below, we show the full source shape on the left and the predicted correspondences on the partial target shape for two state-of-the-art methods, DPFM and ULRSSM, each evaluated with either frozen DINOv3 features or our GeoLoRA features. Red circles highlight regions where DINOv3 features lead to noticeable matching errors that GeoLoRA fixes.

Source

DINOv3

GeoLoRA (Ours)

DPFM

ULRSSM

Source

DINOv3

GeoLoRA (Ours)

DPFM

ULRSSM

Partial-to-Partial Matching

GeoLoRA also improves partial-to-partial shape matching, where both shapes are only partially observed. For each pair below, we show the source partial shape on the left and the predicted correspondences on the target partial shape for two state-of-the-art methods, DPFM and EchoMatch, each evaluated with either frozen DINOv3 features or our GeoLoRA features. Vertices shown in red mark the non-overlapping region (vertices of the target that have no correspondence in the source). Black circles highlight regions where DINOv3 features lead to matching errors that GeoLoRA fixes.

Source

DINOv3

GeoLoRA (Ours)

DPFM

EchoMatch

Source

DINOv3

GeoLoRA (Ours)

DPFM

EchoMatch

Real-World Scans

We also test GeoLoRA on noisy real-world 3D scans. Each scan (right) is matched against a clean full-shape template (left) using either frozen DINOv3 features or our GeoLoRA features. While DINOv3 features collapse to a single dominant color across most of the scan — indicating poor correspondences — GeoLoRA recovers detailed, semantically meaningful matches.