Sketch2Arti: Sketch-based Articulation Modeling of CAD Objects

Yi Yang1,2, Hao Pan2, Yijing Cui2, Alla Sheffer3, Changjian Li1
1University of Edinburgh, 2Tsinghua University, 3University of British Columbia
ACM Transactions on Graphics (Proc. SIGGRAPH 2026) ยท Conditional Accepted
Figure 1 teaser for Sketch2Arti.

Figure 1: Sketch-based articulation modeling. We present Sketch2Arti, the first sketch-based system for articulation modeling of CAD objects. Sketch2Arti is versatile--- Top: through iterative sketch-based editing, Sketch2Arti progressively discovers multiple movable parts and recovers their motion parameters on a complex car model. Middle-left: Sketch2Arti offers high controllability---e.g., a car door can be opened in different user-specified manners (standard, backward-hinged, or butterfly) by simply changing the sketches. Middle-right: Sketch2Arti generalizes to diverse objects beyond public datasets, enabling articulation modeling for unseen categories. Bottom: for shell models lacking internal structures, users can sketch the missing components and Sketch2Arti faithfully generates plausible internal mechanisms that respect both the existing geometry and the predicted articulation parameters.

Abstract

Articulation modeling aims to infer movable parts and their motion parameters for a 3D object, enabling interactive animation, simulation, and shape editing. Prior work studies the problem from different aspects---including articulation perception, articulated object reconstruction, and generative modeling. These methods typically take images, videos, or point clouds as input and estimate articulation by leveraging the learned priors from the public PartNet-Mobility dataset or its variants. Despite the promising progress, existing methods often provide limited controllability, generalize poorly beyond the training distribution, and rely heavily on dataset-specific priors, making them difficult to deploy in real design workflows.

In this paper, we present Sketch2Arti, the first sketch-based articulation modeling system for CAD objects. Our key observation is that designers naturally communicate articulation intent through lightweight sketches (e.g., arrows and strokes) that indicate how parts should move, yet translating such sketches into articulated 3D models remains largely manual. Sketch2Arti bridges this gap by enabling users to specify articulation through simple 2D sketches drawn from a chosen viewpoint. Given a CAD model and user sketches, our approach automatically discovers the corresponding movable parts and predicts their motion parameters, allowing iterative modeling of multiple articulations on complex objects with fine-grained control. Importantly, Sketch2Arti is trained in a category-agnostic manner without requiring object category information, leading to strong generalization to diverse objects beyond existing articulation datasets. Moreover, for shell models lacking interior structures, Sketch2Arti supports controllable internal completion guided by user sketches, generating plausible internal components consistent with the existing geometry and predicted motion constraints. Comprehensive experiments and user evaluations demonstrate the effectiveness, controllability, and generalization of Sketch2Arti.

Overview

Figure 5 overview figure.

Figure 5: Overview. (a) Given an input 3D shape and the user sketches, our method Sketch2Arti addresses the where and how challenges by (b) identifying movable parts (i.e., the two doors) and inferring their articulation parameters. (c) The predicted motion reveals missing internal structure (e.g., an empty drawer), which users can further specify via sketches. Sketch2Arti then tackles the what challenge by (d) generating the full drawer geometry while adhering to both the existing shape and the inferred articulation.

Method

Figure 6 articulation prediction pipeline.

Figure 6: Articulation prediction. Given a static 3D object, we apply category-agnostic articulation recognition on a localized region surrounding the sketch with the local context captured by the depth and normal maps. A trained U-Net module predicts the articulation parameters in 2D maps and 3D local camera coordinates, as well as motion type. The 2D part mask is then back-projected onto the object surface and used to filter through a hierarchy of segments produced by a foundation Partfield model, to select the best matching part at a level undetermined beforehand as the movable 3D component.

Figure 7 interior completion pipeline.

Figure 7: Interior shape completion. Our approach leverages 2D and 3D generative models to complete the interior structures exposed by articulated parts. Given a 3D object with recognized articulation part and parameters, the top branch applies a 2D generative model (e.g., Nano banana) to obtain a high-quality reference image, which is used to guide the 3D generative model (e.g., Trellis) to create the interior structure. Crucially for obtaining structure-preserving interiors, masks of loose and strict types are built to control the flow generative process of the 3D generative model and adjust the completed part interior, respectively. Finally, the completed part is refined for kinematic validity and turned into separate meshes that are readily usable as URDF models.

Dataset

Figure 8 dataset gallery. Figure 8 dataset statistics.

Figure 8: Dataset gallery and statistics. Left: Representative samples from SketchMobility. Note the presence of uncommon articulated objects (e.g., helicopters and motorbikes), which are rarely considered in existing articulation modeling benchmarks. Right: Category distribution of SketchMobility. We report major categories (>=1.5%) individually, while merging minor categories into Others (17.9%).

Results Gallery

Figure 10 results gallery.

Figure 10: Results gallery. We show representative articulation modeling sessions using Sketch2Arti. For each example, user sketches are overlaid on the rendered shape under the chosen viewpoint, and the inferred movable parts are color-coded. The black arrow indicates the iterative modeling order across views/parts.