Publications

There are many highly respectable motives which may lead men to prosecute research, but three which are much more important than the rest: intellectual curiosity, professional pride, and finally, ambition, desire for reputation, and the position, even the power or the money, which it brings ... if (anyone) were to tell me that the driving force in his work had been the desire to benefit humanity, then I should not believe him (nor should I think the better of him if I did). - G. H. Hardy (A Mathematician's Apology)

Selected publications by year (by category)

2026 and arXiv | 2025 | 2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 and before

H-index: 72 | Google citation | DBLP | CS Rankings

2026 and arXiv

	3. Aditya Vora, Sauradip Nag, Kai Wang, Hao Zhang, "Articulate That Object Part (ATOP): 3D Part Articulation via Text and Motion Personalization", PROVISIONALLY accepted to ACM Trans. on Graphics (TOG) , 2026. [arXiv \| bibtex] We present ATOP (Articulate That Object Part), a novel few-shot method based on motion personalization to articulate a static 3D object with respect to a part and its motion as prescribed in a text prompt. In our work, the text input allows us to tap into the power of modern-day diffusion models to generate plausible motion samples for the right object category and part. In turn, the input 3D object provides image prompting to personalize the generated video to that very object we wish to articulate. Our method starts with a few-shot finetuning for category-specific motion generation, a key first step to compensate for the lack of articulation awareness by current diffusion models. This is followed by motion video personalization that is realized by multi-view rendered images of the target 3D object. At last, we transfer the personalized video motion to the target 3D object via differentiable rendering to optimize part motion parameters by an SDS loss.
	2. Yizhi Wang, Mingrui Zhao (co-first author), and Hao Zhang, "ACT-R: Adaptive Camera Trajectories for Single View 3D Reconstruction", 3D Vision (3DV), 2026. [Project page \| arXiv \| Code \| bibtex] We introduce adaptive view planning to multi-view synthesis, aiming to improve both occlusion revelation and 3D consistency for single-view 3D reconstruction. Instead of generating an unordered set of views independently or simultaneously, we generate a sequence of views, leveraging temporal consistency to enhance 3D coherence. Most importantly, our view sequence is not determined by a pre-determined camera setup. Instead, we compute an adaptive camera trajectory (ACT), specifically, an orbit of camera views, which maximizes the visibility of occluded regions of the 3D object to be reconstructed. Once the best orbit is found, we feed it to a video diffusion model to generate novel views around the orbit ...
	1. Aditya Vora, Lily Goli, Andrea Tagliasacchi, and Hao Zhang, "HiT: Hierarchical Transformers for Unsupervised 3D Shape Abstraction", 3D Vision (3DV), 2026. [Project page \| arXiv \| bibtex] We introduce HIT, a novel hierarchical neural field representation for 3D shapes that learns general hierarchies in a coarse-to-fine manner across different shape categories in an unsupervised setting. Our key contribution is a hierarchical transformer (HIT), where each level learns parent–child relationships of the tree hierarchy using a compressed code-book. This codebook enables the network to automatically identify common substructures across potentially diverse shape categories. Unlike previous works that constrain the task to a fixed hierarchical structure (e.g., binary), we impose no such restriction ...

2025

	9. Maham Tanveer, Yang Zhou, Simon Nicklaus, Ali Mahdavi-Amiri, Hao Zhang, Krishna Kumara Singh, and Nanxuan Zhao, "MultiCOIN: Multi-Modal COntrollable Video INbetweening", preprint, 2025. [Project page \| arXiv \| bibtex] We introduce MultiCOIN, a video inbetweening framework that allows multi-modal controls, including depth transition and layering, motion trajectories, text prompts, and target regions for movement localization, while achieving a balance between flexibility, ease of use, and precision for fine-grained video interpolation. To achieve this, we adopt the Diffusion Transformer (DiT) architecture as our video generative model, due to its proven capability to generate high-quality long videos. To ensure compatibility between DiT and our multi-modal controls, we map all motion controls into a common sparse and user-friendly point-based representation as the video/noise input. Further, we separate content controls and motion controls into two branches to encode the required features ...
	8. Sauradip Nag, Daniel Cohen-Or, Hao Zhang, and Ali Mahdavi-Amiri, "In-2-4D: Inbetweening from Two Single-View Images to 4D Generation", ACM SIGGRAPH Asia, 2025. [Project page \| arXiv \| Youtube video \| bibtex] We pose a new problem, In-2-4D, for generative 4D (i.e., 3D + motion) inbetweening to interpolate two single-view images. In contrast to video/4D generation from only text or a single image, our interpolative task can leverage more precise motion control to better constrain the generation. Given two monocular RGB images representing the start and end states of an object in motion, our goal is to generate and reconstruct the motion in 4D, without making assumptions on the object category, motion type, length, or complexity. To handle such arbitrary and diverse motions, we utilize a foundational video interpolation model for motion prediction and employ a hierarchical approach through keyframes to address large frame-to-frame motion gaps can lead to ambiguous interpretations ...
	7. Qimin Chen, Yuezhi Yang, Yifan Wang, Vladimir Kim, Siddhartha Chaudhuri, Hao Zhang, and Zhiqin Chen, "ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction", ACM SIGGRAPH Asia, 2025. [Project page \| arXiv \| bibtex] We introduce a 3D detailizer, a neural model which can instantaneously (in <1s) transform a coarse 3D shape proxy into a high-quality asset with detailed geometry and texture as guided by an input text prompt. Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details. The coarse 3D proxy, which can be easily varied and adjusted (e.g., via user editing), provides structure control over the final shape. Importantly, our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes, with varied structures, whose local details all share a consistent style and appearance.
	6. Sai Raj Kishore Perla, Aditya Vora, Sauradip Nag, Ali Mahdavi-Amiri, and Hao Zhang, "ASIA: Adaptive 3D Segmentation using Few Image Annotations", ACM SIGGRAPH Asia, 2025. [Project page \| arXiv \| bibtex] We introduce ASIA (Adaptive 3D Segmentation using few Image Annotations), a novel framework that enables segmentation of possibly non-semantic and non-text describable "parts" in 3D. Our segmentation is controllable through a few user-annotated in-the-wild images, which are easier to collect than multi-view images, less demanding to annotate than 3D models, and more precise than potentially ambiguous text descriptions. Our method leverages the rich priors of text-to-image diffusion models, such as Stable Diffusion, to transfer segmentations from image space to 3D, even when the annotated and target objects differ significantly in geometry or structure. During training, we optimize a text token for each segment and fine-tune our model with a novel cross-view part correspondence loss.
	5. Ruiqi Wang and Hao Zhang, "RESAnything: Attribute Prompting for Arbitrary Referring Segmentation", NeurIPS, 2025. [arXiv \| Project page \| bibtex] We present an open-vocabulary and zero-shot method for arbitrary referring expression segmentation (RES), targeting more general input expressions than those handled by prior works. Specifically, our inputs encompass both object- and part-level labels as well as implicit references pointing to properties or qualities of object/part function, design, style, material, etc. Our model, coined RESAnything, leverages Chain-of-Thoughts (CoT) reasoning, where the key idea is attribute prompting. We generate detailed descriptions of object/part attributes including shape, color, and location for potential segment proposals through systematic prompting of a large language model (LLM), where the proposals are produced by a foundational image segmentation model.
	4. Yilin Liu, Duoteng Xu, Xingyao Yu, Xiang Xu, Daniel Cohen-Or, Hao Zhang, and Hui Huang, "HoLa: B-Rep Generation using a Holistic Latent Representation", ACM Transactions on Graphics (Special Issue of SIGGRAPH), 2025. [arXiv \| Project page \| bibtex] We introduce a novel representation for learning and generating Computer-Aided Design (CAD) models in the form of boundary representations (B-Reps). Our representation unifies the continuous geometric properties of B-Rep primitives in different orders (e.g., surfaces and curves) and their discrete topological relations in a holistic latent (HoLa) space. This is based on the simple observation that the topological connection between two surfaces is intrinsically tied to the geometry of their intersecting curve. Such a prior allows us to reformulate topology learning in B-Reps as a geometric reconstruction problem in Euclidean space. Specifically, we eliminate the presence of curves, vertices, and all the topological connections in the latent space by learning to distinguish and derive curve geometries from a pair of surface primitives via a neural intersection network ...
	3. Changhao Li, Yu Xin, Xiaowei Zhou, Ariel Shamir, Hao Zhang, Ligang Liu, and Ruizhen Hu, "MASH: Masked Anchored SpHerical Distances for 3D Shape Representation and Generation", ACM SIGGRAPH, 2025. [Project page \| arXiv \| bibtex] We introduce Masked Anchored SpHerical Distances (MASH), a novel multi-view and parametrized representation of 3D shapes. Inspired by multi-view geometry and motivated by the importance of perceptual shape understanding for learning 3D shapes, MASH represents a 3D shape as a collection of observable local surface patches, each defined by a spherical distance function emanating from an anchor point. We further leverage the compactness of spherical harmonics to encode the MASH functions, combined with a generalized view cone with a parameterized base that masks the spatial extent of the spherical function to attain locality. We develop a differentiable optimization algorithm capable of converting any point cloud into a MASH representation accurately approximating ground-truth surfaces with arbitrary geometry and topology ...
	2. Qirui Huang, Runze Zhang, Kangjun Liu, Minglun Gong, Hao Zhang, and Hui Huang, "ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points", CVPR (highlight), 2025. [arXiv \| Project \| bibtex] We introduce ArcPro, a novel learning framework built on architectural programs to recover structured 3D abstractions from highly sparse and low-quality point clouds. Specifically, we design a domain-specific language (DSL) to hierarchically represent building structures as a program, which can be efficiently converted into a mesh. We bridge feedforward and inverse procedural modeling by using a feedforward process for training data synthesis, allowing the network to make reverse predictions. We train an encoder-decoder on the points-program pairs to establish a mapping from unstructured point clouds to architectural programs, where a 3D convolutional encoder extracts point cloud features and a transformer decoder autoregressively predicts the programs in a tokenized form ...
	1. Dingdong Yang, Yizhi Wang, Konrad Schindler, Ali Mahdavi-Amiri, and Hao Zhang, "GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation", ICLR, 2025. [Project page \| arXiv \| bibtex] We propose GALA, a novel representation of 3D shapes that (i) excels at capturing and reproducing complex geometry and surface details, (ii) is computationally efficient, and (iii) lends itself to 3D generative modelling with modern, diffusion-based schemes. The key idea of GALA is to exploit both the global sparsity of surfaces within a 3D volume and their local surface properties ...

2024

	12. Liqiang Lin, Wenpeng Wu, Chi-Wing Fu, Hao Zhang, and Hui Huang, "CRAYM: Neural Field Optimization via Camera RAY Matching", NeurIPS, 2024. [Project page \| arXiv \| bibtex] We introduce camera ray matching (CRAYM) into the joint optimization of camera poses and neural fields from multi-view images. The optimized field, referred to as a feature volume, can be "probed" by the camera rays for novel view synthesis (NVS) and 3D geometry reconstruction. One key reason for matching camera rays, instead of pixels as in prior works, is that the camera rays can be parameterized by the feature volume to carry both geometric and photometric information. Multi-view consistencies involving the camera rays and scene rendering can be naturally integrated into the joint optimization and network training, to impose physically meaningful constraints to improve the final quality of both the geometric reconstruction and photorealistic rendering. We demonstrate the effectiveness of CRAYM for both NVS and geometry reconstruction, over dense- or sparse-view settings, with qualitative and quantitative comparisons to state-of-the-art alternatives.
	11. Fenggen Yu, Yiming Qian, Xu Zhang, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, and Hao Zhang, "DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly", ECCV, 2024. [arXiv \| bibtex] We present a differentiable rendering framework to learn structured 3D abstractions in the form of primitive assemblies from sparse RGB images capturing a 3D object. By leveraging differentiable volume rendering, our method does not require 3D supervision. Architecturally, our network follows the general pipeline of an image-conditioned neural radiance field (NeRF) exemplified by pixelNeRF for color prediction. As our core contribution, we introduce differen- tial primitive assembly (DPA) into NeRF to output a 3D occupancy field in place of density prediction, where the predicted occupancies serve as opacity values for volume rendering. Our network, coined DPA-Net, produces a union of convexes ...
	10. Ruiqi Wang, Akshay Gadi Patil, Fenggen Yu, and Hao Zhang, "Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images", ECCV, 2024. [Project page \| arXiv \| bibtex] We introduce the first active learning (AL) model for high-accuracy instance segmentation of moveable parts from RGB images of real indoor scenes. Specifically, our goal is to obtain fully validated segmentation results by humans while minimizing manual effort. To this end, we employ a transformer that utilizes a masked-attention mechanism to supervise the active segmentation. To enhance the network tailored to moveable parts, we introduce a coarse-to-fine AL approach which first uses an object-aware masked attention and then a pose-aware one, leveraging the hierarchical nature of the problem and a correlation between moveable parts and object poses and interaction directions.
	9. Qimin Chen, Zhiqin Chen, Vladimir Kim, Noam Aigerman, Hao Zhang, and Siddhartha Chaudhuri, "DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement", ECCV, 2024. [arXiv \| bibtex] We present a 3D modeling method which enables end-users to refine or detailize 3D shapes using machine learning, expanding the capabilities of AI-assisted 3D content creation. Given a coarse voxel shape (e.g., one produced with a simple box extrusion tool or via generative modeling), a user can directly "paint" desired target styles representing compelling geometric details, from input exemplar shapes, over different regions of the coarse shape. These regions are then up-sampled into high-resolution geometries which adhere with the painted styles.
	8. Sai Raj Kishore Perla, Yizhi Wang, Ali Mahdavi-Amiri, and Hao Zhang, "EASI-Tex: Edge-Aware Mesh Texturing from Single Image", ACM Transactions on Graphics (Special Issue of SIGGRAPH), 2024. [arXiv \| bibtex] We introduce a novel approach for single-image mesh texturing, which employs diffusion models with judicious conditioning to seamlessly transfer an object's texture from a single RGB image to a given 3D mesh object. We do not assume that the two objects belong to the same category, and even if they do, there can be significant discrepancies in their geometry and part proportions. Our method aims to rectify the discrepancies by respecting both shape semantics and edge features in the inputs to produce clean and sharp mesh texturization. Leveraging a pre-trained Stable Diffusion generator, our method is capable of transferring textures in the absence of a direct guide from the single-view image.
	7. Zhiqin Chen, Qimin Chen, Hang Zhou, and Hao Zhang, "DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation", ACM SIGGRAPH, 2024. [arXiv \| bibtex] We present an unsupervised 3D shape co-segmentation method which learns a set of deformable part templates from a shape collection. To accommodate structural variations in the collection, our network composes each shape by a selected subset of template parts which are affine-transformed. To maximize the expressive power of the part templates, we introduce a per-part deformation network to enable the modeling of diverse parts with substantial geometry variations, while imposing constraints on the deformation capacity to ensure fidelity to the originally represented parts.
	6. Yilin Liu, Jialei Chen, Shanshan Pan, Daniel Cohen-Or, Hao Zhang, and Hui Huang, "Split-and-Fit: Learning B-Reps via Structure-Aware Voronoi Partitioning", ACM Transactions on Graphics (Special Issue of SIGGRAPH), 2024. [arXiv \| bibtex] We introduce a novel method for acquiring boundary representations (B-Reps) of 3D CAD models which involves a two-step process: it first applies a spatial partitioning, referred to as the "split", followed by a "fit" operation to derive a single primitive within each partition. Specifically, our partitioning aims to produce the classical Voronoi diagram of the set of ground-truth (GT) B-Rep primitives.
	5. Jingyu Hu, Kai-Hei Hui, Zhengzhe liu, Hao Zhang, and Chi-Wing Fu, "CNS-Edit: 3D Shape Editing via Coupled Neural Shape Optimization", ACM SIGGRAPH, 2024. [arXiv \| bibtex] We introduce a new approach based on a coupled representation and a neural volume optimization to implicitly perform 3D shape editing in latent space. This work has three innovations. First, we design the coupled neural shape (CNS) representation for supporting 3D shape editing. This representation includes a latent code, which captures high-level global semantics of the shape, and a 3D neural feature volume, which provides a spatial context to associate with the local shape changes given by the editing. Second, ...
	4. Yizhi Wang, Wallace Lira, Wenqi Wang, Ali Mahdavi-Amiri, and Hao Zhang, "Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction", CVPR, 2024. [Project page \| arXiv \| bibtex] We introduce multi-slice reasoning, a new notion for single-view 3D reconstruction which challenges the current and prevailing belief that multi-view synthesis is the most natural conduit between single-view and 3D. Our key observation is that object slicing is more advantageous than altering views to reveal occluded structures. Specifically, slicing is more occlusion-revealing since it can peel through any occluders without obstruction. In the limit, i.e., with infinitely many slices, it is guaranteed to unveil all hidden object parts.
	3. Dingdong Yang, Yizhi Wang, Ali Mahdavi-Amiri, and Hao Zhang, "BRICS: Bi-level feature Representation of Image CollectionS", preprint, 2023. [Project page \| arXiv \| bibtex] We present BRICS, a bi-level feature representation for image collections, which consists of a key code space on top of a feature grid space. Specifically, our representation is learned by an autoencoder to encode images into continuous key codes, which are used to retrieve features from groups of multi-resolution feature grids. Our key codes and feature grids are jointly trained continuously with well-defined gradient flows, leading to high usage rates of the feature grids and improved generative modeling compared to discrete Vector Quantization (VQ). Differently from existing continuous representations such as KL-regularized latent codes, our key codes are strictly bounded in scale and variance.
	2. Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, and Hao Zhang, "RoSI: Recovering 3D Shape Interiors from Few Articulation Images", preprint, 2023. [arXiv \| bibtex] The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures. We present a learning framework to recover the shape interiors (RoSI) of existing 3D models with only their exteriors from multi-view and multi-articulation images. Given a set of RGB images that capture a target 3D object in different articulated poses, possibly from only few views, our method infers the interior planes that are observable in the input images.
	1. Zeyu Huang, Sisi Dai, Kai Xu, Hao Zhang, Hui Huang, and Ruizhen Hu, "DINA: Deformable INteraction Analogy", Graphical Models, selected paper from Computational Visual Media (CVM), 2024. [arXiv \| bibtex] We introduce deformable interaction analogy (DINA) as a means to generate close interactions between two 3D objects. Given a single demo interaction between an anchor object (e.g., a hand) and a source object (e.g., a mug grasped by the hand), our goal is to generate many analogous 3D interactions between the same anchor object and various new target objects (e.g. a toy airplane), where the anchor object is allowed to be rigid or deformable.

2023

	15. Aditya Vora, Akshay Gadi Patil, and Hao Zhang, "DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization", NeurIPS, 2023. [arXiv \| bibtex] We present a volume rendering-based neural surface reconstruction method that takes as few as three disparate RGB images as input. Our key idea is to regularize the reconstruction, which is severely ill-posed and leaving significant gaps between the sparse views, by learning a set of neural templates that act as surface priors. Our method, coined DiViNet, operates in two stages. The first stage learns the templates, in the form of 3D Gaussian functions, across different scenes, without 3D supervision. In the reconstruction stage, our predicted templates serve as anchors to help “stitch” the surfaces over sparse regions.
	14. Fenggen Yu, Qimin Chen, Maham Tanveer, Ali Mahdavi-Amiri, and Hao Zhang, "D²CSG: Unsupervised Learning of Compact CSG Trees with Dual Complements and Dropouts", NeurIPS, 2023. [arXiv \| bibtex] We present D²CSG, a neural model composed of two dual and complementary network branches, with dropouts, for unsupervised learning of compact constructive solid geometry (CSG) representations of 3D CAD shapes. Our network is trained to reconstruct a 3D shape by a fixed-order assembly of quadric primitives, with both branches producing a union of primitive intersections or inverses. A key difference between D2CSG and all prior neural CSG models is its dedicated residual branch to assemble the potentially complex shape complement, which is subtracted from an overall shape modeled by the cover branch. With the shape complements, our network is provably general, while the weight dropout further improves compactness of the CSG tree by removing redundant primitives.
	13. Qimin Chen, Zhiqin Chen, Hang Zhou, and Hao Zhang, "ShaDDR: Real-Time Example-Based Geometry and Texture Generation via 3D Shape Detailization and Differentiable Rendering", ACM SIGGRAPH Asia, 2023. [arXiv \| bibtex] We present ShaDDR, an example-based deep generative neural network which produces a high-resolution textured 3D shape through geometry detailization and conditional texture generation applied to an input coarse voxel shape. Trained on a small set of detailed and textured exemplar shapes, our method learns to detailize the geometry via multi-resolution voxel upsampling and generate textures on voxel surfaces via differentiable rendering against exemplar texture images from a few views. The generation is realtime, taking less than 1 second to produce a 3D model with voxel resolutions up to 512^3. The generated shape preserves the overall structure of the input coarse voxel model, while the style of the generated geometric details and textures can be manipulated through learned latent codes.
	12. Jingyu Hu, Kai-Hei Hui, Zhengzhe liu, Hao Zhang, and Chi-Wing Fu, "CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration", ACM SIGGRAPH Asia, 2023. [arXiv \| bibtex] This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space. Many recent methods have been developed to encode 3D shapes into a learned latent shape space to enable generative design and modeling. Yet, existing methods lack effective exploration mechanisms, despite the rich information. To this end, we propose to leverage CLIP, a powerful pre-trained vision-language model, to aid the shape-space exploration. Our idea is threefold. First, we couple the CLIP and shape spaces by generating paired CLIP and shape codes through sketch images and training a mapper network to connect the two spaces. Second, to explore the space around a given shape, we formulate a co-optimization strategy to search for the CLIP code that better matches the geometry of the shape. Third, we design three exploration modes, binary-attribute-guided, text-guided, and sketch-guided, to locate suitable exploration trajectories in shape space and induce meaningful changes to the shape.
	11. Zihao Yan, Fubao Su, Mingyang Wang, Ruizhen Hu, Hao Zhang, and Hui Huang, "Interaction-Driven Active 3D Reconstruction with Object Interiors", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), 2023. [Project page \| bibtex] We introduce an active 3D reconstruction method which integrates visual perception, robot-object interaction, and 3D scanning to recover both the exterior and interior geometries of a target 3D object. Unlike other works in active vision which focus on optimizing camera viewpoints to better investigate the environment, the primary feature of our reconstruction is an analysis of the interactability of various parts of the target object and the ensuing part manipulation by a robot to enable scanning of occluded regions. As a result, an understanding of part articulations of the target object is obtained on top of complete geometry acquisition. Our method operates fully automatically by a Fetch robot with built-in RGBD sensors. It iterates between interaction analysis and interaction-driven reconstruction, scanning and reconstructing detected moveable parts one at a time, where both the articulated part detection and mesh reconstruction are carried out by neural networks.
	10. Juzhan Xu, Minglun Gong, Hao Zhang, Hui Huang, and Ruizhen Hu, "Neural Packing: from Visual Sensing to Reinforcement Learning", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 42, No, 6, 2023. [Project page \| bibtex] We present a complete learning framework to solve the real-world transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforcement learning (RL), to solve the NP-hard combinatorial optimization problem. Our network simultaneously selects an object to pack and determines the final packing location, based on a judicious encoding of the continuously evolving states of partially observed source objects and available spaces in the target container, using separate encoders both enabled with attention mechanisms.
	9. Fenggen Yu, Yiming Qian, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, and Hao Zhang, "HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling", ICCV, 2023. [arXiv \| bibtex] We present the first active learning tool for fine-grained 3D part labeling, a problem which challenges even the most advanced deep learning (DL) methods due to the significant structural variations among the small and intricate parts. For the same reason, the necessary data annotation effort is tremendous, motivating approaches to minimize human involvement. Our labeling tool iteratively verifies or modifies part labels predicted by a deep neural network, with human feedback continually improving the network prediction. To effectively reduce human efforts, we develop two novel features in our tool, hierarchical and symmetry-aware active labeling. Our human-in-the-loop approach, coined HAL3D, achieves 100% accuracy (barring human errors) on any test set with pre-defined hierarchical part labels, with 80% time-saving over manual effort.
	8. Maham Tanveer, Yizhi Wang, Ali Mahdavi-Amiri, and Hao Zhang, "DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion", ICCV, 2023. [Project page \| arXiv \| bibtex] We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable. To address an assortment of challenges with our task at hand including conflicting goals (artistic stylization vs. legibility), lack of ground truth, and immense search space, our approach utilizes large language models to bridge texts and visual images for stylization and build an unsupervised generative model with a diffusion model backbone. Specifically, we employ the denoising generator in Latent Diffusion Model (LDM), with the key addition of a CNN-based discriminator to adapt the input style onto the input text. The discriminator uses rasterized images of a given letter/word font as real samples and output of the denoising generator as fake samples ...
	7. Yizhi Wang, Zeyu Huang, Ariel Shamir, Hui Huang, Hao Zhang, and Ruizhen Hu, "ARO-Net: Learning Implicit Fields from Anchored Radial Observations", CVPR, 2023. [arXiv \| bibtex] We introduce anchored radial observations (ARO), a novel shape encoding for learning neural field representation of shapes that is category-agnostic and generalizable amid significant shape variations. The main idea behind our work is to reason about shapes through partial observations from a set of viewpoints, called anchors. We develop a general and unified shape representation by employing a fixed set of anchors, via Fibonacci sampling, and designing a coordinate-based deep neural network to predict the occupancy value of a query point in space. Differently from prior neural implicit models, that use global shape feature, our shape encoder operates on contextual, query-specific features ...
	6. Akshay Gadi Patil, Supriya Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, and Hao Zhang, "Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes", Computer Graphics Forum (State-of-the-Art Report), 2023. [arXiv \| bibtex] This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks – 3D object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis ...
	5. Zeyu Huang, Juzhan Xu, Sisi Dai, Kai Xu, Hao Zhang, Hui Huang, and Ruizhen Hu, "NIFT: Neural Interaction Field and Template for Object Manipulation", International Conference on Robotics and Automation (ICRA), 2023. [arXiv \| bibtex] We introduce NIFT, Neural Interaction Field and Template, a descriptive and robust interaction representation of object manipulations to facilitate imitation learning. Given a few object manipulation demos, NIFT guides the generation of the interaction imitation for a new object instance by matching the Neural Interaction Template (NIT) extracted from the demos to the Neural Interaction Field (NIF) defined for the new object. Specifically, the NIF is a neural field which encodes the relationship between each spatial point and a given object, where the relative position is defined by a spherical distance function rather than occupancies or signed distances, which are commonly adopted by conventional neural fields but less informative ...
	4. Hang Zhou, Rui Ma, Lingxiao Zhang, Lin Gao, Ali Mahdavi-Amiri, and Hao Zhang, "SAC-GAN: Structure-Aware Image Composition", IEEE Trans. on Visualization and Computer Graphics (TVCG), 2023. [arXiv \| bibtex] We introduce an end-to-end learning framework for image-to-image composition, aiming to seamlessly compose an object represented as a cropped patch from an object image into a background scene image. As our approach emphasizes more on semantic and structural coherence of the composed images, rather than their pixel-level RGB accuracies, we tailor the input and output of our network with structure-aware features and design our network losses accordingly, with ground truth established in a self-supervised setting through the object cropping. Specifically, our network takes the semantic layout features from the input scene image, features encoded from the edges and silhouette in the input object patch, as well as a latent code as inputs, and generates a 2D spatial affine transform defining the translation and scaling of the object patch.
	3. Liqiang Lin, Pengdi Huang, Chi-Wing Fu, Kai Xu, Hao Zhang, and Hui Huang, "One Point is All You Need: Directional Attention Point for Feature Learning", Science China Information Sciences (SCIS), Vol. 66, No. 1, 2023. [PDF \| arXiv \| bibtex] We present a novel attention-based mechanism for learning enhanced point features for tasks such as point cloud classification and segmentation. Our key message is that if the right attention point is selected, then “one point is all you need” — not a sequence as in a recurrent model and not a pre-selected set as in all prior works. Also, where the attention point is should be learned, from data and specific to the task at hand. Our mechanism is characterized by a new and simple convolution, which combines the feature at an input point with the feature at its associated attention point. We call such a point adirectional attention point (DAP) ...
	2. Zhiqin Chen, Andrea Tagliasacchi, and Hao Zhang, "Learning Mesh Representations via Binary Space Partitioning Tree Networks", IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) (invited and extended article from CVPR 2020 as Best Student Paper Award winner), Vol. 45, No. 4, pp. 4870-4881, 2023. [arXiv \| Project page (code+video) \| bibtex] Polygonal meshes are ubiquitous, but have only played a relatively minor role in the deep learning revolution. State-of-the-art neural generative models for 3D shapes learn implicit functions and generate meshes via expensive iso-surfacing. We overcome these challenges by employing a classical spatial data structure from graphics, Binary Space Partitioning (BSP), to facilitate 3D learning. The core operation of BSP involves recursive subdivision of 3D space to obtain convex sets. By exploiting this property, we devise BSP-Net, a network that learns to represent a 3D shape via convex decomposition without supervision. The network is trained to reconstruct a shape using a set of convexes obtained from a BSP-tree built over a set of planes, where the planes and convexes are both defined by learned network weights.
	1. Tong Wu, Lin Gao, Lingxiao Zhang, Yu-Kun Lai, and Hao Zhang, "STAR-TM: STructure Aware Reconstruction of Textured Mesh from Single Image", IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2023. [arXiv \| bibtex] We present a novel method for single-view 3D reconstruction of textured meshes, with a focus to address the primary challenge surrounding texture inference and transfer. Our key observation is that learning textured reconstruction in a structure-aware and globally consistent manner is effective in handling the severe ill-posedness of the texturing problem and significant variations in object pose and texture details. Specifically, we perform structured mesh reconstruction, via a retrieval-and-assembly approach, to produce a set of genus-zero parts parameterized by deformable boxes and endowed with semantic information. For texturing, we first transfer visible colors from the input image onto the unified UV texture space of the deformable boxes. Then we combine a learned transformer model for per-part texture completion with a global consistency loss to optimize inter-part texture consistency. Our texture completion model operates in a VQ-VAE embedding space and is trained end-to-end, with the transformer training enhanced with retrieved texture instances to improve texture completion performance amid significant occlusion.

2022

	6. Yilin Liu, Liqiang Lin, Ke Xie, Chi-Wing Fu, Hao Zhang, and Hui Huang, "Learning Reconstructability for Drone Aerial Path Planning", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), 2022. [Project \| arXiv \| bibtex] We introduce the first learning-based reconstructability predictor to improve view and path planning for large-scale 3D urban scene acquisition using unmanned drones. In contrast to previous heuristic approaches, our method learns a model that explicitly predicts how well a 3D urban scene will be reconstructed from a set of viewpoints. To make such a model trainable and simultaneously applicable to drone path planning, we simulate the proxy-based 3D scene reconstruction during training to set up the prediction. Specifically, the neural network we design is trained to predict the scene reconstructability as a function of the proxy geometry, a set of viewpoints, and optionally a series of scene images acquired in flight ...
	5. Zhiqin Chen, Andrea Tagliasacchi, Thomas Funkhouser, and Hao Zhang, "Neural Dual Contouring", ACM Transactions on Graphics (Special Issue of SIGGRAPH), 2022. [arXiv \| bibtex] We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC). Like traditional DC, it produces exactly one vertex per grid cell and one quad for each grid edge intersection, a natural and efficient structure for reproducing sharp features. However, rather than computing vertex locations and edge crossings with hand-crafted functions that depend directly on difficult-to-obtain surface gradients, NDC uses a neural network to predict them. As a result, NDC can be trained to produce meshes from signed or unsigned distance fields, binary voxel grids, or point clouds (with or without normals); and it can produce open surfaces in cases where the input represents a sheet or partial surface.
	4. Fenggen Yu, Zhiqin Chen, Manyi Li, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, and Hao Zhang, "CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive Assembly", CVPR, 2022. [arXiv \| bibtex] We introduce CAPRI-Net, a neural network for learning compact and interpretable implicit representations of 3D computer-aided design (CAD) models, in the form of adaptive primitive assemblies. Our network takes an input 3D shape that can be provided as a point cloud or voxel grids, and reconstructs it by a compact assembly of quadric surface primitives via constructive solid geometry (CSG) operations. The network is self-supervised with a reconstruction loss, leading to faithful 3D reconstructions with sharp edges and plausible CSG trees, without any ground-truth shape assemblies.
	3. Qimin Chen, Johannes Merz, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, and Hao Zhang, "UNIST: Unpaired Neural Implicit Shape Translation Network", CVPR, 2022. [arXiv \| bibtex] We introduce UNIST, the first deep neural implicit model for general-purpose, unpaired shape-to-shape translation, in both 2D and 3D domains. Our model is built on autoencoding implicit fields, rather than point clouds which represents the state of the art. Furthermore, our translation network is trained to perform the task over a latent grid representation which combines the merits of both latent-space processing and position awareness, to not only enable drastic shape transforms but also well preserve spatial features and fine local details for natural shape translations.
	2. Chengjie Niu, Manyi Li, Kai Xu, and Hao Zhang, "RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures", CVPR, 2022. [arXiv \| bibtex] We introduce RIM-Net, a neural network which learns recursive implicit fields for unsupervised inference of hierarchical shape structures. Our network recursively decomposes an input 3D shape into two parts, resulting in a binary tree hierarchy. Each level of the tree corresponds to an assembly of shape parts, represented as implicit functions, to reconstruct the input shape. At each node of the tree, simultaneous feature decoding and shape decomposition are carried out by their respective feature and part decoders, with weight sharing across the same hierarchy level ...
	1. Yanran Guan, Han Liu, Kun Liu, Kangxue Yin, Ruizhen Hu, Oliver van Kaick, Yan Zhang, Ersin Yumer, Nathan Carr, Radomir Mech, and Hao Zhang, "FAME: 3D Shape Generation via Functionality-Aware Model Evolution", IEEE Trans. on Visualization and Computer Graphics (TVCG), Vol. 28, No. 4, pp. 1758-1772, 2022. [arXiv \| bibtex] We introduce a modeling tool which can evolve a set of 3D objects in a functionality-aware manner. Our goal is for the evolution to generate large and diverse sets of plausible 3D objects for data augmentation, constrained modeling, as well as open-ended exploration to possibly inspire new designs. Starting with an initial population of 3D objects belonging to one or more functional categories, we evolve the shapes through part re-combination to produce generations of hybrids or crossbreeds between parents from the heterogeneous shape collection ...

2021

	11. Zhiqin Chen and Hao Zhang, "Neural Marching Cubes", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 40, No. 6, 2021. [arXiv \| code \| bibtex] We introduce Neural Marching Cubes (NMC), a data-driven approach for extracting a triangle mesh from a discretized implicit field. We re-cast MC from a deep learning perspective, by designing tessellation templates more apt at preserving geometric features, and learning the vertex positions and mesh topologies from training meshes, to account for contextual information from nearby cubes. We develop a compact per-cube parameterization to represent the output triangle mesh, while being compatible with neural processing, so that a simple 3D convolutional network can be employed for the training. We evaluate our neural MC approach by quantitative and qualitative comparisons to all well-known MC variants, demonstrating its superiority in faithful reconstruction of sharp features and mesh topology.
	10. Jiongchao Jin, Arezou Fatemi (equal contribution), Wallace Lira, Fenggen Yu, Biao Leng, Rui Ma, Ali Mahdavi-Amiri, and Hao Zhang, "RaidaR: A Rich Annotated Image Dataset of Rainy Street Scenes", Second ICCV Workshop on Autonomous Vehicle Vision (AVVision), 2021. [dataset \| arXiv \| bibtex] We introduce RaidaR, a rich annotated image dataset of rainy street scenes, to support autonomous driving research. The new dataset contains the largest number of rainy images (58,542) to date, 5,000 of which provide semantic segmentations and 3,658 provide object instance segmentations. The RaidaR images cover a wide range of realistic rain-induced artifacts, including fog, droplets, and road reflections, which can effectively augment existing street scene datasets to improve data-driven machine perception during rainy weather.
	9. Lin Gao, Tong Wu, Yu-Jie Yuan, Ming-Xian Lin, Yu-Kun Lai, and Hao Zhang, "TM-NET: Deep Generative Networks for Textured Meshes", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 40, No. 6, 2021. [arXiv \| project page \| bibtex] We introduce TM-NET, a novel deep generative model for synthesizing textured meshes in a part-aware manner. Once trained, the network can generate novel textured meshes from scratch or predict textures for a given 3D mesh, without image guidance. Plausible and diverse textures can be generated for the same mesh part, while texture compatibility between parts in the same shape is achieved via conditional generation. Specifically, our method produces texture maps for individual shape parts, each as a deformable box, leading to a natural UV map with minimal distortion. The network separately embeds part geometry (via a PartVAE) and part texture (via a TextureVAE) into their respective latent spaces ...
	8. Han Zhang, Yusong Yao, Ke Xie, Chi-Wing Fu, Hao Zhang, and Hui Huang, "Continuous Aerial Path Planning for 3D Urban Scene Reconstruction", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 40, No. 6, 2021. [Project page \| bibtex] We introduce a path-oriented drone trajectory planning algorithm, which performs continuous image acquisition along an aerial path, aiming to optimize both the scene reconstruction quality and path quality. Specifically, our method takes as input a rough 3D scene proxy and produces a drone trajectory and image capturing setup, which efficiently yields a high-quality reconstruction of the 3D scene based on three optimization objectives: one maximize the amount of 3D scene information that can be acquired along the entirety of the trajectory, another one to optimize the scene capturing efficiency by maximizing the scene information that can be acquired per unit length along the aerial path, and the last one to minimize the total turning angles along the aerial path, so as to reduce the number of sharp turns.
	7. Huan Fu, Bowen Cai, Lin Gao, Lingxiao Zhang, Cao Li, Zengqi Xun, Chengyue Sun, Yiyun Fei, Yu Zheng, Ying Li, Yi Liu, Peng Liu, Lin Ma, Le Weng, Xiaohang Hu, Xin Ma, Qian Qian, Rongfei Jia, Binqiang Zhao, and Hao Zhang, "3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics", ICCV, 2021. [arXiv \| bibtex] We introduce 3D-FRONT (3D Furnished Rooms with layOuts and semaNTics), a new, large-scale, and comprehensive repository of synthetic indoor scenes highlighted by professionally designed layouts and a large number of rooms populated by high-quality textured 3D models with style compatibility. From layout semantics down to texture details of individual objects, our dataset is freely available to the academic community and beyond. Currently, 3D-FRONT contains 18,797 rooms diversely furnished by 3D objects, far surpassing all publicly available scene datasets. In addition, the 7,302 furniture objects all come with high-quality textures ...
	6. Rinon Gal, Amit Bermano, Hao Zhang, and Daniel Cohen-Or, "MRGAN: Multi-Rooted 3D Shape Generation with Unsupervised Part Disentanglement", ICCV Workshop on Structural and Compositional Learning on 3D Data (StruCo3D), 2021. [arXiv \| bibtex] We present MRGAN, a multi-rooted adversarial network which generates part-disentangled 3D point-cloud shapes without part-based shape supervision. The network fuses multiple branches of tree-structured graph convolution layers which produce point clouds, with learnable constant inputs at the tree roots. Each branch learns to grow a different shape part, offering control over the shape generation at the part level. Our network encourages disentangled generation of semantic parts via two key ingredients: a root-mixing training strategy which helps decorrelate the different branches to facilitate disentanglement, and a set of loss terms designed with part disentanglement and shape semantics in mind.
	5. Manyi Li and Hao Zhang, "D^2IM-Net: Learning Detail Disentangled Implicit Fields from Single Images", CVPR, 2021. [arXiv \| bibtex] We present the first single-view 3D reconstruction network aimed at recovering geometric details from an input image which encompass both topological shape structures and surface features. Our key idea is to train the network to learn a detail disentangled reconstruction consisting of two functions, one implicit field representing the coarse 3D shape and the other capturing the details. Given an input image, our network, coined D2IM-Net, encodes it into global and local features which are respectively fed into two decoders. The base decoder uses the global features to reconstruct a coarse implicit field, while the detail decoder reconstructs, from the local features, two displacement maps, defined over the front and back sides of the captured object. The final 3D reconstruction is a fusion between the base shape and the displacement maps, with three losses enforcing the recovery of coarse shape, overall structure, and surface details via a novel Laplacian term.
	4. Zhiqin Chen, Vladimir Kim, Matthew Fisher, Noam Aigerman, Hao Zhang, and Siddhartha Chaudhuri, "DECOR-GAN: 3D Shape Detailization by Conditional Refinement", CVPR (oral), 2021. [arXiv \| bibtex] We introduce a deep generative network for 3D shape detailization, akin to stylization with the style being geometric details. We address the challenge of creating large varieties of high-resolution and detailed 3D geometry from a small set of exemplars by treating the problem as that of geometric detail transfer. Given a low-resolution coarse voxel shape, our network refines it, via voxel upsampling, into a higher-resolution shape enriched with geometric details. The output shape preserves the overall structure (or content) of the input, while its detail generation is conditioned on an input "style code" corresponding to a detailed exemplar.
	3. Akshay Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, and Hao Zhang, "LayoutGMN: Neural Graph Matching for Structural Layout Similarity", CVPR, 2021. [arXiv \| bibtex] We present a deep neural network to predict structural similarity between 2D layouts by leveraging Graph Matching Networks (GMN). Our network, coined LayoutGMN, learns the layout metric via neural graph matching, using an attention-based GMN designed under a triplet network setting. To train our network, we utilize weak labels obtained by pixel-wise Intersection-over-Union (IoUs) to define the triplet loss. Importantly, LayoutGMN is built with a structural bias which can effectively compensate for the lack of structure awareness in IoUs.
	2. Yiming Qian, Hao Zhang, and Yasutaka Furukawa, "Roof-GAN: Learning to Generate Roof Geometry and Relations for Residential Houses", CVPR, 2021. [arXiv \| bibtex] This paper presents Roof-GAN, a novel generative adversarial network that generates structured geometry of residential roof structures as a set of roof primitives and their relationships. Given the number of primitives, the generator produces a structured roof model as a graph, which consists of 1) primitive geometry as raster images at each node, encoding facet segmentation and angles; 2) inter-primitive colinear/coplanar relationships at each edge; and 3) primitive geometry in a vector format at each node, generated by a novel differentiable vectorizer while enforcing the relationships.
	1. Or Patashnik, Dov Danon, Hao Zhang, and Daniel Cohen-Or, "BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer", CVPR Workshop on Learning from Limited and Imperfect Data (L2ID), 2021. [Project page \| arXiv \| bibtex] State-of-the-art image-to-image translation methods tend to struggle in an imbalanced domain setting, where one image domain lacks richness and diversity. We introduce a new unsupervised translation network, BalaGAN, specifically designed to tackle the domain imbalance problem. We leverage the latent modalities of the richer domain to turn the image-to-image translation problem, between two imbalanced domains, into a balanced, multi-class, and conditional translation problem, more resembling the style transfer setting. Specifically, we analyze the source domain and learn a decomposition of it into a set of latent modes or classes, without any supervision. This leaves us with a multitude of balanced cross-domain translation tasks, between all pairs of classes, including the target domain. During inference, the trained network takes as input a source image, as well as a reference or style image from one of the modes as a condition, and produces an image which resembles the source on the pixel-wise level, but shares the same mode as the reference.

2020

	13. Xiaogang Wang, Yuelang Xu, Kai Xu, Andrea Tagliasacchi, Bin Zhou, Ali Mahdavi-Amiri, and Hao Zhang, "PIE-NET: Parametric Inference of Point Cloud Edges", NeurIPS, 2020. [arXiv \| bibtex] We introduce an end-to-end learnable technique to robustly identify feature edges in 3D point cloud data. We represent these edges as a collection of parametric curves (i.e., lines, circles, and B-splines). Accordingly, our deep neural network, coined PIE-NET, is trained for parametric inference of edges. The network is trained on the ABC dataset and relies on a "region proposal" architecture, where a first module proposes an over-complete collection of edge and corner points, and a second module ranks each proposal to decide whether it should be considered.
	12. Kangxue Yin, Zhiqin Chen, Siddhartha Chaudhuri, Matt Fisher, Vladimir Kim, and Hao Zhang, "COALESCE: Component Assembly by Learning to Synthesize Connections", 3D Vision (3DV), (oral), 2020. [arXiv \| bibtex] We introduce COALESCE, the first data-driven framework for component-based shape assembly which employs deep learning to synthesize part connections. To handle geometric and topological mismatches between parts, we remove the mismatched portions via erosion, and rely on a joint synthesis step, which is learned from data, to fill the gap and arrive at a natural and plausible part joint. Given a set of input parts extracted from different objects, COALESCE automatically aligns them and synthesizes plausible joints to connect the parts into a coherent 3D object represented by a mesh. The joint synthesis network, designed to focus on joint regions, reconstructs the surface between the parts by predicting an implicit shape representation that agrees with existing parts, while generating a smooth and topologically meaningful connection.
	11. Ali Mahdavi-Amiri, Fenggen Yu, Haisen Zhao, Adriana Schulz, and Hao Zhang, "VDAC: Volume Decompose-and-Carve for Subtractive Manufacturing", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 39, No. 6, 2020. [PDF \| Project page \| bibtex] We introduce carvable volume decomposition for efficient 3-axis CNC machining of 3D freeform objects, where our goal is to develop a fully automatic method to jointly optimize setup and path planning. We formulate our joint optimization as a volume decomposition problem which prioritizes minimizing the number of setup directions while striving for a minimum number of continuously carvable volumes, where a 3D volume is continuously carvable, or simply carvable, if it can be carved with the machine cutter traversing a single continuous path. Geometrically, carvability combines visibility and monotonicity and presents a new shape property which had not been studied before.
	10. Ruizhen Hu, Juzhan Xu, Bin Chen, Minglun Gong, Hao Zhang, and Hui Huang, "TAP-Net: Transport-and-Pack using Reinforcement Learning", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 39, No. 6, 2020. [Project page \| arXiv \| bibtex] We introduce the transport-and-pack (TAP) problem, a frequently encountered instance of real-world packing, and develop a neural optimization solution based on reinforcement learning. Given an initial spatial configuration of boxes, we seek an efficient method to iteratively transport and pack the boxes compactly into a target container. Due to obstruction and accessibility constraints, our problem has to add a new search dimension, i.e., finding an optimal transport sequence, to the already immense search space for packing alone. Using a learning-based approach, a trained network can learn and encode solution patterns to guide the solution of new problem instances instead of executing an expensive online search.
	9. Zili Yi, Zhiqin Chen, Hao Cai, Wendong Mao, Minglun Gong, and Hao Zhang, "BSD-GAN: Branched Generative Adversarial Networks for Scale-Disentangled Learning and Synthesis of Images", IEEE Trans. on Image Processing, Vol. 29, pp. 9073-9083, 2020. [arXiv \| code \| bibtex] We introduce BSD-GAN, a novel multi-branch and scale-disentangled training method which enables unconditional Generative Adversarial Networks (GANs) to learn image representations at multiple scales, benefiting a wide range of generation and editing tasks. The key feature of BSD-GAN is that it is trained in multiple branches, progressively covering both the breadth and depth of the network, as resolutions of the training images increase to reveal finer-scale features. Specifically, each noise vector, as input to the generator network of BSD-GAN, is deliberately split into several sub-vectors, each corresponding to, and is trained to learn, image representations at a particular scale. During training, we progressively "de-freeze" the sub-vectors, one at a time, as a new set of higher-resolution images is employed for training and more network layers are added.
	8. Wallace Lira, Johannes Merz, Daniel Ritchie, Daniel Cohen-Or, and Hao Zhang, "GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation", ECCV, 2020. [arXiv \| Youtube video \| bibtex] We introduce GANhopper, an unsupervised image-to-image translation network that transforms images gradually between two domains, through multiple hops. Instead of executing translation directly, we steer the translation by requiring the network to produce in-between images which resemble weighted hybrids between images from the two input domains. Our network is trained on unpaired images from the two domains only, without any in-between images. All hops are produced using a single generator along each direction. In addition to the standard cycle-consistency and adversarial losses, we introduce a new hybrid discriminator, which is trained to classify the intermediate images produced by the generator as weighted hybrids, with weights based on a predetermined hop count.
	7. Jiongchao Jin, Akshay Gadi Patil, Zhang Xiong, and Hao Zhang, "DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction", ECCV, 2020. [arXiv \| bibtex] We introduce a differential visual similarity metric to train deep neural networks for 3D reconstruction, aimed at improving reconstruction quality. The metric compares two 3D shapes by measuring distances between multi-view images differentiably rendered from the shapes. Importantly, the image-space distance is also differentiable and measures visual similarity, rather than pixel-wise distortion. Specifically, the similarity is defined by mean-squared errors over HardNet features computed from probabilistic keypoint maps of the compared images. Our differential visual shape similarity metric can be easily plugged into various 3D reconstruction networks, replacing their distortion-based losses, such as Chamfer or Earth Mover distances, so as to optimize the network weights to produce reconstructions with better structural fidelity and visual quality.
	6. Hao Xu, Ka Hei Hui, Chi-Wing Fu, and Hao Zhang, "TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network", ACM Transactions on Graphics (Special Issue of SIGGRAPH), Vol. 39, No. 4, 2020. [Project page \| arXiv \| Code \| bibtex] We introduce the first neural optimization framework to solve a classical instance of the tiling problem. Namely, we seek a non-periodic tiling of an arbitrary 2D shape using one or more types of tiles: the tiles maximally fill the shape’s interior without overlaps or holes. To start, we reformulate tiling as a graph problem by modeling candidate tile locations in the target shape as graph nodes and connectivity between tile locations as edges. We build a graph convolutional neural network, coined TilinGNN, to progressively propagate and aggregate features over graph edges and predict tile placements. Our network is self-supervised and trained by maximizing the tiling coverage on target shapes, while avoiding overlaps and holes between the tiles. After training, TilinGNN has a running time that is roughly linear to the number of candidate tile locations, significantly outperforming traditional combinatorial search.
	5. Ruizhen Hu, Zeyu Huang, Yuhan Tang, Oliver van Kaick, Hao Zhang, and Hui Huang, "Graph2Plan: Learning Floorplan Generation from Layout Graphs", ACM Transactions on Graphics (Special Issue of SIGGRAPH), Vol. 39, No. 4, 2020. [Project page \| arXiv \| bibtex] We introduce a learning framework for automated floorplan generation which combines generative modeling using deep neural networks and user-in-the-loop designs to enable human users to provide sparse design constraints. Such constraints are represented by a layout graph. The core component of our learning framework is a deep neural network, Graph2Plan, which is trained on RPLAN, a large-scale dataset consisting of 80K annotated, human-designed floorplans. The network converts a layout graph, along with a building boundary, into a floorplan that fulfills both the layout and boundary constraints.
	4. Zhiqin Chen, Andrea Tagliasacchi, and Hao Zhang, "BSP-Net: Generating Compact Meshes via Binary Space Partitioning", CVPR, (oral), 2020. Best Student Paper Award. [Project page (code+video) \| arXiv \| bibtex] Polygonal meshes are ubiquitous in the digital 3D domain. Leading methods for learning generative models of shapes rely on implicit functions, and generate meshes only after expensive iso-surfacing routines. To overcome these challenges, we are inspired by a classical spatial data structure from computer graphics, Binary Space Partitioning (BSP), to facilitate 3D learning. The core ingredient of BSP is an operation for recursive subdivision of space to obtain convex sets. By exploiting this property, we devise BSP-Net, a network that learns to represent a 3D shape via convex decomposition. Importantly, BSP-Net is unsupervised since no convex shape decompositions are needed for training. The network is trained to reconstruct a shape using a set of convexes obtained from a BSP-tree built on a set of planes. The convexes inferred by BSP-Net can be easily extracted to form a polygon mesh, without any need for iso-surfacing. The generated meshes are compact (i.e., low-poly) and well suited to represent sharp geometry; they are guaranteed to be watertight and can be easily parameterized.
	3. Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Li Yi, Leonidas J. Guibas, and Hao Zhang, "AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss", CVPR, (oral), 2020. [arXiv \| bibtex] We introduce AdaSeg, a deep neural network architecture for adaptive co-segmentation of a set of 3D shapes represented as point clouds. Differently from the familiar single-instance segmentation problem, co-segmentation is intrinsically contextual: how a shape is segmented can vary depending on the set it is in. Hence, our network features an adaptive learning module to produce a consistent shape segmentation which adapts to a set.
	2. Rundi Wu, Yixin Zhuang, Kai Xu, Hao Zhang, and Baoquan Chen, "PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes", CVPR, 2020. [arXiv \| bibtex] We introduce PQ-NET, a deep neural network which represents and generates 3D shapes via sequential part assembly. The input to our network is a 3D shape segmented into parts, where each part is first encoded into a feature representation using a part autoencoder. The core component of PQ-NET is a sequence-to-sequence or Seq2Seq autoencoder which encodes a sequence of part features into a latent vector of fixed size, and the decoder reconstructs the 3D shape, one part at a time, resulting in a sequential assembly. The latent space formed by the Seq2Seq encoder encodes both part structure and fine part geometry. The decoder can be adapted to perform several generative tasks including shape autoencoding, interpolation, novel shape generation, and single-view 3D reconstruction, where the generated shapes are all composed of meaningful parts.
	1. Siddhartha Chaudhuri, Daniel Ritchie, Jiajun Wu, Kai Xu, and Hao Zhang, "Learning Generative Models of 3D Structures", Computer Graphics Forum (Eurographics STAR), 2020. [Project page \| PDF \| bibtex] 3D models of objects and scenes are critical to many academic disciplines and industrial applications. Of particular interest is the emerging opportunity for 3D graphics to serve artificial intelligence: computer vision systems can benefit from synthetically- generated training data rendered from virtual 3D scenes, and robots can be trained to navigate in and interact with real-world environments by first acquiring skills in simulated ones. One of the most promising ways to achieve this is by learning and applying generative models of 3D content: computer programs that can synthesize new 3D shapes and scenes. To allow users to edit and manipulate the synthesized 3D content to achieve their goals, the generative model should also be structure-aware: it should express 3D shapes and scenes using abstractions that allow manipulation of their high-level structure. This state-of-the- art report surveys historical work and recent progress on learning structure-aware generative models of 3D shapes and scenes.

2019

	11. Kangxue Yin, Zhiqin Chen, Hui Huang, Daniel Cohen-Or, and Hao Zhang, "LOGAN: Unpaired Shape Transform in Latent Overcomplete Space", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 38, No. 6, Article 198, 2019. One of six papers selected for press release at SIGGRAPH Asia. [arXiv \| code \| bibtex] We introduce LOGAN, a deep neural network aimed at learning general-purpose shape transforms from unpaired domains. The network is trained on two sets of shapes, e.g., tables and chairs, while there is neither a pairing between shapes from the domains as supervision nor any point-wise correspondence between any shapes. Once trained, LOGAN takes a shape from one domain and transforms it into the other. Our network consists of an autoencoder to encode shapes from the two input domains into a common latent space, where the latent codes concatenate multi-scale shape features, resulting in an overcomplete representation. The translator is based on a latent generative adversarial network (GAN), where an adversarial loss enforces cross-domain translation while a feature preservation loss ensures that the right shape features are preserved for a natural shape transform.
	10. Lin Gao, Jie Yang, Tong Wu, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai, and Hao Zhang, "SDM-NET: Deep Generative Network for Structured Deformable Mesh", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 38, No. 6, Article 243, 2019. [Project page \| arXiv \| bibtex] We introduce SDM-NET, a deep generative neural network which produces structured deformable meshes. Specifically, the network is trained to generate a spatial arrangement of closed, deformable mesh parts, which respect the global part structure of a shape collection, e.g., chairs, airplanes, etc. Our key observation is that while the overall structure of a 3D shape can be complex, the shape can usually be decomposed into a set of parts, each homeomorphic to a box, and the finer-scale geometry of the part can be recovered by deforming the box. The architecture of SDM-NET is that of a two-level variational autoencoder (VAE). At the part level, a PartVAE learns a deformable model of part geometries. At the structural level, we train a Structured Parts VAE (SP-VAE), which jointly learns the part structure of a shape collection and the part geometries, ensuring a coherence between global shape structure and surface details.
	9. Hao Xu, Ka Hei Hui, Chi-Wing Fu, and Hao Zhang, "Computational LEGO Technic Design", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 38, No. 6, Article 196, 2019. Included in the SIGGRAPH Asia 2019 technical paper trailer. [arXiv \| bibtex] We introduce a method to automatically compute LEGO Technic models from user input sketches, optionally with motion annotations. The generated models resemble the input sketches with coherently-connected bricks and simple layouts, while respecting the intended symmetry and mechanical properties expressed in the inputs. This complex computational assembly problem involves an immense search space, and a much richer brick set and connection mechanisms than regular LEGO. To address it, we first comprehensively model the brick properties and connection mechanisms, then formulate the construction requirements into an objective function, accounting for faithfulness to input sketch, model simplicity, and structural integrity. Next, we model the problem as a sketch cover, where we iteratively refine a random initial layout to cover the input sketch, while guided by the objective. At last, we provide a working system to analyze the balance, stress, and assemblability of the generated model.
	8. Zhihao Yan, Ruizhen Hu, Xingguang Yan, Luanmin Chen, Oliver van Kaick, Hao Zhang, and Hui Huang, "RPM-Net: Recurrent Prediction of Motion and Parts from Point Cloud", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 38, No. 6, Article 240, 2019. [Project page \| bibtex] We introduce RPM-Net, a deep learning-based approach which simultaneously infers movable parts and hallucinates their motions from a single, un-segmented, and possibly partial, 3D point cloud shape. RPM-Net is a novel Recurrent Neural Network (RNN), composed of an encoder-decoder pair with interleaved Long Short-Term Memory (LSTM) components, which together predict a temporal sequence of point-wise displacements for the input shape. At the same time, the displacements allow the network to learn moveable parts, resulting in a motion-based shape segmentation. Recursive applications of RPM-Net on the obtained parts can predict finer-level part motions, resulting in a hierarchical object segmentation. Furthermore, we develop a separate network to estimate part mobilities, e.g., per part motion parameters, from the segmented motion sequence.
	7. Zhiqin Chen, Kangxue Yin, Matt Fisher, Siddhartha Chaudhuri, and Hao Zhang, "BAE-NET: Branched Autoencoder for Shape Co-Segmentation", ICCV, 2019. [arXiv \| code \| bibtex] We treat shape co-segmentation as a representation learning problem and introduce BAE-NET, a branched autoencoder network, for the task. The unsupervised BAE-NET is trained with all shapes in an input collection using a shape reconstruction loss, without ground-truth segmentations. Specifically, the network takes an input shape and encodes it using a convolutional neural network, whereas the decoder concatenates the resulting feature code with a point coordinate and outputs a value indicating whether the point is inside/outside the shape. Importantly, the decoder is branched: each branch learns a compact representation for one commonly recurring part of the shape collection, e.g., airplane wings. By complementing the shape reconstruction loss with a label loss, BAE-NET is easily tuned for one-shot learning.
	6. Nadav Schor, Oren Katzier, Hao Zhang, and Daniel Cohen-Or, "CompoNet: Learning to Generate the Unseen by Part Synthesis and Composition", ICCV, 2019. [arXiv \| code \| bibtex] Data-driven generative modeling has made remarkable progress by leveraging the power of deep neural networks. A reoccurring challenge is how to sample a rich variety of data from the entire target distribution, rather than only from the distribution of the training data. In other words, we would like the generative model to go beyond the observed training samples and learn to also generate "unseen" data. In our work, we present a generative neural network for shapes that is based on a part-based prior, where the key idea is for the network to synthesize shapes by varying both the shape parts and their compositions.
	5. Zhiqin Chen and Hao Zhang, "Learning Implicit Fields for Generative Shape Modeling", CVPR, arXiv:1812.02822, 2019. [PDF \| code \| bibtex] We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not ...
	4. Manyi Li, Akshay Gadi Patil, Kai Xu, Siddhartha Chaudhuri, Owais Khan, Ariel Shamir, Changhe Tu, Baoquan Chen, Daniel Cohen-Or, and Hao Zhang, "GRAINS: Generative Recursive Autoencoders for INdoor Scenes", ACM Transactions on Graphics, Vol. 38, No. 2, Article 12, presented at SIGGRAPH, 2019. [arXiv \| bibtex] We present a generative neural network which enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently. Our key observation is that indoor scene structures are inherently hierarchical. Hence, our network is not convolutional; it is a recursive neural network or RvNN. Using a dataset of annotated scene hierarchies, we train a variational recursive autoencoder, or RvNN-VAE, which performs scene object grouping during its encoding phase and scene generation during decoding.
	3. Yuan Gan, Yan Zhang, and Hao Zhang, "Qualitative Organization of Photo Collections via Quartet Analysis and Active Learning", Proc. of Graphics Interface, 2019. [PDF] We introduce the use of qualitative analysis and active learning to photo album construction. Given a heterogeneous collection of pho- tos, we organize them into a hierarchical categorization tree (C-tree) based on qualitative analysis using quartets instead of relying on conventional, quantitative image similarity metrics. The main moti- vation is that in a heterogeneous collection, quantitative distances may become unreliable between dissimilar data and there is unlikely a single metric that is well applicable to all data.
	2. Pengfei Xu, Jiangqiang Ding, Hao Zhang, and Hui Huang, "Discernible Image Mosaic with Edge-Aware Adaptive Tiles", Computational Visual Media (CVM), 2019. [Project page \| bibtex] We present a novel method to produce discernible image mosaics, with relatively large image tiles replaced by images drawn from a database, to resemble a target image. Since visual edges strongly support content perception, we compose our mosaic via edge-aware photo retrieval to best preserve visual edges in the target image. Moreover, unlike most previous works which apply a pre-determined partition to an input image, our image mosaics are composed by adaptive tiles, whose sizes are determined based on the available images and an objective of maximizing resemblance to the target.
	1. Kangxue Yin, Hui Huang, Edmond S. L. Ho, Hao Wang, Taku Komura, Daniel Cohen-Or, and Hao Zhang, "A Sampling Approach to Generating Closely Interacting 3D Pose-pairs from 2D Annotations", IEEE Trans. on Visualization and Computer Graphics (TVCG), Vol. 25, No. 6, pages 2217-2227, 2019. [PDF \| bibtex] We introduce a data-driven method to generate a large number of plausible, closely interacting 3D human pose-pairs, for a given motion category, e.g., wrestling or salsa dance. With much difficulty in acquiring close interactions using 3D sensors, our approach utilizes abundant existing video data which cover many human activities. Instead of treating the data generation problem as one of reconstruction, we present a solution based on Markov Chain Monte Carlo (MCMC) sampling. Given a motion category and a set of video frames depicting the motion with the 2D pose-pair in each frame annotated, we start the sampling with one or few seed 3D pose-pairs which are manually created based on the target motion category. The initial set is then augmented by MCMC sampling around the seeds, via the Metropolis-Hastings algorithm and guided by a probability density function (PDF) that is defined by two terms to bias the sampling towards 3D pose-pairs that are physically valid and plausible for the motion category.

2018

	11. Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Renjiao Yi, and Hao Zhang, "SCORES: Shape Composition with Recursive Substructure Priors", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 37, No. 6, Article 211, 2018. [arXiv \| Project page \| bibtex] We introduce SCORES, a recursive neural network for shape composition. Our network takes as input sets of parts from two or more source 3D shapes and a rough initial placement of the parts. It outputs an optimized part structure for the composed shape, leading to high-quality geometry construction. A unique feature of our composition network is that it is not merely learning how to connect parts. Our goal is to produce a coherent and plausible 3D shape, despite large incompatibilities among the input parts. The network may significantly alter the geometry and structure of the input parts and synthesize a novel shape structure based on the inputs, while adding or removing parts to minimize a structure plausibility loss.
	10. Wallace Lira, Chi-Wing Fu, and Hao Zhang, "Fabricable Eulerian Wires for 3D Shape Abstraction", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 37, No. 6, Article 240, 2018. [Project page \| bibtex] We present a fully automatic method that finds a small number of machine fabricable wires with minimal overlap to reproduce a wire sculpture design as a 3D shape abstraction. Importantly, we consider non-planar wires, which can be fabricated by a wire bending machine, to enable efficient construction of complex 3D sculptures that cannot be achieved by previous works. We call our wires Eulerian wires, since they are as Eulerian as possible with small overlap to form the target design together.
	9. Shuhua Li, Ali Mahdavi-Amiri, Ruizhen Hu, Han Liu, Changqing Zou, Oliver van Kaick, Xiuping Liu, Hui Huang, and Hao Zhang, "Construction and Fabrication of Reversible Shape Transforms", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 37, No. 6, Article 190, 2018. [Paper \| Youtube video \| bibtex] We study a new and elegant instance of geometric dissection of 2D shapes: reversible hinged dissection, which corresponds to a dual transform between two shapes where one of them can be dissected in its interior and then inverted inside-out, with hinges on the shape boundary, to reproduce the other shape, and vice versa. We call such a transform reversible inside-out transform or RIOT. Since it is rare for two shapes to possess even a rough RIOT, let alone an exact one, we develop both a RIOT construction algorithm and a quick filtering mechanism to pick, from a shape collection, potential shape pairs that are likely to possess the transform. Our construction algorithm is fully automatic. It computes an approximate RIOT between two given input 2D shapes, whose boundaries can undergo slight deformations, while the filtering scheme picks good inputs for the construction.
	8. Rui Ma, Akshay Gadi Patil (co-first author), Matt Fisher, Manyi Li, Soren Pirk, Binh-Son Hua, Sai-Kit Yeung, Xin Tong, Leonidas J. Guibas, and Hao Zhang, "Language-Driven Synthesis of 3D Scenes Using Scene Databases", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 37, No. 6, Article 212, 2018. [Project page \| bibtex] We introduce a novel framework for using natural language to generate and edit 3D indoor scenes, harnessing scene semantics and text-scene grounding knowledge learned from large annotated 3D scene databases. The advantage of natural language editing interfaces is strongest when performing semantic operations at the sub-scene level, acting on groups of objects. We learn how to manipulate these sub-scenes by analyzing existing 3D scenes. We perform edits by first parsing a natural language command from the user and trans- forming it into a semantic scene graph that is used to retrieve corresponding sub-scenes from the databases that match the command. We then augment this retrieved sub-scene by incorporating other objects that may be implied by the scene context. Finally, a new 3D scene is synthesized by aligning the augmented sub-scene with the user’s current scene, where new objects are spliced into the environment, possibly triggering appropriate adjustments to the existing scene arrangement.
	7. Xuelin Chen, Honghua Li, Chi-Wing Fu, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen, "3D Fabrication with Universal Building Blocks and Pyramidal Shells", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 37, No. 6, Article 189, 2018. [Project page \| bibtex] We introduce a computational solution for cost-efficient 3D fabrication using universal building blocks. Our key idea is to employ a set of universal blocks, which can be massively prefabricated at a low cost, to quickly assemble and constitute a significant internal core of the target object, so that only the residual volume need to be 3D printed online. We further improve the fabrication efficiency by decomposing the residual volume into a small number of printing-friendly pyramidal pieces.
	6. Changqing Zou, Qian Yu, Ruofei Du, Haoran Mo, Yi-Zhe Song, Tao Xiang, Chengyi Gao, Baoquan Chen, and Hao Zhang, "SketchyScene: Richly-Annotated Scene Sketches", ECCV, 2018. [PDF \| bibtex] We contribute the first large-scale dataset of scene sketches, SketchyScene, with the goal of advancing research on sketch understanding at both the object and scene level. The dataset is created through a novel and carefully designed crowdsourcing pipeline, enabling users to efficiently generate large quantities realistic and diverse scene sketches. SketchyScene contains more than 29,000 scene-level sketches, 7,000+ pairs of scene templates and photos, and 11,000+ object sketches. All objects in the scene sketches have ground-truth semantic and instance masks. The dataset is also highly scalable and extensible, easily allowing augmenting and/or changing scene composition. We demonstrate the potential impact of SketchyScene by training new computational models for semantic segmentation of scene sketches and showing how the new dataset enables several applications including image retrieval, sketch colorization, editing, and captioning, etc. We will release the complete crowdsourced dataset to the community.
	5. Kangxue Yin, Hui Huang, Daniel Cohen-Or, and Hao Zhang, "P2P-NET: Bidirectional Point Displacement Net for Shape Transform", ACM Transactions on Graphics (Special Issue of SIGGRAPH), Vol. 37, No. 4, Article 152, 2018. [PDF \| arXiv \| bibtex] We introduce P2P-NET, a general-purpose deep neural network which learns geometric transformations between point-based shape representations from two domains, e.g., meso-skeletons and surfaces, partial and complete scans, etc. The architecture of the P2P-NET is that of a bi-directional point dis- placement network, which transforms a source point set to a prediction of the target point set with the same cardinality, and vice versa, by applying point-wise displacement vectors learned from data. P2P-NET is trained on paired shapes from the source and target domains, but without relying on point-to-point correspondences between the source and target point sets. The training loss combines two uni-directional geometric losses, each enforc- ing a shape-wise similarity between the predicted and the target point sets, and a cross-regularization term to encourage consistency between displace- ment vectors going in opposite directions.
	4. Ruizhen Hu, Zhihao Yan, Jingwen Zhang, Oliver van Kaick, Ariel Shamir, Hao Zhang, and Hui Huang, "Predictive and Generative Neural Networks for Object Functionality", ACM Transactions on Graphics (Special Issue of SIGGRAPH), Vol. 37, No. 4, Article 151, 2018. [arXiv \| Project Page \| bibtex] Humans can predict the functionality of an object even without any surroundings, since their knowledge and experience would allow them to "hallucinate" the interaction or usage scenarios involving the object. We develop predictive and generative deep convolutional neural networks to replicate this feat. Our networks are trained on a database of scene contexts, called interaction contexts, each consisting of a central object and one or more surrounding objects, that represent object functionalities. Given a 3D object in isolation, our functional similarity network (fSIM-NET), a variation of the triplet network, is trained to predict the functionality of the object by inferring functionality-revealing interaction contexts involving the object. fSIM-NET is complemented by a generative network (iGEN-NET) and a segmentation network (iSEG-NET). iGEN-NET takes a single voxelized 3D object and synthesizes a voxelized surround, i.e., the interaction context which visually demonstrates the object's functionalities. iSEG-NET separates the interacting objects into different groups according to their interaction types.
	3. Haisen Zhao, Hao Zhang, Shiqing Xin, Yuanmin Deng, Changhe Tu, Wenping Wang, Daniel Cohen-Or, and Baoquan Chen, "DSCarver: Decompose-and-Spiral-Carve for Subtractive Manufacturing", ACM Transactions on Graphics (Special Issue of SIGGRAPH), Vol. 37, No. 4, Article 137, 2018. [PDF \| bibtex] We present an automatic algorithm for subtractive manufacturing of freeform 3D objects using high-speed CNC machining. Our method decomposes the input object's surface into a small number of patches each of which is fully accessible and machinable by the CNC machine, in continuous fashion, under a fixed drill-object setup configuration. This is achieved by covering the input surface using a minimum number of accessible regions and then extracting a set of machinable patches from each accessible region. For each patch obtained, we compute a continuous, space-filling, and iso-scallop tool path, in the form of connected Fermat spirals, which conforms to the patch boundary. Furthermore, we develop a novel method to control the spacing of Fermat spirals based on directional surface curvature and adapt the heat method to obtain iso-scallop carving.
	2. Fenggen Yu, Yan Zhang, Kai Xu, Ali Mahdavi-Amiri, and Hao Zhang, "Semi-Supervised Co-Analysis of 3D Shape Styles from Projected Lines", ACM Transactions on Graphics, Vol. 37, No. 2, Article 21, 2018. (Awarded the Replicability Stamp!) [PDF \| arXiv \| Project page \| bibtex] We present a semi-supervised co-analysis method for learning 3D shape styles from projected feature lines, achieving style patch localization with only weak supervision. Given a collection of 3D shapes spanning multiple object categories and styles, we perform style co-analysis over projected feature lines of each 3D shape and then backproject the learned style features onto the 3D shapes.
	1. Manyi Li, Noa Fish, Lili Cheng, Changhe Tu, Daniel Cohen-Or, Hao Zhang, and Baoquan Chen, "Class-Sensitive Shape Dissimilarity Metric", Graphical Models, , 2018. [PDF \| bibtex] Shape dissimilarity is a fundamental problem with many applications such as shape exploration, retrieval, and classification. Given a collection of shapes, all existing methods develop a consistent global metric to compareand organize shapes. The global nature of the involved shape descriptors implies that overall shape appearanceis compared. These methods work well to distinguishshapes from different categories, but often fail for fine-grained classes within the same category. In this paper, we develop a dissimilarity metric for fine-grained classes by fusing together multiple distinctive metrics for different classes. The fused metric measures the dissimilarities among inter-class shapes by observing their unique traits.

2017

	7. Ruizhen Hu, Wenchao Li, Oliver van Kaick, Ariel Shamir, Hao Zhang, and Hui Huang, "Learning to Predict Part Mobility from a Single Static Snapshot", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 36, No. 6, Article 227, 2017. [PDF \| Project Page \| bibtex] We introduce a method for learning a model for the mobility of parts in 3D objects. Our method allows not only to understand the dynamic function- alities of one or more parts in a 3D object, but also to apply the mobility functions to static 3D models. Specifically, the learned part mobility model can predict mobilities for parts of a 3D object given in the form of a single static snapshot reflecting the spatial configuration of the object parts in 3D space, and transfer the mobility from relevant units in the training data ...
	6. Zhaoliang Lun, Changqing Zou (joint first author), Haibin Huang, Evangelos Kalogerakis, Ping Tan, Marie-Paule Cani, and Hao Zhang, "Learning to Group Discrete Graphical Patterns", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 36, No. 6, Article 225, 2017. [PDF \| Project page \| bibtex] We introduce a deep learning approach for grouping discrete patterns common in graphical designs. Our approach is based on a convolutional neural network architecture that learns a grouping measure defined over a pair of pattern elements. Motivated by perceptual grouping principles, the key feature of our network is the encoding of element shape, context, symmetries, and structural arrangements. These element properties are all jointly considered and appropriately weighted in our grouping measure ...
	5. Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong, "DualGAN: Unsupervised Dual Learning for Image-to-Image Translation", ICCV, also available at arXiv:1704.02510, 2017. [PDF \| bibtex] Conditional Generative Adversarial Networks (GANs) for cross-domain image-to-image translation have made much progress recently. Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN. However, human labeling is expensive, even impractical, and large quantities of data may not always be available. Inspired by dual learning from natural language translation, we develop a novel dual-GAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains. In our architecture, the primal GAN learns to translate images from domain U to those in domain V, while the dual GAN learns to invert the task. The closed loop made by the primal and dual tasks allows images from either domain to be translated and then reconstructed. Hence a loss function that accounts for the reconstruction error of images can be used to train the translators.
	4. Warunika Ranaweera, Parmit Chilana, Daniel Cohen-Or, and Hao Zhang, "ExquiMo: An Exquisite Corpse Tool for Co-creative 3D Shape Modeling", International Conference on Computer-Aided Design and Computer Graphics (CAD/Graphics), Zhangjiajie, China, August 25-27, 2017. One of three Best Paper Awards at the conference. [PDF \| bibtex] We introduce a shape modeling tool, ExquiMo, which is guided by the idea of improving the creativity of 3D shape designs through collaboration. Inspired by the game of Exquisite Corpse, our tool allocates distinct parts of a shape to multiple players who model the assigned parts in a sequence. Our approach is motivated by the understanding that effective surprise leads to creative outcomes. Hence, to maintain the surprise factor of the output, we conceal the previously modeled parts from the most recent player. Part designs from individual players are fused together to produce an often unexpected, hence creative, end result ...
	3. Chenyang Zhu, Renjiao Yi, Wallace Lira, Ibraheem Alhashim, Kai Xu, and Hao Zhang, "Deformation-Driven Shape Correspondence via Shape Recognition", ACM Transactions on Graphics (Special Issue of SIGGRAPH), Vol. 36, No. 4, Article 51, 2017. [Project page \| PDF reduced (1.7MB) \| bibtex] Many approaches to shape comparison and recognition start by establishing a shape correspondence. We ``turn the table'' and show that quality shape correspondences can be obtained by performing many shape recognition tasks. What is more, the method we develop computes a fine-grained, topology-varying part correspondence between two 3D shapes where the core evaluation mechanism only recognizes shapes globally. This is made possible by casting the part correspondence problem in a deformation-driven framework and relying on a data-driven ``deformation energy'' which rates visual similarity between deformed shapes and models from a shape repository. Our basic premise is that if a correspondence between two chairs (or airplanes, bicycles, etc.) is correct, then a reasonable deformation between the two chairs anchored on the correspondence ought to produce plausible, ``chair-like'' in-between shapes.
	2. Jun Li, Kai Xu, Siddhartha Chaudhuri, Ersin Yumer, Hao Zhang, Leonidas Guibas, "GRASS: Generative Recursive Autoencoders for Shape Structures", ACM Transactions on Graphics (Special Issue of SIGGRAPH), Vol. 36, No. 4, Article 52, 2017. One of six papers selected for press release at SIGGRAPH. [PDF \| arXiv \| Project page \| bibtex] We introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures. Our key insight is that 3D shapes are effectively characterized by their hierarchical organization of parts, which reflects fundamental intra-shape relationships such as adjacency and symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a flat, unlabeled, arbitrary part layout to a compact code. The code effectively captures the hierarchical structures of varying complexity despite being fixed-dimensional: an associated decoder maps a code back to a full hierarchy. The learned bidirectional mapping is further tuned using an adversarial setup to yield a generative model of plausible structures, from which novel structures can be sampled. Finally, our structure synthesis framework is augmented by a second trained module that produces fine-grained part geometry, conditioned on global and local structural context, leading to a full generative pipeline for 3D shapes.
	1. Ruizhen Hu, Wenchao Li, Oliver van Kaick, Hui Huang, Melinos Averkiou, Daniel Cohen-Or, and Hao Zhang, "Co-Locating Style-Defining Elements on 3D Shapes", ACM Transactions on Graphics (to be presented at SIGGRAPH), Vol. 36, No. 3, pp. 33:1-33:15, 2017. [PDF \| bibtex] We introduce a method for co-locating style-defining elements over a set of 3D shapes. Our goal is to translate high-level style descriptions, such as "Ming" or "European" for furniture models, into explicit and localized regions over the geometric models that characterize each style. For each style, the set of style-defining elements is defined as the union of all the elements that are able to discriminate the style. Another property of the style-defining elements is that they are frequently-occurring, reflecting shape characteristics that appear across multiple shapes of the same style ...

2016

	8. Rui Ma, Honghua Li, Changqing Zou, Zicheng Liao, Xin Tong, and Hao Zhang, "Action-Driven 3D Indoor Scene Evolution", ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 35, No. 6, Article 173, 2016. [Projet page \| bibtex] We introduce a framework for action-driven evolution of 3D indoor scenes, where the goal is to simulate how scenes are altered by human actions, and specifically, by object placements necessitated by the actions. To this end, we develop an action model with each type of action combining information about one or more human poses, one or more object categories, and spatial configurations of object-object and object-human relations for the action. Importantly, all these pieces of information are learned from annotated photos.
	7. Lei Li, Zhe Huang, Changqing Zou, Chiew-Lan Tai, Rynson W.H. Lau, Hao Zhang, Ping Tan, and Hongbo Fu, "Model-driven Sketch Reconstruction with Structure-oriented Retrieval", SIGGRAPH Asia Technical Brief, 2016. [PDF \| bibtex] We propose an interactive system that aims at lifting a 2D sketch into a 3D sketch with the help of existing models in shape collections. The key idea is to exploit part structure for shape retrieval and sketch reconstruction. We adopt sketch-based shape retrieval and develop a novel matching algorithm which considers structure in addition to traditional shape features.
	6. Zeinab Sadeghipour, Zicheng Liao, Ping Tan, and Hao Zhang, "Learning 3D Scene Synthesis from Annotated RGB-D Images", Computer Graphics Forum (Special Issue of SGP), Vol. 35, No. 5, pp. 197-206, 2016. [PDF \| bibtex] We present a data-driven method for synthesizing 3D indoor scenes by inserting objects progressively into an initial, possibly, empty scene. Instead of relying on few hundreds of hand-crafted 3D scenes, we take advantage of existing large-scale annotated RGB-D datasets, in particular, the SUN RGB-D database consisting of 10,000+ depth images of real scenes, to form the prior knowledge for our synthesis task. Our object insertion scheme follows a co-occurrence model and an arrangement model, both learned from the SUN dataset.
	5. Ruizhen Hu, Oliver van Kaick, Bojian Wu, Hui Huang, Ariel Shamir, and Hao Zhang, "Learning How Objects Function via Co-Analysis of Interactions", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 35, No. 4, Article 47, 2016. [PDF \| Projet page \| bibtex] We introduce a co-analysis method which learns a functionality model for an object category, e.g., strollers or backpacks. Like previous works on functionality, we analyze object-to-object interactions and intra-object properties and relations. Differently from previous works, our model goes beyond providing a functionalityoriented descriptor for a single object; it prototypes the functionality of a category of 3D objects by co-analyzing typical interactions involving objects from the category.
	4. Changqing Zou, Junjie Cao, Warunika Ranaweera, Ibraheem Alhashim, Ping Tan, Alla Sheffer, and Hao Zhang, "Legible Compact Calligrams", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 35, No. 4, Article 122, 2016. [PDF \| bibtex] A calligram is an arrangement of words or letters that creates a visual image, and a compact calligram fits one word into a 2D shape. We introduce a fully automatic method for the generation of legible compact calligrams which provides a balance between conveying the input shape, legibility, and aesthetics.
	3. Haisen Zhao, Fanglin Gu, Qi-Xing Huang, Jorge Garcia, Yong Chen, Changhe Tu, Bedrich Benes, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen, "Connected Fermat Spirals for Layered Fabrication", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 35, No. 4, Article 100, 2016. [PDF \| Project page \| bibtex] We develop a new kind of "space-filling" curves, connected Fermat spirals, and show their compelling properties as a tool path fill pattern for layered fabrication. Unlike classical space-filling curves such as the Peano or Hilbert curves, which constantly wind and bind to preserve locality, connected Fermat spirals are formed mostly by long, low-curvature paths. This geometric property, along with continuity, influences the quality and efficiency of layered fabrication.
	2. Lili Wan, Changqing Zou, and Hao Zhang, "Full and Partial Shape Similarity through Sparse Descriptor Reconstruction", The Visual Computer, to appear, 2016. [PDF \| bibtex] We introduce a novel approach to measure similarity between two 3D shapes based on sparse reconstruction of shape descriptors. The main feature of our approach is its applicability to handle incomplete shapes. We characterize the shapes by learning a sparse dictionary from their local descriptors. The similarity between two shapes A and B is defined by the error incurred when reconstructing B's descriptor set using the basis signals from A’s dictionary.
	1. Daniel Cohen-Or and Hao Zhang, "From inspired modeling to creative modeling", Visual Computer (invited paper), Vol. 32, No. 1, 2016. [PDF \| bibtex] An intriguing and reoccurring question in many branches of computer science is whether machines can be creative, like humans. In this exploratory paper, we examine the problem from a computer graphics, and more specifically, geometric modeling, perspective. We focus our discussions on the weaker but still intriguing question: "Can machines assist or inspire humans in a creative endeavor for the generation of geometric forms?"

2015

	11. Ibraheem Alhashim, Kai Xu, Yixin Zhuang, Junjie Cao, Patricio Simari, and Hao Zhang, "Deformation-Driven Topology-Varying 3D Shape Correspondence", ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 34, No. 6, Article 236, 2015. [PDF \| Project page \| bibtex] We present a deformation-driven approach to topology-varying 3D shape correspondence. In this paradigm, the best correspondence between two shapes is the one that results in a minimal-energy, possibly topology-varying, deformation that transforms one shape to conform to the other while respecting the correspondence. Our deformation model allows both geometric and topological operations such as part split, duplication, and merging ...
	10. Xuelin Chen, Hao Zhang, Jinjie Lin, Ruizhen Hu, Lin Lu, Qixing Huang, Bedrich Benes, Daniel Cohen-Or, and Baoquan Chen, "Dapper: Decompose-and-Pack for 3D Printing", ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 34, No. 6, Article 213, 2015. [PDF \| Project page \| bibtex] We pose the decompose-and-pack or DAP problem, which tightly combines shape decomposition and packing. While in general, DAP seeks to decompose an input shape into a small number of parts which can be efficiently packed, our focus is geared towards 3D printing. The goal is to optimally decompose-and-pack a 3D object into a printing volume to minimize support material, build time, and assembly cost. We present Dapper, a global optimization algo- rithm for the DAP problem which can be applied to both powder- and FDM-based 3D printing.
	9. Yang Zhou, Kangxue Yin, Hui Huang, Hao Zhang, Minglun Gong, and Daniel Cohen-Or, "Generalized Cylinder Decomposition", ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 34, No. 6, Article 171, 2015. [PDF \| Project page \| bibtex] Decomposing a complex shape into geometrically simple primitives is a fundamental problem in geometry processing. We are interested in a shape decomposition problem where the simple primitives sought are generalized cylinders. We introduce a quantitative measure of cylindricity for a shape part and develop a cylindricity-driven optimization algorithm, with a global objective function, for generalized cylinder decomposition.
	8. Ruizhen Hu, Chenyang Zhu, Oliver van Kaick, Ligang Liu, Ariel Shamir, and Hao Zhang, "Interaction Context (ICON): Towards a Geometric Functionality Descriptor", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 34, No. 4, Article 83, 2015. [PDF \| Project page \| bibtex] We introduce a contextual descriptor which aims to provide a geometric description of the functionality of a 3D object in the context of a given scene. Differently from previous works, we do not regard functionality as an abstract label or represent it implicitly through an agent. Our descriptor, called interaction context or ICON for short, explicitly represents the geometry of object-to-object interactions. Our approach to object functionality analysis is based on the key premise that functionality should mainly be derived from interactions between objects and not objects in isolation.
	7. Honghua Li, Ruizhen Hu (co-first author), Ibraheem Alhashim, and Hao Zhang, "Foldabilizing Furniture", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 34, No. 4, Article 90, 2015. [PDF \| bibtex] We introduce the foldabilization problem for space-saving furniture design. Namely, given a 3D object representing a piece of furniture, the goal is to apply a minimum amount of modification to the object so that it can be folded to save space —-- the object is thus foldabilized. We focus on one instance of the problem where folding is with respect to a prescribed folding direction and allowed object modifications include hinge insertion and part shrinking. We develop an automatic algorithm for foldabilization by formulating and solving a nested optimization problem ...
	6. Lili Wan, Jingyu Jiang, and Hao Zhang, "Incomplete 3D Shape Retrieval via Sparse Dictionary Learning", Pacific Graphics (short paper), 2015. [PDF \| bibtex] In this paper, we are interested in the problem of 3D shape retrieval where the query shape is incomplete with moderate to significant portions of the original shape missing. The key idea of our method is to grasp the basis local descriptors for each shape in the retrieved database by sparse dictionary learning and apply them in sparsely coding the local descriptors of an incomplete query
	5. Honghua Li and Hao Zhang, "Shape Compaction", in Perspectives in Shape Analysis, Dagstuhl Seminar, editors: M. Breu, A. Bruckstein, P. Maragos, and S. Wuhrer, to appear, 2015. [PDF \| bibtex] We cover techniques designed for compaction of shape representations or shape configurations. The goal of compaction is to reduce storage space, a fundamental problem in many application domains. Compaction of shape representations focuses on reducing the memory space allocated for storing the shape geometry data digitally, whilst shape compaction techniques in the physical domain reduce the physical space occupied by shape configurations ...
	4. Daniel Cohen-Or, Chen Greif, Tao Ju, Niloy J. Mitra, Ariel Shamir, Olga Sorkine-Hornung, and Hao Zhang, A Sampler of Useful Computational Tools for Applied Geometry, Computer Graphics, and Image Processing, CRC Press, 2015. A Sampler of Useful Computational Tools for Applied Geometry, Computer Graphics, and Image Processing shows how to use a collection of mathematical techniques to solve important problems in applied mathematics and computer science areas. The book discusses fundamental tools in analytical geometry ...
	3. Qian Zheng, Zhuming Hao, Hui Huang, Kai Xu, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen, "Skeleton-Intrinsic Symmetrization of Shapes", Computer Graphics Forum (Special Issue of Eurographics), Vol. 34, No. 2, pp. 275-286, 2015. [PDF \| bibtex] Enhancing the self-symmetry of a shape is of fundamental aesthetic virtue. In this paper, we are interested in recov- ering the aesthetics of intrinsic reflection symmetries, where an asymmetric shape is symmetrized while keeping its general pose and perceived dynamics. The key challenge to intrinsic symmetrization is that the input shape has only approximate reflection symmetries, possibly far from perfect. The main premise of our work is that curve skeletons provide a concise and effective shape abstraction for analyzing approximate intrinsic symmetries as well as symmetrization. By measuring intrinsic distances over a curve skeleton for symmetry analysis, symmetrizing the skeleton, and then propagating the symmetrization from skeleton to shape, our approach to shape symmetrization is skeleton-intrinsic ...
	2. Hadar Averbuch-Elor, Yunhai Wang, Yiming Qian, Minglun Gong, Johannes Kopf, Hao Zhang, and Daniel Cohen-Or, "Distilled Collections from Textual Image Queries", Computer Graphics Forum (Special Issue of Eurographics), Vol. 34, No. 2, pp. 131-142, 2015. [PDF \| bibtex] We present a distillation algorithm which operates on a large, unstructured, and noisy collection of internet images returned from an online object query. We introduce the notion of a distilled set, which is a clean, coherent, and structured subset of inlier images. In addition, the object of interest is properly segmented out throughout the distilled set. Our approach is unsupervised, built on a novel clustering scheme, and solves the distillation and object segmentation problems simultaneously. In essence, instead of distilling the collection of images, we distill a collection of loosely cutout foreground “shapes”, which may or may not contain the queried object. Our key observation, which motivated our clustering scheme, is that outlier shapes are expected to be random in nature, whereas, inlier shapes, which do tightly enclose the object of interest, tend to be well supported by similar shapes captured in similar views ...
	1. Zhenbao Liu, Caili Xie, Shuhui Bu, Xiao Wang, and Hao Zhang, "Indirect Shape Analysis for 3D Shape Retrieval", Computer & Graphics (Special Issue of SMI 2014), Vol. 46, pp. 110-116, 2015. [PDF \| bibtex] We introduce indirect shape analysis, or ISA, where a given shape is analyzed not based on geometric or topological features computed directly from the shape itself, but by studying how external agents interact with the shape. The potential benefits of ISA are two-fold. First, agent-object interactions often reveal an object’s function, which plays a key role in shape understanding. Second, compared to direct shape analysis, ISA, which utilizes pre-selected agents, is less affected by imperfections of, or inconsistencies between, the geometry or topology of the analyzed shapes. We employ digital human models as the external agents and develop a prototype ISA scheme for 3D shape classification and retrieval ...

2014

	6. Ruizhen Hu, Honghua Li, Hao Zhang, and Daniel Cohen-Or "Approximate Pyramidal Shape Decomposition", ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 33, No. 6, Article 213, 2014. [Project page \| PDF \| bibtex] A shape is pyramidal if it has a flat base with the remaining boundary forming a height function over the base. Pyramidal shapes are optimal for molding, casting, and layered 3D printing. We introduce an algorithm for approximate pyramidal shape decomposition. The general exact pyramidal decomposition problem is NP-hard. We turn this problem into an NP-complete Exact Cover Problem which admits a practical solution ... Our solution is equally applicable to 2D or 3D shapes, to shapes with polygonal or smooth boundaries, with or without holes ...
	5. Kangxue Yin, Hui Huang, Hao Zhang, Minglun Gong, Daniel Cohen-Or, and Baoquan Chen, "Morfit: Interactive Surface Reconstruction from Incomplete Point Clouds with Curve-Driven Topology and Geometry Control", ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 33, No. 6, Article 202, 2014. [Project page \| PDF (lowres 2MB) \| Code \| bibtex] We present an interactive technique for surface reconstruction from incomplete and sparse scans of 3D objects possessing sharp features ... We factor 3D editing by the user into two "orthogonal" interactions acting on skeletal and profile curves of the underlying shape, controlling its topology and geometric features, respectively. For surface completion, we introduce a novel skeleton-driven morph-to-fit, or morfit, scheme which reconstructs the shape as an ensemble of generalized cylinders. Morfit is a hybrid operator which optimally interpolates between adjacent curve profiles (the "morph") and snaps the surface to input points (the "fit") ...
	4. Ibraheem Alhashim, Honghua Li, Kai Xu, Junjie Cao, Rui Ma, and Hao Zhang, "Topology-Varying 3D Shape Creation via Structural Blending", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 33, No. 4, Article 158, 2014. [Project page \| Code \| bibtex] We introduce an algorithm for generating novel 3D models via topology-varying shape blending. Given a source and a target shape, our method blends them topologically and geometrically, producing continuous series of in-betweens as new shape creations. The blending operations are defined on a spatio-structural graph composed of medial curves and sheets. Such a shape abstraction is structure-oriented, part-aware, and facilitates topology manipulations. Fundamental topological operations including split and merge are realized by allowing one-to-many correspondences between the source and the target ...
	3. Kai Xu, Rui Ma, Hao Zhang, Chenyang Zhu, Ariel Shamir, Daniel Cohen-Or, and Hui Huang, "Organizing Heterogeneous Scene Collections through Contextual Focal Points", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 33, No. 4, Article 35, 2014. [Project page \| bibtex] We introduce focal points for characterizing, comparing, and organizing collections of complex and heterogeneous data and apply the concepts and algorithms developed to collections of 3D indoor scenes. We represent each scene by a graph of its constituent objects and define focal points as representative substructures in a scene collection. To organize a heterogenous scene collection, we cluster the scenes based on a set of extracted focal points: scenes in a cluster are closely connected when viewed from the perspective of the representative focal points of that cluster ... The problem of focal point extraction is intermixed with the problem of clustering groups of scenes based on their representative focal points. We present a co-analysis algorithm ...
	2. Xiaowu Chen, Dongqing Zou, Jianwei Li, Xiaochun Cao, Qinping Zhao, and Hao Zhang, "Sparse Dictionary Learning for Edit Propagation of High-resolution Images", CVPR, pp. 2854-2861, 2014. [PDF \| bibtex] We introduce the use of sparse representation for edit propagation of high-resolution images or video. Previous approaches for edit propagation typically employ a global optimization over the whole set of image pixels, incurring a prohibitively high memory and time consumption for high-resolution images. Rather than propagating an edit pixel by pixel, we follow the principle of sparse representation to obtain a compact set of representative samples (or features) and perform edit propagation on the samples instead ...
	1. Hui Wang, Patricio Simari, Zhixun Su, and Hao Zhang, "Spectral Global Intrinsic Symmetry Invariant Functions", Proc. of Graphics Interface, pp. 209-215, 2014. [Project page \| bibtex] We introduce spectral Global Intrinsic Symmetry Invariant Functions (GISIFs), a class of GISIFs obtained via eigendecomposition of the Laplace-Beltrami operator on compact Riemannian manifolds. We discretize the spectral GISIFs for 2D manifolds approximated either by triangle meshes or point clouds. In contrast to GISIFs obtained from geodesic distances, our spectral GISIFs are robust to local topological changes. Additionally, for symmetry analysis our spectral GISIFs can be viewed as generalizations of the classical Heat (HKSs) and Wave Kernel Signatures (WKSs), and, as such, represent a more expressive and versatile class of functions ...

2013

	11. Yunhai Wang, Minglun Gong, Tianhua Wang, Daniel Cohen-Or Hao Zhang, and Baoquan Chen, "Projective Analysis for 3D Shape Segmentation", ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 32, No. 6, Article 192, 2013. [PDF \| bibtex] We introduce projective analysis for semantic segmentation and labeling of 3D shapes. The analysis treats an input 3D shape as a collection of 2D projections, label each projection by transferring knowledge from existing labeled images, and back-projects and fuses the labelings on the 3D shape ... Projective analysis simplifies the processing task by working in a lower-dimensional space, circumvents the requirement of having complete and well-modeled 3D shapes, and addresses the data challenge for 3D shape analysis by leveraging the massive image data.
	10. Zhenbao Liu, Sicong Tang, Shuhui Bu, and Hao Zhang, "New Evaluation Metrics for Mesh Segmentation", Computer & Graphics (Special Issue of SMI), Vol. 37, No. 6, pp. 553-564, 2013. [PDF \| bibtex] The four metrics adopted by the well-known Princeton Segmentation Benchmark have been extensively applied to evaluate mesh segmentation algorithms. However, comparison to only a single ground-truth is problematic since one object may have multiple semantic segmentations. We propose two novel metrics to support comparison with multiple ground-truth mesh segmentations, which we call Similarity Hamming Distance (SHD) and Adaptive Entropy Increment (AEI) ...
	9. Hao Zhang, Kai Xu, Wei Jiang, Jinjie Lin, Daniel Cohen-Or, and Baoquan Chen, "Layered Analysis of Irregular Facades via Symmetry Maximization", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 32, No. 4, pp. 121:1-121:10, 2013. [PDF \| Project page \| bibtex] We present an algorithm for hierarchical and layered analysis of irregular facades, seeking a high-level understanding of facade structures. By introducing layering into the analysis, we no longer view a facade as a flat structure, but allow it to be structurally separated into depth layers, enabling more compact and natural interpretations of building facades. Computationally, we perform a symmetry-driven search for an optimal hierarchical decomposition defined by split and layering operations applied to an input facade. The objective is symmetry maximization ...
	8. Oliver van Kaick, Kai Xu, Hao Zhang, Yanzhen Wang, Shuyang Sun, Ariel Shamir, and Daniel Cohen-Or, "Co-Hierarchical Analysis of Shape Structures", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 32, No. 4, Article 69, 2013. [PDF \| bibtex] We introduce an unsupervised co-hierarchical analysis of a set of shapes, aimed at discovering their hierarchical part structures and revealing relations between geometrically dissimilar yet functionally equivalent shape parts across the set. The core problem is that of representative co-selection. For each shape in the set, one representative hierarchy (tree) is selected from among many possible interpretations of the hierarchical structure of the shape. Collectively, the selected tree representatives maximize the within-cluster structural similarity among them.
	7. Shi-Sheng Huang, Ariel Shamir, Chao-Hui Shen, Hao Zhang, Alla Sheffer, Shi-Min Hu, and Daniel Cohen-Or, "Qualitative Organization of Collections of Shapes via Quartet Analysis", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 32, No. 4, pp. 71:1-10, 2013. [Project page \| bibtex] We present a method for organizing a heterogeneous collection of 3D shapes for overview and exploration. Instead of relying on quantitative distances, which may become unreliable between dissimilar shapes, we introduce a qualitative analysis which utilizes multiple distance measures but only in cases where the measures can be reliably compared. Our analysis is based on the notion of quartets, each defined by two pairs of shapes, where the shapes in each pair are close to each other, but far apart from the shapes in the other pair.
	6. Hui Huang, Shihao Wu, Daniel Cohen-Or, Minglun Gong, Hao Zhang, Guiqing Li, and Baoquan Chen, "L1-Medial Skeleton of Point Cloud", ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 32, No. 4, Article 65, 2013. SIGGRAPH 2025 Test-of-Time Award! [PDF \| Project page \| bibtex] We introduce L1-medial skeleton as a curve skeleton representation for 3D point cloud data. The L1-median is well-known as a robust global center of an arbitrary set of points. We make the key observation that adapting L1-medians locally to a point set representing a 3D shape gives rise to a one-dimensional structure, which can be seen as a localized center of the shape ...
	5. Niloy Mitra, Michael Wand, Hao Zhang, Daniel Cohen-Or, and Martin Bokeloh, "Structure-Aware Shape Processing," Eurographics State-of-the-Art Report (STAR), 2013. [PDF \| bibtex] In this survey paper, we organize, summarize, and present the key concepts and methodological approaches towards efficient structure-aware shape processing. We discuss common models of structure, their implementation in terms of mathematical formalism and algorithms, and explain the key principles in the context of a number of state-of- the-art approaches. Further, we attempt to list the key open problems and challenges, both at the technical and at the conceptual level, to make it easier for new researchers to better explore and contribute to this topic.
	4. Wei Jiang, Kai Xu, Zhiquan Cheng, and Hao Zhang, "Skeleton-Based Intrinsic Symmetry Detection on Point Clouds," Graphical Models, Vol. 75, No. 4, pp. 177-188, 2013. [PDF \| bibtex] We present a skeleton-based algorithm for intrinsic symmetry detection on imperfect 3D point cloud data. The data imperfections such as noise and incompleteness make it difficult to reliably compute geodesic distances, We introduce L1-medial skeleton as a curve skeleton representation for 3D point cloud data. The L1-median is well-known as a robust global center of an arbitrary set of points. We make the key observation that adapting L1-medians locally to a point set representing a 3D shape gives rise to a one-dimensional structure, which can be seen as a localized center of the shape ...
	3. Oliver van Kaick, Hao Zhang, and Ghassan Hamarneh, "Bilateral Maps for Partial Matching" Computer Graphics Forum, Vol. 32, No. 6, pp. 189-200, 2013. [PDF \| bibtex] We introduce bilateral map, a local shape descriptor whose region of interest is defined by two feature points. Compared to the classical descriptor definition using single points, the bilateral approach exploits the use of a second point to place more constraints on the selection of the spatial context for feature analysis. This leads to a descriptor where the shape of the region of interest is anisotropic and adapts to the context of the two points, making it more refined for shape analysis, in particular, partial matching.
	2. Honghua Li, Hao Zhang, Yanzhen Wang, Junjie Cao, Ariel Shamir, and Daniel Cohen-Or, "Curve Style Analysis in a Set of Shapes," Computer Graphics Forum, Vol. 32, No. 6, pp. 77-88, 2013. [PDF \| bibtex] We pose the open question "how to extract styles from geometric shapes?" and address one instance of the problem. Speciﬁcally, we present an unsupervised algorithm for identifying curve styles in a set of shapes ...
	1. Hui Huang, Shihao Wu, Minglun Gong, Daniel Cohen-Or, Uri Ascher, and Hao Zhang, "Edge-Aware Point Set Resampling," ACM Trans. on Graphics (presented at SIGGRAPH 2013), Volume 32, Number 1, Article 9, 2013. [Project page with source code \| bibtex] We propose a resampling approach to process a noisy and possibly outlier-ridden point set in an edge-aware manner. Our key idea is to first resample away from the edges so that reliable normals can be computed at the samples, and then based on reliable data, we progressively resample the point set while approaching the edge singularities ...

2012

	10. Kai Xu, Hao Zhang, Wei Jiang, Ramsay Dyer, Zhiquan Cheng, Ligang Liu, and Baoquan Chen, "Multi-Scale Partial Intrinsic Symmetry Detection," ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 31, No. 6, Article 181, 2012. [PDF \| Project page (with data) \| bibtex] We present an algorithm for multi-scale partial intrinsic symmetry detection over 2D and 3D shapes, where the scale of a symmetric region is defined by intrinsic distances between symmetric points over the region. To identify prominent symmetric regions which overlap and vary in form and scale, we decouple scale extraction and symmetry extraction by performing two levels of clustering. First, significant symmetry scales are identified by clustering sample point pairs from an input shape ...
	9. Honghua Li, Ibraheem Alhashim, Hao Zhang, Ariel Shamir, and Daniel Cohen-Or, "Stackabilization," ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 31, No. 6, Article 158, 2012. [PDF \| Code \| bibtex] We introduce the geometric problem of stackabilization: how to geometrically modify a 3D object so that it is more amenable to stacking. Given a 3D object and a stacking direction, we define a measure of stackability, which is derived from the gap between the lower and upper envelopes of the object in a stacking configuration along the stacking direction. The main challenge in stackabilization lies in the desire to modify the object's geometry only subtly so that the intended functionality and aesthetic appearance of the original object are not significantly affected ...
	8. Hui Huang, Minglun Gong, Daniel Cohen-Or, Yaobin Ouyang, Fuwen Tao, and Hao Zhang, "Field-Guided Registration for Feature-Conforming Shape Composition," ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 31, No. 6, Article 179, 2012. [PDF \| bibtex] We present an automatic shape composition method to fuse two shape parts which may not overlap and possibly contain sharp features, a scenario often encountered when modeling man-made objects. At the core of our method is a novel field-guided approach to automatically align two input parts in a feature-conforming manner. The key to our field-guided shape registration is a natural continuation of one part into the ambient field as a means to introduce an overlap with the distant part, which then allows a surface-to-field registration ...
	7. Yunhai Wang, Shmulik Asafi, Oliver van Kaick, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen, "Active Co-Analysis of a Set of Shapes," ACM Trans. on Graphics (Special Issue of SIGGRAPH Asia), Vol. 31, No. 6, Article 165, 2012. [PDF \| Project page \| The Shape COSEG Dataset \| bibtex] We consider the use of a semi-supervised learning method where the user actively assists in the co-analysis by iteratively providing input that progressively constrains the system. We introduce a novel constrained clustering method based on a spring system which embeds elements to better respect their inter-distances in feature space together with the user given set of constraints. We also present an active learning method that suggests to the user where his input is likely to be the most effective in refining the results.
	6. Nima Aghdaii, Hamid Younesy, and Hao Zhang, "5-6-7 Meshes: Remeshing and Analysis" Computer & Graphics, extended version of GI'12 paper, Vol. 36, No. 8, pp. 1072-1083, 2012. [PDF \| bibtex] We introduce a new type of meshes called 5-6-7 meshes, analyze their properties, and present a 5-6-7 remeshing algorithm. A 5-6-7 mesh is a closed triangle mesh where each vertex has valence 5, 6, or 7. We prove that it is always possible to convert an arbitrary mesh into a 5-6-7 mesh. We present a remeshing algorithm which converts a closed triangle mesh with arbitrary genus into a 5-6-7 mesh which a) closely approximates the original mesh geometrically, e.g., in terms of feature preservation, and b) has a comparable vertex count as the original mesh.
	5. Andrea Tagliassachi, Ibraheem Alhashim, Matt Olson, and Hao Zhang, "Mean Curvature Skeletons," Computer Graphics Forum (Special Issue of SGP), Volume 31, Number 5, pp. 1735-1744, 2012. [PDF \| bibtex] We formulate the skeletonization problem via mean curvature flow (MCF). While the classical application of MCF is surface fairing, we take advantage of its area-minimizing characteristic to drive the curvature flow towards the extreme so as to collapse the input mesh geometry and obtain a skeletal structure. By analyzing the differential characteristics of the flow, we reveal that MCF locally increases shape anisotropy. This justifies the use of curvature motion for skeleton computation, and leads to the generation of what we call "mean curvature skeletons" ...
	4. Kai Xu, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen "Fit and Diverse: Set Evolution for Inspiring 3D Shape Galleries," ACM Trans. on Graphics (Special Issue of SIGGRAPH), Vol. 31, No. 4, pp. 57:1-57:10, 2012. [PDF (15 MB) \| bibtex] We introduce set evolution as a means for creative 3D shape modeling, where an initial population of 3D models is evolved to produce generations of novel shapes. Part of the evolving set is presented to a user as a shape gallery to offer modeling suggestions. User preferences define the fitness for the evolution so that over time, the shape population will mainly consist of individuals with good fitness. However, to inspire the user's creativity, we must also keep the evolving set diverse. Hence the evolution is ``fit and diverse'' ...
	3. Nima Aghdaii, Hamid Younesy, and Hao Zhang, "5-6-7 Meshes," Proc. of Graphics Interface, pp. 27-34, 2012. [PDF \| bibtex] A 5-6-7 mesh is a closed triangle mesh where each vertex has valence 5, 6, or 7. An intriguing question is whether it is always possible to convert an arbitrary mesh into a 5-6-7 mesh. In this paper, we answer the question in the positive. We present a 5-6-7 remeshing algorithm which converts any closed triangle mesh with arbitrary genus into a 5-6-7 mesh which a) closely approximates the original mesh geometrically, e.g., in terms of feature preservation, and b) has a comparable vertex count as the original mesh.
	2. Hui Wang, Zhixun Su, Jinjie Cao, Ye Wang, and Hao Zhang, "Empirical Mode Decomposition on Surfaces," Graphical Models (Special Issue of GMP), Vol. 74, No. 4, pp. 173-183, 2012. [bibtex] Empirical Mode Decomposition (EMD) is a powerful tool for the analysis of non-stationary and nonlinear signals, and has drawn a great deal of attention in various areas. In this paper, we generalize the classical EMD from Euclidean space to surfaces represented as triangular meshes. Inspired by the EMD, we also make a first step in using the extremal envelope method for feature-preserving smoothing.
	1. Ibraheem Alhashim, Hao Zhang, and Ligang Liu, "Detail-Replicating Shape Stretching," the Visual Computer, Vol. 28, No. 12, pp. 1153-1166, 2012. [PDF \| Video \| Code \| bibtex] We propose a simple and efficient method that helps create model variations by applying non-uniform stretching on 3D models with organic geometric details. The method replicates the geometric details and synthesizes extensions by adopting texture synthesis techniques on surface details.

2011

	9. Oana Sidi, Oliver van Kaick, Yanir Kleiman, Hao Zhang, and Daniel Cohen-Or, "Unsupervised Co-Segmentation of a Set of Shapes via Descriptor-Space Spectral Clustering," ACM Trans. on Graphics (Proceeding of SIGGRAPH Asia 2011), Volume 30, Number 6, Article 126, 2011. [PDF (11 MB) \| bibtex] We introduce an algorithm for unsupervised co-segmentation of a set of shapes so as to reveal the semantic shape parts and establish their correspondence across the set. Our algorithm exploits a key enabling feature of the input set, namely, dissimilar parts may be ``linked'' through third-parties present in the set ...
	8. Jinjie Lin, Daniel Cohen-Or, Hao Zhang, Cheng Liang, Andrei Sharf, Oliver Deussen, and Baoquan Chen, "Structure-Preserving Retargeting of Irregular 3D Architecture," ACM Trans. on Graphics (Proceeding of SIGGRAPH Asia 2011), Volume 30, Number 6, Article 183, 2011. [PDF \| Highres PDF (29MB) \| bibtex] We present an algorithm for interactive structure-preserving retargeting of irregular 3D architecture models, offering the modeler an easy-to-use tool to quickly generate a variety of 3D models that resemble an input piece in its structural style ...
	7. Andrea Tagliassachi, Matt Olson, Hao Zhang, Ghassan Hamarneh, and Daniel Cohen-Or, "VASE: Volume-Aware Surface Evolution for Surface Reconstruction from Incomplete Point Clouds," Computer Graphics Forum (Special Issue of SGP), Volume 30, Number 5, pp. 1563-1571, 2011. [PDF \| bibtex] Objects with many concavities are difficult to acquire using laser scanners. The resulting point scan typically suffers from large amounts of missing data. We introduce weak volumetric priors which assume that the volume of a shape varies smoothly and that each point cloud sample is visible from outside the shape. Specifically, the union of view-rays given by the scanner implicitly carves the exterior volume, while volumetric smoothness regularizes the internal volume.
	6. Kai Xu, Hanlin Zheng, Hao Zhang, Daniel Cohen-Or, Ligang Liu, and Yueshan Xiong, "Photo-Inspired Model-Driven 3D Object Modeling," ACM Trans. on Graphics (Proceedings of SIGGRAPH 2011), Volume 30, Number 4, pp. 80:1-80:10, 2011. [PDF \| bibtex] We introduce an algorithm for 3D object modeling where the user draws creative inspiration from an object captured in a single photograph. Our method leverages the rich source of photographs for creative 3D modeling. However, with only a photo as a guide, creating a 3D model from scratch is a daunting task. We support the modeling process by utilizing an available set of 3D candidate models. Specifically, the user creates a digital 3D model as a geometric variation from a 3D candidate.
	5. Matt Olson, Ramsay Dyer, Hao Zhang, and Alla Sheffer, "Point Set Silhouettes via Local Reconstruction," Computer & Graphics (Special Issue of SMI 2011), Volume 35, Number 3, pp. 500-509, 2011. [PDF (4MB) \| bibtex] We present an algorithm to compute the silhouette set of a point cloud. Previous methods extract point set silhouettes by thresholding point normals, which can lead to simultaneous over- and under-detection of silhouettes. We argue that additional information such as surface curvature is necessary to resolve these issues. To this end, we develop a local reconstruction scheme using Gabriel and intrinsic Delaunay criteria and defi?ne point set silhouettes based on the notion of a silhouette generating set ...
	4. Yanzhen Wang, Kai Xu, Jun Li, Hao Zhang, Ariel Shamir, Ligang Liu, Zhiquan Cheng, and Yueshan Xiong, "Symmetry Hierarchy of Man-Made Objects," Computer Graphics Forum (Special Issue of Eurographics 2011), Volume 30, Number 2, pp. 287-296, 2011. [Project page \| PDF (14MB) \| PDF reduced (500K) \| bibtex] We introduce symmetry hierarchy of man-made objects, a high-level structural representation of a 3D model providing a symmetry-induced, hierarchical organization of the model's constituent parts. We show that symmetry hierarchy naturally implies a hierarchical segmentation that is more meaningful than those produced by local geometric considerations. We also develop an application of symmetry hierarchies for structural shape editing.
	3. Oliver van Kaick, Andrea Tagliasacchi, Oana Sidi, Hao Zhang, Daniel Cohen-Or, Lior Wolf, and Ghassan Hamarneh, "Prior Knowledge for Part Correspondence," Computer Graphics Forum (Special Issue of Eurographics 2011), Volume 30, Number 2, pp. 553-562, 2011. [PDF (10 MB) \| PDF reduced \| bibtex] We stipulate that under challenging scenarios, shape correspondence by humans involves recognition of the shape parts where prior knowledge on the parts would play a more dominant role than geometric similarity. We introduce an approach to part correspondence which incorporates prior knowledge and combines the knowledge with content-driven analysis based on geometric similarity between the matched shapes ...
	2. Oliver van Kaick, Hao Zhang, Ghassan Hamarneh, Daniel Cohen-Or, "A Survey on Shape Correspondence," Computer Graphics Forum (extended version of Eurographics STAR), Volume 30, Number 6, pp. 1681-1707, 2011. [PDF \| bibtex] We review methods that are designed to compute correspondences between geometric shapes represented by triangle meshes, contours, or point sets. This survey is motivated in part by some recent developments in space-time registration, where one seeks to correspond non-rigid and time-varying surfaces, and semantic shape analysis, which underlines a recent trend to incorporate shape understanding into the analysis pipeline ...
	1. Joe Kahlert, Matt Olson, and Hao Zhang, "Width-Bounded Geodesic Strips for Surface Tiling," The Visual Computer, Vol. 27, No. 1, pp. 45-56, 2011. [PDF \| bibtex] We present an algorithm for computing families of geodesic curves over an open mesh patch to partition the patch into strip-like segments. Specifically, the segments can be well approximated using strips obtained by trimming long, rectangular pieces of material possessing a prescribed width. We call this width-bounded geodesic strip tiling of a curved surface, a problem with practical applications such as the surfacing of curved roofs.

2010

	10. Kai Xu, Honghua Li, Hao Zhang, Daniel Cohen-Or, Yueshan Xiong, and Zhiquan Cheng, "Style-Content Separation by Anisotropic Part Scales," ACM Trans. on Graphics (Proceeding of SIGGRAPH Asia 2010), Volume 29, Number 6, pp. 184:1-184:10, 2010. [PDF (10MB) \| Project page \| bibtex] We perform co-analysis of a set of man-made 3D objects to allow the creation of novel instances derived from the set. We analyze the objects at the part level and treat the anisotropic part scales as a shape style. The co-analysis then allows style transfer to synthesize new objects. The key to co-analysis is part correspondence, where a major challenge is the handling of large style variations and diverse geometric content in the shape set. We propose style-content separation as a means to address this challenge ...
	9. Shy Shalom, Ariel Shamir, Hao Zhang, and Daniel Cohen-Or, "Cone Carving for Surface Reconstruction," ACM Trans. on Graphics (Proceeding of SIGGRAPH Asia 2010), Volume 29, Number 6, Article 150, 2010. [PDF \| bibtex] We present cone carving, a novel space carving technique towards topologically correct surface reconstruction from an incomplete scanned point cloud. The technique utilizes the point samples not only for local surface position estimation but also to obtain global visibility information under the assumption that each acquired point is visible from a point laying outside the shape. This enables associating each point with a generalized cone, called the visibility cone, that carves a portion of the outside ambient space of the shape from the inside out.
	8. Yotam Livny, Feilong Yan, Matt Olson, Baoquan Chen, Hao Zhang, and Jihad El-Sana, "Automatic Reconstruction of Tree Skeletal Structures from Point Clouds," ACM Trans. on Graphics (Proceeding of SIGGRAPH Asia 2010), Volume 29, Number 6, Article 151, 2010. [PDF (20MB) \| PDF reduced (64K) \| bibtex] In this paper, we perform active laser scanning of real world vegetation and present an automatic approach that robustly reconstructs skeletal structures of trees, from which full geometry can be generated. The core of our method is a series of {\it global optimizations} that fit skeletal structures to the often sparse, incomplete, and noisy point data. A significant benefit of our approach is its ability to reconstruct multiple overlapping trees simultaneously without segmentation.
	7. Liangliang Nan, Andrei Sharf, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen, "SmartBoxes for Interactive Urban Reconstruction," ACM Trans. on Graphics (Proceeding of SIGGRAPH 2010), Volume 29, Number 4, Article 93, 2010. [PDF \| Highres PDF (17MB) \| bibtex] We introduce an interactive tool which enables a user to quickly assemble an architectural model directly over a 3D point cloud acquired from large-scale scanning of an urban scene. The user loosely defines and manipulates simple building blocks, which we call SmartBoxes, over the point samples. These boxes quickly snap to their proper locations to conform to common architectural structures. The key idea is that the building blocks are smart ...
	6. Lior Shapira, Shy Shalom, Ariel Shamir, Daniel Cohen-Or, and Hao Zhang, "Contextual Part Analogies in 3D Objects," International Journal of Computer Vision, Vol. 89, No. 1-2, pp. 309-326, 2010. [PDF \| bibtex] We address the problem of finding analogies between parts of 3D objects. By partitioning an object into meaningful parts and finding analogous parts in other objects, not necessarily of the same type, based on a contextual signature, many analysis and modeling tasks could be enhanced ...
	5. Junjie Cao, Andrea Tagliasacchi, Matt Olson, Hao Zhang, and Zhixun Su, "Point Cloud Skeletons via Laplacian-Based Contraction," Proc. of IEEE SMI, pp. 187-197, 2010. [PDF \| Project and code page \| bibtex] We present an algorithm for curve skeleton extraction via Laplacian-based contraction. Our algorithm can be applied to surfaces with boundaries, polygon soups, and point clouds. We develop a contraction operation that is designed to work on generalized discrete geometry data, particularly point clouds, via local Delaunay triangulation and topological thinning ...
	4. Hao Zhang, Oliver van Kaick, and Ramsay Dyer, "Spectral Mesh Processing," (revised and extended version of Eurographics 2007 STAR report) Computer Graphics Forum, Volume 29, Number 6, pp. 1865-1894, 2010. [PDF \| bibtex] We provide the first comprehensive survey on spectral mesh processing. Spectral methods for mesh processing and analysis rely on eigenvalues, eigenvectors, or eigenspace projections derived from appropriately defined mesh operators to carry out desired tasks ...
	3. Oliver van Kaick, Aaron Ward, Ghassan Hamarneh, Mark Schweitzer, and Hao Zhang, "Learning Fourier Descriptors for Computer-Aided Diagnosis of the Supraspinatus," Academic Radiology, Vol. 17, No. 8, pp. 1040-1049, 2010. [PDF \| bibtex] Supraspinatus muscle disorders are frequent and debilitating, resulting in pain and a limited range of shoulder motion. The gold standard for diagnosis involves an invasive surgical procedure ... we present a method to classify 3D shapes of the muscle into the relevant pathology groups, based on MRIs. The method learns the Fourier coefficients that best distinguish the different classes ...
	2. Oliver van Kaick, Hao Zhang, Ghassan Hamarneh, Daniel Cohen-Or, "A Survey on Shape Correspondence," Eurographics 2010 State-of-the-Art Report, TBA. [PDF \| bibtex] We present a review of the correspondence problem targeted towards the computer graphics audience. This survey is motivated by recent developments such as advances in the correspondence of non-rigid or isometric shapes and methods that extract semantic information from the shapes ...
	1. Qian Zheng, Andrei Sharf, Andrea Tagliasacchi, Baoquan Chen, Hao Zhang, Alla Sheffer, Daniel Cohen-Or, "Consensus Skeleton for Non-Rigid Space-Time Registration," Computer Graphics Forum (Proceeding of Eurographics 2010), Volume 29, Number 2, pp. 635-644, 2010. [PDF \| Slides \| bibtex] We introduce the notion of consensus skeletons for non-rigid space-time registration of a deforming shape. Instead of basing the registration on point features, which are local and sensitive to noise, we adopt the curve skeleton of the shape as a global and descriptive feature for the task. Our method uses no template and only assumes that the skeletal structure of the captured shape remains largely consistent over time ...

2009

	10. Kai Xu, Hao Zhang, Andrea Tagliasacchi, Ligang Liu, Guo Li, Min Meng, and Yueshan Xiong, "Partial Intrinsic Reflectional Symmetry of 3D Shapes," ACM Trans. on Graphics (Proceeding of SIGGRAPH Asia 2009), Article 138. [PDF (16 MB) \| PDF (reduced size: 7 MB) \| Project page \| bibtex] While many 3D objects around us exhibit various forms of global symmetries, prominent intrinsic symmetries which exist only on parts of an object are also well recognized ... In this paper, we introduce algorithms to extract and utilize partial intrinsic reflectional symmetries (PIRS) of a 3D shape ...
	9. Hui Huang, Dan Li, Hao Zhang, Uri Ascher, and Daniel Cohen-Or, "Consolidation of Unorganized Point Clouds for Surface Reconstruction," ACM Trans. on Graphics (Proceeding of SIGGRAPH Asia 2009), Article 176. [PDF (8 MB) \| PDF (reduced size: 2 MB) \| bibtex] We consolidate an unorganized point cloud with noise, outliers, non-uniformities, and interference between close-by surface sheets as a preprocess to surface generation ... First, we present a weighted locally optimal projection operator ... Next, we introduce an iterative framework for robust normal estimation, ...
	8. Kai Xu, Daniel Cohen-Or, Tao Ju, Ligang Liu, Hao Zhang, Shizhe Zhou, and Yueshan Xiong, "Feature-Aligned Shape Texturing," ACM Trans. on Graphics (Proceeding of SIGGRAPH Asia 2009), Article 108. [PDF (20 MB) \| PDF (reduced size: 10 MB) \| Project page \| Source Code \| bibtex] We explore the use of salient curves in synthesizing natural-looking, shape-revealing textures on surfaces. Our synthesis is guided by two principles: matching the direction of the texture patterns to those of the salient curves, and aligning the prominent feature lines in the texture to the salient curves exactly ...
	7. Ramsay Dyer, Hao Zhang, and Torsten Moeller, "Gabriel meshes and Delaunay edge flips," Proc. of SIAM/ACM Joint Conf. on Geometric and Physical Modeling (GPM), pp. 295-300, 2009. [PDF \| extended version with more proofs \| bibtex] We undertake a study of the local properties of 2-Gabriel meshes. We show that, under mild constraints on the dihedral angles, such meshes are Delaunay meshes. The analysis is done by means of the Delaunay edge flipping algorithm and it reveals the details of the distinction between these two mesh structures ...
	6. Andrea Tagliasacchi, Hao Zhang, and Daniel Cohen-Or, "Curve Skeleton Extraction from Incomplete Point Cloud," ACM Trans. on Graphics (Proceeding of SIGGRAPH 2009), Volume 28, Number 3, Article 71, 9 pages, DOI = 10.1145/1531326.1531377. [PDF \| Project page \| bibtex] We present an algorithm for curve skeleton extraction from imperfect point clouds where large portions of the data may be missing. Our construction is primarily based on a novel notion of generalized rotational symmetry axis (ROSA) of a point set with normals, via a variational formulation ...
	5. Kai Xu, Hao Zhang, Daniel Cohen-Or, and Yueshan Xiong, "Dynamic Harmonic Fields for Surface Processing," Computers and Graphics (Special Issue of SMI), Vol. 33, pp. 391-398, 2009. [PDF \| bibtex] We propose a method for fast updating of harmonic fields defined on polygonal meshes, enabling real-time insertion and deletion of constraints. Our approach utilizes the penalty method to enforce constraints in harmonic field computation. It maintains the symmetry of the Laplacian system ...
	4. Xiaoxing Li, Tao Jia, and Hao Zhang, "Expression-Insensitive 3D Face Recognition using Sparse Representation," CVPR, pp. 2575-2582, 2009. [PDF \| bibtex] We present a face recognition method based on sparse representation for recognizing 3D face meshes under expressions using low-level geometric features ... To handle facial expressions, we design a feature pooling and ranking scheme to collect various types of low-level geometric features and rank them ...
	3. Kai Xu, Zhiquan Cheng, Yanzhen Wang, Yueshan Xiong, and Hao Zhang, "Quality Encoding for Tetrahedral Mesh Optimization," Computers and Graphics (Special Issue of SMI), Vol. 33, pp. 250-261, 2009. [PDF \| bibtex ] We define quality differential coordinates (QDC) for per-vertex encoding of the quality of a tetrahedral mesh. Our formulation allows the incorporation of element quality metrics into QDC construction to penalize badly shaped and inverted tetrahedra ...
	2. Rong Liu, Hao Zhang, Ariel Shamir, and Daniel Cohen-Or, "A Part-Aware Surface Metric for Shape Analysis," Computer Graphics Forum (Special Issue of Eurographics 2009), Vol. 28, No. 2, 397-406, 2009. [PDF \| bibtex] The notion of parts in a shape plays an important role in many geometry problems. At the same time, many such problems utilize a surface metric to assist shape analysis and understanding. The main contribution of our work is to bring together these two fundamental concepts ...
	1. Matt Olson and Hao Zhang, "Tangential Distance Fields for Mesh Silhouette Problems,," Computer Graphics Forum, Vol. 28, No. 1, pp. 84-100, 2009. [PDF \| bibtex] We introduce a novel class of distance fields for a given surface defined by its tangent planes. At each point in space, we assign a scalar value which is a weighted sum of distances to these tangent planes. We use four applications to illustrate the benefit of using the resulting TDF scalar field: view point selection, ...

2008

3. Ramsay Dyer, Hao Zhang, and Torsten Moeller, "Surface sampling and the intrinsic Voronoi diagram," Computer Graphics Forum (Special Issue of SGP), Volume 27, Number 5, pp. 1431-1439, 2008. (won Best Paper Award at SGP) [PDF | bibtex]

We develop adaptive sampling criteria which guarantee a topologically faithful mesh and demonstrate an improvement and simplification over earlier results, albeit restricted to 2D surfaces. These sampling criteria are based on the strong convexity radius and the injectivity radius ...

2. Hao Zhang, Alla Sheffer, Daniel Cohen-Or, Qingnan Zhou, Oliver van Kaick, and Andrea Tagliasacchi, "Deformation-Driven Shape Correspondence," Computer Graphics Forum (Special Issue of SGP), Volume 27, Number 5, pp. 1393-1402, 2008. [PDF | bibtex | Project page (UBC | SFU)]

We present an automatic feature correspondence algorithm capable of handling large, non-rigid shape variations, as well as partial matching ... The search is deformation-driven, prioritized by a self-distortion energy measured on meshes deformed according to a given correspondence ...

1. Rong Liu, Hao Zhang, and James Busby, "Convex Hull Covering of Polygonal Scenes for Accurate Collision Detection in Games," Proc. of Graphics Interface 2008, pp. 203-210. [PDF | bibtex]

We look at a particular instance of the convex decomposition problem which arises from real-world game development. Given a collection of polyhedral surfaces (possibly with boundaries, holes, and complex interior structures) that model the scene geometry in a game environment, we wish to find a small set of convex hulls ...

2007

	8. Oliver van Kaick, Ghassan Hamarneh, Hao Zhang, and Paul Wighton, "Contour Correspondence via Ant Colony Optimization," Proc. of Pacific Graphics 2007, pp. 271-280. [Oliver's page with paper and MATLAB code \| bibtex] We formulate contour correspondence as a Quadratic Assignment Problem (QAP), incorporating proximity information. By maintaining the neighborhood relation between points this way, we show that better matching results are obtained in practice. We propose the first Ant Colony Optimization (ACO) algorithm ...
	7. Ramsay Dyer, Hao Zhang, and Torsten Moeller, "Delaunay Mesh Construction," Proc. of Eurographics Symposium on Geometry Processing (SGP), pp. 273-282. [PDF \| bibtex] We present algorithms to produce Delaunay meshes from arbitrary triangle meshes by edge flipping and geometry-preserving refinement and prove their correctness. In particular we show that edge flipping serves to reduce mesh surface area, and that a poorly sampled input mesh may yield unflippable edges necessitating refinement ...
	6. Hao Zhang, Oliver van Kaick, and Ramsay Dyer, "Spectral Methods for Mesh Processing and Analysis," Proc. of Eurographics 2007 State of the Art Report, pp. 1-22. [PDF \| bibtex] Spectral methods for mesh processing and analysis rely on the eigenvalues, eigenvectors, or eigenspace projections derived from appropriately defined mesh operators to carry out desired tasks. This state-of-the-art report aims to provide a comprehensive survey on the spectral approach ...
	5. Rong Liu and Hao Zhang, "Mesh Segmentation via Spectral Embedding and Contour Analysis," Computer Graphics Forum (Special Issue of Eurographics 2007), Vol. 26, pp. 385-394, 2007. [PDF \| bibtex] We propose a mesh segmentation algorithm where at each step, a sub-mesh embedded in 3D is first spectrally projected into the plane with a contour extracted from the planar embedding. Transforming the shape analysis problem to the 2D domain facilitates our segmentability analysis and sampling tasks ...
	4. Xiaoxing Li and Hao Zhang, "Adapting Geometric Attributes for Expression-Invariant 3D Face Recognition," Proc. of Shape Modeling International (SMI) 2007, pp. 21-32. [PDF \| bibtex] We investigate the use of multiple intrinsic geometric attributes, including angles, geodesic distances, and curvatures, for 3D face recognition ... As invariance to facial expressions holds the key to improving recognition performance, we propose to train for the component-wise weights ...
	3. Ramsay Dyer, Hao Zhang, and Torsten Moeller, "Voronoi-Delaunay Duality and Delaunay Meshes," Proc. of ACM Symposium on Solid and Physical Modeling (SPM) 2007, pp. 415-420. [PDF \| bibtex] We define a Delaunay mesh to be a manifold triangle mesh whose edges form an intrinsic Delaunay triangulation or iDT of its vertices ... We show that meshes constructed from a smooth surface by taking an iDT or a restricted Delaunay triangulation, do not in general yield a Delaunay mesh ...
	2. Varun Jain, Hao Zhang, and Oliver van Kaick, "Non-Rigid Spectral Correspondence of Triangle Meshes," International Journal on Shape Modeling (via invitation to Special Issue of SMI 2006), Volume 13, Number 1, pp. 101-124. [PDF \| bibtex] We present an algorithm for finding a meaningful correspondence between two triangle meshes, which is designed to handle general non-rigid transformations. Our algorithm operates on embeddings of the two shapes in the spectral domain so as to normalize them with respect to uniform scaling and rigid-body transformation.
	1. Varun Jain and Hao Zhang, "A Spectral Approach to Shape-Based Retrieval of Articulated 3D Models," Computer-Aided Design (via invitation to Special Issue of GMP 2006), Vol. 39, Issue 5, pp. 398-407, 2007. [PDF \| DOI \| bibtex] We present an approach for robust shape retrieval from databases containing articulated 3D models. Each shape is represented by the eigenvectors of an appropriately defined affinity matrix, forming a spectral embedding which achieves normalization against rigid-body transformations, shape articulation ...

2006

	8. John Li and Hao Zhang, "Nonobtuse Remeshing and Decimation," in Proc. of Symposium on Geometry Processing (SGP) 2006 (short paper), pp.235-238. [PDF \| bibtex] We propose an algorithm for guaranteed nonobtuse remeshing and nonobtuse mesh decimation. Our strategy for the remeshing problem is to first convert an input mesh, using a modified Marching Cubes algorithm, into a rough approximate mesh that is guaranteed to be nonobtuse. We then apply iterative "deform-to-fit" ...
	7. Matt Olson and Hao Zhang, "Silhouette Extraction in Hough Space," Computer Graphics Forum (Special Issue on Eurographics 2006), Volume 25, Number 3, pp. 273-282, 2006. [PDF \| bibtex] We present an efficient silhouette extractor for triangle meshes under perspective projection in the Hough space. The more favorable point distribution in Hough space allows us to obtain significant performance gains over the traditional dual-space based techniques ...
	6. Varun Jain and Hao Zhang, "Shape-Based Retrieval of Articulated 3D Models Using Spectral Embedding," in Proceeding of Geometric Modeling and Processing 2006, pp. 295-308. [PDF \| bibtex] We present a spectral approach for robust shape retrieval from databases containing articulated 3D shapes. We show absolute improvement in retrieval performance when conventional shape descriptors are used in the spectral domain on the McGill database of articulated 3D shapes. We also propose a simple eigenvalue-based descriptor ...
	5. Rong Liu, Hao Zhang, and Oliver van Kaick, "Spectral Sequencing based on Graph Distance," in Proceeding of Geometric Modeling and Processing 2006 (poster paper), pp. 632-638. [PDF \| bibtex] In this paper, we treat optimal mesh layout generation as a problem of preserving graph distances and propose to use the subdominant eigenvector of a kernel (affinity) matrix for sequencing ...
	4. Rong Liu, Varun Jain, and Hao Zhang, "Subsampling for Efficient Spectral Mesh Processing," in Proceeding of Computer Graphics International 2006, Lecture Notes in Computer Science 4035, H.-P. Seidel, T. Nishita, and Q. Peng, Eds., pp. 172-184, 2006. (acceptance rate: 10%) [PDF \| bibtex] We apply Nystrom method, a sub-sampling and reconstruction technique, to speed up spectral mesh processing. We first relate this method to Kernel Principal Component Analysis (KPCA). This enables us to derive a novel measure in the form of a matrix trace, based soly on sampled data, to quantify the quality of Nystrom approximation ...
	3. Varun Jain and Hao Zhang, "Robust 3D Shape Correspondence in the Spectral Domain," in Proceeding of International Conference on Shape Modeling and Applications (SMI) 2006, pp. 118-129, 2006. [PDF \| bibtex] We present an algorithm for finding a meaningful correspondence between two 3D shapes given as triangle meshes. Our algorithm operates on embeddings of the two shapes in the spectral domain so as to normalize them with respect to uniform scaling, rigid-body transformation and shape bending ...
	2. Andrew Clements and Hao Zhang, "Minimum Ratio Contours on Surface Meshes," in Proceeding of International Conference on Shape Modeling and Applications (SMI) 2006, pp. 26-37, 2006. [PDF \| bibtex] We present a novel approach for discretely optimizing contours on the surface of a triangle mesh. This is achieved through the use of a minimum ratio cycle (MRC) algorithm, where we compute a contour having the minimal ratio between a novel contour energy term and the length of the contour ...
	1. Xiaoxing Li, Greg Mori, and Hao Zhang, "Expression-Invariant Face Recognition with Expression Classification," in Proceeding of Canadian Conference on Computer and Robot Vision (CRV) 2006, pp. 77-83, 2006. [PDF \| bibtex] Facial expression, which changes face geometry, usually has an adverse effect on the performance of a face recognition system. On the other hand, face geometry is a useful cue for recognition. Taking these into account, we utilize the idea of separating geometry and texture information in a face image ...

2005 -

9. Hao Zhang and Rong Liu, "Mesh Segmentation via Recursive and Visually Salient Spectral Cuts," in Proceeding of Vision, Modeling, and Visualization 2005, pp. 429-436, 2005. [PDF | bibtex]

8. Varun Jain and Hao Zhang, "Robust 2D Shape Correspondence using Geodesic Shape Context," in Proceeding of Pacific Graphics 2005, (short paper), pp. 121-124, 2005. [bibtex]

7. Hao Zhang, "Discrete Combinatorial Laplacian Operators for Digital Geometry Processing," in Proc. of SIAM Conference on Geometric Design and Computing, pp. 575-592, 2004. [PDF | bibtex]

6. Rong Liu and Hao Zhang, "Segmentation of 3D Meshes through Spectral Clustering," in Proceeding of Pacific Graphics 2004, pp. 298-305. [PDF | bibtex]

5. Hao Zhang and Hendrik C. Blok, "Optimal Mesh Signal Transforms," in Proceeding of IEEE Geometric Modeling and Processing 2004 (poster paper), pp. 373-379. [bibtex]

4. Hao Zhang and Eugene Fiume, "Butterworth Filtering and Implicit Fairing of Irregular Meshes," in Proceedings of Pacific Graphics 2003 (short paper), pp. 502-506. [bibtex]

3. Hao Zhang and Eugene Fiume, "Mesh Smoothing with Shape or Feature Preservation," in Advances in Modeling, Animation, and Rendering, J. Vince and R. Earnshaw, editors, pp. 167-182, Springer 2002. Also as Proceeding of Computer Graphics International 2002.

2. Hao Zhang and Eugene Fiume, "Shape Matching of 3-D Contours using Normalized Fourier Descriptors," in Proceeding of International Conference on Shape Modeling and Applications (SMI), IEEE Computer Society, pp. 261-268, 2002. [bibtex]

1. John A. Brzozowski and Hao Zhang, "Delay-Insensitivity and Semi-Modularity," Formal Methods in System Design, Kluwer Academic Publishers, March 2000, vol. 16, pp. 191-218, 2000.

SIGGRAPH/TOG: 58; ICCV/CVPR/ECCV/NeurIPS: 17; SGP: 7; Eurographics: 8; EGSTAR: 4; CGF: 19.

	9. Maham Tanveer, Yang Zhou, Simon Nicklaus, Ali Mahdavi-Amiri, Hao Zhang, Krishna Kumara Singh, and Nanxuan Zhao, "MultiCOIN: Multi-Modal COntrollable Video INbetweening", preprint, 2025. [Project page \| arXiv \| bibtex] We introduce MultiCOIN, a video inbetweening framework that allows multi-modal controls, including depth transition and layering, motion trajectories, text prompts, and target regions for movement localization, while achieving a balance between flexibility, ease of use, and precision for fine-grained video interpolation. To achieve this, we adopt the Diffusion Transformer (DiT) architecture as our video generative model, due to its proven capability to generate high-quality long videos. To ensure compatibility between DiT and our multi-modal controls, we map all motion controls into a common sparse and user-friendly point-based representation as the video/noise input. Further, we separate content controls and motion controls into two branches to encode the required features ...
	8. Sauradip Nag, Daniel Cohen-Or, Hao Zhang, and Ali Mahdavi-Amiri, "In-2-4D: Inbetweening from Two Single-View Images to 4D Generation", ACM SIGGRAPH Asia, 2025. [Project page \| arXiv \| Youtube video \| bibtex] We pose a new problem, In-2-4D, for generative 4D (i.e., 3D + motion) inbetweening to interpolate two single-view images. In contrast to video/4D generation from only text or a single image, our interpolative task can leverage more precise motion control to better constrain the generation. Given two monocular RGB images representing the start and end states of an object in motion, our goal is to generate and reconstruct the motion in 4D, without making assumptions on the object category, motion type, length, or complexity. To handle such arbitrary and diverse motions, we utilize a foundational video interpolation model for motion prediction and employ a hierarchical approach through keyframes to address large frame-to-frame motion gaps can lead to ambiguous interpretations ...
	7. Qimin Chen, Yuezhi Yang, Yifan Wang, Vladimir Kim, Siddhartha Chaudhuri, Hao Zhang, and Zhiqin Chen, "ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction", ACM SIGGRAPH Asia, 2025. [Project page \| arXiv \| bibtex] We introduce a 3D detailizer, a neural model which can instantaneously (in <1s) transform a coarse 3D shape proxy into a high-quality asset with detailed geometry and texture as guided by an input text prompt. Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details. The coarse 3D proxy, which can be easily varied and adjusted (e.g., via user editing), provides structure control over the final shape. Importantly, our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes, with varied structures, whose local details all share a consistent style and appearance.
	6. Sai Raj Kishore Perla, Aditya Vora, Sauradip Nag, Ali Mahdavi-Amiri, and Hao Zhang, "ASIA: Adaptive 3D Segmentation using Few Image Annotations", ACM SIGGRAPH Asia, 2025. [Project page \| arXiv \| bibtex] We introduce ASIA (Adaptive 3D Segmentation using few Image Annotations), a novel framework that enables segmentation of possibly non-semantic and non-text describable "parts" in 3D. Our segmentation is controllable through a few user-annotated in-the-wild images, which are easier to collect than multi-view images, less demanding to annotate than 3D models, and more precise than potentially ambiguous text descriptions. Our method leverages the rich priors of text-to-image diffusion models, such as Stable Diffusion, to transfer segmentations from image space to 3D, even when the annotated and target objects differ significantly in geometry or structure. During training, we optimize a text token for each segment and fine-tune our model with a novel cross-view part correspondence loss.
	5. Ruiqi Wang and Hao Zhang, "RESAnything: Attribute Prompting for Arbitrary Referring Segmentation", NeurIPS, 2025. [arXiv \| Project page \| bibtex] We present an open-vocabulary and zero-shot method for arbitrary referring expression segmentation (RES), targeting more general input expressions than those handled by prior works. Specifically, our inputs encompass both object- and part-level labels as well as implicit references pointing to properties or qualities of object/part function, design, style, material, etc. Our model, coined RESAnything, leverages Chain-of-Thoughts (CoT) reasoning, where the key idea is attribute prompting. We generate detailed descriptions of object/part attributes including shape, color, and location for potential segment proposals through systematic prompting of a large language model (LLM), where the proposals are produced by a foundational image segmentation model.
	4. Yilin Liu, Duoteng Xu, Xingyao Yu, Xiang Xu, Daniel Cohen-Or, Hao Zhang, and Hui Huang, "HoLa: B-Rep Generation using a Holistic Latent Representation", ACM Transactions on Graphics (Special Issue of SIGGRAPH), 2025. [arXiv \| Project page \| bibtex] We introduce a novel representation for learning and generating Computer-Aided Design (CAD) models in the form of boundary representations (B-Reps). Our representation unifies the continuous geometric properties of B-Rep primitives in different orders (e.g., surfaces and curves) and their discrete topological relations in a holistic latent (HoLa) space. This is based on the simple observation that the topological connection between two surfaces is intrinsically tied to the geometry of their intersecting curve. Such a prior allows us to reformulate topology learning in B-Reps as a geometric reconstruction problem in Euclidean space. Specifically, we eliminate the presence of curves, vertices, and all the topological connections in the latent space by learning to distinguish and derive curve geometries from a pair of surface primitives via a neural intersection network ...
	3. Changhao Li, Yu Xin, Xiaowei Zhou, Ariel Shamir, Hao Zhang, Ligang Liu, and Ruizhen Hu, "MASH: Masked Anchored SpHerical Distances for 3D Shape Representation and Generation", ACM SIGGRAPH, 2025. [Project page \| arXiv \| bibtex] We introduce Masked Anchored SpHerical Distances (MASH), a novel multi-view and parametrized representation of 3D shapes. Inspired by multi-view geometry and motivated by the importance of perceptual shape understanding for learning 3D shapes, MASH represents a 3D shape as a collection of observable local surface patches, each defined by a spherical distance function emanating from an anchor point. We further leverage the compactness of spherical harmonics to encode the MASH functions, combined with a generalized view cone with a parameterized base that masks the spatial extent of the spherical function to attain locality. We develop a differentiable optimization algorithm capable of converting any point cloud into a MASH representation accurately approximating ground-truth surfaces with arbitrary geometry and topology ...
	2. Qirui Huang, Runze Zhang, Kangjun Liu, Minglun Gong, Hao Zhang, and Hui Huang, "ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points", CVPR (highlight), 2025. [arXiv \| Project \| bibtex] We introduce ArcPro, a novel learning framework built on architectural programs to recover structured 3D abstractions from highly sparse and low-quality point clouds. Specifically, we design a domain-specific language (DSL) to hierarchically represent building structures as a program, which can be efficiently converted into a mesh. We bridge feedforward and inverse procedural modeling by using a feedforward process for training data synthesis, allowing the network to make reverse predictions. We train an encoder-decoder on the points-program pairs to establish a mapping from unstructured point clouds to architectural programs, where a 3D convolutional encoder extracts point cloud features and a transformer decoder autoregressively predicts the programs in a tokenized form ...
	1. Dingdong Yang, Yizhi Wang, Konrad Schindler, Ali Mahdavi-Amiri, and Hao Zhang, "GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation", ICLR, 2025. [Project page \| arXiv \| bibtex] We propose GALA, a novel representation of 3D shapes that (i) excels at capturing and reproducing complex geometry and surface details, (ii) is computationally efficient, and (iii) lends itself to 3D generative modelling with modern, diffusion-based schemes. The key idea of GALA is to exploit both the global sparsity of surfaces within a 3D volume and their local surface properties ...

	12. Liqiang Lin, Wenpeng Wu, Chi-Wing Fu, Hao Zhang, and Hui Huang, "CRAYM: Neural Field Optimization via Camera RAY Matching", NeurIPS, 2024. [Project page \| arXiv \| bibtex] We introduce camera ray matching (CRAYM) into the joint optimization of camera poses and neural fields from multi-view images. The optimized field, referred to as a feature volume, can be "probed" by the camera rays for novel view synthesis (NVS) and 3D geometry reconstruction. One key reason for matching camera rays, instead of pixels as in prior works, is that the camera rays can be parameterized by the feature volume to carry both geometric and photometric information. Multi-view consistencies involving the camera rays and scene rendering can be naturally integrated into the joint optimization and network training, to impose physically meaningful constraints to improve the final quality of both the geometric reconstruction and photorealistic rendering. We demonstrate the effectiveness of CRAYM for both NVS and geometry reconstruction, over dense- or sparse-view settings, with qualitative and quantitative comparisons to state-of-the-art alternatives.
	11. Fenggen Yu, Yiming Qian, Xu Zhang, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, and Hao Zhang, "DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly", ECCV, 2024. [arXiv \| bibtex] We present a differentiable rendering framework to learn structured 3D abstractions in the form of primitive assemblies from sparse RGB images capturing a 3D object. By leveraging differentiable volume rendering, our method does not require 3D supervision. Architecturally, our network follows the general pipeline of an image-conditioned neural radiance field (NeRF) exemplified by pixelNeRF for color prediction. As our core contribution, we introduce differen- tial primitive assembly (DPA) into NeRF to output a 3D occupancy field in place of density prediction, where the predicted occupancies serve as opacity values for volume rendering. Our network, coined DPA-Net, produces a union of convexes ...
	10. Ruiqi Wang, Akshay Gadi Patil, Fenggen Yu, and Hao Zhang, "Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images", ECCV, 2024. [Project page \| arXiv \| bibtex] We introduce the first active learning (AL) model for high-accuracy instance segmentation of moveable parts from RGB images of real indoor scenes. Specifically, our goal is to obtain fully validated segmentation results by humans while minimizing manual effort. To this end, we employ a transformer that utilizes a masked-attention mechanism to supervise the active segmentation. To enhance the network tailored to moveable parts, we introduce a coarse-to-fine AL approach which first uses an object-aware masked attention and then a pose-aware one, leveraging the hierarchical nature of the problem and a correlation between moveable parts and object poses and interaction directions.
	9. Qimin Chen, Zhiqin Chen, Vladimir Kim, Noam Aigerman, Hao Zhang, and Siddhartha Chaudhuri, "DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement", ECCV, 2024. [arXiv \| bibtex] We present a 3D modeling method which enables end-users to refine or detailize 3D shapes using machine learning, expanding the capabilities of AI-assisted 3D content creation. Given a coarse voxel shape (e.g., one produced with a simple box extrusion tool or via generative modeling), a user can directly "paint" desired target styles representing compelling geometric details, from input exemplar shapes, over different regions of the coarse shape. These regions are then up-sampled into high-resolution geometries which adhere with the painted styles.
	8. Sai Raj Kishore Perla, Yizhi Wang, Ali Mahdavi-Amiri, and Hao Zhang, "EASI-Tex: Edge-Aware Mesh Texturing from Single Image", ACM Transactions on Graphics (Special Issue of SIGGRAPH), 2024. [arXiv \| bibtex] We introduce a novel approach for single-image mesh texturing, which employs diffusion models with judicious conditioning to seamlessly transfer an object's texture from a single RGB image to a given 3D mesh object. We do not assume that the two objects belong to the same category, and even if they do, there can be significant discrepancies in their geometry and part proportions. Our method aims to rectify the discrepancies by respecting both shape semantics and edge features in the inputs to produce clean and sharp mesh texturization. Leveraging a pre-trained Stable Diffusion generator, our method is capable of transferring textures in the absence of a direct guide from the single-view image.
	7. Zhiqin Chen, Qimin Chen, Hang Zhou, and Hao Zhang, "DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation", ACM SIGGRAPH, 2024. [arXiv \| bibtex] We present an unsupervised 3D shape co-segmentation method which learns a set of deformable part templates from a shape collection. To accommodate structural variations in the collection, our network composes each shape by a selected subset of template parts which are affine-transformed. To maximize the expressive power of the part templates, we introduce a per-part deformation network to enable the modeling of diverse parts with substantial geometry variations, while imposing constraints on the deformation capacity to ensure fidelity to the originally represented parts.
	6. Yilin Liu, Jialei Chen, Shanshan Pan, Daniel Cohen-Or, Hao Zhang, and Hui Huang, "Split-and-Fit: Learning B-Reps via Structure-Aware Voronoi Partitioning", ACM Transactions on Graphics (Special Issue of SIGGRAPH), 2024. [arXiv \| bibtex] We introduce a novel method for acquiring boundary representations (B-Reps) of 3D CAD models which involves a two-step process: it first applies a spatial partitioning, referred to as the "split", followed by a "fit" operation to derive a single primitive within each partition. Specifically, our partitioning aims to produce the classical Voronoi diagram of the set of ground-truth (GT) B-Rep primitives.
	5. Jingyu Hu, Kai-Hei Hui, Zhengzhe liu, Hao Zhang, and Chi-Wing Fu, "CNS-Edit: 3D Shape Editing via Coupled Neural Shape Optimization", ACM SIGGRAPH, 2024. [arXiv \| bibtex] We introduce a new approach based on a coupled representation and a neural volume optimization to implicitly perform 3D shape editing in latent space. This work has three innovations. First, we design the coupled neural shape (CNS) representation for supporting 3D shape editing. This representation includes a latent code, which captures high-level global semantics of the shape, and a 3D neural feature volume, which provides a spatial context to associate with the local shape changes given by the editing. Second, ...
	4. Yizhi Wang, Wallace Lira, Wenqi Wang, Ali Mahdavi-Amiri, and Hao Zhang, "Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction", CVPR, 2024. [Project page \| arXiv \| bibtex] We introduce multi-slice reasoning, a new notion for single-view 3D reconstruction which challenges the current and prevailing belief that multi-view synthesis is the most natural conduit between single-view and 3D. Our key observation is that object slicing is more advantageous than altering views to reveal occluded structures. Specifically, slicing is more occlusion-revealing since it can peel through any occluders without obstruction. In the limit, i.e., with infinitely many slices, it is guaranteed to unveil all hidden object parts.
	3. Dingdong Yang, Yizhi Wang, Ali Mahdavi-Amiri, and Hao Zhang, "BRICS: Bi-level feature Representation of Image CollectionS", preprint, 2023. [Project page \| arXiv \| bibtex] We present BRICS, a bi-level feature representation for image collections, which consists of a key code space on top of a feature grid space. Specifically, our representation is learned by an autoencoder to encode images into continuous key codes, which are used to retrieve features from groups of multi-resolution feature grids. Our key codes and feature grids are jointly trained continuously with well-defined gradient flows, leading to high usage rates of the feature grids and improved generative modeling compared to discrete Vector Quantization (VQ). Differently from existing continuous representations such as KL-regularized latent codes, our key codes are strictly bounded in scale and variance.
	2. Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, and Hao Zhang, "RoSI: Recovering 3D Shape Interiors from Few Articulation Images", preprint, 2023. [arXiv \| bibtex] The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures. We present a learning framework to recover the shape interiors (RoSI) of existing 3D models with only their exteriors from multi-view and multi-articulation images. Given a set of RGB images that capture a target 3D object in different articulated poses, possibly from only few views, our method infers the interior planes that are observable in the input images.
	1. Zeyu Huang, Sisi Dai, Kai Xu, Hao Zhang, Hui Huang, and Ruizhen Hu, "DINA: Deformable INteraction Analogy", Graphical Models, selected paper from Computational Visual Media (CVM), 2024. [arXiv \| bibtex] We introduce deformable interaction analogy (DINA) as a means to generate close interactions between two 3D objects. Given a single demo interaction between an anchor object (e.g., a hand) and a source object (e.g., a mug grasped by the hand), our goal is to generate many analogous 3D interactions between the same anchor object and various new target objects (e.g. a toy airplane), where the anchor object is allowed to be rigid or deformable.

	15. Aditya Vora, Akshay Gadi Patil, and Hao Zhang, "DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization", NeurIPS, 2023. [arXiv \| bibtex] We present a volume rendering-based neural surface reconstruction method that takes as few as three disparate RGB images as input. Our key idea is to regularize the reconstruction, which is severely ill-posed and leaving significant gaps between the sparse views, by learning a set of neural templates that act as surface priors. Our method, coined DiViNet, operates in two stages. The first stage learns the templates, in the form of 3D Gaussian functions, across different scenes, without 3D supervision. In the reconstruction stage, our predicted templates serve as anchors to help “stitch” the surfaces over sparse regions.
	14. Fenggen Yu, Qimin Chen, Maham Tanveer, Ali Mahdavi-Amiri, and Hao Zhang, "D²CSG: Unsupervised Learning of Compact CSG Trees with Dual Complements and Dropouts", NeurIPS, 2023. [arXiv \| bibtex] We present D²CSG, a neural model composed of two dual and complementary network branches, with dropouts, for unsupervised learning of compact constructive solid geometry (CSG) representations of 3D CAD shapes. Our network is trained to reconstruct a 3D shape by a fixed-order assembly of quadric primitives, with both branches producing a union of primitive intersections or inverses. A key difference between D2CSG and all prior neural CSG models is its dedicated residual branch to assemble the potentially complex shape complement, which is subtracted from an overall shape modeled by the cover branch. With the shape complements, our network is provably general, while the weight dropout further improves compactness of the CSG tree by removing redundant primitives.
	13. Qimin Chen, Zhiqin Chen, Hang Zhou, and Hao Zhang, "ShaDDR: Real-Time Example-Based Geometry and Texture Generation via 3D Shape Detailization and Differentiable Rendering", ACM SIGGRAPH Asia, 2023. [arXiv \| bibtex] We present ShaDDR, an example-based deep generative neural network which produces a high-resolution textured 3D shape through geometry detailization and conditional texture generation applied to an input coarse voxel shape. Trained on a small set of detailed and textured exemplar shapes, our method learns to detailize the geometry via multi-resolution voxel upsampling and generate textures on voxel surfaces via differentiable rendering against exemplar texture images from a few views. The generation is realtime, taking less than 1 second to produce a 3D model with voxel resolutions up to 512^3. The generated shape preserves the overall structure of the input coarse voxel model, while the style of the generated geometric details and textures can be manipulated through learned latent codes.
	12. Jingyu Hu, Kai-Hei Hui, Zhengzhe liu, Hao Zhang, and Chi-Wing Fu, "CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration", ACM SIGGRAPH Asia, 2023. [arXiv \| bibtex] This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space. Many recent methods have been developed to encode 3D shapes into a learned latent shape space to enable generative design and modeling. Yet, existing methods lack effective exploration mechanisms, despite the rich information. To this end, we propose to leverage CLIP, a powerful pre-trained vision-language model, to aid the shape-space exploration. Our idea is threefold. First, we couple the CLIP and shape spaces by generating paired CLIP and shape codes through sketch images and training a mapper network to connect the two spaces. Second, to explore the space around a given shape, we formulate a co-optimization strategy to search for the CLIP code that better matches the geometry of the shape. Third, we design three exploration modes, binary-attribute-guided, text-guided, and sketch-guided, to locate suitable exploration trajectories in shape space and induce meaningful changes to the shape.
	11. Zihao Yan, Fubao Su, Mingyang Wang, Ruizhen Hu, Hao Zhang, and Hui Huang, "Interaction-Driven Active 3D Reconstruction with Object Interiors", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), 2023. [Project page \| bibtex] We introduce an active 3D reconstruction method which integrates visual perception, robot-object interaction, and 3D scanning to recover both the exterior and interior geometries of a target 3D object. Unlike other works in active vision which focus on optimizing camera viewpoints to better investigate the environment, the primary feature of our reconstruction is an analysis of the interactability of various parts of the target object and the ensuing part manipulation by a robot to enable scanning of occluded regions. As a result, an understanding of part articulations of the target object is obtained on top of complete geometry acquisition. Our method operates fully automatically by a Fetch robot with built-in RGBD sensors. It iterates between interaction analysis and interaction-driven reconstruction, scanning and reconstructing detected moveable parts one at a time, where both the articulated part detection and mesh reconstruction are carried out by neural networks.
	10. Juzhan Xu, Minglun Gong, Hao Zhang, Hui Huang, and Ruizhen Hu, "Neural Packing: from Visual Sensing to Reinforcement Learning", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 42, No, 6, 2023. [Project page \| bibtex] We present a complete learning framework to solve the real-world transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforcement learning (RL), to solve the NP-hard combinatorial optimization problem. Our network simultaneously selects an object to pack and determines the final packing location, based on a judicious encoding of the continuously evolving states of partially observed source objects and available spaces in the target container, using separate encoders both enabled with attention mechanisms.
	9. Fenggen Yu, Yiming Qian, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, and Hao Zhang, "HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling", ICCV, 2023. [arXiv \| bibtex] We present the first active learning tool for fine-grained 3D part labeling, a problem which challenges even the most advanced deep learning (DL) methods due to the significant structural variations among the small and intricate parts. For the same reason, the necessary data annotation effort is tremendous, motivating approaches to minimize human involvement. Our labeling tool iteratively verifies or modifies part labels predicted by a deep neural network, with human feedback continually improving the network prediction. To effectively reduce human efforts, we develop two novel features in our tool, hierarchical and symmetry-aware active labeling. Our human-in-the-loop approach, coined HAL3D, achieves 100% accuracy (barring human errors) on any test set with pre-defined hierarchical part labels, with 80% time-saving over manual effort.
	8. Maham Tanveer, Yizhi Wang, Ali Mahdavi-Amiri, and Hao Zhang, "DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion", ICCV, 2023. [Project page \| arXiv \| bibtex] We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable. To address an assortment of challenges with our task at hand including conflicting goals (artistic stylization vs. legibility), lack of ground truth, and immense search space, our approach utilizes large language models to bridge texts and visual images for stylization and build an unsupervised generative model with a diffusion model backbone. Specifically, we employ the denoising generator in Latent Diffusion Model (LDM), with the key addition of a CNN-based discriminator to adapt the input style onto the input text. The discriminator uses rasterized images of a given letter/word font as real samples and output of the denoising generator as fake samples ...
	7. Yizhi Wang, Zeyu Huang, Ariel Shamir, Hui Huang, Hao Zhang, and Ruizhen Hu, "ARO-Net: Learning Implicit Fields from Anchored Radial Observations", CVPR, 2023. [arXiv \| bibtex] We introduce anchored radial observations (ARO), a novel shape encoding for learning neural field representation of shapes that is category-agnostic and generalizable amid significant shape variations. The main idea behind our work is to reason about shapes through partial observations from a set of viewpoints, called anchors. We develop a general and unified shape representation by employing a fixed set of anchors, via Fibonacci sampling, and designing a coordinate-based deep neural network to predict the occupancy value of a query point in space. Differently from prior neural implicit models, that use global shape feature, our shape encoder operates on contextual, query-specific features ...
	6. Akshay Gadi Patil, Supriya Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, and Hao Zhang, "Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes", Computer Graphics Forum (State-of-the-Art Report), 2023. [arXiv \| bibtex] This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks – 3D object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis ...
	5. Zeyu Huang, Juzhan Xu, Sisi Dai, Kai Xu, Hao Zhang, Hui Huang, and Ruizhen Hu, "NIFT: Neural Interaction Field and Template for Object Manipulation", International Conference on Robotics and Automation (ICRA), 2023. [arXiv \| bibtex] We introduce NIFT, Neural Interaction Field and Template, a descriptive and robust interaction representation of object manipulations to facilitate imitation learning. Given a few object manipulation demos, NIFT guides the generation of the interaction imitation for a new object instance by matching the Neural Interaction Template (NIT) extracted from the demos to the Neural Interaction Field (NIF) defined for the new object. Specifically, the NIF is a neural field which encodes the relationship between each spatial point and a given object, where the relative position is defined by a spherical distance function rather than occupancies or signed distances, which are commonly adopted by conventional neural fields but less informative ...
	4. Hang Zhou, Rui Ma, Lingxiao Zhang, Lin Gao, Ali Mahdavi-Amiri, and Hao Zhang, "SAC-GAN: Structure-Aware Image Composition", IEEE Trans. on Visualization and Computer Graphics (TVCG), 2023. [arXiv \| bibtex] We introduce an end-to-end learning framework for image-to-image composition, aiming to seamlessly compose an object represented as a cropped patch from an object image into a background scene image. As our approach emphasizes more on semantic and structural coherence of the composed images, rather than their pixel-level RGB accuracies, we tailor the input and output of our network with structure-aware features and design our network losses accordingly, with ground truth established in a self-supervised setting through the object cropping. Specifically, our network takes the semantic layout features from the input scene image, features encoded from the edges and silhouette in the input object patch, as well as a latent code as inputs, and generates a 2D spatial affine transform defining the translation and scaling of the object patch.
	3. Liqiang Lin, Pengdi Huang, Chi-Wing Fu, Kai Xu, Hao Zhang, and Hui Huang, "One Point is All You Need: Directional Attention Point for Feature Learning", Science China Information Sciences (SCIS), Vol. 66, No. 1, 2023. [PDF \| arXiv \| bibtex] We present a novel attention-based mechanism for learning enhanced point features for tasks such as point cloud classification and segmentation. Our key message is that if the right attention point is selected, then “one point is all you need” — not a sequence as in a recurrent model and not a pre-selected set as in all prior works. Also, where the attention point is should be learned, from data and specific to the task at hand. Our mechanism is characterized by a new and simple convolution, which combines the feature at an input point with the feature at its associated attention point. We call such a point adirectional attention point (DAP) ...
	2. Zhiqin Chen, Andrea Tagliasacchi, and Hao Zhang, "Learning Mesh Representations via Binary Space Partitioning Tree Networks", IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) (invited and extended article from CVPR 2020 as Best Student Paper Award winner), Vol. 45, No. 4, pp. 4870-4881, 2023. [arXiv \| Project page (code+video) \| bibtex] Polygonal meshes are ubiquitous, but have only played a relatively minor role in the deep learning revolution. State-of-the-art neural generative models for 3D shapes learn implicit functions and generate meshes via expensive iso-surfacing. We overcome these challenges by employing a classical spatial data structure from graphics, Binary Space Partitioning (BSP), to facilitate 3D learning. The core operation of BSP involves recursive subdivision of 3D space to obtain convex sets. By exploiting this property, we devise BSP-Net, a network that learns to represent a 3D shape via convex decomposition without supervision. The network is trained to reconstruct a shape using a set of convexes obtained from a BSP-tree built over a set of planes, where the planes and convexes are both defined by learned network weights.
	1. Tong Wu, Lin Gao, Lingxiao Zhang, Yu-Kun Lai, and Hao Zhang, "STAR-TM: STructure Aware Reconstruction of Textured Mesh from Single Image", IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2023. [arXiv \| bibtex] We present a novel method for single-view 3D reconstruction of textured meshes, with a focus to address the primary challenge surrounding texture inference and transfer. Our key observation is that learning textured reconstruction in a structure-aware and globally consistent manner is effective in handling the severe ill-posedness of the texturing problem and significant variations in object pose and texture details. Specifically, we perform structured mesh reconstruction, via a retrieval-and-assembly approach, to produce a set of genus-zero parts parameterized by deformable boxes and endowed with semantic information. For texturing, we first transfer visible colors from the input image onto the unified UV texture space of the deformable boxes. Then we combine a learned transformer model for per-part texture completion with a global consistency loss to optimize inter-part texture consistency. Our texture completion model operates in a VQ-VAE embedding space and is trained end-to-end, with the transformer training enhanced with retrieved texture instances to improve texture completion performance amid significant occlusion.

	6. Yilin Liu, Liqiang Lin, Ke Xie, Chi-Wing Fu, Hao Zhang, and Hui Huang, "Learning Reconstructability for Drone Aerial Path Planning", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), 2022. [Project \| arXiv \| bibtex] We introduce the first learning-based reconstructability predictor to improve view and path planning for large-scale 3D urban scene acquisition using unmanned drones. In contrast to previous heuristic approaches, our method learns a model that explicitly predicts how well a 3D urban scene will be reconstructed from a set of viewpoints. To make such a model trainable and simultaneously applicable to drone path planning, we simulate the proxy-based 3D scene reconstruction during training to set up the prediction. Specifically, the neural network we design is trained to predict the scene reconstructability as a function of the proxy geometry, a set of viewpoints, and optionally a series of scene images acquired in flight ...
	5. Zhiqin Chen, Andrea Tagliasacchi, Thomas Funkhouser, and Hao Zhang, "Neural Dual Contouring", ACM Transactions on Graphics (Special Issue of SIGGRAPH), 2022. [arXiv \| bibtex] We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC). Like traditional DC, it produces exactly one vertex per grid cell and one quad for each grid edge intersection, a natural and efficient structure for reproducing sharp features. However, rather than computing vertex locations and edge crossings with hand-crafted functions that depend directly on difficult-to-obtain surface gradients, NDC uses a neural network to predict them. As a result, NDC can be trained to produce meshes from signed or unsigned distance fields, binary voxel grids, or point clouds (with or without normals); and it can produce open surfaces in cases where the input represents a sheet or partial surface.
	4. Fenggen Yu, Zhiqin Chen, Manyi Li, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, and Hao Zhang, "CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive Assembly", CVPR, 2022. [arXiv \| bibtex] We introduce CAPRI-Net, a neural network for learning compact and interpretable implicit representations of 3D computer-aided design (CAD) models, in the form of adaptive primitive assemblies. Our network takes an input 3D shape that can be provided as a point cloud or voxel grids, and reconstructs it by a compact assembly of quadric surface primitives via constructive solid geometry (CSG) operations. The network is self-supervised with a reconstruction loss, leading to faithful 3D reconstructions with sharp edges and plausible CSG trees, without any ground-truth shape assemblies.
	3. Qimin Chen, Johannes Merz, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, and Hao Zhang, "UNIST: Unpaired Neural Implicit Shape Translation Network", CVPR, 2022. [arXiv \| bibtex] We introduce UNIST, the first deep neural implicit model for general-purpose, unpaired shape-to-shape translation, in both 2D and 3D domains. Our model is built on autoencoding implicit fields, rather than point clouds which represents the state of the art. Furthermore, our translation network is trained to perform the task over a latent grid representation which combines the merits of both latent-space processing and position awareness, to not only enable drastic shape transforms but also well preserve spatial features and fine local details for natural shape translations.
	2. Chengjie Niu, Manyi Li, Kai Xu, and Hao Zhang, "RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures", CVPR, 2022. [arXiv \| bibtex] We introduce RIM-Net, a neural network which learns recursive implicit fields for unsupervised inference of hierarchical shape structures. Our network recursively decomposes an input 3D shape into two parts, resulting in a binary tree hierarchy. Each level of the tree corresponds to an assembly of shape parts, represented as implicit functions, to reconstruct the input shape. At each node of the tree, simultaneous feature decoding and shape decomposition are carried out by their respective feature and part decoders, with weight sharing across the same hierarchy level ...
	1. Yanran Guan, Han Liu, Kun Liu, Kangxue Yin, Ruizhen Hu, Oliver van Kaick, Yan Zhang, Ersin Yumer, Nathan Carr, Radomir Mech, and Hao Zhang, "FAME: 3D Shape Generation via Functionality-Aware Model Evolution", IEEE Trans. on Visualization and Computer Graphics (TVCG), Vol. 28, No. 4, pp. 1758-1772, 2022. [arXiv \| bibtex] We introduce a modeling tool which can evolve a set of 3D objects in a functionality-aware manner. Our goal is for the evolution to generate large and diverse sets of plausible 3D objects for data augmentation, constrained modeling, as well as open-ended exploration to possibly inspire new designs. Starting with an initial population of 3D objects belonging to one or more functional categories, we evolve the shapes through part re-combination to produce generations of hybrids or crossbreeds between parents from the heterogeneous shape collection ...

	11. Zhiqin Chen and Hao Zhang, "Neural Marching Cubes", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 40, No. 6, 2021. [arXiv \| code \| bibtex] We introduce Neural Marching Cubes (NMC), a data-driven approach for extracting a triangle mesh from a discretized implicit field. We re-cast MC from a deep learning perspective, by designing tessellation templates more apt at preserving geometric features, and learning the vertex positions and mesh topologies from training meshes, to account for contextual information from nearby cubes. We develop a compact per-cube parameterization to represent the output triangle mesh, while being compatible with neural processing, so that a simple 3D convolutional network can be employed for the training. We evaluate our neural MC approach by quantitative and qualitative comparisons to all well-known MC variants, demonstrating its superiority in faithful reconstruction of sharp features and mesh topology.
	10. Jiongchao Jin, Arezou Fatemi (equal contribution), Wallace Lira, Fenggen Yu, Biao Leng, Rui Ma, Ali Mahdavi-Amiri, and Hao Zhang, "RaidaR: A Rich Annotated Image Dataset of Rainy Street Scenes", Second ICCV Workshop on Autonomous Vehicle Vision (AVVision), 2021. [dataset \| arXiv \| bibtex] We introduce RaidaR, a rich annotated image dataset of rainy street scenes, to support autonomous driving research. The new dataset contains the largest number of rainy images (58,542) to date, 5,000 of which provide semantic segmentations and 3,658 provide object instance segmentations. The RaidaR images cover a wide range of realistic rain-induced artifacts, including fog, droplets, and road reflections, which can effectively augment existing street scene datasets to improve data-driven machine perception during rainy weather.
	9. Lin Gao, Tong Wu, Yu-Jie Yuan, Ming-Xian Lin, Yu-Kun Lai, and Hao Zhang, "TM-NET: Deep Generative Networks for Textured Meshes", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 40, No. 6, 2021. [arXiv \| project page \| bibtex] We introduce TM-NET, a novel deep generative model for synthesizing textured meshes in a part-aware manner. Once trained, the network can generate novel textured meshes from scratch or predict textures for a given 3D mesh, without image guidance. Plausible and diverse textures can be generated for the same mesh part, while texture compatibility between parts in the same shape is achieved via conditional generation. Specifically, our method produces texture maps for individual shape parts, each as a deformable box, leading to a natural UV map with minimal distortion. The network separately embeds part geometry (via a PartVAE) and part texture (via a TextureVAE) into their respective latent spaces ...
	8. Han Zhang, Yusong Yao, Ke Xie, Chi-Wing Fu, Hao Zhang, and Hui Huang, "Continuous Aerial Path Planning for 3D Urban Scene Reconstruction", ACM Transactions on Graphics (Special Issue of SIGGRAPH Asia), Vol. 40, No. 6, 2021. [Project page \| bibtex] We introduce a path-oriented drone trajectory planning algorithm, which performs continuous image acquisition along an aerial path, aiming to optimize both the scene reconstruction quality and path quality. Specifically, our method takes as input a rough 3D scene proxy and produces a drone trajectory and image capturing setup, which efficiently yields a high-quality reconstruction of the 3D scene based on three optimization objectives: one maximize the amount of 3D scene information that can be acquired along the entirety of the trajectory, another one to optimize the scene capturing efficiency by maximizing the scene information that can be acquired per unit length along the aerial path, and the last one to minimize the total turning angles along the aerial path, so as to reduce the number of sharp turns.
	7. Huan Fu, Bowen Cai, Lin Gao, Lingxiao Zhang, Cao Li, Zengqi Xun, Chengyue Sun, Yiyun Fei, Yu Zheng, Ying Li, Yi Liu, Peng Liu, Lin Ma, Le Weng, Xiaohang Hu, Xin Ma, Qian Qian, Rongfei Jia, Binqiang Zhao, and Hao Zhang, "3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics", ICCV, 2021. [arXiv \| bibtex] We introduce 3D-FRONT (3D Furnished Rooms with layOuts and semaNTics), a new, large-scale, and comprehensive repository of synthetic indoor scenes highlighted by professionally designed layouts and a large number of rooms populated by high-quality textured 3D models with style compatibility. From layout semantics down to texture details of individual objects, our dataset is freely available to the academic community and beyond. Currently, 3D-FRONT contains 18,797 rooms diversely furnished by 3D objects, far surpassing all publicly available scene datasets. In addition, the 7,302 furniture objects all come with high-quality textures ...
	6. Rinon Gal, Amit Bermano, Hao Zhang, and Daniel Cohen-Or, "MRGAN: Multi-Rooted 3D Shape Generation with Unsupervised Part Disentanglement", ICCV Workshop on Structural and Compositional Learning on 3D Data (StruCo3D), 2021. [arXiv \| bibtex] We present MRGAN, a multi-rooted adversarial network which generates part-disentangled 3D point-cloud shapes without part-based shape supervision. The network fuses multiple branches of tree-structured graph convolution layers which produce point clouds, with learnable constant inputs at the tree roots. Each branch learns to grow a different shape part, offering control over the shape generation at the part level. Our network encourages disentangled generation of semantic parts via two key ingredients: a root-mixing training strategy which helps decorrelate the different branches to facilitate disentanglement, and a set of loss terms designed with part disentanglement and shape semantics in mind.
	5. Manyi Li and Hao Zhang, "D^2IM-Net: Learning Detail Disentangled Implicit Fields from Single Images", CVPR, 2021. [arXiv \| bibtex] We present the first single-view 3D reconstruction network aimed at recovering geometric details from an input image which encompass both topological shape structures and surface features. Our key idea is to train the network to learn a detail disentangled reconstruction consisting of two functions, one implicit field representing the coarse 3D shape and the other capturing the details. Given an input image, our network, coined D2IM-Net, encodes it into global and local features which are respectively fed into two decoders. The base decoder uses the global features to reconstruct a coarse implicit field, while the detail decoder reconstructs, from the local features, two displacement maps, defined over the front and back sides of the captured object. The final 3D reconstruction is a fusion between the base shape and the displacement maps, with three losses enforcing the recovery of coarse shape, overall structure, and surface details via a novel Laplacian term.
	4. Zhiqin Chen, Vladimir Kim, Matthew Fisher, Noam Aigerman, Hao Zhang, and Siddhartha Chaudhuri, "DECOR-GAN: 3D Shape Detailization by Conditional Refinement", CVPR (oral), 2021. [arXiv \| bibtex] We introduce a deep generative network for 3D shape detailization, akin to stylization with the style being geometric details. We address the challenge of creating large varieties of high-resolution and detailed 3D geometry from a small set of exemplars by treating the problem as that of geometric detail transfer. Given a low-resolution coarse voxel shape, our network refines it, via voxel upsampling, into a higher-resolution shape enriched with geometric details. The output shape preserves the overall structure (or content) of the input, while its detail generation is conditioned on an input "style code" corresponding to a detailed exemplar.
	3. Akshay Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, and Hao Zhang, "LayoutGMN: Neural Graph Matching for Structural Layout Similarity", CVPR, 2021. [arXiv \| bibtex] We present a deep neural network to predict structural similarity between 2D layouts by leveraging Graph Matching Networks (GMN). Our network, coined LayoutGMN, learns the layout metric via neural graph matching, using an attention-based GMN designed under a triplet network setting. To train our network, we utilize weak labels obtained by pixel-wise Intersection-over-Union (IoUs) to define the triplet loss. Importantly, LayoutGMN is built with a structural bias which can effectively compensate for the lack of structure awareness in IoUs.
	2. Yiming Qian, Hao Zhang, and Yasutaka Furukawa, "Roof-GAN: Learning to Generate Roof Geometry and Relations for Residential Houses", CVPR, 2021. [arXiv \| bibtex] This paper presents Roof-GAN, a novel generative adversarial network that generates structured geometry of residential roof structures as a set of roof primitives and their relationships. Given the number of primitives, the generator produces a structured roof model as a graph, which consists of 1) primitive geometry as raster images at each node, encoding facet segmentation and angles; 2) inter-primitive colinear/coplanar relationships at each edge; and 3) primitive geometry in a vector format at each node, generated by a novel differentiable vectorizer while enforcing the relationships.
	1. Or Patashnik, Dov Danon, Hao Zhang, and Daniel Cohen-Or, "BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer", CVPR Workshop on Learning from Limited and Imperfect Data (L2ID), 2021. [Project page \| arXiv \| bibtex] State-of-the-art image-to-image translation methods tend to struggle in an imbalanced domain setting, where one image domain lacks richness and diversity. We introduce a new unsupervised translation network, BalaGAN, specifically designed to tackle the domain imbalance problem. We leverage the latent modalities of the richer domain to turn the image-to-image translation problem, between two imbalanced domains, into a balanced, multi-class, and conditional translation problem, more resembling the style transfer setting. Specifically, we analyze the source domain and learn a decomposition of it into a set of latent modes or classes, without any supervision. This leaves us with a multitude of balanced cross-domain translation tasks, between all pairs of classes, including the target domain. During inference, the trained network takes as input a source image, as well as a reference or style image from one of the modes as a condition, and produces an image which resembles the source on the pixel-wise level, but shares the same mode as the reference.