I am a PhD student at UC San Diego and also a member of Center of Visual Computing. My research interest mainly focuses on deep learning and computer vision. I am currently working on projects related to inverse rendering, such as shape reconstruction, material enhancement. My goal is to leverage computer vision and graphics technique to enable realistic content creation for AR/VR applications. Besides, I am also interested in representation learning, feature disentanglement and video prediction and have done some works related to generative models on videos and domain adaptation.
Ph.D. student in Computer Science and Engineering
Non-degree, Computer Science
Non-degree, Computer Science
B.S. in Physics & B.A. in Economics
Supervised by Prof. Manmohan Chandraker
Supervised by Prof. Yu-Chiang Frank Wang
Supervised by Dr. Yu-Chiang Frank Wang
Recovering the 3D shape of transparent objects using a small number of unconstrained natural images is an ill-posed problem. Complex light paths induced by refraction and reflection have prevented both traditional and deep multiview stereo from solving this challenge. We propose a physically-based network to recover 3D shape of transparent objects using a few images acquired with a mobile phone camera, under a known but arbitrary environment map. Our novel contributions include a normal representation that enables the network to model complex light transport through local computation, a rendering layer that models refractions and reflections, a cost volume specifically designed for normal refinement of transparent shapes and a feature mapping based on predicted normals for 3D point cloud reconstruction. We render a synthetic dataset to encourage the model to learn refractive light transport across different views. Our experiments show successful recovery of high-quality 3D geometry for complex transparent shapes using as few as 5-12 natural images.
Full paper: [PDF]
In this paper, we address a novel and challenging task of video inference, which aims to infer video sequences from given non-consecutive video frames. Taking such frames as the anchor inputs, our focus is to recover possible video sequence outputs based on the observed anchor frames at the associated time. With the proposed Stochastic and Recurrent Conditional GAN (SR-cGAN), we are able to preserve visual content across video frames with additional ability to handle possible temporal ambiguity. In the experiments, we show that our SR-cGAN not only produces preferable video inference results, it can also be applied to relevant tasks of video generation, video interpolation, video inpainting, and video prediction.
We present a novel and unified deep learning framework which is capable of learning domain-invariant representation from data across multiple domains. Realized by adversarial training with additional ability to exploit domain-specific information, the proposed network is able to perform continuous cross-domain image translation and manipulation, and produces desirable output images accordingly. In addition, the resulting feature representation exhibits superior performance of unsupervised domain adaptation, which also verifies the effectiveness of the proposed model in learning disentangled features for describing cross-domain data.
While representation learning aims to derive interpretable features for describing visual data, representation disentanglement further results in such features so that particular image attributes can be identified and manipulated. However, one cannot easily address this task without observing ground truth annotation for the training data. To address this problem, we propose a novel deep learning model of Cross-Domain Representation Disentangler (CDRD). By observing fully annotated source-domain data and unlabeled target-domain data of interest, our model bridges the information across data domains and transfers the attribute information accordingly. Thus, cross-domain joint feature disentanglement and adaptation can be jointly performed. In the experiments, we provide qualitative results to verify our disentanglement capability. Moreover, we further confirm that our model can be applied for solving classification tasks of unsupervised domain adaptation, and performs favorably against state-of-the-art image disentanglement and translation methods.
Full paper: [Arxiv] / Code: To be updated soon.
Person re-identification (Re-ID) aims at recognizing the same person from images taken across different cameras. To address this task, one typically requires a large amount labeled data for training an effective Re-ID model, which might not be practical for real-world applications. To alleviate this limitation, we choose to exploit a sufficient amount of pre-existing labeled data from a different (auxiliary) dataset. By jointly considering such an auxiliary dataset and the dataset of interest (but without label information), our proposed adaptation and re-identification network (ARN) performs unsupervised domain adaptation, which leverages information across datasets and derives domain-invariant features for Re-ID purposes. In our experiments, we verify that our network performs favorably against state-of-the-art unsupervised Re-ID approaches, and even outperforms a number of baseline Re-ID methods which require fully supervised data for training.
PyTorch implementation of Isospectralization, or how to hear shape, style, and correspondence," in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019.
Tensorflow implementation of Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder, Siggraph 2017.
Tensorflow implementation of Variational Autoencoder and Generative Adversarial Networks