Old version video, contents may be outdated
In this work, we focus on a novel task of category-level functional hand-object manipulation synthesis covering both rigid and articulated object categories. Given an object geometry, an initial human hand pose as well as a sparse control sequence of object poses, our goal is to generate a physically reasonable hand-object manipulation sequence that performs like human beings. To address a such challenge, we first design CAnonicalized Manipulation Spaces (CAMS), a two-level space hierarchy that canonicalizes the hand poses in an object-centric and contact-centric view. Benefiting from the representation capability of CAMS, we then present a two-stage framework for synthesizing human-like manipulation animations. Our framework achieves state-of-the-art performance for both rigid and articulated categories with impressive visual effects.
Old version video, contents may be outdated
Our framework mainly consists of a CVAE-based planner module and an optimization-based synthesizer module. Given the generation condition as the input, the planner first generates a per-stage CAMS representation containing contact reference frames and sequences of finger embedding. Then the synthesizer optimizes the whole manipulation animation based on the CAMS embedding.
CAnonicalized Manipulation Spaces have a two-level canonicalization for manipulation representation. At the root level, the canonicalized contact targets (top right) describe the discrete contact information. At the leaf level, the canonicalized finger embedding (bottom right) transforms finger motion from global space into local reference frames defined on the contact targets.
A CVAE-based motion planner module takes take configuration and object shape as inputs, and generates a CAMS sample of motion corresponding to the input.
The synthesizer adopts a two-stage optimization method that first optimizes the MANO pose parameters to best fit the CAMS finger embedding and then optimizes the contact effect to improve physical plausibility.
Kettle
Input
View 2
View 3
Laptop
Input
View 2
View 3
Pliers
Input
View 2
View 3
Scissors
Input
View 2
View 3
Laptop
View 2
View 3
Pliers
Scissors
Ours vs GraspTTA vs ManipNet
View 2
View 3
Ours vs GraspTTA vs ManipNet
View 2
View 3
@article{zheng2023cams,
title={CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis},
author={Zheng, Juntian and Zheng, Qingyuan and Fang, Lixing and Liu, Yun and Yi, Li},
journal={arXiv preprint arXiv:2303.15469},
year={2023}
}
If you have any questions, please feel free to contact us:
Lixing Fang: flx20@mails.tsinghua.edu.cn