3DBODY.TECH 2018 - Paper 18.154

K. Ino et al., "Grasping Hand Pose Estimation from RGB Images Using Digital Human Model by Convolutional Neural Network", in Proc. of 3DBODY.TECH 2018 - 9th Int. Conf. and Exh. on 3D Body Scanning and Processing Technologies, Lugano, Switzerland, 16-17 Oct. 2018, pp. 154-160, https://doi.org/10.15221/18.154.

Title:

Grasping Hand Pose Estimation from RGB Images Using Digital Human Model by Convolutional Neural Network

Authors:

Kentaro INO 1, Naoto IENAGA 1, Yuta SUGIURA 1, Hideo SAITO 1, Natsuki MIYATA 2, Mitsunori TADA 2

1 Keio University, Yokohama, Japan;
2 Digital Human Research Group, Human Informatics Research Institute, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan

Abstract:

Recently, there has been an increase in research estimating hand poses using images. Due to the hand's high degree of freedom and self-occlusion, multi-view or depth images are often used. Our objective was to estimate hand poses specifically while grasping objects. When holding something, the hand moves in many directions. However, if the camera is too distant from the hand, it may move out of range. Widening the viewing angle, however, reduces the resolution beyond usable limits. One possible solution was developed by Kashiwagi - by setting the camera on an object, the hand's pose can be estimated regardless of its position. However, Kashiwagi's method cannot be used without estimating the fingertips' positions. Recently, another method using a convolutional neural network (CNN), useful for estimating complex poses, has been proposed. Unfortunately, it is difficult to collect the large number of images with ground truth needed for learning. In this research, we focused on creating a large dataset by generating hand pose images using a digital human model and motion-captured data using DhaibaWorks. We evaluated the model by calculating the distance of the estimated pose and ground truth of the test data, which was approximately 12.3 mm on average. In comparison, the average distance in related work was 18.5 mm. We also tested our method with ordinary camera images and confirmed that it can be used in the real world. Our method provides a new means of dataset generation: annotations are done automatically with motion capture technology, which reduces the time required. In future work, we will improve the architecture of the CNN and shorten the execution time for real-time processing.

Details:

Full paper: 18154ino.pdf
Proceedings: 3DBODY.TECH 2018, 16-17 Oct. 2018, Lugano, Switzerland
Pages: 154-160
DOI: 10.15221/18.154

Copyright notice

© Hometrica Consulting - Dr. Nicola D'Apuzzo, Switzerland, www.hometrica.ch.
Reproduction of the proceedings or any parts thereof (excluding short quotations for the use in the preparation of reviews and technical and scientific papers) may be made only after obtaining the specific approval of the publisher. The papers appearing in the proceedings reflect the author's opinions. Their inclusion in these publications does not necessary constitute endorsement by the editor or by the publisher. Authors retain all rights to individual papers.

Proceedings of 3DBODY.TECH International Conferences on 3D Body Scanning & Processing Technologies, © Hometrica Consulting, Switzerland