# H2O
For more detailed information, please visit our official website (https://taeinkwon.com/projects/h2o/).
# How to merge tar.part files?
```cat subject1_v1_1.tar.gz.parta* >subject1_v1_1.tar.gz```
# Visualization code (H2OPlayer)
Please check the visualization code in the following link: https://github.com/taeinkwon/h2oplayer.
# Dataset Structure
.
├── h1
│ ├── 0
│ │ │── cam0
│ │ │ ├── rgb
│ │ │ ├── depth
│ │ │ ├── cam_pose
│ │ │ ├── hand_pose
│ │ │ ├── hand_pose_MANO
│ │ │ ├── obj_pose
│ │ │ ├── obj_pose_RT
│ │ │ ├── action_label (only in cam4)
│ │ │ ├── rgb256 (only in cam4)
│ │ │ ├── verb_label
│ │ │ └── cam_intrinsics.txt
│ │ ├── cam1
│ │ ├── cam2
│ │ ├── cam3
│ │ └── cam4
│ ├── 1
│ ├── 2
│ ├── 3
│ └── ...
├── h2
├── k1
├── k2
└── ...
cam0 ~ cam3 are fixed cameras. cam4 is an head-mounted camera (egocentric view).
train_sequences = ['subject1/h1', 'subject1/h2', 'subject1/k1', 'subject1/k2', 'subject1/o1', 'subject1/o2', 'subject2/h1', 'subject2/h2', 'subject2/k1',
'subject2/k2', 'subject2/o1', 'subject2/o2', 'subject3/h1', 'subject3/h2', 'subject3/k1'] (subject 1,2,3)
val_sequences = ['subject3/k2', 'subject3/o1', 'subject3/o2'] (subject 3)
test_sequences = ['subject4/h1', 'subject4/h2', 'subject4/k1', 'subject4/k2', 'subject4/o1', 'subject4/o2'] (subject4)
### rgb
1280 * 720 resolution rgb images
### rgb256
455 * 256 resolution resized rgb images
### depth
1280 * 720 resolution depth images
### cam_pose
cam_to_world rotion matrix
16 numbers, 4x4 camera matrix
### hand_pose
cam_to_hand
1 (whether annotate or not, 0: not annotate 1: annotate) + 21 * 3 (x, y, z in order) + 1 + 21 * 3 (right hand)
First 64 numbers belong to the left hand. Next 64 numbers belong to the right hand
### hand_pose_MANO
1 (whether annotate or not, 0: not annotate 1: annotate) + 3 translation values + 48 pose values + 10 shape values + 1 + 3 + 48 + 10 (right hand)
First 59 numbers belong to the left hand. Next 59 numbers belong to the right hand
### obj_pose
cam_to_obj
1 (object class) + 21 * 3 (x, y, z in order)
21 numbers : 1 center, 8 corners, 12 mid edge point.
0 background (no object)
1 book
2 espresso
3 lotion
4 spray
5 milk
6 cocoa
7 chips
8 capuccino
### object_pose_RT
1 (object class) + 16 numbers, 4x4 camera matrix
### verb_label
0 background (no verb)
1 grab
2 place
3 open
4 close
5 pour
6 take out
7 put in
8 apply
9 read
10 spray
11 squeeze
### action_label
Combination of a noun (object class) and a verb (verb label)
0 background
1 grab book
2 grab espresso
3 grab lotion
4 grab spray
5 grab milk
6 grab cocoa
7 grab chips
8 grab cappuccino
9 place book
10 place espresso
11 place lotion
12 place spray
13 place milk
14 place cocoa
15 place chips
16 place cappuccino
17 open lotion
18 open milk
19 open chips
20 close lotion
21 close milk
22 close chips
23 pour milk
24 take out espresso
25 take out cocoa
26 take out chips
27 take out cappuccino
28 put in espresso
29 put in cocoa
30 put in cappuccino
31 apply lotion
32 apply spray
33 read book
34 read espresso
35 spray spray
36 squeeze lotion
## For Actions
[Train set file](action_labels/action_train.txt)
[Validation set file](action_labels/action_val.txt)
[Test set file](action_labels/action_test.txt)
## For Poses
[Train set file](pose_lists/pose_test.txt)
[Validation set file](pose_lists/pose_train.txt)
[Test set file](pose_lists/pose_test.txt)
### cam_instrinsics.txt
six numbers : fx, fy, cx, cy, width, height
# Citations
If you find any usefulness in this dataset, please consider citing:
```
@InProceedings{Kwon_2021_ICCV,
author = {Kwon, Taein and Tekin, Bugra and St\"uhmer, Jan and Bogo, Federica and Pollefeys, Marc},
title = {H2O: Two Hands Manipulating Objects for First Person Interaction Recognition},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {10138-10148}
}
```