The 20BN-JESTER dataset is a large collection of densely-labeled video clips that show humans performing pre-definded hand gestures in front of a laptop camera or webcam. The dataset was created by a large number of crowd workers. It allows for training robust machine learning models to recognize human hand gestures. It is available free of charge for academic research. Commercial licenses are available upon request.
A paper with supplementary material will be published soon on arXiv.
The video data is provided as one large TGZ archive, split into parts of 1 GB max. The total download size is 20.8 GB. The archive contains directories numbered from 1 to 148092. Each directory corresponds to one video and contains JPG images with height 100px and variable width. The JPG images were extracted from the orginal videos at 12 frames per seconds. The filenames of the JPGs start at 00001.jpg. The number of JPGs varies as the length of the original videos varies.
This dataset is made availble under the Creative Commons Attribution 4.0 International license CC BY-NC-ND 4.0. It can be used for academic research free of charge. If you seek to use the data for commercial purposes please contact us.
Please register or log in to download the dataset.
|Total number of videos||
|Test Set (w/o labels)||
5,434Pushing Hand Away
5,379Pulling Hand In
5,345Sliding Two Fingers Left
5,244Sliding Two Fingers Right
5,410Sliding Two Fingers Down
5,262Sliding Two Fingers Up
5,315Pulling Two Fingers In
5,358Pushing Two Fingers Away
5,165Rolling Hand Forward
5,031Rolling Hand Backward
4,181Turning Hand Counterclockwise
3,980Turning Hand Clockwise
5,307Zooming In With Full Hand
5,330Zooming Out With Full Hand
5,355Zooming In With Two Fingers
5,379Zooming Out With Two Fingers
12,416Doing other things
If you have been successful in creating a model based on the training set and it performs well on the validation set, we encourage you to run your model on the test set (which is published without any class labels, as you might have noticed). Please prepare a .csv file with the video's id in the first column and your predicted class label (as a string matching the wording used in the training and validation sets). As a separator, please use a semicolon. You can then upload your .csv file here (user login required) to be ranked in the leaderboard and to benchmark your approach against that of other machine learners. We are looking forward to your submission.
TRN (CVPR'18 submission)
Ford's Gesture Recognition System
2D CNN （RGB）
Twenty Billion Neuron's Jester System
3D convolutional neural network
Just random guessing…