deep learning paper reading notes

creation date: 2018-01-11, latest update: 2018-01-11


Based on Awesome Deep Learning Papers plus my own addition of literature summary

Famous Machine Learning Conferences

Famous Challenges / Dataset


Depth Estimation

Depth Fusion

RGB-D data and its usage

6D pose estimation





Body pose estimation


big list of both body and hand dataset



Hand pose estimation

The most challenging part about this is not the architecture, but the lack of large, clean, public dataset.


list of more datasets here

Hand Papers

Most of the papers use Depth-only or RGB+D data to estimate hand-pose... It is probably possible to convert RGB to depth with another model, but it might be even slower.

Anomaly Detection (Images / Videos)

Anomaly Detection (Time Series)

Generative Adversarial Networks (GANs)

Style Transfers

Understanding / Generalization / Transfer

Optimization / Training Techniques

Unsupervised / Generative Models

CNN Feature Extractors

Image: Object Detection

Image: Segmentation

Image / Video / Etc

Natural Language Processing / RNNs

Speech / Other Domain

Reinforcement Learning / Robotics

Credit card fraud detection

Weather Classification

There are no agreed upon public dataset and very few DL papers dedicated to the topic.

The common dataset used is 1 sunny/cloudy dataset with 10k images. Other recent papers 2 have contructed their own dataset which are not opened to public yet. However, BDD100K dataset also has weather attribute labeled, so we should be considering using that.

There are 3 type of models proposed thus far.

so far the DL method did aggressively out-perform traditional ones.

New alternative would be to add new sensor data (temperature/humidity) and ensemble with CNN model. For that matter, how accurate would predictions from sensor data alone be?

Autonomous driving

Face Detection

Own discovery of Research Papers

Other papers still unassorted

Articles and Videos

Classic Paperspublished before 2012

HW / SW / Dataset

Book / Survey / Review

Video Lectures / Tutorials / Blogs




  1. Caffe: Convolutional architecture for fast feature embedding,Y. Jia et al.

  2. SFace: An Efficient Network for Face Detection in Large Scale Variations (Megvii Inc. Face++)
    • A new dataset called 4K-Face is also introduced to evaluate the performance of face detection with extreme large scale variations.
      • The SFace architecture shows promising results on the new 4K-Face benchmarks.
      • In addition, our method can run at 50 frames per second (fps) with an accuracy of 80% AP on the standard WIDER FACE dataset, which outperforms the state-of-art algorithms by almost one order of magnitude in speed while achieves comparative performance.