Week 2

Learning Objectives

  • Implement the basic building blocks of ResNets in a deep neural network using Keras

  • Train a state-of-the-art neural network for image classification

  • Implement a skip connection in your network

  • Create a dataset from a directory

  • Preprocess and augment data using the Keras Sequential API

  • Adapt a pre-trained model to new data and train a classifier using the Functional API and MobileNet

  • Fine-tine a classifier's final layers to improve accuracy

Video Wise Summary

  1. Why look at case studies?: CNNs have come a long way since the inception of the concept in 1998 from Yann Lecunn and his team. Studying the various highly successful models which were then developed helps us understand the architectures and the purpose of each technique incorporated.

  2. Classic Networks: the LeNet-5,AlexNet and VGG-16 architectures were discussed . First, the LeNet-5 network architecture is discussed and the pattern of increasing depth and decreasing image size is observed. It is a simple CNN with 2 sets of conv and pooling layers followed by a couple of fully connected layers, then a softmax layer for classification. The AlexNet is a bigger network with more layers and the accuracy is much better. Local Response Normalization (LRN) was discussed there, but isnt very helpful to improve accuracy. The VGG-16 architecture has 16 layers with weights and has 138 million parameters to train.

  3. ResNets: The skip connections used in ResNets are discussed and how they are able to learn the identity mapping easier than multilayer nets and the fact that the problem of accuracy degradation while using deeper neural networks is solved, in fact, deeper layers give better accuracy and results, which aligns with intuition and theory.

  4. Why ResNets Work?: The reason is given above

  5. Networks in Networks and 1x1 Convolutions: depth of the output image = no of 1x1 filters. See image for how the operation is done. Shrinking, expansion and same size convs (wrt to the depth only) can be done.

  6. Inception Network Motivation: Instead of using only one type/ size of filters every step, the inception network uses multiple types of filters in each step and concatenating them all together into the output activation, leading to better feature extraction (only intuition) and thus better accuracy of the model. 1x1 convolutions are used in c=inception networks to reduce computational cost drastically and thus save on training time. A bottleneck arch is used when 1x1 convolutions are used, it refers to the shrinking of the depth of the data/image and then expanding it again with 1x1 other sized filters.

  7. Inception Network: Inception network or GoogleNet's entire architecture is discussed and the regularizing effect of the side branches in the network which try to classify objects based on the activations from layers in the middle of the layer is also mentioned.

  8. MobileNet: designed for mobile phones and other devices with limited computational power, with the objective of fewer computations trading off a little bit of accuracy in return. It uses a modified type of convolution called the depthwise separable convolution followed by the pointwise convolution which reduces the number of computations performed quite drastically.

Programming Assignments

Important Slides/Pictures

ResNet
From Paper
Inception Network Architecture

Resources

LeNet-5 Paper
ResNet Paper
AlexNet Paper
VGG-16 Paper
GoogleNet (Inception Paper)
CNN W2 Slides

Last updated

Was this helpful?