Week 1

Learning Objectives

  • Explain the convolution operation

  • Apply two different types of pooling operations

  • Identify the components used in a convolutional neural network (padding, stride, filter, ...) and their purpose

  • Build a convolutional neural network

  • Implement convolutional and pooling layers in numpy, including forward propagation

  • Implement helper functions to use when implementing a TensorFlow model

  • Create a mood classifier using the TF Keras Sequential API

  • Build a ConvNet to identify sign language digits using the TF Keras Functional API

  • Build and train a ConvNet in TensorFlow for a binary classification problem

  • Build and train a ConvNet in TensorFlow for a multiclass classification problem

  • Explain different use cases for the Sequential and Functional APIs

Video Wise Summary

  1. Computer Vision: Talked about how image classification, object detection is a supervised learning problem and how normal neural networks mapped to every pixel of an image would make terrible classifiers with millions of connections in every couple of steps.

  2. Edge Detection Example: The convolution operation and the notations associated with it were introduced to us. Also, an example of a filter/kernel capable of detecting vertical edges in an image was given to us to illustrate how convolutions can extract higher-level information from low-level image data, thus proving more effective in the world of computer vision. Convolution in DL is technically Cross-Correlation.

  3. More Edge Detection: more edge detection examples with horizontal edge detection and edges at an angle etc; and how backprop works in CNNs.

  4. Padding: The padding operation to retain the size of a matrix while convoluting the matrix is discussed. Another reason for padding being "you're throwing away a lot of the information near the edge of the image", to avoid that. Odd dimension filter is prefered "when you have an odd dimension filter, such as three by three or five by five, then it has a central position and sometimes in computer vision its nice to have a distinguisher, it's nice to have a pixel, you can call the central pixel so you can talk about the position of the filter."There are two types of padding most widely used while building models, 'valid' padding or no padding and the 'same' padding or the padding to retain the shape of the input tensor/matrix.

  5. Strided Convolutions: Strides in convolution operation were discussed.

  6. Convolution over Volume: Images normally have many channels or depth, so the filters are also 3D matrixes or arrays and the convoluted sum over them gives the output 2D matrix. # channels in input tensor = #channels in the filter. using multiple 3D kernels on the input tensor/image allows us to get multiple channels in the output tensor. # of filters = Depth of output matrix/tensor

  7. One Layer of a CNN: One convolution operation on input + multiplying weights to output convolution + adding biases to weighted convoluted output + using activation function on it gives the activation output of one step in a CNN.(See pic1 for details)

  8. Simple CNN Example: An example of a very simple and primitive cnn was given and explained with quantitative details on the sizes, channels and kernelsOne thing to take away from this is that as you go deeper in a neural network, typically you start off with larger images, 39 by 39. And then the height and width will stay the same for a while and gradually trend down as you go deeper in the neural network. CNNs predominantly use Convolution Layers, Pooling Layers and Fully Connected layers (mostly at last for classification (softmax)).

  9. Pooling Layers: I learnt about the operation of pooling, both max and average pooling (pretty elementary ones) which are used to change the shape of the tensor and to enable extraction of better higher-level data by conv layers. Only hyperparameters in pooling layers, no trainable parameters, padding usually not done. Max pooling is more often used than avg pooling (why?:- cause it was found to work better 😏 ) Size reduction formula of pooling layers is found in pic2.

  10. CNN Example: An example of a typical CNN (the LeNet-5 Architechture) was given (pic3) and explained to us and the various features of it were revised. Normally the # of channels keeps increasing in a CNN and the activation size or parameters in each layer falls at a steady and not too fast rate for good accuracy/performance. Also, one conv layer and pool may be referred to as a single layer in DL literature ie; papers.

  11. Why Convolutions?: We address the question of why do CNNs work better than ANNs with FC (Fully Connected) layers in Computer Vision. The main points are parameter sharing and sparse connections. Also learned that CNNs have a translational invariance. (But lack rotational or skew modification invariance). The Deep Learning Book by Ian Goodfellow, Yoshua Bengio, Aaron Courville offered a more comprehensive explanation. Page 329 to 335. https://www.deeplearningbook.org/contents/convnets.html

Programming Assignments

Assignment1:Convolutional Model, Step by Step. Building ConvNets from scatch using numpy. https://github.com/zestyoreo/Coursera_Courses/tree/main/DL%20Specialization/Course3%20(Convolutional%20Neural%20Networks)/W1/A1

Assignment2: Convolution Model Application. Building ConvNets to create a mood classifier and identify sign language digits, with TF Keras Sequential and Functional APIs. https://github.com/zestyoreo/Coursera_Courses/tree/main/DL%20Specialization/Course3%20(Convolutional%20Neural%20Networks)/W1/A2

Important Slides/Pictures

Resources

LeNet-5 Paper -Yann LeCun et al.
CNN W1 Quiz
CNN W1 Slides

The presentation we used @ the MLoc weekly technical meet.

CNN Presentation

Last updated

Was this helpful?