Few-Shot Learning (2/3): Siamese Networks

Shusen Wang

2 Dec 202023:40

Summary

TLDRThis lecture explores Siamese networks, a type of deep learning model used for one-shot learning. It discusses two training methods: learning pairwise similarity scores and triplet loss. The first method involves creating positive and negative sample pairs to teach the network to distinguish between same and different classes. The second method uses triplets of images (anchor, positive, and negative samples) to refine feature extraction, aiming to minimize intra-class variation and maximize inter-class separation. The lecture concludes with the application of these networks for one-shot prediction, where the model must classify new samples based on a small support set.

Takeaways

🔍 **Siamese Networks**: The lecture introduces Siamese networks, which are analogous to Siamese twins, physically connected but with separate bodies, used for learning similarities and differences between data samples.
📈 **Pairwise Similarity Training**: The first method for training Siamese networks involves learning pairwise similarity scores using positive and negative sample pairs, where positive pairs are of the same class and negative pairs are of different classes.
📚 **Data Preparation**: For training, a large dataset is required with labeled classes, from which positive and negative samples are prepared to teach the model about sameness and difference.
🐅 **Positive Samples**: Positive samples are created by randomly sampling two images from the same class, labeling them as '1' to indicate they belong to the same category.
🚫 **Negative Samples**: Negative samples are constructed by sampling one image from one class and another from a different class, labeling them as '0' to signify they are of different categories.
🧠 **Convolutional Neural Network (CNN)**: A CNN is used for feature extraction, with layers including convolutional, pooling, and a flattened layer, to transform input images into feature vectors.
🔗 **Feature Vector Comparison**: The network outputs two feature vectors from two input images, which are then compared by calculating the absolute difference, resulting in a vector z that represents their similarity.
📊 **Loss Function and Backpropagation**: The loss function measures the difference between the predicted similarity score and the actual label, using cross-entropy. Backpropagation is employed to update the model parameters to minimize this loss.
🔄 **Model Update Process**: Both the convolutional and fully connected layers of the network are updated during training, with gradients flowing from the loss function to the dense layers and then to the convolutional layers.
📊 **Triplet Loss Method**: An alternative training method involves triplet loss, where an anchor, a positive sample, and a negative sample are used to encourage the network to learn a feature space where intra-class distances are small and inter-class distances are large.
🔎 **One-Shot Prediction**: After training, the Siamese network can be used for one-shot prediction, where the model makes predictions based on a support set and a query image, identifying the class of the query by comparing it to the support set samples.