— Ch. 1 · Origins And Development —
FaceNet.
~2 min read · Ch. 1 of 6
Florian Schroff, Dmitry Kalenichenko and James Philbin stood before the 2015 IEEE Conference on Computer Vision and Pattern Recognition to unveil a new system. These three researchers were affiliated with Google when they presented their work on facial recognition. The presentation marked the first public introduction of FaceNet to the world of computer vision. Prior to this event, no other system had achieved such high accuracy scores on standard datasets using unrestricted protocols. The team aimed to solve the problem of mapping face images into a specific mathematical space for comparison.
Neural Network Architecture
The NN1 model utilized a total of 140 million parameters across its convolutional layers. This architecture required approximately 1.6 billion floating-point operations per second during processing. Input batches contained about 1800 images for each training session. Each identity within these batches was represented by 40 similar images alongside several random selections from other identities. The structure included multiple convolutional blocks labeled conv1 through conv6 with varying filter sizes ranging from 3x3 to 7x7 kernels.Triplet Loss Innovation
A key innovation involved the triplet loss function which mapped face images into a 128-dimensional Euclidean space. Similarity between faces was assessed based on the square of the Euclidean distance between normalized vectors in that space. The system introduced an online triplet mining method to improve efficiency during training. This function has since become central in various one-shot learning problems beyond facial recognition. Researchers used stochastic gradient descent with standard backpropagation to optimize the cost function.