One of the most common problems we can solve using machine learning is classifying samples into different categories. However, using supervised classification models becomes impossible if we don't have enough instances of a particular class to train a model.
Imagine you want to validate the quality of circuit boards using a picture. You have access to thousands of working circuit board images, but it's improbable that you'll have images illustrating every possible defect. How can you frame this problem and train a model to solve it?
A great way to introduce autoencoders is with an anomaly detection example. A way to approach the circuit board problem is to find a way to detect outliers that differ significantly from the norm. That's the goal of anomaly detection problems, which have one particular characteristic: these anomalous observations are rare and, therefore, it's hard to collect enough data.
We need to build a model that can leverage all the data representing typical instances and use it to detect any outliers that represent anomalies.
Think of autoencoders as data compression algorithms that use neural networks. The initial part of the network compresses the original input into an intermediate representation, and the second part reverses the process to reconstruct the original information. In other words, an autoencoder tries to learn an approximation to the identity function.
A critical characteristic of an autoencoder is the bottleneck. The bottleneck is the section that stores the compressed representation of the data. This portion of the network sits between the encoder and decoder and restricts the number of hidden units to prevent the autoencoder from memorizing the input values.
If there's any structure in the original data, autoencoders will learn it. The encoding process summarizes the data, so the decoding process can reproduce it back, keeping as much fidelity as possible. Any unique characteristic not representative of the whole dataset will never make it into the bottleneck.
Let's get back to our circuit board problem, where we want to build a model to flag any picture that doesn't look correct, but we don't have any sample showing problematic boards. We can use an autoencoder to solve this problem.
An autoencoder will learn the essential characteristics of a working circuit board. It will summarize the dataset of pictures into those features that better represent a board. Imagine each image is 100x100 pixels, and we include a bottleneck with 5,000 neurons. The autoencoder must represent 10,000 pixels in half the space, so it will have to throw away any information that's not critical to reproduce the essence of a working circuit board.
In summary, the autoencoder will learn to represent every characteristic that makes a working circuit board and discard any information that's not common in the training dataset.
Anytime we show the network a picture of a circuit board, the autoencoder will reproduce it with a low error. But anytime we use a circuit board that doesn't look like every other board, the reproduction error will be high because the autoencoder can't reproduce any of the unusual characteristics of a circuit board that looks different.
Looking at this reproduction error is the key that helps us use autoencoders for anomaly detection tasks.
Autoencoders are helpful beyond anomaly detection applications. For example, we can teach autoencoders to remove noise from pictures or audio by showing them samples with noise and expecting them to reproduce their corresponding clean version.
In those cases, we teach the encoder to develop an intermediate representation that helps the decoder produce clean samples. After we train this autoencoder, we can remove the noise from any image or audio similar to the input dataset.
Autoencoders are also helpful for information retrieval, imputation, feature extraction, and dimensionality reduction problems.
Autoencoders are neural networks designed in a way they can learn any existing structure in a dataset. They create a compact representation of the data we can leverage later in different applications.
A critical characteristic of the design of an autoencoder is its bottleneck, which forces the network to compress the original data into a meaningful representation.
Autoencoders are a great approach to tackling anomaly detection and noise reduction problems.