Leveraging Active Learning to Optimize Your Computer Vision Pipeline

Our Active Learning tool informs users to annotate images that best help to train the model based on metrics calculated from model predictions on images.

Leonard So

What is Active Learning?

Machine learning model training and image data annotation are the two most time-consuming processes in the machine learning pipeline. Oftentimes, data provided in training can be repetitive, unclear, or outright incorrect due to human error, making the model training less effective and invalidating all the hard work put into annotations. 

To offset these issues, one can utilize methods such as active learning to assist in image selection or image filtering such that users can annotate and use smaller datasets that are less time-consuming and more meaningful for the model to learn from. This increases model training speed, provides meaningful insight into the diversity of your dataset, and what your model might be struggling with.

Types of Active Learning

Categories of active learning generally come under a few categories. The first is uncertainty-based methods, which are metric-based methods for selecting samples for active learning using various perspectives like entropy, margins, or posterior probabilities. The second is distribution-based, which attempts to estimate the distribution of unlabelled samples relative to the labelled samples by selecting samples that are more representative of missing examples. There are also different ways how active learning is applied to data. The main types are pool-based sampling, stream-based selective sampling, and membership query synthesis. As users on our platform have to provide their own data first before using our platform, we will be focusing on methods that can be applied for pool-based sampling.

Additionally, there are different underlying techniques that can be used. These can generally be separated into machine learning techniques and purely statistical evaluations. Machine learning techniques typically involve additional modules or additional model training on top of the baseline computer vision model in order to learn the uncertainty or metric as the output. Statistics are a much easier alternative over its machine learning counterpart. However, they can suffer from a lack of depth and insight into the inner workings of the model, and are purely focused on prediction results.

Below, we outline several different state-of-the-art techniques that you can try in your own training as well as describe our preferred active learning method and why we’ve chosen to support it on our platform.

Area Under the Margin (AUM)

AUM seeks to solve the issue of incorrectly labelled data using metrics developed from machine learning techniques. AUM can be used to take out anomalous data from your dataset is a good way to improve training efficiency. It can also be viewed as an uncertainty metric to determine images that the model finds more difficult to learn from, included with a threshold that determines sufficiently uncertain image annotations that are actually incorrect. 

The general approach of Area Under the Margin is to calculate differences in logits between similar examples. The average differences in logits are then used as the metric for model uncertainty about image classification. AUM has designed a method to calculate threshold scores to determine the difference between images that are difficult to classify and ones that are outright labelled incorrectly. The differences in logits are calculated and averaged over the course of the initial set of epochs in model training.

AUM for each input and prediction is calculated based on the average margin. The margin is the difference in logits between the predicted class and the next most probable logit.

While AUM is a simple and effective method, it has several requirements and limitations that may not suit every use case. Firstly, the calculation of AUM occurs over the course of model training so it will not be able to evaluate upon previously saved models. It also requires indexed datasets as inputs, so it can remember which results correspond to the correct examples. Most importantly, AUM is designed to tackle object classification problems, and currently, there is no demonstrable way of adapting AUM to object detection problems.

Learning Loss for Active Learning

Learning loss for active learning instead has a simple goal backed by machine learning. It seeks to predict the loss value associated with the computer vision task at hand given any image. It attaches a simple neural network module on top of the computer vision model during training, where it can learn to match the loss values based on image features used by the model. While the metric itself is not likely to be precise, it can be an easy qualitative way to detect images that are completely different from the set of images that the model is performant on.

Shows the general loss-predicting module architecture. [Source]

This simplistic solution has merits, most particularly that it is task-agnostic, meaning that it can be used for active learning in any computer vision task. However, it fails to integrate task-specific information and therefore suffers in accuracy and utility as tasks grow increasingly complicated. Additionally, because it is a machine learning solution, it does require training alongside the training of your computer vision model.

Multiple Instance Active Learning for Object Detection (MI-AOD)

MI-AOD seeks to address the issues presented by simpler methods like AUM and learning loss by providing a more robust, task-specific machine learning solution to determining uncertainty for object detection. The MI-AOD approach has the goal of selecting informative images from an unlabeled set by learning and re-weighting instance uncertainty with discrepancy learning and multiple instance learning such that it can give one aggregated measurement of the per-image uncertainty. To learn the instance-level uncertainty, MI-AOD has an instance uncertainty learning module that leverages two adversarial instance classifiers plugged on top of the detection network. The network is trained upon the disparity in predictions between the classifiers. MI-AOD also uses an additional module, which is a MIL module, which performs instance uncertainty re-weighting on instances based on image level classification loss in order to determine significant instances in the image. Using multi-staged training, MI-AOD can learn instance uncertainty as well as reduce distribution bias between labelled and unlabelled instances by fine-tuning the network.

Multi-staged training to learn multiple types of uncertainty. [Source]

MI-AOD is among the state-of-the-art for measuring uncertainty for object detection, because the framework is precisely designed for determining instance uncertainty latent in the model. However, the downside is that there is a more complex and larger network structure, and more extensive training. Therefore, unless very precise uncertainty is required, the trade-off of computational complexity and time required may not be worth it.

Entropy-based Active Learning for Object Detection with Progressive Diversity Constraint

This is an empirical statistical approach to measure uncertainty. In this case, it utilizes entropy as the main basis of uncertainty for individual instances. It then uses different techniques to filter out entropy values in order to refine the sum of entropies so that it is more in line with goals such as inter-class instance diversity, or intra-image diversity. There are three types of entropy. First, there is basic entropy, which calculates the entropy based on the instance’s maximum confidence values. Then there is Entropy-based Non-Maximum Suppression (ENMS), which takes the instances with the most entropy and removes other instances in the same image with sufficiently similar features. The last one to improve inter-image diversity is to create a Diverse Prototype for Inter-Image Diversity. The idea here is to create entropy weighted feature prototypes for each class per image. These prototypes can be representative of class features in each image, and it is more computationally efficient to compare these prototypes to determine if the prototypes are unique enough to consider. The other aspect is creating weighted budgets based on the class imbalance, and selecting images which have more minority class instances in them.

Showing the three different types of entropy calculations. [Source]

The benefit of this method is that it can be applied in any classification task, so long as the model outputs class confidence, and it is by far the quickest way to gain access to uncertainty statistics. It is also guaranteed to improve your model performance to increase confidence levels. The drawback is that there is similarly a lack of depth into the inner workings of the model.

In the case of our platform, we have opted to choose this method because it provides a simple and easy answer for testing and evaluating pre-existing models trained on our platform, the results are easily explainable, and are sufficiently effective for the purposes of picking images to improve the robustness of the model.

How Does Active Learning Work on Nexus?

Active learning on Nexus will hitch onto our model deployment API service as an extra output function on prediction results. Therefore, you will need to have a pre-existing deployment with the model you want to evaluate on to allow it to work. Using the API deployment for predictions, the active learning metrics can be added as an egress routine to evaluate the model’s performance on the asset. You can then process your own logic such as a threshold to determine whether the asset should be uploaded onto the Nexus platform for annotation.

This system will enable you to annotate batches of data at a time, and train while annotating data. It will also allow you to optimally reduce the number of images that you need to add at each stage of retraining. You can learn how to automate the active learning process with Datature’s Python SDK here with this extensive tutorial.

Our Developer’s Roadmap

The introduction of our active learning metric is just the beginning of our commitment to facilitate the machine learning model life cycle entirely on our platform. To that end, we plan to add active learning as an explicit component in the workflow, to tie in with dataset versioning and model versioning as well. Further down the roadmap, we will also be introducing model deployment performance tracking so that you can receive automated signals to watch for data drift or other declines in performance.

Want to Get Started?

If you have questions, feel free to join our Community Slack to post your questions or contact us about how active learning fits in with your usage. 

For more detailed information about the Active Learning functionality, customization options, or answers to any common questions you might have, read more about the process on our Developer Portal.

Build models with the best tools.

develop ml models in minutes with datature