How to Improve Data Annotation Quality with Consensus Algorithm

We are excited to introduce the Consensus Tool to encourage annotation collaboration and improve quality controls through quality assessment of labellers and their annotations.

Wei Loon Cheng
Editor

What is Consensus Algorithm?

Annotation consensus is a method of evaluating and reconciling multiple labels in a dataset. It aims to resolve these conflicts by determining a single label that best represents the true label for a given data point. Generally, this can be done by assigning scores to each label, and then selecting the label with the highest score. The scores can be determined by metrics that represent the level of agreement between multiple annotators labelling the same data point.

Why is Consensus Important?

Data annotation is an essential process in machine learning, which involves labelling datasets with accurate and meaningful information to enable models to learn from the data. This task can be quite tedious if there is a large quantity of data to be labelled. Hence, it is often carried out by a team of annotators to reduce annotation time.

However, there may be inconsistencies and errors in labelling. These can be caused by a number of factors, such as different interpretations of the data, different levels of expertise, and different levels of attention. Sub-optimal labelling can lead to poor model performance, which can be costly to fix, since the model may have to be retrained. To overcome these challenges, it is a common solution to have teams of annotators do multiple rounds of annotation on assets, in order to reduce variability in annotation results. However, given the multiple annotations on a single asset, data annotation consensus algorithms are needed to automatically evaluate and resolve labelling discrepancies, resulting in greater annotation accuracy and consistency.

Furthermore, consensus can be used as a performance metric for teams to ensure that labellers uphold the highest annotation quality standards for each project. Ultimately, the metrics produced from consensus algorithms can be utilized in many ways to improve the efficiency and quality of your data labelling pipeline.

What are Some Consensus Methods?

Majority Voting

This method involves correlating annotations by multiple labellers into groups based on their similarity and then selecting the most representative annotation for each group based on a majority vote. The similarity between annotations can be measured using a number of metrics, such as the Intersection over Union (IoU) algorithm, which measures the overlap between two sets of labels. IoU is typically used to evaluate object detection models during validation and testing stages. It calculates the ratio of the intersection of the ground truth bounding box and the predicted bounding box to the union of both bounding boxes. One benefit of using IoU is that it can be generalised to other types of annotations, such as polygon masks.

Source: https://www.baeldung.com/cs/object-detection-intersection-vs-union

In the case of data annotation, IoU can measure the degree of overlap between annotations from different annotators. The higher the IoU score, the greater the overlap between the two sets of annotations, which suggests that both annotators agree on the label assigned to a particular object. For multiple labels with high IoU scores, a winning label with the highest score can be chosen. However, if the IoU score is low, it indicates that the two annotators have differing opinions on the label for a particular object. The reviewer can then manually review the two labels to determine the correct one. In the case where both labels are deemed incorrect, the reviewer can bring in a third annotator to resolve the conflict.

Diagram displaying how Intersection over Union is used to evaluate multiple sets of labels.
Diagram displaying how Intersection over Union is used to evaluate multiple sets of labels.

In the above example, two different labellers label the same object but the annotations differ. We determine the quality of the consensus to be the IoU calculated, which is lower than Nexus' default threshold for IoU of 0.8. As such, using the consensus algorithm, these labels would be automatically rejected without any effort from the manual reviewer.

Model Prediction Benchmarking

This method involves using a trained model to generate benchmark labels for the dataset. Each of the annotations are compared solely with the benchmark labels to calculate a similarity score. The predicted labels with the highest agreement with the benchmark labels are selected as the consensus labels. This method is useful if there is already a trained model and a new dataset needs to be labelled.

How Does Consensus Work On Nexus?

Consensus adds an extra layer of quality assessment to Annotation Automation. It is implemented as a majority voting tool that can be seamlessly incorporated into your annotation workflow right from the start without the need for a trained model. To include consensus in your annotation workflow, simply add a Consensus block after a To Annotate block in your workflow. The Consensus block acts as a preliminary review stage for reconciling multiple sets of annotations, so it must come before any To Review blocks.

Example of an annotation workflow incorporating consensus.
Example of an annotation workflow incorporating consensus.

An asset will only be sent for consensus review if there are at least two labellers who have annotated the same asset. If there is only one set of annotations, the asset will bypass the Consensus block and move on to the subsequent stage in your workflow.

You can specify certain users to take on the role of reviewers in the Consensus block. These users will be responsible for reviewing duplicate annotations by multiple labellers and selecting which annotations to accept or reject. All users will be selected by default.

If you are tasked to manage the Consensus block reviews, simply activate Review Mode in the Annotator page. On the right sidebar, you will be able to see a list of all labellers who have annotated the selected asset, together with a few actions. By default, all labellers’ annotations are visible and overlaid on the asset. The default action that is shown below the image when all labellers are selected is to reject all annotations.

Consensus interface when Review Mode is activated. By default, all sets of annotations are visible.
Consensus interface when Review Mode is activated. By default, all sets of annotations are visible.

If all sets of annotations for an asset are rejected and the next stage in the workflow via the Reject route of the Consensus block is to assign another labeller to re-annotate the asset, the labeller will be able to view all past annotations from every labeller that had annotated that asset. The new labeller can choose to edit the old annotations, or to delete them and create new ones.

Annotator interface when all sets of annotations of an asset are rejected and sent for re-annotation.
Annotator interface when all sets of annotations of an asset are rejected and sent for re-annotation.

If a reviewer wishes to view a particular labeller’s annotations in Review Mode, simply click on their name. They can choose to accept that labeller’s annotations.

Consensus interface with a particular labeller's annotations selected.

Only one set of annotations for each asset can be accepted to advance the asset to the next stage in the annotation workflow via the Accept route in the Consensus block. The remaining sets of annotations are irreversibly discarded when this happens, so do make sure that you are accepting a satisfactory set of annotations before committing the review.

Only one set of annotations can be accepted before moving on to the next stage in the annotation workflow.
Only one set of annotations can be accepted before moving on to the next stage in the annotation workflow.

The Accept All Assets button will automatically select the set of annotations to keep that has an average consensus score of at least 80%. If there are multiple sets above this threshold, the set with the highest consensus scores will be accepted. If none of the sets meet this criterion, all of them will be rejected.

Our Developer’s Roadmap

Consensus is one of the tools that we introduced to empower teams to collaborate seamlessly and effectively using our new Annotation Workflow. We have roadmaps in place to introduce other tools that will further improve the collaborative annotation experience, such as tracking certain performance metrics of labellers like average annotation time per asset.

Want to Get Started?

If you have questions, feel free to join our Community Slack to post your questions or contact us about how active learning fits in with your usage. 

For more detailed information about the Consensus functionality, customization options, or answers to any common questions you might have, read more about the process on our Developer Portal.

Build models with the best tools.

develop ml models in minutes with datature

START A PROJECT