How To Train YOLOX Object Detection Model On A Custom Dataset

Learn how you can train a YOLOX model to recognize your custom data (without code) for your next computer vision project.

Marcus Neo

What is an Object Detection Model?

Object detection models are an extension to image classification models. Image classification models take an image, and tell you what is in that image. For example, with a model trained to identify cats and dogs, one can easily classify images into either cat, dogs or neither, as such:

However, what happens when there is a cat and a dog in the image? A simple workaround is to train the model on an additional class called “Cat and Dog”, as such.

Sounds like an easy fix? Definitely, but only if there is only a cat and a dog in the image. What happens when there are multiple cats and dogs in a single image? If you are content with just classifying them as “Cat and Dog”, the model might be able to do so without additional training. However, if you are keen to differentiate an image with one cat and one dog from an image with multiple cats and dogs, you could include another class called “Cats and Dogs”.

You now see where this is going. Just for a simple cat and dog dataset, you would have to create so many classes for the model to train on. Perhaps a simple image classification model may not be what you are looking for.

An elegant solution for this is to utilise an object detection model. While classification models answer the “What” question, object detection models take it a step further and attempt the “Where” question. By using an object detection model on the same set of images, you could expect this result:

How Do Classification-Based Object Detection Models Work?

Classification based object detection models are created by repurposing classification models to perform detection. Essentially, this means that these models first extract smaller sub-images from the main image. These are called region proposals.

These models then perform classification on the sub-images. The process looks something like this:

The main caveat of classification based models is that the convolutional neural networks that perform the classification step need to be executed for each of the regions proposed within the image. For demonstration purposes, only three regions were showcased. In reality, however, there could be over thousands of such region that are proposed for each image. This causes the computations to be extremely expensive.

What is YOLOX?

YOLOX is a regression based model that performs differently from its standard classification based peers. Being an aptly named abbreviation for “You Only Look Once”, the YOLOX model does not use regions of interest (ROIs), but instead just looks at the whole image once in order to perform detection.

YOLOX distinguishes itself from other models in the YOLO family, because it utilises a decoupled head and an anchor-free approach, compared to the coupled head, anchor-based pipeline subscribed by other YOLO models.

In the paper YOLOX: Exceeding YOLO Series in 2021 by Ge et. al., the YOLOX team emphasises that a major disadvantage of traditional YOLO models is that the models use a coupled head that harms their performance (classification based models mainly use decoupled heads). By using a decoupled head, the YOLOX team portrayed that the model can reach an optimal average precision (AP) in a smaller amount of training epochs.

[Source]: Ge, Zheng, et al. "Yolox: Exceeding yolo series in 2021." arXiv preprint arXiv:2107.08430 (2021).
[Source]: Ge, Zheng, et al. "Yolox: Exceeding yolo series in 2021." arXiv preprint arXiv:2107.08430 (2021).

This anchor-free mechanism enables the YOLOX model to yield the following advantages:

  1. The predictions are less domain specific and more generalised.
  2. The complexity of the detection head is reduced.
  3. The number of predictions for each image is reduced.
  4. The number of design parameters which need heuristic tuning is reduced.

All these make the training and decoding phases of the YOLOX model much simpler compared to other YOLO models. As a result, the training time for the YOLOX model is considerably faster than its counterparts in the YOLO family of models.

How Does the YOLOX Model Work?

The YOLOX model first takes an image as an input, then divides it into a grid of NxN grids. Classification and confidence are performed with a single regression step. The model then combines both of the results to give a final output.

To learn more about the YOLOX architecture and its performance, check out the original research paper done by the Megvii Technology team here.

Training YOLOX on Custom Data Using Nexus

Datature’s Nexus platform provides an easy and straightforward way to train a YOLOX model for your own custom data. This is made especially simple considering that there is no code required to implement the model training process!

The steps to training the YOLOX model are categorised into:

  1. Create your project
  2. Upload your images
  3. Label your images
  4. Define your training workflow
  5. Monitor your training progress

If you haven’t already done so, simply sign up on Nexus to begin training your model. It’s free!

1. Create your project

To create your project, click on the Create Project button located on the main page of Nexus. Decide on your project name and you are all set.

Click on the Create Project button located on the main page of Nexus
Decide on your project name and the type of detection that you're looking to train your model on

2. Upload your images

Upon the creation of your project, you will be redirected to your project homepage. Selecting Assets located at the left sidebar will bring you to the asset upload page. Here, you can drag and drop your images to upload them to the Nexus platform.

Drag and drop your images to upload them to the Nexus platform

3. Label your images

Once your images have been uploaded, you may begin annotating them. Machine learning models have to be trained on annotated images before being able to make accurate predictions for future unseen images.

To access the annotator, click on Annotator located at the left sidebar. Within the annotator, you will be prompted to create a label for. Here, we create the two labels “Cat” and “Dog”.

Annotations are made simple by the Nexus platform. All you would need to do is select the Rectangle annotation type at the right toolbar, then hold the mouse button and drag over your selection. Upon releasing the mouse button, a rectangle will appear and that would mean that you have successfully made your first annotation.

To change between classes, click on your desired class name located at the bottom right of the screen before continuing your annotations.

4. Define your training workflow

Once you have completely annotated your entire dataset (you would need at least 200 of these images in your dataset for your model to train well), you can then move on to the next stage, which is to define your training workflow. The Nexus platform simplifies this process by handling the complicated aspects of the training workflow such as hyperparameter selection. All that needs to be done is to select how data flows from your dataset to the model.

Back to the home page, select Workflow at the left toolbar to navigate to the workflows page. Create a new workflow by selecting the green Create Workflow button.

Next, right click on the canvas to select the Dataset block.

It is essential to split our dataset into the respective training and validation set. By altering the train-test ratio in the Dataset block, you can then decide the proportion of your images that will be used for training or validation. Here, I will choose the default 0.3, which means that 70% of my dataset will be used for training and 30% for validation.

Next, to increase the amount of data that your model could train on, implementing an Augmentation block is recommended. To do so, right click the canvas once again and select Augmentations. Link the output of the Dataset block to the Augmentations block by holding the mouse button at the bottom of the Dataset block and dragging it to the top of the Augmentations block. Within the Augmentations block, you may select several augmentations to be implemented on your dataset. Note that the number of augmentations should be moderate. Too many augmentations could lead to bad results. You may view your augmentations by clicking on the Preview Augmentations button located on the bottom right of the screen.

Finally we select our YOLOX model. Nexus provides three types of YOLOX models, namely YOLOX-S, YOLOX-M, YOLOX-L. For each of these models, there are two supported image sizes, 320x320 and 640x640. To select the model, right click on the canvas once more and hover over Models. You will then see a list of model architectures. Here, hover over the YOLO Models option, and select your desired YOLOX model. Here, we will be using the YOLOX-Medium model that accepts an image size of 640x640.

Within the model block, you can select the batch size and the number of steps for your model to train on. Here we select the batch size of 8, with 10,000 training steps. Assuming we have a dataset of 1,000 images and a validation split of 0.3, there will be 7,000 images used for training. With a batch size of 8 and 10,000 steps, this will correspond to around 11.4 epochs.

Finally, once you connect the Augmentations block to the Model block, click on Run Training at the bottom right of the screen, then select your hardware acceleration and checkpointing strategies. Once that is done, select Start Training and you are on your way!

5. Monitor your training progress

Once the training has begun, Nexus will take care of the training progress, and update you about the training progress in real time. You can track the losses as well as the evaluation metrics while the model is in mid-training. If you have activated the Evaluation Preview option before the training, you can also watch how the model improves after each checkpoint!

6. Exporting your Model

Once your training has been completed, you can then export your model to be used in your respective use case. Back to the Project Homepage, select Artifacts at the left toolbar to head to the artefacts page. You can then select the artefact of your choice to export for use!

7a. Using your Model for prediction by using scripts

Datature has provided several sample scripts for you to make predictions using your newly exported models. For the YOLOX model that you have just exported, you can start making predictions by running this prediction script.

7b. Using your Model for prediction without scripts

Should you be unfamiliar with writing code, and would like to make predictions without any hassle, Datature has created a program called Portal. By downloading and using Portal, you will then be able to make predictions without any code involved!

Additional Deployment Capabilities That You Could Explore

Now that you have your very own YOLOX Model ready for deployment, why not consider hosting your deployment model on Datature as well? In just a few clicks, your model will be deployed on our cloud and be ready for prediction! Sounds appealing, head over to our API Deployment page to learn more about our cool feature!

Our Developer’s Roadmap

At Datature we are committed to facilitate a seamless end-to-end machine learning experience for first-timers and advanced developers alike. To that end, we plan to constantly release state-of-the-art models that perform tasks such as object detection, instance segmentation and semantic segmentation. Soon, we will be releasing two more semantic segmentation models, namely the UNet and FCN models, so do stay tuned!

Want to Get Started?

If you have questions, feel free to join our Community Slack to post your questions or contact us about how YOLOX fits in with your usage.

For more detailed information about the Model Training pipeline and other state-of-the-art model offerings in Nexus, or answers to any common questions you might have, read more about Nexus on our Developer Portal.

Build models with the best tools.

develop ml models in minutes with datature