Talk to SalesSign Up

How To Use API Deployment For Trained Model Inference

Leonard So
September 27, 2022

What is API Deployment?

Once you have trained and evaluated a model that you are happy with on Nexus, there are a variety of ways in which you can use your model for prediction.

Datature has introduced a new way for users to access your trained machined learning model: REST API deployment! An API is an interface for model deployment that facilitates  inference accessible online. Nexus can now host an API deployment of your machine learning model with low code requirements so you and your collaborators can access information and receive inferences just by providing information in an API request.

How does a REST API Function?

A REST API is a secure interface allowing two computer systems to communicate over the internet. An API following the REST architectural design has many benefits. To name a few, 

  • Scalability: REST APIs can complete requests independently of other requests, so it can be easily scaled to handle changes in quantity and variety without causing communication bottlenecks.
  • Technological Independence: The programming languages used or changes made on the server or client side have no effect on the REST API’s ability to communicate with the user. This means that the user can continue to use their deployment without worrying that changes on either end will interrupt their required communication.
  • Security: REST APIs are capable of authenticating requests to restrict users from accessing information from the API at all or only being able to access certain types of requests through a variety of ways.

In our case, the trained machine learning model is hosted on dedicated GPUs. The deployed API then receives your request, which would include prediction input and authorization, for the hosted model to perform inference and return a JSON as a prediction output, ready for use in the next steps of your machine learning pipeline or specific use case.

What are the Benefits of API Deployment?

Our API deployment of trained models furthers Datature’s mission to democratize the power of computer vision through low code requirement and ease of use.

API deployment on the Nexus platform allows users to have uninterrupted access to their trained models for inference without the need to reload the model or manage it on your own. This takes the responsibility of deployment off of you so that you can focus on utilising the prediction inference in the most effective manner possible.

Additionally, a remotely hosted API allows users from around the world without specialized hardware or advanced GPUs, or otherwise limited compute power to access and fully utilise the power of computer vision models. This can range from making predictions from your laptop to even just a microcontroller. This lowers the barrier to not just training these models, but also integrating them easily into applications, research and machines. As long as the end user has an internet connection, even such small devices can make use of your advanced models in whatever contexts are required.

APIs also enable you to use code to automate the process of getting predictions, thus further streamlining and automating your computer vision pipeline. API request libraries are ubiquitous in mainstream programming languages, so automating your own requests for your machine learning pipeline is a simple and common procedure to set up. We make that even simpler by auto-generating the code so you just have to fill in your own information to get started.

How does API deployment work on Nexus?

If you have trained a machine learning model on Nexus, you will have training artifacts of your model ready to be deployed. For model deployment, you simply have to select Create Deployment on your preferred artifact. Datature provides several customization options for your API deployment so that it matches your requirements. You can improve your performance by changing the number of available instances. It can also be improved with our Multi-GPUs Support, where we provide options for vertical scaling in the type of GPU that is hosting your model, and horizontal scaling with the number of GPUs used as well. Once you’re happy with your choices, the setup will only take between 5-7 minutes and then your API will be deployed!

Making a Deployment:

Using Your API Deployment:

Once your API has been deployed, you can now make API requests. Nexus makes this easy to test and use through in a few different ways. We also facilitate API management and monitoring. All of this can be found in the Deploy page found through the sidebar menu on your project page.

If you have trained a model on Nexus, you will have training artifacts of your model ready to be deployed. To deploy your API, you simply have to select Create Deployment on your preferred artifact.

To make an API request, you should use code to make the call. As long as your preferred programming language has tools to make API requests, it is compatible with our API deployment. In line with our goal of low code requirements, Nexus auto-generates the code necessary to make API requests in all the most common programming languages (e.g., Shell, Python, JavaScript, Node, Java, Go, R, Ruby, C++). Once you have filled in the relevant information pertaining to your request, such as your project secret for authorization and your image URL for image data, you can now run the request in your chosen language!

Shell Auto-Generated Code:

Python Auto-Generated Code:

The output of your machine learning model will be a JSON file containing the corresponding information that you would receive in a model prediction, such as the annotation ID, bounding box information, confidence, tag, or polygonal mask coordinates.

We provide a real-time graph showing the number of requests being made.

Besides making calls, you can also manage your API deployment in various ways. We provide a real-time graph showing the number of requests being made. We also provide a Stats tab that shows essential information about the operation of your API, such as uptime, request rate, error rate, latency, concurrent requests, and usages. If you want to improve or alter the performance to adapt to your current needs, we provide a tab for you to reconfigure the customizable options that you initialized your API with. With this, we empower our users with the means of obtaining real-time predictions from models that they have trained - just with a few lines of code.

We also provide a Stats tab that shows essential information about the operation of your API, such as uptime, request rate, error rate, latency, concurrent requests, and usages.

How to Test Your Inference API

If you select the Test Your API button at the bottom, a drop down box will become available for you to drag an image in for prediction. The output will simulate what you will receive when using the code, and provides an easy and interactive way for you to understand how the API is functioning and a quick way to get predictions for single examples. It will show the JSON output on the right, and the visualization of the image with its corresponding predictions on the left. It also includes a Confidence Threshold slider to filter out the annotations displayed up to certain confidence values.

The output will simulate what you will receive when using the code, and provides an easy and interactive way for you to understand how the API is functioning and a quick way to get predictions for single examples.

Additional Deployment Capabilities That You Could Explore

Once your API is up and running, you can now fully utilize your machine learning model for inference, taking the usage of your machine learning pipeline to the next level. Inference capabilities are extended to a broad range of electronic devices as long as they have internet access. With our API management tools, you can always alter the deployment’s capability as needed, and we will continue to increase deployment customizability, with new options like switching artifacts.

Our Developer’s Roadmap

Additionally, we have roadmaps in place to continue to flesh out the flexible REST API architecture with more types of tools and requests so that you can optimize your usage for all your specific use cases. In particular, we will be adding pre-prediction and post-prediction routines to help specify the exact information you need or refine the data you are giving. You can look forward to options such as image resizing for preprocessing, or post-prediction tools like filtering classes, confidence, IOU, or histogram data. On top of API deployment, please keep an eye out for a new feature, model assisted labelling, to help with annotation for new training images, to further our goal of a complete CI/CD pipeline and reduce user manual workload in annotation.

Need Help?

If you have questions, feel free to join our Community Slack to post your questions or contact us about how API deployment fits in with your usage. 

For more detailed information about the Deployable API functionality, customization options, or answers to any common questions you might have, read more about the API deployment on our Developer Portal.

Build Your AI Vision Applications for Free

Develop your go-to-market product with our no-code MLOps platform that simplifies how computer vision models are built.

Up to 500 images with model-assisted annotations
Up to 300 minutes of GPU training
Slack and email support from our engineers and product managers