What are Ontologies?
The concept of thinking about datasets ontologically was popularized by Kitchin and McArdle in 2016 as a way to systematically define and categorize the specific scope of datasets. It has subsequently been embraced as a concept to enforce dataset standards and encourage clarity. Ontologies have been practically used in a variety of ways, but principally in the context of machine learning, they detail the structure of data labels and how these labels relate to each other hierarchically.
Why are Ontologies Needed?
While the concept described above is rather broad and generic, the utilization of ontologies in practice directly addresses many of the problems that datasets have today. Due to the sheer volume and nature of unstructured data, data and annotation inconsistencies are rampant, and changing requirements lead to datasets becoming an amalgamation of smaller subsets with their own unique qualities rather than one uniform, easily interpretable and usable dataset.
Ontologies on Nexus
We have integrated Ontologies onto the Nexus platform to allow users to define annotation requirements during the labelling process. The core release of the ontology feature is the ability to add attributes to class tags.
This iteration of Ontologies allows for adding unique attributes of the following three types:
This attribute type allows you to assign user-defined categories to your annotations. Multiple selections can be made for the categories if the annotation prescribes to multiple categories. If the attribute is set as required, a default value must be assigned based on existing categories in the attribute. This is useful for when annotation attributes can be described in a set of options, such as instances of clothing with class tags of the type of clothing being additionally attributed to a set of fashion brands.
In the example below, we create a categorical attribute called Brands, that is a categorical attribute containing the fashion brands. We’ve created five categories: Clark’s, Esprit, Nike, Puma, and Reebok.
Once the attribute is created, we label the object below as slippers, and set the attribute, Brand, as Clark’s.
This attribute type should be used for any attribute data that is quantitative. As a very broad attribute type, this allows for support of various types of use-cases. If the attribute is required, a default numerical value must be provided. In the context of object tracking, numerical attributes can be used for assigning integer instance IDs to objects across various frames in a video, assigning velocity vectors using two attributes in tandem, with one attribute representing speed and the other representing angle.
Here, we show numeric attributes, Speed and Bearing, as well as a categorical attribute, Vehicle Type, for each individual instance of the class Vehicle. On Datature, when one uses Interpolation Mode or Tracking Mode, the instances are automatically assigned to the same instance ID track as shown at the bottom of the video bar. As such, attributes can be assigned across the same instance, but can be modified across frames as well.
In the video below, instances of vehicles exist across frames, and each vehicle has a speed associated at each frame. The selected annotation has the speed of 73 mph, bearing of 91 degrees, and Vehicle Type as Car at the first frame, but can be changed as the video plays out.
The string attribute type can be used to assign non-categorizable text to an annotation. If the attribute is required, a default text must be provided. This can be of use for use-cases that require more varied descriptions of the object. Another common use-case is through assigning text prompts to their corresponding visual imagery for the purpose of building image-prompt pair datasets. This is useful for Generative AI applications, such as finetuning Stable Diffusion, or even utilizing the prompt labels as input to generate variation in your dataset.
In this example, we are creating a classification type dataset, in which labels apply to an entire image. These can be easily used to form an image-prompt pair dataset, in which the caption is the prompt and the image is the expected output. We have simple class categories, Real World Image and AI Generated Image.
In the Advanced Tag Options, the class AI Generated Image has string attributes, Prompt and Negative Prompt, as well as a categorical attribute, Artistic Style, where users can select multiple categories for styles that were used to generate the AI generated image.
The image below is assigned the AI Generated Image label, together with the prompt “Create a painting of a snowy mountain.”, the negative prompt “No urban environment” and the artistic styles of “watercolor” and “tranquil” to generate the image.
Managing Your Ontology
To create or modify these attributes, you can select Manage Tags → Advanced Tag Options, and select any class tag. You will be shown a menu as seen in the image below. On the left side, you can change the tag properties such as default color of the tag, the type of annotation that is being made with the tag, the unique tag ID, and the tag description. On the top-right section, you can create new attributes with options to set an Attribute Name, the type of attribute (Categorical, Numerical, Text), whether it is required or not, and choose a default value if it is. Created attributes will be displayed below, where you can view and modify existing attributes.
In this example, we are creating an Optical Character Recognition (OCR) dataset, so we just need one string attribute that is required, Written Text.
Ontologies on the Annotator
Once the attributes have been saved, you can create new annotations with that class tag. The Attributes tab on the right will appear once you have selected an existing annotation. In this case, we have created a new bounding box annotation using our Rectangle tool. Since we previously set the attribute Written Text as required in our OCR dataset, the annotation appears in red to indicate that the annotation attributes have not been sufficiently filled. You will not be able to submit the annotation to the next stage of the annotation workflow unless they have been filled in.
Once the attributes have been filled in and saved, the color of the annotation returns to its normal class color and the attribute values are saved. In this case, we have entered the text corresponding to what is encapsulated by the bounding box, which states “Large Party Gratuity (18.00%)”.
Ontologies for Model Training
Ontologies provide a simple framework for utilizing attributes as ground truth labels for training. Many datasets are created with an initial goal that evolves as the dataset grows. The use of ontologies enables raw data to be used for several purposes at the training stage, by allowing either the primary class tag to act as the ground truth label or any of the corresponding attribute tags for supervised model training. As Datature expands its model offerings, users will be able to use tags and attributes in conjunction with custom designed model trainings, targetting traditional computer vision tasks or multimodal tasks as well.
Exporting Ontologies and Annotations
To enable transparency and ensure the storage of data, you can export ontologies alongside your annotations using our Python SDK. The ontology structure, together with other class tag metadata is stored in a separate ontology file. Within the exported annotation file, each annotation instance will contain the annotation bounds as well as the selected attribute values.
Try It On Your Own Data
Ontologies aren’t just limited to simple class tag attributes, but can provide further clarity on a dataset through features like inter-class or inter-object relationships, attribute conditionality, and more complex attributes. To facilitate this, we plan to roll out these features for a more comprehensive Ontology structure. Additionally, we will be providing more options in tag and attribute management to allow for setting more rules to enforce more structured data, such as specifying rounding and step size for numerical data or text character limit for strings.
Want to Get Started?
If you have questions, feel free to join our Community Slack to post your questions or contact us about how Ontologies fit in with your usage.
For more detailed information about the Ontology customization options, or answers to any common questions you might have, read more on our Developer Portal.
Build models with the best tools.
develop ml models in minutes with datature