What is Data Annotation?
Data annotation is an integral component of the MLOps pipeline, which is composed of labelling training data with ground truth annotations that represent the desired output of a predictive machine learning model. Annotation requirements vary with the computer vision task and real world context. Object detection tasks would require bounding boxes around preferred objects, where segmentation tasks would require detailed binary masks to outline the object. Additionally, real world contexts can determine the level of quality and detail necessary for a satisfactory outcome.
What Can the Annotation Process Look Like in Practice?
MLOps pipelines in the industry are pushing for increasingly complex processes to ensure that their production-level model deployments remain effective despite external changes in the surrounding environment. Additionally, the machine learning space as a whole is gravitating towards a more data-centric approach. Thus, at the source of effective machine learning pipelines is the efficient usage of data.
One naive approach is to contend that it is sufficient to just collect as much data as possible over an extended period of time, annotate it all, and thus conduct the entire processing of data in one go. However, given the time consuming nature of data annotation amongst other subprocesses, problems like data drift and model drift could occur between the time of raw data collection and production level model deployment.
Therefore, a more agile and iterative approach is needed. Current MLOps pipelines now lean more towards a system where smaller batches of data are processed, allowing for intermediary models to be deployed, then subsequently adjusted when necessary with new incoming data.
However, the system described above is complex and requires considerable time and effort to create in order for it to be efficient. Given that this system is designed to be agile, this means that each step in the process must be made as efficient as possible. While automated data collection and preprocessing is more well established, data annotation at its core requires some sort of human interaction or involvement. This lack of automation can become a serious bottleneck both in terms of efficiency and man hours.
How Can Model Assisted Labelling Help?
Model assisted labelling is a technique that takes advantage of the iterative process described above by utilizing previous iterations of trained models to help with annotation. The unannotated image is sent as input to the trained model, which then makes a prediction. The predictions serve as annotations for the inputted image. Model assisted labelling excels particularly because its own efficacy in automatically annotating images improves as the model improves. More efficient image labelling then leads to faster creation of annotated datasets for further retraining, which can thus help to further improve the model.
How Does Model Assisted Labelling Work on Nexus?
Model Assisted Labelling on Nexus utilizes pre-existing artifacts in your project as representations for previous iterations of your trained models. Once you have created an online deployment using our Inference API, our annotator will be able to utilize the deployment to make predictions on the currently selected image. To ensure that a deployment is running and available, one should be able to see displays in the Deployment tab of the project page like that below.
To access Model Assist as an option on the Annotator, one simply needs to go to the Annotator page and select Model Settings, and then select the preferred model deployment. This option will not be available if there are no ongoing deployments.
Having selected the appropriate model deployment, the option Model Assist will become available. One merely has to press that button or the ‘M’ key, and predicted annotations will then appear on the image as greyed out annotations like those below.
To select an individual predicted annotation for committing, the user simply has to click on the corresponding greyed-out polygon. Once selected, the annotation will change into its corresponding predicted tag color. Additionally, to help with easier selection of annotations, the Assist Settings which can be found on the left side of the annotator provide several options.
The top part of the settings is a confidence threshold slider bar. This will threshold the corresponding proposed annotations that appear on the screen. The higher the threshold is set, the less suggested annotations will appear. This can be used to filter for annotations that the model is more confident in and thus more likely to be an accurate annotation.
The other two buttons at the bottom deal with how the user wants to commit annotations to the annotator. The first button, Commit, commits manually selected annotations by the user out of the greyed out suggested annotations. The selected annotations, upon selection, will show up in their corresponding tag colors. After selecting to commit, they will appear as regular annotations.
The second button will commit all suggested annotations appearing on the annotator. This can be a faster way to annotate images fully if the model is well trained and has demonstrated strong performance in proposing accurate images. In this way, one can simply flip through images, select the Model Assist option and commit all annotations, thus annotating an entire image in two clicks.
Overall, the process of model assisted labelling can be completed for an image in merely two clicks. This rapidly accelerates the annotation process and is just another method that Nexus provides for reducing bottlenecks at the annotation stage.
Want to Get Started?
Model Assisted Labelling is available to users who are able to create deployments through our Inference API feature. If you are interested in this feature, please contact us so we can discuss how to best support your use case!
Given that model assisted labelling naturally suits an iterative process, you can utilize our Asset Group Management functionality to ensure that you can record and manage your dataset with more precision as it continuously evolves.
Our Developer’s Roadmap
Data annotation is a part of the pipeline that we are very focused on. We will continue to push to provide tools that reduce the tedium and frustration that can often come with the annotation experience, by continuing to improve the Annotator to maintain its status as one of the best annotation tool suites in the MLOps space.
Need Help?
If you have questions, feel free to join our Community Slack to post your questions or contact us about how model assisted labelling fits in with your usage.
Build models with the best tools.
develop ml models in minutes with datature