Tutorials

Annotate Images & Videos with Segment Anything 2.0 on Datature Nexus

Wei Loon Cheng

July 30, 2024

MIN READ

Sections

What is Segment Anything 2.0?

Segment Anything 2.0 (SAM 2.0) is Meta’s newest version of Segment Anything (SAM). SAM 2.0 extends upon the capabilities of SAM by allowing for real-time promptable visual segmentation in images and videos. To accomplish this, SAM 2.0 employs a simple transformer architecture leveraging “streaming memory” to store information about existing predictions and prompts.

Additionally, Meta released the dataset that SAM 2.0 was trained on, Segment Anything Video, or SA-V. SA-V is a geographically diverse and fairly evaluated dataset which features 35.5M masks on 50.9K videos, or 53x more masks than any existing video segmentation datasets. To learn more about the key improvements of SAM 2.0, check out our introductory article.

What’s New In SAM 2.0?

SAM 2.0 builds on the original SAM with several exciting improvements:

Enhanced Accuracy and Speed: SAM 2.0 offers better segmentation accuracy and faster processing times, making it great for real-time use.
Compact Model Size: SAM 2.0 has a broader model offering, with four different model sizes, all of which are more compact than their predecessors, ranging from around 150 MB to 880 MB versus 350 MB to 2.2 GB with the original SAM.
Video Segmentation: SAM 2.0 can now segment videos, tracking and segmenting objects across video frames consistently.
Refined Prompting Mechanisms: The new model supports more advanced prompting techniques, giving users more control over the segmentation process
Expanded Dataset: SAM 2.0 is trained on an even larger and more diverse dataset, improving its ability to handle different image and video types

How to Use SAM 2.0 on Datature Nexus

Datature strives to allow state-of-the-art technology to be accessible and easy to use for all. SAM 2.0 is even more well-adapted to visual features in most contexts and industry use cases without any pre-training as compared to its predecessor. With such generalized utility out of the box, it is the perfect tool to accelerate user annotation on Nexus’ Annotator. With this in mind, we’re integrating SAM 2.0 into our Annotator in three main ways.

Intellibrush Leveraging SAM 2.0

The first way integrates SAM 2.0 as an option in our flagship IntelliBrush feature, which is an model-assisted interactive segmentation tool with positive and negative clicks. IntelliBrush greatly speeds up annotation efforts by allowing users to annotate complex objects with just a few clicks - instead of painstakingly plotting vertices. Intellibrush also works well on a variety of use cases and environments, with no pre-training or downstream fine-tuning necessary.

Over the years, we have been experimenting with Interactive Labelling through the release of our initial IntelliBrush algorithm, and also adding new backbone models (such as incorporating the original SAM) to suit an increasing variety of use cases.

Why SAM 2.0?

Our previous comparisons between IntelliBrush and SAM showcased pros and cons of each method. While SAM was more granular in being able to segment subsections of objects, IntelliBrush was more responsive to multiple positive and negative prompts. However, we find that SAM 2.0 is able to draw on the strengths of both existing methods, allowing users more fine-grained control over the exact objects (or parts of them) that they want to segment.

SAM 2.0 has also proven to be less computationally-intensive than SAM and Intellibrush, giving users a more seamless experience when annotating multiple objects, as well as further reducing the time and effort needed to annotate large datasets.

SAM 2.0 for Image Annotation

To use SAM 2.0 for annotation images on Nexus, simply activate the IntelliBrush tool by clicking on the button on the right sidebar on the Annotator page, or press the hotkey T. The default backbone model used is our flagship Intellibrush model, so you will need to select the SAM 2.0 option in Intelli-Settings.

Once IntelliBrush has been activated, the model will begin computing the image embeddings. After a couple of seconds, you will be able to use IntelliBrush as usual. In addition to the normal IntelliBrush annotations with left and right clicks, when you hover your mouse over parts of an image, you will also be able to see what IntelliBrush proposes as an initial object mask.

What about Domain-Specific Images?

SAM 2.0 has proven to outperform its predecessor in most domain-specific images given its strong zero-shot generalization capabilities. This allows domain experts to label up to 10x faster by leveraging their experience and the “predictive” nature of IntelliBrush.

Specifically for challenging images from the medical industry, we found that SAM 2.0 is able to identify more precise masks, allowing for more accurate post-detection diagnosis of conditions like calculating the size of malignant tumours.

Multiple Prompt Refinement

In certain scenarios, a single prompt may be insufficient to precisely segment objects. We recommend using multiple positive (left-click) and negative (right-click) prompts to further refine your annotation mask selection. Do note that conflicting prompts may confuse the model, so do ensure that the areas surrounding positive and negative prompts are clearly defined.

Everything Mode

Through Everything mode, the tool automatically creates predicted masks for every object detected in the image, after which users can simply select their preferred masks.

These will appear as greyed-out masks with a dashed outline. From here, you can then select the appropriate class tag and select and assign masks you deem to be appropriate and accurate. These will then appear as solid masks in their assigned tag colors. Once satisfied, you can confirm the annotations.

This is a quick way that reduces annotation time. Users will see similarities with the idea of our Model-Assisted Labelling tool. However, the underlying SAM 2.0 model is capable of identifying most objects in a class-agnostic manner, whereas the Model-Assisted Labelling leverages your previously trained models on Nexus to label your data with the class assigned as well, given the contextual knowledge it has gathered from prior trainings.

Video Tracking

SAM 2.0’s impressive zero-shot capabilities is further enhanced by its spatio-temporal awareness, allowing it to propagate annotations in one video frame across a series of subsequent video frames, even accounting for object motion and occlusion. This cuts down annotation time by up to 90%, as users only need to annotate a tiny fraction of the frames in a video and let the model do most of the heavy lifting work. This is currently in beta testing - do contact us if you would like to try out this new feature!

Try It On Your Own Data

Datature makes it easy to try all these tools on the Nexus platform. With Datature’s Starter plan, users can sign up for free without a credit card with 500 IntelliBrush credits to use on IntelliBrush and Everything mode, as well as the rest of the intelligent tools available, including AI Mask Refinement and Video Interpolation. To learn more about the credit system and how to purchase more, please visit our pricing page.

Our Developer’s Roadmap

Datature is always invested in utilizing the latest research to improve our platform. Given the way that machine learning both in the foundational model and computer vision space is continuing to rapidly evolve, we are closely monitoring and reviewing new research for new features, such as exploring how the Everything mode can be extended to videos.

Want to Get Started?

If you have questions, feel free to join our Community Slack to post your questions or contact us if you wish to try out Meta’s SAM 2.0 on Datature Nexus.

For more detailed information about SAM 2.0’s functionality, customization options, or answers to any common questions you might have, read more on our Developer Portal.

What is Segment Anything 2.0?

What’s New In SAM 2.0?

SAM 2.0 builds on the original SAM with several exciting improvements:

Enhanced Accuracy and Speed: SAM 2.0 offers better segmentation accuracy and faster processing times, making it great for real-time use.
Compact Model Size: SAM 2.0 has a broader model offering, with four different model sizes, all of which are more compact than their predecessors, ranging from around 150 MB to 880 MB versus 350 MB to 2.2 GB with the original SAM.
Video Segmentation: SAM 2.0 can now segment videos, tracking and segmenting objects across video frames consistently.
Refined Prompting Mechanisms: The new model supports more advanced prompting techniques, giving users more control over the segmentation process
Expanded Dataset: SAM 2.0 is trained on an even larger and more diverse dataset, improving its ability to handle different image and video types