Image segmentation is a crucial task in computer vision, enabling precise pixel-level classification for applications like autonomous vehicles and healthcare. It allows the identification and separation of objects, regions, or shapes, which is vital in high-stakes environments.
However, segmentation tasks pose challenges, especially in data representation. The typical bitmask output—arrays of 0s and 1s—maps to the image dimensions, marking pixels as part of an object or background. While effective, this approach can become inefficient with large images or complex scenes, increasing storage demands and slowing down machine learning workflows.
Datature has introduced solutions to improve bitmask handling within ML pipelines. In this article, we’ll explore the limitations of traditional bitmask storage and how Datature optimises data representation.
Types of Bitmask Representations
Bitmasks are central to image segmentation, storing each pixel’s class in a compact format. They come in two types: dense and sparse. Dense bitmasks prioritise speed and random access, while sparse bitmasks focus on storage and transfer efficiency through data compression. Let’s explore both approaches and their trade-offs.
Dense Representations
Dense representations are straightforward to visualise. Imagine a 3x3 bitmask like this, where 1 indicates the masked area and 0 represents a transparent or unmasked area:
This might be stored as a dense array, 110011100
. In this approach, each pixel's value can be accessed quickly, as it corresponds directly to its position in the array. For instance, the formula B[y * w + x] = b(x, y)
lets us retrieve the bit value b(x, y)
at any pixel location (x, y)
, where w
is the row width.
Dense representations extend easily to higher dimensions, such as 3D bitmasks. For instance, a 3D bitmask value might be accessed with a formula like B[z * (h * w) + y * w + x]
, enabling fast, random access to individual pixels. However, while dense representations are efficient to read and manipulate, they also consume considerable memory. Even though each pixel only takes up one bit, this method is storage-intensive, especially for large images or high-resolution datasets.
Sparse Representations
Sparse representations improve storage efficiency by removing redundancy. Instead of storing every bit, they store only the indices of specific pixel types—like object pixels (1 bits) or background pixels (0 bits). For example, if a bitmask mostly contains background, only the object pixels' indices are stored, significantly reducing storage needs.
A naive sparse representation, however, isn’t always a perfect fit for vision tasks, where the ratio of 1 to 0 pixels can vary widely. For bitmasks where most pixels are occupied by objects (or vice versa), storing just a few indices might save storage. But, in many cases, we can’t predict whether a sparse or dense format will save space in advance.
Run-Length Encoding (RLE)
The aforementioned challenge can be tackled by a method called Run-Length Encoding (RLE), which has been a popular, lossless method for efficiently storing bitmasks since as far back as 1967. When dealing with binary bitmasks that represent segmentation data—where pixels are either “on” or “off,” representing object vs. background—RLE can be especially useful. RLE reduces storage by encoding consecutive sequences of pixels (known as "runs") rather than storing each individual pixel, making it a highly efficient choice for many computer vision tasks.
In a binary bitmask with only two-pixel types (e.g., black for background and white for the object), we can use RLE to encode consecutive runs of the same type consecutively. Using the same example as above, the corresponding bitmask string 110011100
can be encoded as: 2(1) 2(0) 3(1) 2(0)
, which reflects two white pixels, two black pixels, three white pixels, and finally, two black pixels.
In RLE, we can simplify this further by using signed integers where negative values represent black pixels, and positive values represent white pixels. The result would be: -2, 2, -3, 2
. This method keeps the first bit of each number to signify the pixel type, with negative values for background and positive values for objects.
Unlike dense representations, where bitmask size depends on the image dimensions (width x height), RLE scales with the number of runs rather than total pixels. This is ideal for vision tasks, where segmented objects are typically clustered into contiguous regions, resulting in long, uniform pixel runs. In fact, even with complex segmentation masks, RLE size tends to be proportional to the number of rows rather than the number of pixels. This makes RLE highly effective for images with large homogenous areas—such as satellite or aerial imagery—where thousands of pixels may represent the same class.
To further optimize RLE storage, variable-width integers (varints) can be used. Varints store shorter runs using fewer bytes, compressing data further when runs are short, a small but useful enhancement over fixed-byte storage.
The main drawback of RLE is limited random access. Retrieving the value of a single pixel requires sequentially traversing each run up to that pixel’s position, slowing down random access. For some vision applications, this may not pose a problem, but in the context of Datature’s Nexus platform—where fast, scalable operations are essential—this could be a consideration. For instance, tools like our built-in Annotator, which may handle extremely large images (over 10,000 pixels per side in cases like aerial or satellite imagery), depend on rapid interaction and require techniques like image tiling or splicing. These operations often benefit from a trade-off between RLE's compact storage and the performance characteristics of dense storage.
Run-End Encoding (REE)
Regarding efficient bitmask storage, RLE is often the go-to choice. However, another method, known as Run-End Encoding (REE), offers some unique advantages by balancing elements of both sparse and RLE formats. While RLE uses runs of consecutive values to compress bitmasks, REE represents these runs as a sequence of indices where values switch between black and white (or 0 and 1). This makes it an intriguing choice for computer vision tasks requiring efficient segmentation mask handling.
In REE, the transition points between consecutive black and white runs are stored as indices. Taking our previous example of a bitmask 110011100
, REE would represent it as: 0 2 4 7
. In this case, we assume that the starting run is black (0), so the first REE index is 0 where the bitmask starts with 1. At each index in REE, the value flips between black and white, allowing us to represent the entire bitmask based on transition points alone.
On the surface, REE might appear to offer similar or even less compression than RLE, especially since it uses indices, and the indices towards the end are often large, requiring more than one byte. However, REE offers distinct advantages:
- Efficient Pixel Lookup with Binary Search: While REE doesn't fully support random access, it allows pixel value lookup via binary search in O(log n) time. Because the indices are ordered and alternate between pixel types, identifying a specific pixel type within the bitmask is faster compared to RLE's sequential lookup.
- Optimized Boolean Operations: REE’s ordered indices make it ideal for performing Boolean operations on overlapping bitmasks, such as union, intersection, and subtraction. By leveraging a method similar to the merge sort algorithm, two REE-encoded bitmasks can be combined in a single pass, generating a new bitmask based on the chosen Boolean operation. This approach makes REE highly efficient for complex vision tasks that involve comparing or combining multiple bitmasks.
- Flexible Chunking and Combining: REE also excels in situations where bitmasks need to be split into chunks and later recombined. For example, if we split an REE-encoded bitmask into separate chunks, each chunk can be easily merged using the exclusive OR (XOR) operation. Thanks to XOR's commutative and reversible properties, the merging order doesn’t matter; no matter which chunks arrive first, they can be combined to reconstruct the original bitmask. This feature is particularly valuable when streaming large images over a network, as REE allows stitching to begin immediately without waiting for all chunks to arrive.
In the example above, we take a bitmask represented in REE and divide it into three separate, mutually exclusive chunks. When it's time to recombine these chunks, we simply use the exclusive OR (XOR) operation to merge them. Due to the commutative and reversible properties of XOR, the order in which the chunks are joined doesn’t matter; regardless of the sequence, we’ll always end up with the same final bitmask.
This characteristic makes REE particularly useful when streaming bitmask data over a network. Since each chunk can arrive in any order, there’s no need to wait for all chunks to be present before beginning the recombination process. Stitching can start as soon as any chunk arrives, enabling more efficient and flexible processing, even when data transfers are asynchronous or prone to network delays.
REE vs. RLE
A summary of the differences between REE and RLE are shown in the table below:
One intriguing advantage of the strictly increasing nature of REE indices is that it allows for efficient compression using binary trees, offering results comparable to RLE with varints.
Efficient Compression of REE Using Binary Trees
Since REE relies on a series of indices that are laid out in a strictly ordered fashion, these indices can be efficiently encoded using a binary tree structure. Drawing inspiration from Octree compression—commonly used in 3D graphics and point cloud data—we can build a binary tree where the leaves represent the REE indices and the branches encode the presence or absence of downstream indices.
Building the Compression Tree
The binary tree is recursively defined: empty branches are represented by a single 0 bit, while non-empty branches start with a 1 bit, followed by the preorder traversal of the subtree. This structure relies on the indices' strictly increasing nature and allows for efficient compression.
- Empty branches: Represented by a 0 bit, indicating no indices exist in that segment of the tree.
- Non-empty branches: Start with a 1 bit, followed by a preorder traversal of all downstream indices, efficiently capturing the presence and structure of the data.
Such a tree will be recursively defined where empty branches are represented by a single 0 bit while non-empty branches start with a one bit followed by a preorder traversal of the subtree.
The result of this recursive structure is a compressed bitstream obtained through a preorder traversal of the entire binary tree.
Compression Efficiency and Performance
The encoding and decoding of this binary tree structure are simple and can be performed in a single pass with a time complexity of O(n log m)
, where:
n
is the number of run-end indices to encode, andm
is the range of the indices.
This process is efficient in both time and space, especially considering that in raw form, these indices are typically stored as 4-byte Int32 values. In practice, this technique results in a compression ratio of about 1 to 2 bytes per run, a significant reduction when compared to the original 4-byte representation.
Error Detection and Further Compression
An added benefit of this binary tree approach is its inherent error-checking capability. Since both branches of any subtree cannot both have a value of 0, this structural property allows for self-error detection during encoding or decoding. This feature makes the process more robust and reliable.
Moreover, when repetitive patterns appear in the bitmask data, they are preserved in the compressed bitstream. This preservation makes the compressed data particularly amenable to further compression using general-purpose compression tools like gzip, which can further reduce the size of the bitstream, especially when dealing with large, homogenous regions in the data.
Comparison with Run-Length Encoding (RLE)
When comparing this binary tree-based REE compression to traditional RLE, it's important to note the strengths and limitations of each approach:
- RLE is efficient for encoding short runs but becomes less effective when the run length exceeds 63, as varints must be used. Once the run length exceeds 63, varints require at least two bytes due to the limited number of bits available (only 6 bits can be used for the run length after accounting for continuation and sign bits).
- REE with binary tree compression, on the other hand, offers more consistent compression performance across a wide range of run lengths, especially when dealing with large datasets where indices may be spread across large, sparse areas.
Evaluating the Space Efficiency of Binary Tree-Compressed REEs
We applied binary tree-compressed REEs on images of varying complexities, revealing several important insights into their space efficiency:
- Run Length Efficiency: The average length of each run has a direct impact on compression efficiency. Images with longer, continuous runs (like large contiguous regions of similar pixel values) generate fewer runs overall, reducing the storage size.
- Significant Space Savings: Binary tree-compressed REE format substantially reduces memory usage compared to dense bitmask representations, particularly in images with structured or clustered pixel values.
- Adaptability Across Image Types: This encoding method efficiently adapts to images of different sizes and complexities. While images with shorter run lengths (more alternating pixel types) naturally require more runs and slightly larger storage, the compression remains effective across diverse content types, maintaining reasonable sizes even in high-detail scenarios.
Image Width Delimited RLE
While REE excels in bulk operations and compression efficiency, image width-delimited RLE shines in tasks that require row-based operations. This format simplifies a number of tasks that are challenging with other bitmask representations, including REE.
For instance, when translating a bitmask, image width delimited RLE ensures that runs stay strictly within the confines of a row, avoiding issues with wrapping around and making the translation operation much more straightforward. This is particularly helpful when working with other 2D bitmask manipulations like centroids and bounding box computations, cut-and-paste operations, finding islands and holes in fill operations, and more.
Try It Out Yourself
To visualize bitmask encoding and see firsthand the space-saving differences between RLE and REE compression, check out our demo notebook, or run it on Colab. This interactive resource offers a clear comparison of each method's efficiency, showing how much storage you can save with the right encoding technique.
Conclusion: Efficient Bitmask Data Representation at Scale
At Datature, we understand that bitmask data representation plays a crucial role in the performance and efficiency of computer vision workflows. By carefully selecting the most appropriate data structures for different stages of the process, we ensure that our Nexus platform delivers high performance without sacrificing flexibility or ease of use.
Key Engineering Decisions Made at Datature
- Storage and Transfer: We use a binary tree compressed REE format to efficiently handle sparse data with low storage requirements.
- Visualization and Manipulation: The REE format is preferred for operations on pixel runs, improving memory efficiency, especially in undo/redo stacks. It also enables quick Boolean operations (union, intersection, etc.) through a single pass.
- 2D Manipulation: For tasks like cut and paste, centroid computation, and bounding box extraction on our Annotator, we use image-width delimited RLE, which allows for easy affine transformations (translation, scaling, rotation).
One of the key benefits of our approach is the ease of format conversion. Whether moving between dense bitmasks, REE, image-width delimited RLE, or binary tree compressed REE, the transitions are seamless and computationally inexpensive, typically requiring only a single pass.
Ultimately, Datature’s flexible and efficient approach to bitmask data representation ensures users can easily manage large, complex datasets, streamlining the annotation process while maintaining high performance and scalability. This thoughtful combination of data structures empowers users to tackle a wide range of computer vision challenges with greater speed and accuracy.
Build models with the best tools.
develop ml models in minutes with datature