Vision AI Glossary

Resources

Glossary

Definitions of AI terms, platform features, and machine learning concepts.

Search Glossary

Action Recognition

Action recognition is the process of identifying and categorizing human actions or movements in videos or images, such as walking, running, or dancing, to enable computer systems to understand and respond to these actions automatically.

More Information

Active learning

Active learning is a machine learning algorithm that gives users the ability to actively to label data points with the intended outputs. The algorithm randomly selects the data points to be labeled next from the unlabeled data pool.

More Information

Anchor Box

Anchor box is the predefined bounding boxes which is usually seen in object detection models.

More Information

Annotation

Annotation is the process of labeling your data to teach your deep learning model the outcome your want to predict. Generally, bounding boxes are used to train for object detection and polygons are used to train for instance segmentation.

More Information

Annotation Format

Annotation format is the specific method to encode the annotation and to describe the bounding box’s size and position (COCO, YOLO, TXT, etc).

More Information

Application Programming Interface (API)

An application programming interface is a mechanism that provides components to convey with other software within databases or applications. Companies can use it to assist digital transformation or an ecosystem. We use REST API to allow users to easily import their models into our platform.

More Information

Attribute/attribute group

An attribute is the item of data that is utilized in machine learning, and the attribute groups define clusters of attributes to create the product’s additional information.

More Information

Augmentation

Augmentations are good for dataset robustness. It allows users to enhance their existing dataset through positional augmentations or color space augmentation. These augmentation techniques enable the model to not lean on specific features while training.‍

More Information

Automated Machine Learning (AutoML)

AutoML leads to automating the tasks to optimize the training models for application to the real world by themselves. It contains the whole process from loading a raw dataset to deploying the ML model.

More Information

Backpropagation

Backpropagation is a two-stage training process of how neural networks improve themselves. It is an ML algorithm that adjusts the parameters from the error calculation of each neuron.

More Information

Bounding box

A bounding box is a rectangular region of an image that concludes an object and is portrayed by its (x,y) coordinates‍.

More Information

COCO

COCO is an image dataset stored in the JSON format, gathering to compare different models’ performance and solve common object detection problems.‍

More Information

Classification

Classification is a machine learning task where data is categorized into predefined classes or labels. The goal is to build a model that can predict the correct label for new, unseen data based on patterns and features learned from a training dataset. It's widely used in various applications, such as spam detection or image recognition.

More Information

Clustering

Clustering is an unsupervised technique that groups similar instances according to similarity, and the data points will not include labels.

More Information

Computer Vision

Computer Vision is the science of enabling computers to see and understand images and video. This is accomplished by developing algorithms that can make sense of visual content, for example detecting people or objects in an image or video, or being able to read road signs.‍

More Information

Confusion Matrix

A confusion matrix is a table used in machine learning to evaluate the performance of a classification model. It summarizes the model's predictions by showing the true positive, true negative, false positive, and false negative counts, enabling the assessment of accuracy, precision, recall, and other metrics.

More Information

Convolutional neural networks (CNN)

CNN is a neural network that at least has one convolutional layer. It is typically used for image recognition and identification.‍

More Information

Explainable AI

More Information

Foundation Models

Foundation models are pre-trained convolutional neural networks (CNNs) that have been trained on large image datasets. These models serve as a starting point for various computer vision tasks like object detection, image classification, and segmentation. They provide a foundation of learned features and patterns that can be fine-tuned for specific vision-related applications.

More Information

Generative AI

Generative AI refers to artificial intelligence systems capable of generating data, content, or objects autonomously. These systems, often based on deep learning models like GANs, can produce images, text, audio, or other forms of data, allowing them to create new and original content based on patterns learned from training data.

More Information

Gesture Recognition

Gesture recognition is a technology that interprets human gestures or body movements to control and interact with computers or other devices. It allows users to convey commands, input data, or interact with a system through natural movements, making it valuable in applications like gaming, virtual reality, and user interfaces.

More Information

Instance Segmentation

Instance segmentation is a computer vision task that combines object detection and semantic segmentation. It identifies and delineates individual objects within an image, assigning each pixel to a specific object instance. This provides a detailed understanding of the spatial extent and location of distinct objects in an image.

More Information

Keypoint Detection

Keypoint detection is a computer vision task that identifies and localizes specific points or landmarks in an image. These keypoints represent important features, such as corners or interest points, and are often used for tasks like object tracking, pose estimation, and image alignment.

More Information

Machine Learning Operations (MLOps)

MLOps, short for Machine Learning Operations, is a set of practices and tools that combine machine learning with DevOps to manage the end-to-end machine learning lifecycle. It encompasses model development, deployment, monitoring, and automation, enabling efficient, scalable, and reliable machine learning operations in production environments.

More Information

Model Deployment

Model deployment is the act of making a machine learning model operational and accessible for real-world use. It involves integrating the model into a software application, cloud service, or other systems, so it can make predictions or decisions based on new data in a practical, automated, and scalable manner.

More Information

Object Detection

Object detection is a computer vision task that involves identifying and locating objects within images or videos. It goes beyond image classification by not only classifying objects but also drawing bounding boxes around them, providing information about their positions in the image. It's widely used in applications like autonomous driving and image analysis.

More Information

Object Tracking

Object tracking is a computer vision process that involves monitoring and following the movement of objects within a sequence of images or a video stream over time. It assigns a unique identity to each object and tracks its position and motion as it moves through the frames, enabling applications like video surveillance and autonomous vehicles.

More Information

Pose Estimation

Pose estimation is a computer vision task that identifies and calculates the positions and orientations of key body parts or objects within an image or video, often in the context of human pose analysis. It's used in applications such as motion capture, gesture recognition, and augmented reality.

More Information

Semantic Segmentation

Semantic segmentation is a computer vision task that classifies each pixel in an image to a specific object category or class. It provides a detailed understanding of the objects' spatial layout and enables the delineation of object boundaries in an image, making it useful in applications like image analysis and autonomous driving.

More Information

Training Metrics

Training metrics in machine learning are quantitative measures used to evaluate and assess the performance of a model during its training process. They help in understanding how well the model is learning from the data and can include metrics like accuracy, loss, precision, recall, and F1-score, among others.

More Information

Unstructured Data

Unstructured data refers to information that doesn't have a predefined format or structure, making it harder to organize and analyze using traditional data processing methods. Examples include text, images, audio, and video. Specialized techniques, like natural language processing and computer vision, are used to extract insights from unstructured data.

More Information

Vision Transformers (ViTs)

Vision transformers are a class of neural networks that apply the transformer architecture, originally designed for sequence modelling tasks like language translation, to image processing tasks.

More Information

Vision-Language Models (VLMs)

Vision language models are multi-modal models that can learn simultaneously from images and texts to tackle many tasks, from visual question answering to image captioning.

More Information

Visualization

In computer vision, visualization involves creating graphical representations of visual data, aiding in tasks like image segmentation, object detection, and image classification. It helps interpret and communicate information through visual cues, such as bounding boxes, heatmaps, or feature representations, enhancing the understanding of computer vision algorithms and results.

More Information

YOLO

YOLO is a real-time object detection algorithm in computer vision. It processes an image once to simultaneously predict bounding boxes and class probabilities for multiple objects. YOLO is known for its speed and accuracy, making it suitable for various applications, including autonomous driving and surveillance.

More Information

Zero-Shot / K-Shot Learning

Zero-shot learning is a machine learning approach where a model can recognize and classify objects or concepts it has never seen during training. K-shot learning extends this by allowing the model to learn from a small number (k) of examples for each unseen class, enabling better generalization.

More Information

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

No results found.

Resources

Learn more from Datature’s team.

Our Blog

A Comprehensive Guide to Object Tracking Algorithms in 2025

MIN READ

December 6, 2025

Comprehensive comparison of the latest advanced object tracking methods including ByteTrack, SAMBA-MOTR, CAMELTrack, Cutie, and DAM4SAM. Analysis covers tracking-by-detection vs detection-by-tracking paradigms, performance metrics, computational efficiency, and real-world applications in autonomous driving, surveillance, and video analytics.

Read

How to Fine-Tune Qwen2.5-VL

MIN READ

June 30, 2025

Learn how to train Qwen2.5-VL to automatically detect and describe objects in images. This guide covers dataset preparation, training on consumer GPUs, and real-world results with detailed examples and troubleshooting tips

Read

A Comprehensive Guide to Model Fusion Techniques for Metadata-Aware Training

MIN READ

June 6, 2025

Learn how to enhance computer vision model performance by integrating metadata through early, middle, and late fusion techniques. Discover practical YOLO11 implementation examples and achieve up to 20% accuracy improvements in challenging classification tasks.

Read

Get Started Now

Get Started using Datature’s platform now for free.

Book Demo

Glossary

Action Recognition

Action Recognition

More Reading

Active learning

Active learning

More Reading

Anchor Box

Anchor Box

More Reading

Annotation

Annotation

More Reading

Annotation Format

Annotation Format

More Reading

Application Programming Interface (API)

Application Programming Interface (API)

More Reading

Attribute/attribute group

Attribute/attribute group

More Reading

Augmentation

Augmentation

More Reading

Automated Machine Learning (AutoML)

Automated Machine Learning (AutoML)

More Reading

Backpropagation

Backpropagation

More Reading

Bounding box

Bounding box

More Reading

COCO

COCO

More Reading

Classification

Classification

More Reading

Clustering

Clustering

More Reading

Computer Vision

Computer Vision

More Reading

Confusion Matrix

Confusion Matrix

More Reading

Convolutional neural networks (CNN)

Convolutional neural networks (CNN)

More Reading

Explainable AI

Explainable AI

More Reading

Foundation Models

Foundation Models

More Reading

Generative AI

Generative AI

More Reading

Gesture Recognition

Gesture Recognition

More Reading

Instance Segmentation

Instance Segmentation

More Reading

Keypoint Detection

Keypoint Detection

More Reading

Machine Learning Operations (MLOps)

Machine Learning Operations (MLOps)

More Reading

Model Deployment

Model Deployment

More Reading

Object Detection

Object Detection

More Reading

Object Tracking