What is Image Classification?
Image classification is a pivotal task in computer vision that seeks to automate the process of assigning labels or categories to images based on their visual content. This task is executed using machine learning models, particularly convolutional neural networks (CNNs), which are trained on labeled datasets. The process typically involves feature extraction, as the model extracts meaningful and relevant visual features from the raw pixel values of the images. These features serve as the foundation upon which the model's understanding is built. The result is a model that can classify new, unseen images into predefined categories, allowing for automated decision-making.
Classification is a fundamental tool for organizing, analyzing, and extracting insights from data in a wide range of domains. It empowers machines to make informed decisions, enhances human decision-making processes, and contributes to the automation of tasks, ultimately driving progress and innovation in numerous fields.
What are Some Applications of Classification?
Unlike other tasks such as object detection and segmentation, classification may be considered less precise since there is no additional information on the location, size, and other factors of objects present in the image. However, classification is still critical for use cases that require simple identification of the existence or occurrence of particular objects or scenarios. It is widely adopted since classification models are generally known to be more lightweight than object detection or segmentation models. The multitude of applications across various domains include:
- Medical Diagnosis: In healthcare, image classification models are employed to diagnose diseases from medical images such as X-rays, MRIs, and CT scans, facilitating early detection and treatment.
- Retail & E-commerce: Classification models can categorize products based on images, aiding in inventory management, recommendation systems, and visual search.
- Security & Surveillance: These models are vital for security systems, recognizing and alerting to unusual or potentially threatening objects or individuals in real-time.
- Content-Based Image Retrieval: Image classification is used in image search engines to find images with similar tags to a query image based on their visual content.
What can Ultralytics YOLOv8 Achieve?
One of the cutting-edge methods for image classification is YOLOv8, an evolution of the popular YOLO (You Only Look Once) series of object detection and classification models. We will explore the diverse applications of classification and delve into how we can train a custom model using YOLOv8 to classify human actions. YOLOv8, an extension of the YOLO series, brings state-of-the-art capabilities to the realm of image classification, such as:
- High Classification Accuracy: While YOLOv8 is primarily known for object detection, its deep neural network architecture and advanced training techniques allow it to perform well in classification tasks as well.
- Real-Time Processing: Despite its impressive accuracy, YOLOv8 maintains real-time processing speeds. This makes it suitable for applications where low latency is crucial, such as real-time image classification in video streams or surveillance systems.
- Task Flexibility: YOLOv8 inherits the ability to not only classify objects in images but also detect and locate them. This means that in addition to categorizing objects within images, it can also provide information about where those objects are located, making it versatile for tasks that require both localization and classification.
- Efficient Training: YOLOv8 employs efficient training strategies, including transfer learning and data augmentation, which can significantly reduce the amount of labeled data required for model training. This can be especially beneficial when dealing with limited training data.
Training A Custom YOLOv8 Classification Model on Nexus
Training a custom classification model is made easy with Nexus. For this example, you can use the “Human Action Detection - Artificial Intelligence” dataset. This dataset classifies 15,000 images into 16 unique different actions such as “sleeping”, “running”, and “texting”.
Create a Project
On the Nexus homepage, you can first create a new project by clicking on the “Create New Project” card and selecting “Classification” for Type and “Image” for Content. You can name this project “Human Action Classification”.
Upload Your Dataset
The next step is to upload your images. You can select “Assets” under the Project Overview on the project home page, and drag and drop the folder containing your images into the site’s uploader. After a few moments, your images will have been imported into the project.
Annotate Your Dataset with Annotation Tools
To annotate images for classification, one can head over to the Annotator, and for each image, select your class either using hotkeys next to the associated tag or selecting them on the right column tab. You can then select “Assign” or use the hotkey “A” to assign the class to the image.
If you have already prepared annotations, you can perform a batched annotation upload by clicking on “Upload Annotations” under the “Upload/Export Annotations” Tab located on the Dataset Page and drop your .csv file into the uploader. The .csv file should have two columns, with the corresponding headers being “filename” and “label”.
Creating Your Training Workflow
After the images have been annotated, you can create your own custom workflow. Upon entering the workflow page, you are prompted to use the recommended workflow. You can use this recommendation selection to create a valid workflow with your preferences. In this instance, I chose the recommended workflow for a small model with fast inference speed. This gives a workflow consisting of a dataset, several augmentations, and a YOLOv8 Classification model with 320x320 image resolution. You can always select to make your own workflow at the bottom corner.
Once the workflow is created and ready for training, you can click on the green “Run Training” button at the bottom right corner and start training your model.
Training Your Model
After starting the training, you are redirected to the Training Run page, where you can get the performance of your model in real time. Since this is a classification model, the only loss required is the Classification loss. Furthermore, evaluation metrics are generated at every set interval, set to keep track of your model’s performance on the validation dataset. These metrics include Accuracy, Precision, Average Confidence, and F1 Score.
To further visualize the results of the training, you can examine several additional features such as the Advanced Evaluation Preview and Confusion Matrix. With the Advanced Evaluation, you can see what predictions were being made by the model for a sample of the validation dataset at each evaluation step. Using the confusion matrix, you can observe the specific classes that are confusing the model.
With these visualizations, you’re more able to understand how the model is performing qualitatively, and understand in more depth the types of errors the model is making. In particular, with using the confusion matrix, seeing how specific classes are confused with each other can allow one to better examine why classes are deemed to be similar, and bolster the dataset with a greater variety to help with better performance when retraining.
Understanding the Classification Metrics
There are a few main metrics used when examining classification trainings. The only loss used is the classification loss, which is in fact cross entropy loss. Cross entropy loss for a single image can be defined by the formula:
Cross-entropy loss is a common standard as it is more punishing to incorrect labels than something like mean-squared error.
For evaluation, typical metrics for classification algorithms in statistics are used, primarily recall, precision, F1 score, and average confidence. Recall is used to describe out of how many positively classified examples, the proportion that's classified correctly. Precision is used to describe the proportion of positive examples were accurately classified. Average confidence depicts the average confidence that the model has in making its classification. F1 score is a typical metric used for describing the overall capability of a classifier. The following Python code depicts how we might calculate these for multi-label classifiers:
Try It On Your Own Data
To get started with your own classification project, all you have to do is sign up for a Free Tier account, upload your images and annotations, and train the model right away. If you’re interested in what visual features your model is looking at during classification, you can go to this article right here to use Eigen-CAM on your trained model.
Our Developer’s Roadmap
Datature is always looking to expand its capability to support new use cases. With our latest release on supporting classification tasks, we are always on the lookout for users who are interested in additional features to make our task support more robust. Currently, we are exploring model explainability features such as Eigen-CAM to improve interpretability of classification models, but we are always open to expanding our feature selection to include a wider variety.
Want to Get Started?
If you have questions, feel free to join our Community Slack to post your questions or contact us about how classification fits in with your usage.
For more detailed information about the image classification functionality, customization options, or answers to any common questions you might have, read more about the process on our Developer Portal.
Build models with the best tools.
develop ml models in minutes with datature