Tutorials

Understanding Your YOLOv8 model with Eigen-CAM

Leonard So

August 28, 2023

MIN READ

Sections

What is Model Explainability?

As deep learning solutions have rapidly evolved and become increasingly prevalent in a broad range of industries, it is becoming just as critical to understand how models make the predictions that they make. As deep learning models, by their very nature, are black box solutions, there is no way to directly translate each computation as a logical step that can be interpreted by human intuition. As such, model explainability tools are necessary to extract processed data from the model at various stages and convert them into knowledge that we perceive the model to be extracting.

‍

Example of Model Explainability Tool - Eigen-CAM (Source : https://ieeexplore.ieee.org/abstract/document/9206626)

‍

In the specific case of computer vision, models can contain millions of parameters and hundreds of layers, which creates a greater challenge in parsing through and extracting salient information that can be interpreted. A common goal for model explainability tools in computer vision is to determine what spatial regions or visual features that a model is focused on when making predictions. This is particularly beneficial in tasks such as image classification or object detection in which the prediction outputs of detection aren’t usually detailed enough to show which features the model finds significant in that region.

‍

What is Eigen-CAM?

Eigen-CAM is one such model explainability tool to help with convolutional neural networks (CNNs), which are a proven and well-established general architecture for computer vision deep learning models. Released in 2020 by Muhammad et al., it is based on class activation maps (CAM), focusing on making sense of what a model learns from the visual data in order to arrive at the predictions that it makes. These class activation maps are useful for visualization in the model explainability space as the concept of significant features is aligned with how humans generally comprehend vision, so anyone is capable of looking at a class activation map, comparing it with the contents of the original image, and determining whether the model is truly grasping the important visual concepts that the human is seeing as well.

Eigen-CAM used on a Various Dataset Types

When used in the context of a machine learning operations (MLOps) pipeline, being able to see how your model has progressed in learning the correct features is a great qualitative indicator for whether your training setup is ideal, whether you need to add more augmentations for variability, and even provide a space for sanity checks in case your model is actually focused on entirely the wrong object to make its predictions.

While there are several techniques such as GradCAM and its variants that are used to construct these class activation maps, Eigen-CAM differs due to its simplicity in implementation and plug-and-play integration, as it doesn’t require re-training or layer modification. Eigen-CAM computes and visualizes the principal components of the learned features/representations from the convolutional layers. This creates a heatmap that can be overlaid on top of the input image to highlight which parts of the image induced the greatest magnitude of activation from the convolutional layers. As convolutional layers grow more and more compact, they go from picking up smaller visual details to learning more global representations throughout the image, so the principal components extracted from one convolutional layer aren’t necessarily similar to ones extracted from a different layer. Typical visualizations might show heatmaps across several different layers or congregate them into a singular heatmap to highlight the overall focus of the image.

Class activation maps like these are most beneficial for computer vision tasks that require least detail in prediction, such as classification, in which the entire image is mapped to a single tag. It provides deeper insight into how a model is able to arrive at the predicted classification. Tasks such as object detection still benefit, but generally the model is more capable as it is trained to learn localization of objects, but can still be useful in spotting anomalies in detection. With tasks such as segmentation where the specific outline of the object is the output, the class activation map may not reveal much more information.

‍

Integrating Eigen-CAM with Ultralytics YOLOv8

Eigen-CAM can be integrated with any YOLOv8 models with relative ease. If you have not trained a YOLOv8 model before, you can easily do so on Datature’s Nexus platform. To reap the benefits of Eigen-CAM, we first train models for the tasks of classification and object detection.

Check out Section 4 of our How To Train YOLOX Object Detection Model On A Custom Dataset article for more details on creating a project, uploading and annotating your images and defining your training workflow. Since the article was published, we have added YOLOv8 Object Detection to the list of models we provide, and YOLOv8 Classification models will be coming very soon.

Training Ultralytics YOLOv8 Models on Datature Nexus

As shown above, when defining your workflow, you can right click anywhere in the canvas and hover over Models to be able to select Ultralytics YOLOv8 Object Detection and view the different base model options we provide: YOLOv8-Nano, Medium, Large, and Xtra-Large with various input dimensions from 320x320 to 2048x2048. To learn more about model selection, you can read more here.

Once your model has been trained, you can then opt to export the model as a PyTorch model, and save the model to your preferred local path. When running the Jupyter Notebook, ensure you have the additional files in this folder here as well as helper classes and functions will be needed. You can then run this Jupyter notebook and replace the local path in the notebook with yours, as well as replacing the local path of the input image. The expected output is a heatmap rendered over the input image, where the red spots indicate greater magnitude. If the red spots hover over areas that are visually significant in identifying the unique qualities of the object, then this is a great indicator that your model is learning correctly.

Sample Notebook for Running Eigen-CAM on YOLOv8 Object Detection Models

We have attached one of our notebooks using the Red Blood Cell dataset and a model that we have trained on the Datature Nexus Platform for a sample analysis.

As Eigen-CAM is a class activation map extraction technique, it will be able to construct class heatmaps for all possible classes that your model is trained on.

# Code Snippet from https://github.com/datature/resources/blob/main/example-scripts/explainable-ai/eigencam-yolov8/yolov8_cam/utils/svd_on_activations.py
U, S, VT = np.linalg.svd(reshaped_activations, full_matrices=True)
# Change 0 Here to Other Index to View Various CAMs
projection = reshaped_activations @ VT[0, :]

"""Adapted from https://github.com/rigvedrs/YOLO-V8-CAM"""
import numpy as np


def get_2d_projection(activation_batch, principal_comp=[0]):
    # TBD: use pytorch batch svd implementation
    activation_batch[np.isnan(activation_batch)] = 0
    projections = []
    for activations in activation_batch:
        reshaped_activations = (
            (activations).reshape(activations.shape[0], -1).transpose()
        )
        # Centering before the SVD seems to be important here,
        # Otherwise the image returned is negative
        reshaped_activations = reshaped_activations - reshaped_activations.mean(
            axis=0
        )
        U, S, VT = np.linalg.svd(reshaped_activations, full_matrices=True)

        projection = reshaped_activations @ VT[principal_comp, :].T
        projection = projection.reshape(list(activations.shape[1:])+[len(principal_comp)])
        projections.append(projection)
    return np.float32(projections)

To see the different class heatmaps, you can index into the output of the EigenCAM class instance and use each of those as an individual heatmap output. As shown below, with both classification and object detection, you will see the different parts of the image that the model associated with the various classes. As EigenCAM finds the salience features that are aligned in the direction of the principal components of the learned representation, we notice that the n-th eigenvector after the singular value decomposition shows various interesting activation maps.

Class Activation Map Computed Via Projections of Class Activated Output

This could be particularly revelatory for object detection, in which instances of different classes can appear in a single image. One can use these heatmaps to determine whether the correct visual features are being attributed to their associated classes.

What’s Next?

If you want to try out any of the features described above, please feel free to sign up for an account and train a model. You can also check out some of our own helpful visualization tools such as Advanced Evaluation as well. To access the Jupyter Notebook, you can just click here.

Our Developer’s Roadmap

Model explainability is an important part of democratizing computer vision. Machine learning as a field can feel intimidating and inaccessible because the theoretical knowledge needed to understand architectures is significant, but model explainability tools help to reduce these complexities into simple and meaningful outputs that are solid indicators in helping you with model training. Datature is committed to helping users feel this way regardless of any of the new tasks or architectures we release, so we will be adding tools like these for your use.

Want to Get Started?

If you have questions, feel free to join our Community Slack to post your questions or contact us about how explainable AI fits in with your usage.

For more detailed information about Eigen-CAM functionality, customization options, or answers to any common questions you might have, read more about the process on our Developer Portal.

References

Muhammad, M. B., & Yeasin, M. (2020, July). Eigen-cam: Class activation map using principal components. In 2020 international joint conference on neural networks (IJCNN) (pp. 1-7). IEEE. https://ieeexplore.ieee.org/abstract/document/9206626

Original Eigen-CAM Implementation: GitHub - jacobgil/pytorch-grad-cam: Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

Notebook adapted from this YOLOv8 Implementation: https://github.com/rigvedrs/YOLO-V8-CAM

Datature Resources Jupyter Notebooks:

https://github.com/datature/resources/blob/main/example-scripts/explainable-ai/eigencam-yolov8/eigencam_yolov8_cls.ipynb

https://github.com/datature/resources/blob/main/example-scripts/explainable-ai/eigencam-yolov8/eigencam_yolov8_od.ipynb

Resources