Model Inference

Model inference is the process of running a trained model on new input data to generate predictions. Training is where the model learns; inference is where it puts that learning to use. In computer vision, inference means feeding an image or video frame through the network and getting back bounding boxes, class labels, segmentation masks, keypoints, or embeddings.

Inference performance is measured in latency (milliseconds per image) and throughput (images per second). These depend on model size, input resolution, hardware (CPU vs. GPU vs. dedicated accelerator), batch size, and optimization level. A ResNet-50 classifier might run at 5ms per image on a modern GPU but 200ms on a CPU. YOLO26 achieves real-time detection at 30+ FPS on an NVIDIA Jetson by being designed specifically for fast inference with efficient backbone operations.

Production inference requires more than just running the model. It includes preprocessing (resizing, normalizing, padding inputs to expected format), post-processing (NMS for detection, thresholding for segmentation, decoding model outputs into usable coordinates), batching (grouping multiple inputs for parallel processing on GPU), and serving infrastructure (REST APIs, gRPC endpoints, model version management). Optimization techniques like quantization (INT8 inference), TensorRT compilation, and ONNX Runtime acceleration can deliver 2-5x speedup over naive PyTorch inference. Datature supports inference through its API deployment and Outpost edge deployment features.

Resources

Relevant Blog Posts ↘

Glossary

Our Blog

Documentation

Exploring MemryX AI Accelerator Performance with Datature Vision Models

MIN READ

March 4, 2026

Datature partnered with MemryX to test the MX3 M.2 AI Accelerator, achieving 18× faster inference speeds and strong accuracy (mAP 0.90) using a YOLOv8 Nano model. Together, Datature’s vision AI platform and MemryX’s efficient edge hardware enable fast, cost-effective, and privacy-focused computer vision deployment from cloud to edge.

Read

Building a Simple Inference Dashboard with Streamlit

MIN READ

March 4, 2026

Streamlit empowers users to visualise inference results using custom trained computer vision models while staying within the ease of a Python environment.

Read

How To Use API Deployment For Trained Model Inference

MIN READ

March 4, 2026

We are excited to introduce REST API deployment - a new way for users to access their trained machine learning model with low code requirements.

Read

Get Started Now

Get Started using Datature’s computer vision platform now for free.

Book Demo