Video classification assigns a label to an entire video clip based on the action or event it contains. Instead of analyzing individual frames, the model processes temporal patterns across the full sequence. This tutorial walks through training a MoViNet video classification model on Datature Nexus in about seven minutes.
What This Tutorial Covers
- Uploading video data to Datature Nexus
- Labeling video clips with activity or event categories
- Selecting MoViNet as the training architecture
- Configuring training settings and launching the run
- Reviewing per-clip predictions on held-out test data
Why MoViNet
MoViNet (Mobile Video Networks) was designed by Google specifically for video understanding. It processes temporal information across frames rather than treating each frame as an independent image. That matters because the same frame can mean different things depending on what comes before and after it. A raised hand in one context is a wave; in another, it's a throw.
Where Video Classification Applies
Sports analytics: identifying play types, scoring events, or athlete actions from game footage. Security: detecting unusual activities (fights, falls, perimeter breaches) in surveillance streams. Manufacturing: classifying production stages or flagging process deviations on assembly lines. Content moderation: screening user-uploaded video for prohibited content. Medical: analyzing surgical procedures or patient movement patterns.
Any task that requires understanding what happens over time in a video sequence (not just what a single frame looks like) fits this model type.

