Precision and recall are the two basic metrics for evaluating how well a model detects or classifies things. Precision answers "of everything the model flagged as positive, how much was actually correct?" It's calculated as TP / (TP + FP). Recall answers "of everything that actually exists, how much did the model find?" It's calculated as TP / (TP + FN). Together, they capture the two ways a model can fail: false alarms (hurting precision) and missed detections (hurting recall).
There's a natural tension between them. Lowering the confidence threshold means the model flags more detections, catching more true positives (higher recall) but also introducing more false positives (lower precision). Raising the threshold does the opposite. The precision-recall curve plots this tradeoff across all thresholds, and the area under it gives Average Precision (AP). In object detection, these metrics are computed at specific IoU thresholds: a prediction only counts as a true positive if it overlaps sufficiently with a ground truth box.
Which metric matters more depends on the application. Medical screening prioritizes recall because missing a tumor is worse than a false alarm. A self-driving car's emergency braking system may prioritize precision to avoid unnecessary hard stops. The F1 score (harmonic mean of precision and recall) gives a balanced single number when you care about both equally.

