When to Stop Training

Training doesn’t need to run β€œforever”. In real projects, the best results come from stopping at the right moment:

  • not too early (model hasn’t learned yet)

  • not too late (model starts to overfit / memorize)

circle-info

In AugeLab Studio, training usually ends when it reaches the configured max iterations, or when you click Stop Training. The Training Chart helps you decide whether it’s worth continuing.

If this is your first training, start with the Starter Checklist.

Monitor Training Progress

During training, monitor the progress of the model and watch the relationship between:

  • Loss

  • mAP

  • IOU

  • Iterations

Loss and mAP are shown on a chart like below:

Good training example chart
Example: Good training (loss decreases, mAP increases then plateaus)
circle-exclamation

Quick Rule (what usually works)

If you only remember one rule:

Stop when validation mAP stops improving for a long time, or when it starts going down while loss keeps going down.

That second case is the classic sign of overfitting.

Common Training Patterns (cheat sheet)

These patterns are common in real use. For each one, look at the chart first, then read the explanation.

circle-info

These example charts are generated for training/documentation purposes. In your repository, place them under the .assets/ folder next to this page.

Insufficient data

Insufficient data example chart
Insufficient data: too few points / too short run (noisy early metrics)

Explanation:

  • What it means: you don’t have enough signal yet to trust the trend.

  • Likely causes: too few images, too short run, weak/too small validation split.

  • What to do: train longer; add data; ensure validation exists and includes real variety.

Low variance

Low variance example chart
Low variance: loss plateaus and mAP barely improves

Explanation:

  • What it means: the model learns the β€œeasy repetition” quickly, then stops getting new information.

  • Likely causes: repetitive dataset (same background/angle/light), missing negatives, missing edge cases.

  • What to do: add variety (angles, backgrounds, lighting), add negatives, capture hard cases on purpose.

Overtraining

Overtraining example chart
Overtraining: loss keeps improving, but mAP peaks (even very high) and then degrades

Overtraining is not always catastrophic, but it usually indicates memorization rather than generalization. For strict environments (fixed camera, fixed lighting), it is acceptable.

  • What it means: the model is getting better at the training set, but worse at validation (memorization).

  • Likely causes: not enough variety, too-small validation, duplicates/near-duplicates.

  • What to do: stop and keep best weights; add more variety; increase validation split; remove duplicates.

Model not learning

Model not learning example chart
Model not learning: loss stays high/flat, mAP stays near zero

Explanation:

  • What it means: training is not progressing in a meaningful way.

  • Likely causes: wrong labels/classes, class IDs mismatch, broken annotation format, incorrect config/settings.

  • What to do: verify .names order vs label IDs; spot-check labels; confirm YOLO format; adjust training settings.

Corrupted dataset

Corrupted dataset example chart
Corrupted dataset: unstable loss spikes and erratic mAP

Explanation:

  • What it means: training is being disrupted by inconsistent or broken data.

  • Likely causes: corrupted image files, invalid labels, mixed sources/resolutions, β€œempty-but-contains-objects” images.

  • What to do: run dataset checks; remove corrupted data; fix label format; re-export a clean set.

Good training

Good training example chart
Good training: steady learning and a stable high plateau

Explanation:

  • What it means: healthy learning and generalization.

  • Likely causes: consistent labels + enough variety.

  • What to do: stop when mAP plateaus; validate on real footage / a β€œgolden set”; deploy best weights.

Loss

Loss is a training-fit signal. It represents how well the model is fitting the training batches.

Loss is useful, but it can be misleading:

  • Loss can keep decreasing even when the model is already overfitting.

  • Loss does not guarantee β€œreal-world performance”.

circle-info

Loss alone is not enough to judge accuracy. Use mAP to understand generalization on validation data.

**2.0 β‰₯** Loss

Often indicates β€œlearning has started”, but quality may still be poor. Use it as a sign that the pipeline works, not as a finish line.

circle-exclamation

**1.0 β‰₯** Loss

Commonly a usable baseline on many focused datasets.

**0.5 β‰₯** Loss

Often indicates a well-fit model on a clean, consistent dataset. After this point improvements can be slow, and overfitting risk increases.

chevron-rightLoss thresholds are not universal (why)hashtag

Loss values depend on model architecture, image size, classes, label noise, augmentation, and dataset complexity. Use loss thresholds to build intuition, not as a universal pass/fail.

mAP

The mAP (mean average precision) metric combines both precision and recall to provide a comprehensive evaluation of the model's accuracy in detecting objects in an image.

It is calculated by evaluating predictions against ground-truth labels at specific IoU thresholds (exact details depend on the training backend/settings).

circle-exclamation

Practical interpretation:

  • A stable plateau is often more important than chasing the last +1%.

  • Very high mAP (like 95–99%) on a small or repetitive dataset is a common overfitting trap.

  • If mAP peaks then drops, see Over-Fitting.

IOU

IOU (Intersection over Union) measures the overlap between predicted and true bounding boxes for individual object detections. mAP evaluates the overall performance of the object detection model across all object categories, considering both precision and recall.

circle-info

Higher the IOU value, tighter the predicted box is.

You can track each IOU in Training Window loggings:

Fine Tuning

Training Time

Define a maximum training time budget based on available computational resources and project constraints. If the model does not achieve satisfactory performance within the allocated time, consider stopping training and exploring other approaches such as:

  • Manually analyze annotation accuracy

  • Check class variety

  • Choose different model sizes and batch sizes

  • Increase database size

Over-Fitting

Avoid overfitting by monitoring how mAP behaves over time.

The most reliable β€œreal life” overfit signal is:

loss decreases, but mAP peaks and then gets worse.

Overfitting is not always β€œcatastrophic” on very constrained, fixed-camera setups. But if you care about robustness (different lighting, different shifts, different backgrounds), overfitting will show up quickly.

What usually helps:

  • Add more variety (new days, new lighting, new backgrounds)

  • Add negatives that look like your real environment

  • Tighten label consistency (same style across labelers)

  • Increase validation split so mAP is harder to β€œcheat”

Overtraining example chart
Overtraining example: mAP peaks then drops while loss continues decreasing

Balancing Time and Performance

Balance the training time with the desired model performance. In some cases, additional training iterations may improve performance, but the returns may diminish over time. Weigh the benefits against the computational cost and the urgency of the project.

Usually, depending on class numbers and database size, training process length can vary between a day or a week.

Starter Checklist

Database:

Model:

Training (stop if):

chevron-rightFast debugging checklist (when things look wrong)hashtag
  1. Spot-check 20–50 images across the dataset (not just the first page)

  2. Confirm class mapping:

  • .names file order matches label IDs

  • no missing/extra classes

  1. Spot-check label files:

  • YOLO format: class x_center y_center width height (normalized)

  • boxes are in-bounds and not zero-sized

  1. If mAP looks β€œtoo good to be true”:

  • validation split may be too small or too similar to training

  • you may have duplicates / near-duplicates

  1. If training is unstable or OOM:

  • increase subdivisions or reduce batch

  • temporarily reduce input resolution to debug

Last updated

Was this helpful?