Introducing Notate ML

Rajaram Gurumurthi
3 min readJan 24, 2022

Create object detection datasets from the comfort of your phone

Click, Tag, Download

Preparing input image datasets is a critical and challenging task in any Machine Learning project. Notate ML, now available on the App Store, brings the power and usability of Apple’s mobile devices to accelerate this task and deliver higher quality training data to your object detection model.

The typical object detection computer vision task involves processing a two-dimensional color image (either a photo or a frame in a video) to identify objects of interest, and drawing labelled bounding boxes around them. As of this writing, there are two broad classes of bounding box object detectors;

Two-stage detectors: In stage 1, the detector identifies “regions of interest” where it thinks objects might exist. In stage 2, it checks if any of these regions contain a known object. If an object is recognized, then the detector also predicts its bounding box. The R-CNN architecture series apply this approach to great effect to achieve state of art precision on COCO and other datasets (see reference #3).

One-stage detectors: Believing that there is only one life (per soul) and that there is no time for two stages, these detectors classify and predict bounding boxes in one pass of the image through the neural network. The YOLO (You-Only-Look-Once) series is the prime example of this class of detectors. YOLOv3 (the last version published by its original authors) achieves close to best in class precision at blinding speeds, and has since been surpassed by YOLOv4 and v5.

While architecture selection and fine tuning is a fascinating intellectual activity, building an effective detector also includes the equally important, labor intensive and deeply unglamorous task of preparing the training data.

The intent of Notate ML is to make this latter part a bit easier by a) allowing you to tag objects at the point of photo capture and b) taking advantage of the enhanced usability of mobile devices.

Notate ML enables you to:

  • Create a dataset and type, scan, or speak your labels into it
  • Snap new pictures, or import old ones from your photo library
  • Crop images, draw bounding boxes, tag and label objects of interest
  • Export images and annotations for training with YOLO, Apple Create ML, or Google Auto ML

Each object detection training framework accepts input data in different schemas, with the most important difference being how we specify the bounding boxes.

Bounding box coordinates by framework

Some tips for using this app effectively:

  1. Keep your data sets small (< 1000 images). Smaller data sets are easier to navigate and they provide additional flexibility in choosing which ones you want to use for training and validation.
  2. At a minimum, keep your validation datasets separate from the training datasets. This helps avoid leakage of training photos into the validation dataset causing artificial inflation of validation scores.
  3. Crop your photos before tagging them. While mobile devices allow you to capture very high resolution photos (e.g. 4032 x 3024 for a Dual Camera), they are typically shrunk to much smaller sizes (e.g. 416 x 416 for YOLO) during training, causing significant loss of information and reduced scores.
  4. While importing photos and images from external sources, ensure that you are using them in accordance with their license and usage restrictions.

Please log your issues and suggestions here.


  1. Joseph Redmon, Ali Farhadi. Yolov3: An Incremental Improvement —
  2. Mingxing Tan, Ruoming Pang, Quoc V. Le. EfficientDet: Scalable and Efficient Object Detection. CVPR, 2020.
  3. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS, 2015.
  4. Formatting a Training Data CSV | Automl Vision Object Detection | Google Cloud. Google.
  5. Training Object Detection Models in Create ML. WWDC NOTES, Apple.