Train an object detection model

Create a custom model for use with ArcGIS Survey123 smart assistants

Barbara Webster

February 28, 2024

What is a Smart Assistant?

Smart annotation generated from an object detection model

ArcGIS Survey123 supports using a smart assistant to enhance image collection with machine learning models. This includes storing EXIF metadata in an image, annotating an image, or redacting an object in an image.

To use a smart assistant, survey authors configure a survey to reference a TensorFlow Lite model that is stored in the survey files. Smart assistants can be used with either an image classification model or an object detection model.

For more information on smart assistants in Survey123 see the smart assistants documentation .

Continue reading to learn how to create your own custom model and use it in a survey.

Workflow to create a model

The steps for creating a TensorFlow Lite model are to first prepare a collection of training images. Next use the images to train a model using the efficientDet class of the arcis.learn module of the ArcGIS API for Python. Once the model files are created, use them in Survey123 to test the model's predictions.

The training notebook can be downloaded here .

1. Prepare training data

The training dataset consists of a collection of photos of the objects of interest along with a boundary box and label for each object depicted.

Images

Source - Find images online or take photos on your own. Some organizations may already have collections of images depicting the objects of interest. Ensure that you comply with any licensing restrictions when procuring photos online.

File format - Ensure that all your images are in JPG format.

Relevance - Opt for images that closely resemble the images you will use in the smart assistants workflow. This helps in training a more effective and applicable model. Keep in mind that photo angle, distance to object, lighting, and environmental conditions can effect the appearance of an object in an image.

Quantity - Keep in mind that while more images generally lead to better model performance, it comes at the cost of longer training times and increased computing power requirements.

Labels

Labelling software - Label the images using software for creating PASCAL VOC training sets such as labelImg . For more details see https://pypi.org/project/labelImg , and for easy installation on Windows see the installer file here .

Quantity - Limit the number of annotations to a maximum of 500 per image. This also sets the maximum detection output from an image to 500.

Bounding box configuration - Ensure that each bounding box is at least 8 pixels by 8 pixels. Additionally, set a minimum size requirement as 0.01 times the length of the shorter side of the image.

File organization

Create a folder of training dataset named “train” with an “images” and a “labels” folder inside. Use them for saving images and labels while using labelImg.

2. Train a model

Step 1: Clone your ArcGIS Pro environment.

Step 2: Install the deep-learning-essentials package in ArcGIS Pro.

Step 3: Open the sample notebook provided above in ArcGIS Pro. Follow the instructions included in the notebook and run the notebook.

3. Test the model

Next add the model to Survey123 to use with a smart assistant.

Download the Object detection test model survey.zip file.
Open Survey123 Connect.
Create a new survey based on the downloaded XLSForm by dragging the XLSForm into the main gallery in Connect.
In Connect, click the File button to open the survey files.
Paste the 'scripts' folder and its contents into the survey files folder.
In Connect click the XLSForm button to open the XLSForm associated with the newly-created survey.
Update the model names listed in the bind::esri:parameters columns (in cells X3 and X9). Replace "[myModel]" with the name of your TFLite file and save.
In Connect click the File button again to open the survey files.
Copy the TFLite file and .emd files that you created into the media folder.
Update and publish your survey to test your model. This survey contains one image question with smart attributes, and a second image question with smart annotation. Both have previews enabled so you will be able to see the objects that are detected in the camera preview. There are also two note questions demonstrating methods to extract data from the image with smart attributes enabled.