Object Detection made easier with IceVision (Part-1)

meghal darji
5 min readFeb 23, 2022

Have you ever attempted to build an object detection model but gave up because it got overwhelming? Maybe you should give IceVision a try.

IceVision is an agnosticComputer Vision framework that simplifies the Object Detection pipeline and allows you to get your object detection model up and running within a few minutes.

In this 2-part blog post, I will show you how to build your own object detection model with any dataset you want. We will also compare 2 different model architectures: RetinaNet and VFNet to see which one works better. So open your notebooks and we’ll get started. Ohh, I meant your jupyter notebook. You might want to use Google Colab because we’ll be using GPUs during training.

Let’s first go over the pipeline to get a better understanding of the tasks at every step.

Object Detection Pipeline

  1. Data Collection
  2. Data Parsing
  3. Creating Data Augmentations and Transformations
  4. Model Selection
  5. Training the model
  6. Inference/Testing

Data Collection

It goes without saying any AI model needs a good amount of data to train on. There’s no way around it. But I am guessing you have your data ready since you’re here. If not, I would suggest you start looking for datasets for objects that you want your model to detect. You can find a dataset of your interest on any of the following:

a. Kaggle: https://www.kaggle.com/datasets
b. VisualData Discovery: https://visualdata.io/discovery
c. Roboflow: https://public.roboflow.com/

Setting up the notebook

Once you have your dataset ready, it’s time to get started with code. We’ll start with all the installations and imports. In your jupyter notebook, type up the following lines:

Here I will be using the “Road Sign Detection” dataset from Kaggle which requires the installation of the Kaggle library. So if you plan on using any dataset from Kaggle, this is how you do it.

Data Parsing

Object Detection data labels consist of bounding box coordinates and class labels. Such labels are usually stored in the Pascal VOC format or the COCO format. We need to parse the data so that it can be fed to a Neural Network model. IceVision supports VOC and COCO format parsing out-of-the-box. The parser class in IceVision provides methods to parse VOC or COCO data.

You can check if the data has been successfully parsed by using the class_map attribute of the parser object:

The class_map should return dictionary-like data containing the class labels as the key and integers as the corresponding values. It would look something like this: <ClassMap: {‘background’: 0, ‘trafficlight’: 1, ‘speedlimit’: 2, ‘crosswalk’: 3, ‘stop’: 4}>

Data Augmentations and Transformation

One of the most crucial aspects of any Computer Vision task is data augmentation and transformation for robust training and better results. IceVison supports the Albumentations library to perform data transformations. The aug_tfms randomly applies various transformations such as rotation, cropping, and horizontal flips. We will apply the transformations to the training set. So let’s create the transformations and apply them:

Once, we apply the transformations it is important to visualize them before we start training the model. You can visualize the samples using the following code:

You can see some of the samples below. Notice how the Bounding Box coordinates are also transformed. This saves a lot of time which would be required to annotate the transformed images without IceVision.

I think that is enough for the 1st part of Object detection. We’ll cover the rest in Part-2. Let’s summarize what we’ve done so far:

a. We set up our jupyter notebooks by installing and importing the required libraries.
b. Collected our Data and parsed it.
c. Applied Augmentation and transformation to our images.

In the next part, we’ll go through the following steps:
a. Choosing the Model
b. Finding the best learning rate and training the model
c. Testing the model on novel images (Inference)

See you in Part-2. Cheers!

This is the first blog post on an object detection project that I am currently working on where the main aim is to build an end-to-end pipeline with a good feedback loop. To give you a glimpse into the future, I will be working on data versioning, monitoring the model, productionization, and a lot more. I can’t give away everything now! I will be sharing my progress regularly on LinkedIn and Twitter so feel free to connect/follow me.

--

--

meghal darji

Computer Vision Enthusiast | Experienced in Deep Learning