Object Detection made easier with IceVision (Part-2)

meghal darji
5 min readFeb 23, 2022

--

Hello and welcome back to part-2 of the object detection tutorial using IceVision. If you haven’t gone through part-1, please go through it.

Alright, let’s get started with the remaining steps in the pipeline which are:
a. Model Selection
b. Choosing the best learning rate and training
c. Inference/Testing

Model Selection

IceVision supports several state-of-art models to select from. RetinaNet, VFNet, and YOLO are a few examples. To learn more about the supported models, head over to IceVision.
To choose a model, we’ll have to select a library, a model that the library supports, and a backbone corresponding to the selected model. For this tutorial, I’ll show how you can train 2 models: RetinaNet and VFNet. You only need 3 lines worth of code to select a model and instantiate it:

Now, it’s time to load our data so that we can start training our model and decide on a metric to evaluate the performance of our model during training. For this tutorial, we’ll choose the COCO metric that computes the Mean Average Precision (mAP).

Training

IceVision being an agnostic framework allows you to plug it into other deep learning frameworks. You can train your models either using fastai2 or Pytorch-lightning. Let’s walk through both of these methods.

I. Training using fastai2

Before we train the model, it is important to find an optimal learning rate for faster learning. The lr_find method find the best learning rate to train the model on. This is accomplished by the code block below:

The next step is to start the training. We’ll train our model for 50 epochs.

2. Training using PyTorch-lightning

Create a new model class LightModel ,create an instance of this model class, and train this model instance as shown below.

Once the model has been trained, you can save the model weights along with the meta-data using the save_icevision_checkpoint() method.

Let us take a look at how our models perform when trained for 50 epochs and 20 epochs. We’ll also compare the training progress after 20 epochs for VFNet and RetinaNet.

VFNet v/s RetinaNet after 20 epochs of training

As you can see in the pictures above, after 20 epochs the COCO metric for VFNet is 0.72 and that of RetinaNet is 0.677. Although the difference might seem small, but a difference of 0.043 for an object detection task is substantial.

Training the VFNet for 50 epochs results in a COCO metric score of 0.734 which is a very small improvement over the 30 additional epochs.

Inference

Now that we have our model trained with 0.738 mAP, it’s time to test its performance on novel images. This can be done by either the end2end_detect() method which is used for single image inference or using predict_from_dl() method which is used for batch inference.

To perform single image inference, pick your test images directory and load the image as shown below

Now all you have to do is use the end2end_detect() as shown below. Let’s take a look at a few predictions made by our VFNet on new images.

Inference results

To perform batch inference, we will use the infer_dl data loader and get predictions on this data loader. The predictions can be visualized by comparing them to the original images using the show_preds() method

That’s it, We’ve done it, we have successfully built an object detection model that can now detect 4 types of road/traffic signs. You can try using any other dataset of your own choice and train any model of your choice on that data. It was easy, wasn’t it? Before wraping up let’s go through all the steps briefly:

  1. We collected our data from Kaggle using the Kaggle API
  2. Using the parser object, we parsed the annotations
  3. We used the Albumentations library to create more training data
  4. We selected 2 different models to compare the results
  5. We trained the model in 2 different ways: using fastai2 and PyTorch
  6. Finally, we tested our model on novel images.

That’s it for this tutorial. I will be coming up with more tutorials in the future. This is just the first 2-part blog post on an object detection project that I am currently working on where the main aim is to build an end-to-end pipeline with a good feedback loop. To give you a glimpse into the future, I will be working on data versioning, monitoring the model, productionization, and a lot more. I can’t give away everything now! I will be sharing my progress regularly on LinkedIn and Twitter so feel free to connect/follow me.

--

--

meghal darji
meghal darji

Written by meghal darji

Computer Vision Enthusiast | Experienced in Deep Learning

No responses yet