Object Detection made easier with IceVision (Part-2)
Hello and welcome back to part-2 of the object detection tutorial using IceVision. If you haven’t gone through part-1, please go through it.
Alright, let’s get started with the remaining steps in the pipeline which are:
a. Model Selection
b. Choosing the best learning rate and training
c. Inference/Testing
Model Selection
IceVision supports several state-of-art models to select from. RetinaNet, VFNet, and YOLO are a few examples. To learn more about the supported models, head over to IceVision.
To choose a model, we’ll have to select a library, a model that the library supports, and a backbone corresponding to the selected model. For this tutorial, I’ll show how you can train 2 models: RetinaNet and VFNet. You only need 3 lines worth of code to select a model and instantiate it:
Now, it’s time to load our data so that we can start training our model and decide on a metric to evaluate the performance of our model during training. For this tutorial, we’ll choose the COCO metric that computes the Mean Average Precision (mAP).
Training
IceVision being an agnostic framework allows you to plug it into other deep learning frameworks. You can train your models either using fastai2 or Pytorch-lightning. Let’s walk through both of these methods.
I. Training using fastai2
Before we train the model, it is important to find an optimal learning rate for faster learning. The lr_find
method find the best learning rate to train the model on. This is accomplished by the code block below:
The next step is to start the training. We’ll train our model for 50 epochs.
2. Training using PyTorch-lightning
Create a new model class LightModel ,
create an instance of this model class, and train this model instance as shown below.
Once the model has been trained, you can save the model weights along with the meta-data using the save_icevision_checkpoint()
method.
Let us take a look at how our models perform when trained for 50 epochs and 20 epochs. We’ll also compare the training progress after 20 epochs for VFNet and RetinaNet.
As you can see in the pictures above, after 20 epochs the COCO metric for VFNet is 0.72 and that of RetinaNet is 0.677. Although the difference might seem small, but a difference of 0.043 for an object detection task is substantial.
Training the VFNet for 50 epochs results in a COCO metric score of 0.734 which is a very small improvement over the 30 additional epochs.
Inference
Now that we have our model trained with 0.738 mAP, it’s time to test its performance on novel images. This can be done by either the end2end_detect()
method which is used for single image inference or using predict_from_dl()
method which is used for batch inference.
To perform single image inference, pick your test images directory and load the image as shown below
Now all you have to do is use the end2end_detect()
as shown below. Let’s take a look at a few predictions made by our VFNet on new images.
To perform batch inference, we will use the infer_dl
data loader and get predictions on this data loader. The predictions can be visualized by comparing them to the original images using the show_preds()
method
That’s it, We’ve done it, we have successfully built an object detection model that can now detect 4 types of road/traffic signs. You can try using any other dataset of your own choice and train any model of your choice on that data. It was easy, wasn’t it? Before wraping up let’s go through all the steps briefly:
- We collected our data from Kaggle using the Kaggle API
- Using the parser object, we parsed the annotations
- We used the Albumentations library to create more training data
- We selected 2 different models to compare the results
- We trained the model in 2 different ways: using fastai2 and PyTorch
- Finally, we tested our model on novel images.
That’s it for this tutorial. I will be coming up with more tutorials in the future. This is just the first 2-part blog post on an object detection project that I am currently working on where the main aim is to build an end-to-end pipeline with a good feedback loop. To give you a glimpse into the future, I will be working on data versioning, monitoring the model, productionization, and a lot more. I can’t give away everything now! I will be sharing my progress regularly on LinkedIn and Twitter so feel free to connect/follow me.