Vehicle detection is a vital component of traffic management and road safety. With the increasing use of traffic cameras and drones, the ability to detect and classify different types of vehicles in real-time video feeds has become more important than ever. In this article, we will explore the different techniques used for vehicle detection and how they can be implemented using real-time video feeds.
Deep Learning for Vehicle Detection
One of the most popular techniques for vehicle detection is deep learning. With the advancements in computer vision and neural networks, deep learning has become a powerful tool for object detection. The most commonly used deep learning architecture for vehicle detection is the convolutional neural network (CNN). CNNs are able to learn features from the images and detect objects based on these features.
One of the most popular deep learning frameworks for vehicle detection is TensorFlow. The TensorFlow Object Detection API is a powerful tool that can be used to train a model for vehicle detection. The API provides pre-trained models and tools for training custom models.
Coding examples for vehicle detection using TensorFlow Object Detection API
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = '/path/to/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = '/path/to/label_map.pbtxt'
# Number of classes to detect
NUM_CLASSES = 1
# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
# Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
# Detection
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Load image using OpenCV
image = cv2.imread('path/to/image.jpg')
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_expanded = np.expand_dims(image, axis=0)
# Perform the actual detection by running the model with the image as input
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_expanded})
# Draw the results of the detection (aka 'visulaize the results')
vis_util.visualize_boxes_and_labels_on_image_array(
image,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
# Save the image with the detections
cv2.imwrite('path/to/image_with_detections.jpg', image)
Example #2
Here is how you could implement vehicle detection in real-time video feeds using the popular object detection framework, YOLO (You Only Look Once):
import cv2
import numpy as np
# Load the YOLO model
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = []
with open("coco.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))
# Get the video feed
cap = cv2.VideoCapture("traffic.mp4")
while True:
_, frame = cap.read()
height, width, channels = frame.shape
# Detect objects in the video feed
blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 16), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
# Show the detected objects and their classifications on the video feed
class_ids = []
confidences = []
boxes = []
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
# Object detected
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
for i in range(len(boxes)):
if i in indexes:
x, y, w, h = boxes[i]
label = str(classes[class_ids[i]])
color = colors[class_ids[i]]
cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
cv2.putText(frame, label, (x, y + 30), cv2.FONT_HERSHEY_PLAIN, 3, color, 2)
cv2.putText(frame, str(round(confidences[i],2)), (x, y + 60), cv2.FONT_HERSHEY_PLAIN, 2, color, 2)
# Show the processed video feed
cv2.imshow("Video Feed", frame)
cap.release()
cv2.destroyAllWindows()