How can YOLO be used to accurately count the number of people in an image?

If you want to create a system that can count people, how would you do it?

It’s simple, just follow the tutorial in this article step by step, and all you need to prepare is a computer or laptop capable of running Python, and a web camera (you can also use a video as a substitute).

In the example below, we use the the 「Accurate human detection model」from yolo.dog , which can accurately detect parts of the human figure such as heads and bodies in images.

You could also opt for the general model provided on the official YOLOV8 website, but the results won’t be as good, especially for crowded groups, smaller figures, people in swimming pools, etc. For detecting most of the human body and head, using 「Accurate human detection model」is recommended.

  • Install the necessary Python packages
pip install ultralytics
pip install streamlit
pip install dill
  • Create a file named count.py, and then edit it with your preferred tool.
# Import necessary packages
import cv2
import math
import streamlit as st
from ultralytics import YOLO

Load the YOLOv8 model and define the names for the first and second classes (you can modify these class names as they will appear on the web page).

net = YOLO('yolov8s.pt')
classNames =["head", "body"]

Define three Streamlit components to display on the web page: the image, text for the head count, and text for the body count.

FRAME_WINDOW = st.image([])
TXT_HEAD_COUNT = st.markdown([])
TXT_BODY_COUNT = st.markdown([])

Set up to use the first (0) web camera.

# Use Webcam
cap = cv2.VideoCapture(0)

# If you are using a video file instead of a webcam, use the following code
cap = cv2.VideoCapture("demo.mp4")

Start reading image frames, and if a frame is obtained, begin the loop.

frameOK, img = cap.read()
while frameOK:

Send the image to the model net for detection and store the results in results. In addition, reset the head_count and body_count variables to zero, as the number of detected items in the image will be stored in these variables later.

    # Perform detection
    results = net(img, stream=True)

    # Parameters to stoure number
    head_count, body_count = 0, 0

For each result r detected, extract the bounding box, confidence, and class id. The detected class id can be converted to a name using the classNames defined earlier.

Finally, draw a rectangle with cv2.rectangle and print text on the image with cv2.putText.

for r in results:
    boxes = r.boxes

    for box in boxes:
        # bounding box
        x1, y1, x2, y2 = box.xyxy[0]
        x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) # convert to int values


        # confidence
        confidence = math.ceil((box.conf[0]*100))/100

        # class name
        cls = int(box.cls[0])
        if cls == 0:
            color = (0, 255, 0)
            head_count += 1
        else:
            color = (255, 0, 0)
            body_count += 1

        # object details
        org = [x1, y1]
        font = cv2.FONT_HERSHEY_SIMPLEX
        fontScale = 1
        thickness = 1

        cv2.rectangle(img, (x1, y1), (x2, y2), color, thickness)
        cv2.putText(img, classNames[cls], org, font, fontScale, color, thickness)

Convert the image format from BGR to RGB using cv2.cvtColor so that the correct colors are displayed on the Streamlit web page. Then, use the FRAME_WINDOW, TXT_HEAD_COUNT, and TXT_BODY_COUNT defined earlier to display the image and text respectively.

    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    FRAME_WINDOW.image(img)

    TXT_HEAD_COUNT.text("Detected people's head: {}".format(head_count))
    TXT_BODY_COUNT.text("Detected people's body: {}".format(body_count))

Finally, get the next frame from the webcam or video file to start a new loop.

frameOK, img = cap.read()

The following is the complete code.

# Import necessary libraries
import cv2
import math
import streamlit as st
from ultralytics import YOLO

# Load the YOLOv8 model
net = YOLO('yolov8s.pt')

classNames =["head", "body"]

FRAME_WINDOW = st.image([])
TXT_HEAD_COUNT = st.markdown([])
TXT_BODY_COUNT = st.markdown([])

# Use Webcam
cap = cv2.VideoCapture(0)
# Use a video file
#cap = cv2.VideoCapture("demo.mp4")

# Read the frame
frameOK, img = cap.read()
while frameOK:
    # Perform detection
    results = net(img, stream=True)

    # parameters to store the number of people head, body
    head_count, body_count = 0, 0

    # coordinates
    for r in results:
        boxes = r.boxes

        for box in boxes:
            # bounding box
            x1, y1, x2, y2 = box.xyxy[0]
            # convert to int values
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) 

            # confidence
            confidence = math.ceil((box.conf[0]*100))/100

            # class name
            cls = int(box.cls[0])
            # set color and add number
            if cls == 0:
                color = (0, 255, 255)
                head_count += 1
            else:
                color = (0, 255, 0)
                body_count += 1

            # object details
            org = [x1, y1]
            font = cv2.FONT_HERSHEY_SIMPLEX
            fontScale = 1
            thickness = 1
            #draw bbox and print text on image
            cv2.rectangle(img, (x1, y1), (x2, y2), color, thickness)
            cv2.putText(img, classNames[cls], org, font, fontScale, color, thickness)

    # Display the results using Streamlit
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    FRAME_WINDOW.image(img)

    TXT_HEAD_COUNT.text("Detected people's head: {}".format(head_count))
    TXT_BODY_COUNT.text("Detected people's body: {}".format(body_count))
    
    frameOK, img = cap.read()

Finally, save the count.py code file and execute the command below to open a web page that displays the detection results.

Of course, you can also change the 5998 port to another number.

streamlit run count.py --server.port 5998
+0