/Hand Gesture Recognition Using Python And OpenCV
hand-gesture-recognition-using-python-and-opencv

Hand Gesture Recognition Using Python And OpenCV

Gesture Recognition has been very interesting and popular problem in the field of computer vision for a long time.From last few years, you will see that most of the futuristic companies are trying to build a good product based on the gesture recognition. Some of them got success to build such product which solves this issue little bit but there are also so many things have to do right now to become this a great product. So In this tutorial, we are going to make an app that can detect your gesture and then perform the action based on your gesture using Python and OpenCV.

What is Gesture Recognition

Gesture recognition is the mathematical interpretation of a human motion by a computing device. In personal computing, gesture is most often used as an input command.Using the concept of gesture recognition, a human can communicate with the machine with more easily. Gesture Technology is one of the most successful to date.Gesture Recognition has so many applications in improving human and machine interaction.

What We Are Going To Do

Let’s suppose we have a camera and when somebody makes any hand gesture in front of the camera then camera will react and perform the action based on the gesture. Sae thing we are going to do in this tutorial. We will detect the hand gesture using Python and OpenCV and perform some action based on it.There are many algorithms out on the Internet that gives very good and accurate results on gesture recognition but we keep it as simple as possible.

Gesture Recognition Using Python And OpenCV

Prerequisites

Here I am assuming that you have the basic knowledge in Python, OpenCV and Numpy. In addition to this, you must be familiar with Image Processing and some basic operation on the image like segmentation and thresholding. You must have a yellow piece of paper that can be worn in a finger for the image segmentation. S let’s  start working on this cool tutorial.

Step By Step Flow Of Gesture Recognition

Since I am using only image processing for the gesture recognition, I will be using only the direction of movement to determine the gesture.

  • Take one frame at a time and convert it from RGB colour space to HSV colour space for better yellow colour segmentation.
  • Use a mask for yellow colour.
  • Blurring and thresholding the mask.
  • If a yellow colour is found and it crosses a reasonable area threshold, we start to create a gesture.
  • The direction of movement of the yellow cap is calculated by taking the difference between the old centre and the new centre of the yellow colour after every 5th iteration or frame.
  • Take the directions and store in a list until the yellow cap disappears from the frame.
  • Process the created direction list and the processed direction list is used to take a certain action like a keyboard shortcut.

Let’s Start Some Coding For Gesture Recognition

gesture_action.py

Let us begin with all the important imports and a few global variables.

import cv2
import numpy as np 
from collections import deque
import pyautogui as gui
from gesture_api import do_gesture_action

cam = cv2.VideoCapture(0)                                      # Camera Object
yellow_lower = np.array([7, 96, 85])                           # HSV yellow lower
yellow_upper = np.array([255, 255, 255])                       # HSV yellow upper
screen_width, screen_height = gui.size()
camx, camy = 360, 240                                          # Resize resolution
buff = 128
line_pts = deque(maxlen = buff)                                # Create a deque data structure which store the present location of centre point of the yellow patch

The gesture_api is a different file that I created. do_gesture_action is a function in that file. yellow_lower and yellow_upper can be determined by using this python program. So in your case, these values might be different in different lighting conditions. The easiest way to use it is to put the yellow paper in front of the camera and then slowly increasing the lower parameters(H_MIN, V_MIN, S_MIN) one by one and then slowly decreasing the upper parameters (H_MAX, V_MAX, S_MAX). When the adjusting has been done you will find that only the yellow paper will have a corresponding white patch and rest of the image will be dark.

Now let’s get into the main function and some of its local variables

def gesture_action():                                 
    centerx, centery = 0, 0                                    # Present location of the centre of the yellow patch
    old_centerx, old_centery = 0, 0                            # Previous location of the centre of the yellow patch                            
    area1 = 0                                                  # Area of the yellow patch
    c = 0                                                      # Stores the number of yellow objects in the picture
    flag_do_gesture = 0                                        # If a gesture has been completed then this flag is 1
    flag0 = True                                               # Checks if a yellow object is present in the frame

    created_gesture_hand1 = []                                 # stores the direction of the movement

With that out of the way, we can now extract each frame and do the operations as required. These are steps we will be doing

    1. Get a frame
    2. Flip and resize the image to 360*240 for faster processing
    3. Convert the frame from RGB colour space to HSV colour space
    4. Now we will be using the yellow colour mask to segment the yellow colour
    5. Because every camera has some flaws in them which introduces some error in the frame hence we need to reduce the noise in the image and the easiest way to do that is too heavily blur the frame.
    6. Now if we set the colour threshold to any colour which is black then we can get the almost exact shape of the yellow patch.
    7. Take the contour of the thresholded frame.
    8. Repeat the above steps for every frame
while True:
    _, img = cam.read()

    # Resize for faster processing. Flipping for better orientation
    img = cv2.flip(img, 1)
    img = cv2.resize(img, (camx, camy))

    # Convert to HSV for better color segmentation
    imgHSV = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

    # Mask for yellow color
    mask = cv2.inRange(imgHSV, yellow_lower, yellow_upper)

    # Bluring to reduce noises
    blur = cv2.medianBlur(mask, 15)
    blur = cv2.GaussianBlur(blur , (5,5), 0)

    # Thresholding
    _,thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
    cv2.imshow("Thresh", thresh)

    _, contours, _ = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)

After getting the contours we can have 2 cases-

  1. A number of contours is greater than zero then yellow coloured objects are in the frame.
  2. A number of contours is zero then no yellow coloured objects are in the frame.
Case 1- Yellow coloured objects in the frame
  1. Assign 0 to flag_do_gesture.
  2. Take the contour that has the maximum area. Let us call this max_contour.
  3. Find a minimum area rectangle that surrounds the max_contour.
  4. Take the width and height of the rectangle.
  5. Find the area of the rectangle by width*height.
  6. If the area crosses a reasonable threshold then start making a gesture. I found the threshold experimenting with different values and in my case, it was 450.
  7. If the area of the contour crosses the threshold then find the centre of the yellow object.
  8. Draw a rectangular box around it.
  9. Draw a dot at the centre.
  10. Append the centre to the deque line_pts.
  11. Update the centre after every 5th iteration or frame.
  12. At the 5th iteration take the difference between the old center (x1, y1) and new center (x2, y2). I have used diffx = (x2-x1) and diffy = (y2-y1).
  13. Hence values of diffx and diffy give us the direction of movement.
  14. If the flag0 is False then append the direction to the created_gesture_hand1 list.
  15. Draw a line for all the points in line_pts
  16. Assign False to flag0.
Case 2- No yellow coloured objects in the frame
  1. Empty the deque line_pts.
  2. Process the created_gesture_hand1 by removing the ‘St’ and the consecutive directions. Let us call it processed_gesture_hand1.
  3. If flag_do_gesture is 0 and processed_gesture_hand1 then take an action corresponding to a particular gesture.
  4. Assign 1 to flag_do_gesture. This avoids the gesture action to be run only once and not repeatedly.
  5. Empty created_hand_gesture.
  6. Assign True to flag0.

Enough said….. In a code, it looks something like this

if len(contours) == 0:                                                  # Completion of a gesture
    line_pts = deque(maxlen = buff)                                     # Empty the deque
    processed_gesture_hand1 = tuple(process_created_gesture(created_gesture_hand1))
    if flag_do_gesture == 0:                                            # flag_do_gesture to make sure that gesture runs only once and not repeatedly
        if processed_gesture_hand1 != ():
            do_gesture_action(processed_gesture_hand1)
        flag_do_gesture = 1
    print(processed_gesture_hand1)                                      # for debugging purposes
    created_gesture_hand1 = []
    flag0 = True
else:
    flag_do_gesture = 0
    max_contour = max(contours, key = cv2.contourArea)
    rect1 = cv2.minAreaRect(max_contour)
    (w, h) = rect1[1]
    area1 = w*h
    if area1 > 450:
        center1 = list(rect1[0])
        box = cv2.boxPoints(rect1)                                      # to draw a rectangle
        box = np.int0(box)
        cv2.drawContours(img,[box],0,(0,0,255),2)
        centerx = center1[0] = int(center1[0])                          # center of the rectangle
        centery = center1[1] = int(center1[1])
        cv2.circle(img, (centerx, centery), 2, (0, 255, 0), 2)
        line_pts.appendleft(tuple(center1))
        if c == 0:
            old_centerx = centerx
            old_centery = centery
        c += 1

        diffx, diffy = 0, 0
        if c > 5:                                                       # check after every 5 iteration the new center
            diffx = centerx - old_centerx
            diffy = centery - old_centery
            c = 0

        if flag0 == False:
        # the difference between the old center and the new center determines the direction of the movement
            if abs(diffx) <=10 and abs(diffy) <= 10:
                created_gesture_hand1.append("St")
            elif diffx > 15 and abs(diffy) <= 15:
                created_gesture_hand1.append("E")
            elif diffx < -15 and abs(diffy) <= 15:
                created_gesture_hand1.append("W")
            elif abs(diffx) <= 15 and diffy < -15:
                created_gesture_hand1.append("N")
            elif abs(diffx) <= 15 and diffy > 15:
                created_gesture_hand1.append("S")
            elif diffx > 25 and diffy > 25:
                created_gesture_hand1.append("SE")
            elif diffx < -25 and diffy > 25:
                created_gesture_hand1.append("SW")
            elif diffx > 25 and diffy < -25:
                created_gesture_hand1.append("NE")
            elif diffx < -25 and diffy < -25:
                created_gesture_hand1.append("NW")

        for i in range(1, len(line_pts)):
            if line_pts[i - 1] is None or line_pts[i] is None:
                continue
            cv2.line(img, line_pts[i-1], line_pts[i], (0, 255, 0), 2)

        flag0 = False

The process_created_gesture function looks like this

def process_created_gesture(created_gesture):
    """
    function to remove all the St direction and removes duplicate direction if they
    occur consecutively.
    """
    if created_gesture != []:
        for i in range(created_gesture.count("St")):
            created_gesture.remove("St")
        for j in range(len(created_gesture)):
            for i in range(len(created_gesture) - 1):
                if created_gesture[i] == created_gesture[i+1]:
                    created_gesture.remove(created_gesture[i+1])
                    break
    return created_gesture

So the whole file gesture_action.py looks like this.

import cv2
import numpy as np
import pyautogui as gui
from gesture_api import do_gesture_action
from collections import deque

cam = cv2.VideoCapture(0)
yellow_lower = np.array([7, 96, 85])                          # HSV yellow lower
yellow_upper = np.array([255, 255, 255])                      # HSV yellow upper
screen_width, screen_height = gui.size()
camx, camy = 480, 360
buff = 128
line_pts = deque(maxlen = buff)

def process_created_gesture(created_gesture):
    """
    function to remove all the St direction and removes duplicate direction if they
    occur consecutively.
    """
    if created_gesture != []:
        for i in range(created_gesture.count("St")):
            created_gesture.remove("St")
        for j in range(len(created_gesture)):
            for i in range(len(created_gesture) - 1):
                if created_gesture[i] == created_gesture[i+1]:
                    created_gesture.remove(created_gesture[i+1])
                    break
    return created_gesture

def gesture_action():
    centerx, centery = 0, 0
    old_centerx, old_centery = 0, 0
    area1 = 0
    c = 0
    flag_do_gesture = 0
    flag0 = True

    created_gesture_hand1 = []

    while True:
        _, img = cam.read()

        # Resize for faster processing. Flipping for better orientation
        img = cv2.flip(img, 1)
        img = cv2.resize(img, (camx, camy))

        # Convert to HSV for better color segmentation
        imgHSV = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

        # Mask for yellow color
        mask = cv2.inRange(imgHSV, yellow_lower, yellow_upper)

        # Bluring to reduce noises
        blur = cv2.medianBlur(mask, 15)
        blur = cv2.GaussianBlur(blur , (5,5), 0)

        # Thresholding
        _,thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
        cv2.imshow("Thresh", thresh)

        _, contours, _ = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)

        w, h = 0, 0
        if len(contours) == 0:                                                  # Completion of a gesture
            line_pts = deque(maxlen = buff)                                     # Empty the deque
            processed_gesture_hand1 = tuple(process_created_gesture(created_gesture_hand1))
            if flag_do_gesture == 0:                                            # flag_do_gesture to make sure that gesture runs only once and not repeatedly
                if processed_gesture_hand1 != ():
                    do_gesture_action(processed_gesture_hand1)
                flag_do_gesture = 1
            print(processed_gesture_hand1)                                      # for debugging purposes
            created_gesture_hand1 = []
            flag0 = True
        else:
            flag_do_gesture = 0
            max_contour = max(contours, key = cv2.contourArea)
            rect1 = cv2.minAreaRect(max_contour)
            (w, h) = rect1[1]
            area1 = w*h
            if area1 > 450:
                center1 = list(rect1[0])
                box = cv2.boxPoints(rect1)                                      # to draw a rectangle
                box = np.int0(box)
                cv2.drawContours(img,[box],0,(0,0,255),2)
                centerx = center1[0] = int(center1[0])                          # center of the rectangle
                centery = center1[1] = int(center1[1])
                cv2.circle(img, (centerx, centery), 2, (0, 255, 0), 2)
                line_pts.appendleft(tuple(center1))
                if c == 0:
                    old_centerx = centerx
                    old_centery = centery
                c += 1

                diffx, diffy = 0, 0
                if c > 5:                                                       # check after every 5 iteration the new center
                    diffx = centerx - old_centerx
                    diffy = centery - old_centery
                    c = 0

                if flag0 == False:
                # the difference between the old center and the new center determines the direction of the movement
                    if abs(diffx) <=10 and abs(diffy) <= 10:
                        created_gesture_hand1.append("St")
                    elif diffx > 15 and abs(diffy) <= 15:
                        created_gesture_hand1.append("E")
                    elif diffx < -15 and abs(diffy) <= 15:
                        created_gesture_hand1.append("W")
                    elif abs(diffx) <= 15 and diffy < -15:
                        created_gesture_hand1.append("N")
                    elif abs(diffx) <= 15 and diffy > 15:
                        created_gesture_hand1.append("S")
                    elif diffx > 25 and diffy > 25:
                        created_gesture_hand1.append("SE")
                    elif diffx < -25 and diffy > 25:
                        created_gesture_hand1.append("SW")
                    elif diffx > 25 and diffy < -25:
                        created_gesture_hand1.append("NE")
                    elif diffx < -25 and diffy < -25:
                        created_gesture_hand1.append("NW")

                for i in range(1, len(line_pts)):
                    if line_pts[i - 1] is None or line_pts[i] is None:
                        continue
                    cv2.line(img, line_pts[i-1], line_pts[i], (0, 255, 0), 2)

                flag0 = False

        cv2.imshow("IMG", img)
        if cv2.waitKey(1) == ord('q'):
            break

    cv2.destroyAllWindows()
    cam.release()


gesture_action()

gesture_api.py

This file contains nothing but the gesture directions and the keyboard shortcut that it needs to emulate. So a square can be made using directions like (North, West, South, East)Now let’s say that when a square is made we need to emulate the keyboard shortcut Winkey (For Windows) or altleft+f1 (For KDE) and so on. We can have 2 cases for the keyboard shortcut emulation.

  • Only one key press needs to be emulated e.g Winkey
  • More than one key press needs to be emulated e.g Winkey + l, alt + f4 etc.

For the first case, we need to just press the key.

For the second case, we need to hold all the keys except the last key, press the last key and then un-hold the keys.

In code, this can be accomplished by-

import pyautogui as gui
import os

GEST_START = ("N", "E", "S", "W")
GEST_CLOSE = ("SE", "N", "SW")
GEST_COPY = ("W", "S", "E")
GEST_PASTE = ("SE", "NE")
GEST_CUT = ("SW", "N", "SE")
GEST_ALT_TAB = ("SE", "SW")
GEST_ALT_SHIFT_TAB = ("SW", "SE")
GEST_MAXIMISE = ("N",)
GEST_MINIMISE = ("S",)
GEST_LOCK = ("S", "E")
GEST_TASK_MANAGER = ("E", "W", "S")
GEST_NEW_FILE = ("N", "SE", "N")
GEST_SELECT_ALL = ("NE", "SE", "NW", "W")

# Gesture set containing the directions and the key press actions
GESTURES = {GEST_CUT: ('ctrlleft', 'x'),
GEST_CLOSE: ('altleft', 'f4'),
GEST_ALT_SHIFT_TAB: ('altleft', 'shiftleft', 'tab'),
GEST_PASTE: ('ctrlleft', 'v'),
GEST_ALT_TAB: ('altleft', 'tab'),
GEST_COPY: ('ctrlleft', 'c'),
GEST_NEW_FILE: ('ctrlleft', 'n'),
GEST_SELECT_ALL: ('ctrlleft', 'a')}

# Windows PCs
if os.name == 'nt':
    GESTURES[GEST_START] = ('winleft',)
    GESTURES[GEST_LOCK] = ('winleft', 'l')
    GESTURES[GEST_TASK_MANAGER] = ('ctrlleft', 'shiftleft', 'esc')

# Linux using KDE
else:
    GESTURES[GEST_START] = ('altleft', 'f1')
    GESTURES[GEST_LOCK] = ('ctrlleft', 'altleft', 'l')
    GESTURES[GEST_TASK_MANAGER] = ('ctrlleft', 'esc')

def do_gesture_action(gesture):
    if gesture in GESTURES.keys():
        keys = list(GESTURES[gesture])
        last_key = keys.pop()                                  # get the last key press
        if len(keys) >= 1:                                     # case 2
            for key in keys:                                   # hold all the keys except the last key
                gui.keyDown(key)
        gui.press(last_key)                                    # press the last key. for case 1 the last key and the first key are the same
        if len(keys) >= 1:
            keys.reverse()                                     # un-holding the keys
            for key in keys:
                gui.keyUp(key)

That’s it for now. If you got any trouble in doing this then you can find the full code on my GitHub Profile from this link.

Summary

Yes. And that’s about it. Using only 2 files and Image Processing we have successfully implemented a very simple and naive gesture recognition system using Python and OpenCV.  You could extend this idea by implementing more thing like fingers count and based on that perform some action or you can make an automated robot using the using Arduino or Raspberry Pi platform.

He is an engineering student from West Bengal, India. He loves to write things about Python.