Computer vision paint using color tracking

This is an assignment of my multimedia course. I tried to code a small application that allows people to draw on computer by moving 2 pieces of color paper. This application is based on the idea of color tracking.

As we now, video is actually the list of digital images which are stored as 2D arrays. Each pixel holds a value of a color at a given point. These values are stored in each cell of the array.

For this assignment, I used the HSV color space to distinguished difference colors. In HSV space, each cell of the holds three values Hue, Saturation and Value which represent the color at that pixel. The HSV space is better than RGB space for tracking color because it is robust against light condition.

The application setup
This application requires a camera for catching the image of user in order to control the cursor of the application as well as drawing on the screen. The higher quality of the webcam the better the tracking is. However, it is acceptable for a 1.3MP webcam in this application.

The control objects
The control objects are used to control and to draw on the application. There are two of them. The first one is used to control the cursor, called the cursor control object. The camera detects the position of the first control object then normalizes the coordinates into with the drawing field of the application.
After the coordinates are calculated, the application will show the position of the cursor on to the screen and will move according to the movement of the first control object.
The second control object is used to trigger the event called the event generator object. When two objects appear together, it is similar to a left mouse click event. With this second object, we might be able to select the pen, select color or control the cursor to draw on screen following the movement of our hand.

In order to control the cursor, it needs two colored control objects. These two colors must be different.

The color training phase
The color training phase is launched at the beginning of the application.
Two control objects are trained in order.

  • The cursor control is trained first
  • The event generator is trained second.

In order to train the color, the application will ask for user to hold still a solid color object at the center of camera’s view port. The application then will catch 60 continuous images of the object and calculate the mean and standard deviation of the color. Finally, the color is assigned for further use.
The method for taking the images and make sure they are images of one object is done by continuously taking images from the camera and push them into a queue of 60 images. If the number of image exceeds 60, it dequeues the first ones. For each image in the queue, we calculate the mean and standard deviation. If the standard deviation is too big, it means the image is not solid color and will be ignored.
The process of taking and removing images continues until the application obtains 60 continuous frames of a same color and produces the mean.

Color detection
Color detection is applied for both the cursor control object and the event generator control object. The application keeps capturing images from the camera. For each image, it scans through every pixel and compares with the trained color. The distanced between the trained color and the actual color varies ±10 for Hue, ±30 for Saturation and from 100 to 255 for Value. The value for Hue is taken more strictly than the other two because it is the main characteristic for the color. These numbers come from many times of
experiments on tracking different object colors and in different light conditions.

We create a binary image where the pixels which have similar color to the trained color are marked as a white point while the other pixels are marked as a black point. This binary image is called the thresh hold image.

Thresh hold image is also applied the erosion and dilation technique to reduce noise and improve tracking accuracy. The parameter for erosion and dilation is set at 4. This number also comes from many times of experiments.

Erode and Dilate
Erode and Dilate are two morphology techniques used to process geometric structures, based on set theory, lattice theory, topology and random function
The idea of dilation is the convolution between the original image A and the kernel B. As the kernel B is scanned over the image, the maximal pixel value overlapped by B is calculated and we replace the image pixel under the anchor point with the newly calculated value.
Erosion, on the other hand, is the converse operation. It is equivalent to computing the local minimum over the area of kernel. As the kernel B is scanned over the image, we calculate the minimum pixel overlapped by B and replace the image pixel under the anchor point with the newly found minimum value.

Object tracking
After the color being detected, we need to find the position of the object in order to track it.
The purpose of tracking object is to synchronize between the movement of the object and the cursor on the screen. It includes two main steps.
1) Compute object position: Recall from previous section of color detection, we have the result is a binary image indicating the pixels where the trained color appears. Based on this information, we are able to compute the object position. Assume that there is only one object with trained color, then there is only one solid region of the white color.
The x and y coordinates of the region center can be computed as

x = sum(xi/area)
y = sum(yi/area)

Where xi, yi are the coordinates of every point in the region. Area is the number of pixels in the region.

2) Normalize the coordinate: We need to normalize the coordinate in the camera coordinate system to the coordinate system of the drawing field of the application.
The normalization formula:
x’ = (x/width)*width’
y’ = (y/height)*height’

Where x’, y’ are the normalized values; width, height are the original size of camera; width’, height’ are the size of the drawing field of the application.

Event handling
1) The application interface: The application interface includes 3 main components: menu , color and drawing field. The menu of the application has the following features: pen, brush, rectangle, circle and eraser. The first 4 menus are for selecting what kind of graphic user wants to draw while the final one is for clearing the drawing field.
The color menu is a list of color for drawing. It also has a menu for customized color where it will learn the color from the camera instead of choosing from a list of given color.
The drawing field is a while section where user will control the cursor and draw on it


2) The events:As stated in previous sections, an event happens when both the cursor control object and the event generator object appear in the camera view. Depending on the region where the cursor control object is pointing to, it generate different events. If the cursor is pointing to menu section, it will generate the event ‘choose cursor’ where it will change the current graphic tool to draw. The selected tool will be surrounded by a red rectangle.

If the cursor is pointing to the color section, it will generate the event ‘choose cursor’ where it will change the current color to draw. The selected color will be surrounded by a red rectangle.
In case the ‘custom color’ feature in the color panel is selected, a new window appear and ask for user to hold still a solid color object at the center of the camera. This is similar to the training color phase, and the newly trained color will be selected to be the color for drawing.
If the cursor is pointing to the drawing field, the drawing event will be generated. The drawing feature will be discussed following

1) Pen and Brush: The different between pen and brush is that a brush draws thicker line than a pen. For drawing a line, we need 2 separate points. If a drawing event is first launched, the first point is recorded. In next frame, if the drawing event still hold, a line is being drawn from the previous one to the current position of the cursor. The new position replaces the previous one.
If the event terminates, the first position is reset.
2) Rectangle and Ellipse: Drawing the rectangle and ellipse is different from drawing the pen and brush in the way that a line drawn with pen or brush is saved directly to the drawing field during the time we move the cursor while drawing rectangle or ellipse has to wait until the event terminates to save the graphic to the drawing field.
In order to make that possible, we need a temporary image where the rectangle or ellipse will be drawn on when moving the cursor. At the moment when the event terminates, the temporary image will be copied to the main drawing field

The video demo of the application is available at:
Source code: or github


One thought on “Computer vision paint using color tracking

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s