JavaScript motion detection

Source From :


Prerequisite knowledge

JavaScript beginner/intermediate, HTML5 beginner/intermediate and a basic knowledge of jQuery. Due to its use of the getUserMedia and audio APIs, the demo requires Chrome Canary and must be run via a local web server.

User level


Additional required other products (third-party/labs/open source)

In this article, I discuss how to detect a user movement using a webcam stream in JavaScript. The demo shows a video gathered from the user webcam. The user can play notes on a xylophone built in HTML, using their own physical movements, and in real time. The demo is based on two HTML5 “works in progress”: the getUserMedia API displays the user’s webcam, and the Audio API plays the xylophone notes.
I also show how to use a blend mode to capture the user’s movement. Blend modes are a common feature in languages that have a graphics API and almost any graphics software. To quote Wikipedia about blend modes,  “Blend modes in digital image editing are used to determine how two Layers are blended into each other.” Blend modes are not natively supported in JavaScript but, as it is nothing more than a mathematical operation between pixels, I create a blend mode “difference”.
I made a demo to show where web technologies are heading. JavaScript and HTML5 will provide tools for new types of interactive web applications. Exciting!

HTML5 getUserMedia API

As of the writing of this article, the getUserMedia API is still a work in progress. With the fifth major release of HTML by the W3C, there has been a surge of new APIs offering access to native hardware devices. Thenavigator.getUserMediaAPI() API provides the tools to enable a website to capture audio and video.
Many articles on different blogs cover the basics of how to use the API. For that reason, I do not explain it too much in detail. At the end of this article is a list of useful links about getUserMedia if you wish to learn more about it. Here, I show you how I enabled it for this demo.
Today, the getUserMedia API is usable only in Opera 12 and Chrome Canary, both of which are not public release yet. For this demo, you must use Chrome Canary because it supports the AudioContext to play the xylophone notes. AudioContext is not supported by Opera.
Once Chrome Canary is installed and launched, enable the API. In the address bar, type: about:flags. Under the Enable MediaStream option, click the toggle to enable the API.

Figure 1. Enable MediaStream.
Figure 1. Enable MediaStream.

Lastly, due to security restrictions, you must run the sample files within your local web server as video camera access is denied for local file:/// access.
Now that everything is ready and enabled, add a video tag to play the webcam stream. The stream received is be attached to the src property of the video tag using JavaScript.

In the JavaScript, you go through two steps. The first one is to find out if the user’s browser can use the getUserMedia API.

function hasGetUserMedia() { return !!(navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia); }

The second step is to try to get the stream from the user’s webcam.
Note: I’m using jQuery in this article and in the demo, but feel free to use any selector you like, such as a native document.querySelector.

var webcamError = function(e) { alert('Webcam error!', e); }; var video = $('#webcam')[0]; if (navigator.getUserMedia) { navigator.getUserMedia({audio: true, video: true}, function(stream) { video.src = stream; }, webcamError); } else if (navigator.webkitGetUserMedia) { navigator.webkitGetUserMedia({audio:true, video:true}, function(stream) { video.src = window.webkitURL.createObjectURL(stream); }, webcamError); } else { //video.src = 'video.webm'; // fallback. }

You are now ready to display a webcam stream in a HTML page. The next section provides an overview of the structure and assets used in the demo.
If you hope to use getUserMedia in production, I recommend following Addy Osmani’s work. Addy has created a shim for the getUserMedia API with a Flash fallback.

Blend mode difference

The detection of the user’s movement is performed using a blend mode difference.
Blending two images using difference is nothing more than substracting pixels values. In the Wikipedia article about blend modes is a description of difference: “Difference subtracts the top layer from the bottom layer or the other way round, to always get a positive value. Blending with black produces no change, as values for all colors are 0.”
To perform this operation, you need two images. The code loops over each pixel of the image, substracts each color channel from the first image to each color channel of the second image.
For example, from two red images, the color channels at every pixel in both images is:
– red: 255 (0xFF)
– green: 0
– blue: 0
The following operations substract the colors values from these images:
– red: 255 – 255 = 0
– green: 0 – 0 = 0
– blue: 0 – 0 = 0
In other words, applying a blend mode difference on two identical images produces a black image. Let’s discuss how that is useful and where those images coming from.
Taking the process step by step, first draw an image from the webcam stream in a canvas, at a certain interval. In the demo, I draw 60 images per second, which is more than you need. The current image displayed in the webcam stream is the first image that you blend into another.
The second image is another capture from the webcam but at the previous time interval. Now that you have two images, subtract their pixels values. This means that if the images are identical – in other words, if the user is not moving at all – the operation produces a black picture.
The magic happens when the user starts to move. The image taken at the current time interval is be slightly different than the image of the previous time interval. If you substract different values, some colors start to appear. This means that something moved between these two frames!

Figure 2. Example of a blended image from a webcam stream.
Figure 2. Example of a blended image from a webcam stream.

It starts to make sense as the motion detection process is almost finished. The last step is looping over all of the pixels in the blended image to determine if there are some pixels that are not black.

Assets preparation

To make this demo work, you need several things that are pretty common to all websites. You need, of course, an HTML page that contains some canvas and video tags, and some JavaScript to make the application run.
Then, you need an image of the xylophone ideally with a transparent background (png). In addition, you need images for each key on the xylophone, as a separate image with a small brightness change. Using a rollover effect, these images give the user visual feedback, highlighting the triggered note.
You need an audio file containing each note’s sound for which mp3 files will do. Playing a xylophone without sound wouldn’t be much fun.
Finally, I think it is important to show a video fallback of the demo in case the user can’t or doesn’t want to use their webcam. You need to create a video and encode it to mp4, ogg, and webm to cover all browsers. The next section provides a list of the tools used to encode the videos. You can find all the assets in the demo zip file,

Encode HTML5 videos

I encoded the video demo fallback into the common three formats needed to display HTML5 videos.
I had some trouble with the webm format as most of the tools I found for encoding would not give me the option to choose a bitrate. The bitrate sets both the video file size and quality, usually written as kbps (kilobits per second). I ended up using Online ConVert Video converter to convert to the WebM format (VP8), which worked well:
For the mp4 version, you can use tools from the Adobe Creative Suite, such as the Adobe Media Encoder or After Effects. Or you can also use a free alternative such as Handbrake.
For the ogg format, I used some tools that encode to all formats at once. I wasn’t happy with the quality of the webm and mp4 formats. Although I couldn’t seem to change the quality output, the ogg video was fine. You can use either Easy HTML5 Video or Miro.

Prepare the HTML

This is the last step before starting to code some JavaScript. Set some html tags. Here are the required HTML tags. (I won’t list everything I used, such as the video demo fallback code which you can download in the source.)
You need a simple video tag to receive the user’s webcam feed. Don’t forget to set the autoplay property. Without it, the stream pauses on the first frame received.

The video tag will actually not be displayed. Its css display style is set to none. Instead, draw the stream received in acanvas so I can use it for the motion detection. Create the canvas to draw the webcam stream:

You also need another canvas to show what’s happening in real time during the motion detection:

Create a div that contains the xylophone images. Place the xylophone on top of the webcam: the user can virtually use their hand to play with it. On top of the xylophone are placed the notes, hidden, that display on a rollover when the note is triggered.

JavaScript motion detection

The JavaScript steps to make this demo work are as follows:
  • detect if the application can use the getUserMedia API
  • detect if the webcam stream is being received
  • load the sounds of the xylophone notes
  • start a time interval and call an update function
  • at each time interval, draw the webcam feed  onto a canvas
  • at each time interval, blend the current webcam image into the previous one
  • at each time interval, draw the blended image onto a canvas
  • at each time interval, check the pixel color values in the areas of the xylophone notes
  • at each time interval, play a specific xylophone note if a motion detection is found

First step: prepare variables

Prepare some variables to store what you for the drawing and motion detection. You need two references to the canvas, a variable to store the context of each canvas, and a variable to store the drawn webcam stream. Also store the x axis position of each xylophone note, the sounds notes to play, and some sound-related variables.

var notesPos = [0, 82, 159, 238, 313, 390, 468, 544]; var timeOut, lastImageData; var canvasSource = $("#canvas-source")[0]; var canvasBlended = $("#canvas-blended")[0]; var contextSource = canvasSource.getContext('2d'); var contextBlended = canvasBlended.getContext('2d'); var soundContext, bufferLoader; var notes = [];

Also invert the x axis of the webcam stream so the user feels like they are in front of a mirror. This makes their movement to reach the notes a bit easier. Here is how to do that:

contextSource.translate(canvasSource.width, 0); contextSource.scale(-1, 1);

Second step: update and draw video

Create a function called update that will be executed 60 times per second and will call other functions that draw the webcam stream onto a canvas, blend the images, and detect the motion.

function update() { drawVideo(); blend(); checkAreas(); timeOut = setTimeout(update, 1000/60); }

Drawing the video onto a canvas is quite easy and it takes only one line:

function drawVideo() { contextSource.drawImage(video, 0, 0, video.width, video.height); }

Third step: build the blend mode difference

Create a helper function to ensure that the result of the substraction is always positive. You can use the built-in functionMath.abs, but I wrote an equivalent with binary operators. Most of the time, using binary operators results in better peformance. You don’t have to understand it exactly, just use it as it is:

function fastAbs(value) { // equivalent to Math.abs(); return (value ^ (value >> 31)) - (value >> 31); }

Now write the blend mode difference. The function receives three parameters:
  • a flat array of pixels to store the result of the substraction
  • a flat array of pixels of the current webcam stream image
  • a flat array of pixels of the previous webcam stream image
The arrays of pixels are flattened and contain the color channels values of red, green, blue and alpha:
  • pixels[0] = red value
  • pixels[1] = green value
  • pixels[2] = blue value
  • pixels[3] = alpha value
  • pixels[4] = red value
  • pixels[5] = green value
  • and so on…
In the demo, the webcam stream has a width of 640 pixels and a height of 480 pixels. The size of the array is: 640 * 480 * 4 = 1,228,000.
The best way to loop over the array of pixels is to increment the value by 4 (red, green, blue, alpha), meaning there are now 307,200 iterations – much better.

function difference(target, data1, data2) { var i = 0; while (i < (data1.length / 4)) { var red = data1[i*4]; var green = data1[i*4+1]; var blue = data1[i*4+2]; var alpha = data1[i*4+3]; ++i; } }

You can now substract the pixel values of the images. For performance – and it seems to make a big difference – do not perform the substraction if the color channel is already 0 and set the alpha to 255 (0xFF) automatically. Here is the finished blend mode difference function (feel free to optimize!):

function difference(target, data1, data2) { // blend mode difference if (data1.length != data2.length) return null; var i = 0; while (i < (data1.length * 0.25)) { target[4*i] = data1[4*i] == 0 ? 0 : fastAbs(data1[4*i] - data2[4*i]); target[4*i+1] = data1[4*i+1] == 0 ? 0 : fastAbs(data1[4*i+1] - data2[4*i+1]); target[4*i+2] = data1[4*i+2] == 0 ? 0 : fastAbs(data1[4*i+2] - data2[4*i+2]); target[4*i+3] = 0xFF; ++i; } }

I used a slightly different version in the demo to get a better accuracy. I created a threshold function to be applied on the color values. The method either changes the pixel color value to black under a certain limit or to white above the limit. You can also use it as is:

function threshold(value) { return (value > 0x15) ? 0xFF : 0; }

I also made an average between the three color channels, which results in an image with pixels either black or white.

function differenceAccuracy(target, data1, data2) { if (data1.length != data2.length) return null; var i = 0; while (i < (data1.length * 0.25)) { var average1 = (data1[4*i] + data1[4*i+1] + data1[4*i+2]) / 3; var average2 = (data2[4*i] + data2[4*i+1] + data2[4*i+2]) / 3; var diff = threshold(fastAbs(average1 - average2)); target[4*i] = diff; target[4*i+1] = diff; target[4*i+2] = diff; target[4*i+3] = 0xFF; ++i; } }

The result is something like this black and white image.

Figure 3. The blend mode difference results in a black and white image.
Figure 3. The blend mode difference results in a black and white image.

Fourth step: blend canvas

The function to blend the images is now ready, you just need to send the right values to it: the arrays of pixels.
The JavaScript drawing API provides a method to retrieve an instance of an ImageData object. This object contains useful properties, such as width and height, and also the data property that is the array of pixels you need. Also, create an empty ImageData instance to receive the result, and store the current webcam image for the next iteration of the time interval. Here is the function that blends and draws the result in a canvas:

function blend() { var width = canvasSource.width; var height = canvasSource.height; // get webcam image data var sourceData = contextSource.getImageData(0, 0, width, height); // create an image if the previous image doesn’t exist if (!lastImageData) lastImageData = contextSource.getImageData(0, 0, width, height); // create a ImageData instance to receive the blended result var blendedData = contextSource.createImageData(width, height); // blend the 2 images differenceAccuracy(,,; // draw the result in a canvas contextBlended.putImageData(blendedData, 0, 0); // store the current webcam image lastImageData = sourceData; }

Fifth step: search for pixels

The final step for this demo is the motion detection using the blended images we created in the previous section.
When preparing the assets, place eight images of the notes of the xylophone to use them as rollovers. Use these note positions and sizes as rectangle areas to retrieve the pixels from the blended image. Then, loop over them to find some white pixels.
In the loop, make an average of the color channels and add the result to a variable. After the loop has performed, create a global average of all the pixels in this area.
Avoid noise or small motions by seeting a limit of 10. If you find a value that is more than 10, consider that something has moved since the last frame. This the motion detection!
Then, play the corresponding sound and show the note rollover. The function looks like this:

function checkAreas() { // loop over the note areas for (var r=0; r<8; ++r) { // get the pixels in a note area from the blended image var blendedData = contextBlended.getImageData( notes[r].area.x, notes[r].area.y, notes[r].area.width, notes[r].area.height); var i = 0; var average = 0; // loop over the pixels while (i < ( / 4)) { // make an average between the color channel average += ([i*4] +[i*4+1] +[i*4+2]) / 3; ++i; } // calculate an average between of the color values of the note area average = Math.round(average / ( / 4)); if (average > 10) { // over a small limit, consider that a movement is detected // play a note and show a visual feedback to the user playSound(notes[r]); notes[r] = "block"; $(notes[r].visual).fadeOut(); } } }

Where to go from here

I hope this brings you some new ideas to create video-based interactions, or to step on a territory that has been owned by Flash developers such as myself for years: video-interactive campaigns based on either pre-rendered videos or webcam streams.
You can learn more information about the getUserMedia API by following the development of the Chrome and Opera browsers, and on the draft of the API on the W3C website.
Creatively, if you are looking for ideas for motion detection or video-based applications, I recommend that you follow the experiments made by Flash developers and motion designers using After Effects. Now that JavaScript and HTML are gaining momentum and new capabilities, there is a lot to learn, try and experiment with the following resources:

Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License+Adobe Commercial Rights
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. Permissions beyond the scope of this license, pertaining to the examples of code included within this work are available at Adobe.