Hackathon writeup: Augmented Reality Snake
Click here to view.I've been at Harvey Mudd College for almost three months now, and I'm enjoying it a lot. So far, I've participated in two hackathons: first, MuddHacks, a hardware hackathon, and then last weekend, the 5C Hackathon, a software hackathon that spans the 5 Claremont Colleges (Harvey Mudd, Pomona, Scripps, Pitzer, Claremont McKenna). In this post, I'm going to be talking about the project I made for the second of the two hackathons.
But wait! You haven't talked about the first one yet! You can't do a part 2 without doing a part 1! I'd like to tell you all about how I got caught in a time machine that someone built in the hardware hackathon and so my past self posted about the first hackathon in this timeline's future, but then I'd have to get the government to come brainwash you to preserve national security I'd be lying. So instead I'll explain that I haven't actually finished the project from the first hackathon yet. Both of these were 12-hour hackathons, and about 10.5 hours into the first one, I realized that our chosen camera angle introduced interference patterns that made our image analysis code completely ineffective. I'm hoping to find some time this weekend or next to finish that up, and I'll write that project up then.
Anyways, about this project. My partner Hamzah Khan and I wanted to do some sort of computer vision/image analysis based project. Originally, we considered trying to detect certain features in an image and then use those as the basis for a game (I thought it would be really cool to play snake on the walls of HMC, which are mostly made of square bricks). But feature detection is pretty difficult, and we decided it wasn't a good choice for a 12-hour project. Instead, we came up with the idea of doing an augmented-reality based project, detecting a very specific marker pattern. We also wanted to do it in Javascript, because we both knew Javascript pretty well and also wanted to be able to run it on mobile devices.
Although some Javascript AR libraries exist, we decided not to use them, because they were kind of slow and didn't work unless the marker was almost straight on (it stopped working once the camera got too angled). Also, we wanted to challenge ourselves and do more than just use a library.
The first challenge was getting an image from the device camera. We used the HTML5 getUserMedia API to obtain a video stream, which we then rendered into a canvas element and extracted the pixel data. Strangely enough, on Android devices, the Google Chrome browser does not give web applications access to the back-facing camera, only the front one. We avoided this problem by simply switching to Firefox for Android, which does let you use the back camera. (Yay for the open-source community!) We didn't test it with any Apple devices, so I'm not sure whether or not it works on those.
To make the marker pattern, I took a QR code and stripped all the data, resulting in an image that looks somewhat like this:
This allowed us to use the jsqrcode library to detect the QR code. The jsqrcode library, which is a port of the ZXing QR code library, internally performs a couple of steps: it makes the image black and white, then it tries to find the characteristic 1:1:3:1:1 ratio of the QR code finder patterns (the big squares), then it uses the calculated positions of the finder patterns (and also the smaller alignment pattern) to generate a transformation matrix, then it samples the pixels of the QR code using the matrix, and finally it runs the QR code decoding algorithm to find the data. We were able to modify the code to simply stop once it found the transformation matrix, and then hand that transformation matrix to our code.
Before we move on, a quick introduction to transformation matrices. A transformation matrix is a matrix that allows you to convert from a point in one coordinate space to a point in another. Basically, for some desired point in QR-coordinate space (a location on the QR code), we can multiply the coordinates of that point by the transformation matrix and obtain the coordinates of that point in screen space (a given pixel position on the screen). This is complicated slightly by the fact that the matrix works on homogeneous coordinates, so our desired 2d point $(x,y)$ ends up being represented as a vector $<x,y,1>$, and the result of the matrix multiplication is a vector $<x',y',w'>$ where the screen coordinates are $(\frac{x'}{w'}, \frac{y'}{w'})$. If you want to learn more about transformation matrices and homogeneous coordinates, I would suggest reading the Wikipedia page on transformation matrices.
Unfortunately, the QR code detection is quite slow on mobile devices, and it doesn't work well for extreme angles of the marker. Because of this, we decided to switch to another algorithm once we had determined an initial transformation matrix. Using the JSFeat library, we performed FAST corner detection on each frame, identifying the most prominent corners in the image. The QR code marker we chose has many distinct corners, and we know their exact positions in QR code space, so we ran those positions through the transformation matrix to find where we expected the corners to be. Then, we simply assumed that the corners would not move much, so we matched each expected corner position with the closest position of an actual corner detected by FAST in the image.
Now, we had a correspondence between the corner positions in QR code space and the updated positions of those corners in image space. Luckily, the JSFeat library includes functions to compute a homography, which is essentially the transformation matrix that best maps one set of points to another. Thus, we were able to use these matching sets of points to generate an updated version of our transformation matrix.
Once we had a way to find the transformation matrix, the next step was to overlay our game on top of the QR code. We used three.js, a popular 3D Javascript library, to do this. Three.js makes it very easy to display the video stream in the background: you can simply initialize a texture with a video element, and Three.js will update the texture for you. Three.js is designed for 3D transformation matrices, however, and our perspective transformation matrix was a 2D transformation. To get around this, we wrote a custom Three.js "camera" that generates a transformation matrix that simply ignores the depth of the point. (We did this by inserting a row of all zeroes and a column of all zeroes, except for a 1 where they intersected).
Actually, our matrix wasn't quite that simple. On mobile devices, we were able to "fake" a limited 3d effect using the accelerometer, which allowed us to determine which direction was "up", assuming that the marker is face-up on some surface. We then inserted two numbers $D_x$ and $D_y$ into the matrix that nudged points in the "up" direction, times the z-coordinate of the point in QR code space:
With this camera set up, drawing the snake on the screen was relatively easy: we just told three.js to draw various boxes at various locations on our 25x25 block game board, and it applied the transformation and drew them to the screen using WebGL.
One cool feature of how we used three.js is the fact that we used the multiply blend mode when drawing. This has the effect that the texture and shadows of the paper that the QR code is on is included in the overlay, avoiding the jarring lighting contrast that seems to characterize most examples of augmented reality. Instead, it looks like the game of snake is printed or drawn on the page.
Once we had all of this working, we implemented the actual snake game logic. We made the snake wrap around the screen, but also marked the QR code regions as occupied, so if you hit them, you lose and have to start over. We also allowed users to control the snake either by tapping on the left and right sides of the screen or by using the "a" and "d" keys on the keyboard.
Here are some screenshots of the game in action, both on a computer and on my HTC One:
We ended up finishing about half an hour before the end of the hackathon, so we went to the awards room and to wait. Every team presented for about a minute in front of the judges, which was pretty cool; projects included an interesting strategy game, a feature-packed campus news app, a weird program that allowed you to control your computer with nonsense syllables, an app that transformed pictures into Lego mosaics, and many more. After all the presentations, they moved on to awards, and to my complete and utter surprise, we won first place!
Overall, I really liked the experience. I originally wanted to do multiplayer, but given the time constraint I think we accomplished a large amount of stuff. And according to the judges, it was pretty cool too!
If you want to try it out, you'll have to print out the marker I included above. If you click on it, you should get a bigger version. Then you can run the game by clicking here. Also, you can see the source code on GitHub.
I'm currently considering trying to adapt some of our code to detect specific objects other than QR codes. I think it would be really cool to play augmented reality snake on the wall of the Shanahan Center, for example:
A nice, gridlike wall in the Shanahan.
Theoretically, this could be accomplished by swapping the QR code detection with ORB feature detection and the corner tracking with optical flow tracking, both of which are provided by the JSFeat library. But those are adventures for another day! Stay tuned for a post about that idea, if I ever end up finishing it.