Wednesday, September 3, 2008

Gestures without Libraries, Toolkits or Training

Jacob O. Wobbrock, Andrew D. Wilson, Yang Li

Summary
The goal of this paper was to produce the $1 recognizer--a recognizer that is simple to create (less than 100 lines of code) and accurate.
The algorithm is as such:
1- Resample point path by calculating the total point path. Divide this length by the # of points (32 <= N <= 256; N = 64 seems to work well ) minus 1 to get the distance between points. Then, redraw the shape with this spacing between points.
2- Rotate the image based on the indicative angle (i.e. angle between centroid of figure and figures first point becomes horizontal--use golden section searching?)
3- Scale to a reference square and translate centroid to origin
4- Recognize the sketch by comparing it to a template (using Euclidean distance between sketch and template)

Discussion
If it is possible to create enough templates then this method seems to have some potential; however, I think that would be pretty difficult. Also, finding that angle is a pretty complicated thing to do and it doesn't allow for a robust input set as orientation is difficult to reproduce. I would like to see an animation of the system trying to find the correct angle as I think that would be instructive. Also, id they changed the location of the angle end-points then it might be easier to do the rotation.

MARQS

Brandon Paulson and Tracy Hammond
Journal on Multimodal User Interfaces, October 2007

Summary
MARQS is a dual classifier sketch retrieval system that builds stronger associations with sketches the more it is used. Its principle domain for the paper was for finding personal photos and music albums.
The algorithm is broken down like so:
1- The major axis is calculated by finding the two farthest points away from one another and rotating the sketch to make that axis the horizontal axis
2- Determine the bounding box aspect ratio (width/height)
3- Determine Pixel Density (# of black pixels to # of pixels in bounding box)
4- Determine average curvature to be the values of all the points in all strokes divided by the total sketch length
5- Number of perceived corners via segmentation (refer to “Sketch based interfaces: early processing for sketch understanding by T.M. Sezgin et al.)
6- If only a single example exists then calculate the features and compare to the database examples
7- Calculate the normalized total error
8- Display sketches with lowest errors

This algorithm is able to return the correct sketch in the highest ranking 70% of the time, 87.6% top 2, 95.5% top 3 and 98% top 4 (initially only four returned).

Discussion
This is fascinating to me. The feature set doesn't seem to be that robust, but I guess the capture of curvature and density really return some powerful recognizers. I wonder if center of gravity and area could have as much power? It seemed to me that the pixel density is based on only black and white pictures--sketches. I wonder if the accuracy could be increased by allowing the user to color different lines of the sketch. It might be unreasonable to make the user remember what color they made what lines, but if the user kept it simple then it might not be that difficult.

Monday, September 1, 2008

Visual Similarity of Pen Gestures

A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels

Summary
The increasing popularity of pen (stylus) based computer input necessitates a study as to the difficulties associated with learning new gestures to mimic the natural way we write on paper. This paper conducted two studies to establish why users found gestures similar and then took that data and built a gesture design tool to help gesture designers build gesture sets that are easy to remember/perform.
As the heart of this paper is about the experiments, I will forego the "Related Work" section which basically describes the pen interface and the machines that use them, perceptual similarity (changes in geometric and perceptual properties influence the perceived similarity) and multi-dimensional scaling(a way of reducing the data set to allow for pattern recognition in 2- or 3-d).
Trial 1
The purpose of this trial was to determine what measurable geometric properties influenced their perceived similarity (accomplished with MDS) AND to produce a model that could predict how two people would view a similar gesture (accomplished with regression analysis). A gesture set with a wide degree of separability was constructed. Subjects were given a "triad" (three randomly chosen gestures; 364 triads shown to each user) and asked to pick the one that was most different. The 21 users' responses were used to construct the dissimilarity matrix.
The model they built was able to correlate gesture similarities at 0.74.
Trail 2
The purpose of this trial was to study three things:
1- total absolute angle and aspect
2- length and area
3- rotation related features
Because the feature sets became to large to show all possible combinations of the three sets built, a fourth feature set containing features from the first three was built. Subjects where then shown triads from all 4 sets for a total of 538 triads. The model they built using MDS and regression gave a correlation of 0.71.
When the model from the first trial was used on the data of the second trial only a correlation of 0.56 was found and it was even worse for model two on data 1.

Discussion
The goal of the experiments and the way they were conducted was great. I am a big fan of getting people to use the systems we build in order to evaluate them. After all, no matter how clever we think our solutions are, we are often times too close to them to see them objectively. That being said, I am not satisfied that the second part of their research goal was accomplished. For a specific environment being able to correlate at 0.75 might be adequate, but as the goal was to build this tool for any gesture environment, only having a correlation of 0.54 means that the system is just "guessing" and not providing the insights a designer would want/need. Secondly, the authors mention that in their regression analysis they remove features that were "only obtainable by subjective judgement". I am not exactly sure what that means, but it makes their numbers of 0.75 and 0.71 seem a little weaker to me.
I think it would be interesting to repeat this experiment to determine if gender/race/left handed vs right handed people aggregate into similar gesture classifiers.

Specifying Gestures by Example

Dean Rubine
Computer Graphics, Vol. 25, Num 4, July 1991

Summary
Rubine built a hand gesture recognizing architecture called GRANDMA (Gesture Recognizers Automated in a Novel Direct Manipulation Architecture). With this architecture he developed GDP, which is a single stroke gesture recognizer for a small set of gestures.
GDPs gestures are modeled with GRANDMA.
The heart of this paper is in the way the author describes the feature set, the classification, the training, and the rejection so that is what I will focus on.
Feature Set
Done in constant time regardless of the gesture size there are 13 features that are classified. The features are all mathematical functions involving slopes, sums and angles. Refer to the paper for the formulae.
Gesture Classification
Gesture class c has weights that maximize the gesture class.
Training
Training is accomplished with a linear discriminator used to find the weights. Basically, a set of test data determines the feature vectors for a given gesture class by averaging the training data feature sets per gesture. The weights are finally calculated via some statistics using the covariance and inversion of the covariance matrix.
Rejection
Rejection is the act of intentionally not classifying a gesture due to ambiguities. This is necessary because linear discriminators will always classify. Rejection is not necessary as long as a robust undo capability is provided.$$
The great thing about this technique is that it is very accurate for small gesture sets. The author provides some evaluations that show for gesture sets less than 26 the accuracy is better than 98%.

Discussion
As the first true gesture recognition paper I have read I am very excited about the potential. While this paper basically focused on the way something is drawn I would anticipate that there are numerous collisions between mathematically similar shapes to be free in the sketch process. As an editor, it makes sense that a small number of actions need to be recognized, but as a free "piece of paper" the amount of training would be mind boggling. However, I enjoyed this paper as a first step toward targeted recognition and hope that there are some other means of generalizing.