Ben's Blog: 2008

Wednesday, September 3, 2008

Gestures without Libraries, Toolkits or Training

Jacob O. Wobbrock, Andrew D. Wilson, Yang Li

Summary
The goal of this paper was to produce the $1 recognizer--a recognizer that is simple to create (less than 100 lines of code) and accurate.
The algorithm is as such:
1- Resample point path by calculating the total point path. Divide this length by the # of points (32 <= N <= 256; N = 64 seems to work well ) minus 1 to get the distance between points. Then, redraw the shape with this spacing between points.
2- Rotate the image based on the indicative angle (i.e. angle between centroid of figure and figures first point becomes horizontal--use golden section searching?)
3- Scale to a reference square and translate centroid to origin
4- Recognize the sketch by comparing it to a template (using Euclidean distance between sketch and template)

Discussion
If it is possible to create enough templates then this method seems to have some potential; however, I think that would be pretty difficult. Also, finding that angle is a pretty complicated thing to do and it doesn't allow for a robust input set as orientation is difficult to reproduce. I would like to see an animation of the system trying to find the correct angle as I think that would be instructive. Also, id they changed the location of the angle end-points then it might be easier to do the rotation.

MARQS

Brandon Paulson and Tracy Hammond
Journal on Multimodal User Interfaces, October 2007

Summary
MARQS is a dual classifier sketch retrieval system that builds stronger associations with sketches the more it is used. Its principle domain for the paper was for finding personal photos and music albums.
The algorithm is broken down like so:
1- The major axis is calculated by finding the two farthest points away from one another and rotating the sketch to make that axis the horizontal axis
2- Determine the bounding box aspect ratio (width/height)
3- Determine Pixel Density (# of black pixels to # of pixels in bounding box)
4- Determine average curvature to be the values of all the points in all strokes divided by the total sketch length
5- Number of perceived corners via segmentation (refer to “Sketch based interfaces: early processing for sketch understanding by T.M. Sezgin et al.)
6- If only a single example exists then calculate the features and compare to the database examples
7- Calculate the normalized total error
8- Display sketches with lowest errors

This algorithm is able to return the correct sketch in the highest ranking 70% of the time, 87.6% top 2, 95.5% top 3 and 98% top 4 (initially only four returned).

Discussion
This is fascinating to me. The feature set doesn't seem to be that robust, but I guess the capture of curvature and density really return some powerful recognizers. I wonder if center of gravity and area could have as much power? It seemed to me that the pixel density is based on only black and white pictures--sketches. I wonder if the accuracy could be increased by allowing the user to color different lines of the sketch. It might be unreasonable to make the user remember what color they made what lines, but if the user kept it simple then it might not be that difficult.

Monday, September 1, 2008

Visual Similarity of Pen Gestures

A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels

Summary
The increasing popularity of pen (stylus) based computer input necessitates a study as to the difficulties associated with learning new gestures to mimic the natural way we write on paper. This paper conducted two studies to establish why users found gestures similar and then took that data and built a gesture design tool to help gesture designers build gesture sets that are easy to remember/perform.
As the heart of this paper is about the experiments, I will forego the "Related Work" section which basically describes the pen interface and the machines that use them, perceptual similarity (changes in geometric and perceptual properties influence the perceived similarity) and multi-dimensional scaling(a way of reducing the data set to allow for pattern recognition in 2- or 3-d).
Trial 1
The purpose of this trial was to determine what measurable geometric properties influenced their perceived similarity (accomplished with MDS) AND to produce a model that could predict how two people would view a similar gesture (accomplished with regression analysis). A gesture set with a wide degree of separability was constructed. Subjects were given a "triad" (three randomly chosen gestures; 364 triads shown to each user) and asked to pick the one that was most different. The 21 users' responses were used to construct the dissimilarity matrix.
The model they built was able to correlate gesture similarities at 0.74.
Trail 2
The purpose of this trial was to study three things:
1- total absolute angle and aspect
2- length and area
3- rotation related features
Because the feature sets became to large to show all possible combinations of the three sets built, a fourth feature set containing features from the first three was built. Subjects where then shown triads from all 4 sets for a total of 538 triads. The model they built using MDS and regression gave a correlation of 0.71.
When the model from the first trial was used on the data of the second trial only a correlation of 0.56 was found and it was even worse for model two on data 1.

Discussion
The goal of the experiments and the way they were conducted was great. I am a big fan of getting people to use the systems we build in order to evaluate them. After all, no matter how clever we think our solutions are, we are often times too close to them to see them objectively. That being said, I am not satisfied that the second part of their research goal was accomplished. For a specific environment being able to correlate at 0.75 might be adequate, but as the goal was to build this tool for any gesture environment, only having a correlation of 0.54 means that the system is just "guessing" and not providing the insights a designer would want/need. Secondly, the authors mention that in their regression analysis they remove features that were "only obtainable by subjective judgement". I am not exactly sure what that means, but it makes their numbers of 0.75 and 0.71 seem a little weaker to me.
I think it would be interesting to repeat this experiment to determine if gender/race/left handed vs right handed people aggregate into similar gesture classifiers.

Specifying Gestures by Example

Dean Rubine
Computer Graphics, Vol. 25, Num 4, July 1991

Summary
Rubine built a hand gesture recognizing architecture called GRANDMA (Gesture Recognizers Automated in a Novel Direct Manipulation Architecture). With this architecture he developed GDP, which is a single stroke gesture recognizer for a small set of gestures.
GDPs gestures are modeled with GRANDMA.
The heart of this paper is in the way the author describes the feature set, the classification, the training, and the rejection so that is what I will focus on.
Feature Set
Done in constant time regardless of the gesture size there are 13 features that are classified. The features are all mathematical functions involving slopes, sums and angles. Refer to the paper for the formulae.
Gesture Classification
Gesture class c has weights that maximize the gesture class.
Training
Training is accomplished with a linear discriminator used to find the weights. Basically, a set of test data determines the feature vectors for a given gesture class by averaging the training data feature sets per gesture. The weights are finally calculated via some statistics using the covariance and inversion of the covariance matrix.
Rejection
Rejection is the act of intentionally not classifying a gesture due to ambiguities. This is necessary because linear discriminators will always classify. Rejection is not necessary as long as a robust undo capability is provided.$$
The great thing about this technique is that it is very accurate for small gesture sets. The author provides some evaluations that show for gesture sets less than 26 the accuracy is better than 98%.

Discussion
As the first true gesture recognition paper I have read I am very excited about the potential. While this paper basically focused on the way something is drawn I would anticipate that there are numerous collisions between mathematically similar shapes to be free in the sketch process. As an editor, it makes sense that a small number of actions need to be recognized, but as a free "piece of paper" the amount of training would be mind boggling. However, I enjoyed this paper as a first step toward targeted recognition and hope that there are some other means of generalizing.

Thursday, August 28, 2008

Introduction to Sketch Recognition

Summary
Tracy Hammond and Kenneth Mock

The paper starts with a description of the Sutherland technology called Sketchpad (reviewed here).
A discussion about passive vs active digitizers follows. A Digitizer is the technology that allows Tablet PCs to recognize where the writing instrument is. Passive digitizers allow the use of finger based input, but had the drawback that if one's palm touches the screen an unintended click will be recorded. This phenominon is called vectoring. Active Disgitizers emit electromagnetic signals that are reflected by a pen or stylus and can be interpreted by the system to be in either hover mode or tapping mode.
Tablet PCs come in two types: convertible and slate. Convertable tablet PCs allow the screen to be rotated and closed with the screen accessible for writing on. In this scenario the screen covers the keyboard. In the slate style the keyboard is removed.
There are also USB connected Pen interfaces. Intuous and Graphire capture both pen data and pressure data. Airliner is a wireless slate in which multiple units can be used to edit the same Smart Board. Another type of input device is the Cintiq 21ux display which is high resolution and can be used as a second monitor.
Software that can be used is Camtasia which can record srceen interaction and audio--making for robust presentations to be watched at later times. Microsoft XP and Vista have handwriting recognition built in, and some versions of Linux can be loaded onto a Tablet PC. Apple offers some hand writing recognition software but no devices as they were burned early in their careers by immature technologies. Hammond has created a software suite called LADDER which can recognize mechanical systems and simulate the systems drawn. Other areas where this type of recognition has taken place is in Chemistry, FSM, and Music.
The FLUID framework allows anyone to design a graphical diagram, create a LADDER doamin, integrate with GUILD (automatically creates sketch recognition and editing specified by the LADDER domain) and load onto a tablet PC for interaction.
Some case studies were discussed: 11th and 12th grade pre-calculus, calculus and trigonometry and 8th grade pre-algebra and algebra. Both examples show how a teacher could benefit from the use of Tablet PCs by describing how they could use them in their classes.

Discussion
This paper was a good overview of the physical devices and software availbale for sketch based capture and processing. The most interesting parts of the paper came from the way the technology can be used to enhance the learning experience and role of the teacher. While I would ike to have seen some more analysis on the success rates of using the technology in classrooms I believe that this is a great avenue for class interaction and facilitated learning not normally seen. It is engaging and active rather than passively listening to a lecture.

Wednesday, August 27, 2008

SketchPad Review

Comments to other bloggers on this paper:

Summary
Ivan E Sutherland, 1963
AFIPS Conference Proceedings Volume 23

Sketchpad is a human-computer interactive system capable of taking input data from a combination of light-pen, light sensing screen and keyboard and manipulate the objects input. By defining some elementary constructs (points, lines, circles) and defing some basic operations on these constructs (magnification, copying, deleting, drawing) interesting results ranging in circuit design to bridge stress analysis can be performed.
The paper begins with an example of building a regular hexagon from an irregular six-sided shape and making it regular by inscribing it into a cicrle. After fixing some constraints (vertexes on circle) the user was able to force the system to construct a regular hexagon. Definitions worth remembering are:
SUBPICTURE: any subpart of the resulting image that can be repurposed as part of another image.
CONSTRAINT: basic relationships between picture parts--vertical, horizontal, parallel lines; points lie on circles, symbols appear in some orientation and relationship between symbols.
COPYING: Making equal copies of a picture part
The underlying structure that makes this work is the Ring Structure which is an expanded version of the n-component element described by Ross (Ross, D.T., Rodriguez, J.E., "Theoretical Foundations for the Computer Aided Design System"). In particular, all references made to a particular block are collected together by an array of pointers originating within that block. The "ring" reference is because the array of pointers starts and ends with the block itself for closure. The ring structure is made dynamically through the use of the keypad.
The basic ring structure operations are insertion, removal, combination (i.e. joing multiple ring structures), and auxiliary operations on ring members in following flink or blink paths. It's the boundary conditions (i.e. 0 and 1 member rings) that are important to handle robustly.
The light pen is the main instrument for getting data into the system and manipulating data already in the system (coordinate input vs. demonstrative input). Pen tracking is accomplished by placing the pen anywhere on the screen (blank space or another line) and moving the pen. When aiming at a line or point, the pen uses only 1/8 inch of its total 1/2 inch sensing area to determine if it is close to an "aimed at" object. Next, it determines which objects are topologically related to the ones seen (favoring end pints and attachment points). To aid in human interaction pseudo pen placement is used to snap to the part that is aimed at as opposed to what is pointed at.
The display uses 20 bits to define the position and 16 bits to associate the spot with the ring structure it belongs to. This results in all spots in a line being associated with the ring structure for that line and all spots in an instance being associated to the ring structure composite for that instance. It also has a resolution capable of magnifying up to 2000 times. Lastly, all the displayed parts are generated by points and the difference equations:
xi = xi-1 +delta(x) yi = yi-1 + delta(y) lines
xi = xi-2 + 2/R(yi-1 - yc) yi = yi-2 - 2/R(xi-1 - xc) circles
Text and digits are presented via special tables made of 36 equally spaced characters.
Abstractions (like only viewing contstraints or digits) can be displayed because the data is inherent in the ring structure storage schema.
Sketchpad uses 3 generic recursive functions:
1) Expansion of instances -- subpictures within subpictures
2) Recursive deletion -- ring consistency must be maintained so removing some parts may remove others
3) Recursive Merging -- combination of two similar picture parts results in the reconstitution of the ring structure to account for the additional picture parts
Attachers are special locations on an instance that allow the instance to be manipulated (ex. added to other pictures) and the light pen has the same affinity for it as it does end points. They are necessary because they allow the combination of instances with each other thus assembling a more detailed picture.
The major improvement sketchpad has over conventional paper and pen is the inclusion of mathemtaical constraints that the computer can reconsile in real time with the user. This parallelism allows some degree of analysis in the design phase of certain actions. For example, by building constraints into various line segments, bridges can be built and analysed for load stresses with some degree of veracity.
Constraint satisfaction is accomplished most frequently with a one-pass-method instead of relaxation. This method attempts to find constraints that will constrain multiple variables simultaneosly. Such variables are said to be adjacent. If a variable with few constraints can be found to satisfy all of them it is said to be free. As such, it no longer adds to the constraint pool as it is solved. There is now a ripple effect through the rest of the constraints because of its freedom which might result in the freeing of other variables.
The author shows the usefulness of sketchpad with a few examples. In particular, he shows how sketch pad handles linkages, dimension lines, bridges (load analysis), artistic pictures (girl winking), and circuit design (apprently not very easy even with sketchpad).
Lastly, some future work in three dimensional drawings are discussed. Johnson is doing the predominate amount of research in here and is going to make it compatible with Sketchpad.

Discussion

This paper is interesting because it attempts to bridge the gap between computer as a box that a human uses to crunch numbers and computer as a sophisticated tool to help solve problems by working with the user in the users natural mode of solving problems. It is interesting because of the use of the ring structure to capture not only data points but also constraints and other meta data used in describing the pictures. It is also interesting how recursion can be used so succesfully to generate the complex instances that can be created with the system.
I think this system could be extended to include newer technologies like touch screens and larger color displays. It would be interesting if the system had the ability to color code the lines based on the context of the picture drawn. For example, if someone built a bridge and added load constraints, the line would turn red when the structure exceeded the constraint. Not too sure that is good research or just an improvement on a system, but it would be cool.

Tuesday, August 26, 2008

Ben's First Blog

image of me coming soon :-)

bullweister at gmail.com (bullweister@gmail.com)
BS Chemical Engineering Texas A&M
MS Computer Science Texas State
1st year PhD

I am taking this class because I want to build an understanding of where the science is behind Sketch Recognition so that I can help to mature this field and possibly extend it to others as well.

Probably the most important experience I have is from my time with Applied Research at USAA in San Antonio. It was there that I first saw the types of technologies that are helpful in business and in communication in general. During my tenure at USAA my teams were able to provide innovative solutions to both the business side and the education side of the house.

In 10 years I want to be a professor at an established university wherein I will be in charge of a lab that alters human-computer interaction. I will also continue my roll as an entrepreneur and build businesses that can help people all over the world.

I think the next biggest advances of computers will be in the way they engage us in their processing. In particular, immersive environments and learnign systems that can help solve problems with creative explorations of their own design rather than from algorithmic formalism. Sketch recognition is a great way to bring this paradigm about.

Best courses were neural networks and AI.

I would become a wolf if I could change into any animal. Wolves are great examples of freedom and cooperation.

Favorite Slogan: "Quickly friends! Toward the danger!" -- MST3K quote
Favorite Movie: The Lord of the Rings Trilogy
Interesting Fact: I have scored from half field in competative soccer 3 times in my life :-)

Ben's Blog