Perceptual Organization is about finding salient or significant groupings, collections and labelings of image data that identify visual structure and events supporting later interpretation and reasoning processes. An example is identifying as a single object the collection of disparate curve fragments comprising the slash through the word, "grouping," below.
Our visual systems see the slash as a single object. How do we do that? How can we get machines to do that? I offer partial answers in my CVPR '92, and CVGIP, '93 papers.
More recently I have been looking at labeling of image structure such as occlusion relations in "construction paper" worlds. A classic example is the Kanisza Triangle, in which we perceive a white triangular surface in front of black circles and a black triangle. I have introduced an appropriate ontology for labeling various ways that overlapping surfaces of different lightnesses can generate contours, T-junctions, and L-junctions, and I have developed rules for assigning labels to contours and junctions that accords with human perception. See my paper on this.
|
|
| Kaniza Triangle |
Optimal contour and junction labeling of the Kaniza Triangle found by my program.
Arrows indicate direction of surface overlap. Thin lines (other than contrast
edges) are modal completion edges; dashed lines are amodal completion edges
(occluded contours). Full size image |
Check out my java transparency applet that explores the interpretation of surface lightness and transparency at X-junctions. (This applet uses Java AWT 1.1 functions which unfortunately are known not to work in Netscape 4.7 on the MacIntosh. Also, it can take a little while to load all the classes so you may need to be patient).
One of the applications for this work is the ZombieBoard Whiteboard Scanner. The user interface to ZombieBoard is a real-time computer vision system that monitors the board, watching for people to draw commands.
Another application of perceptual ogranization is perceptually supported image editors. Our research prototype is called ScanScribe. UIST '03 paper.
Symbolic Token Grouping on the Scale-Space Blackboard
Following Marr's Primal Sketch ideas, my approach is to label visual events explicitly by associating them with symbolic tokens. Tokens are maintained on a spatially- and scale-indexed data structure called the scale-space blackboard, PAMI '90.
Lattice Clustering
Token grouping algorithms are employed to examine the spatial relationships among tokens and identify---and perhaps label as new tokens--- configurations of tokens that satisfy criteria for perceptual significance. In some ways these grouping procedures resemble conventional clustering algorithms, except the result must be not a tree, but a lattice, ICCV '95.
Visual Routines
A computational bottleneck occurs when many grouping algorithms might profitably be run and many potentially significant configurations could be labeled. Deciding how, among the various regions of the image, data, and knowledge space, to deploy limited computational resources is a fundamental problem addressed by Visual Routines. I think we need a "Feynman Diagram" of Visual Routines to help us design our programs and understand how they interact with the visual world.
Unsupervised Learning
Salient perceptual-level structure in images reflects regularities in the visual world. Most visual systems cannot be hardwired to detect all the significant visual events of use to them. Instead, visual systems need to discover regularities on their own, through learning. One theoretical problem in learning is figuring out how to partition observed data according to different underlying phenomenon. An example is noticing that the same figure appears on the the left and right sides of the picture below, but because of its neighbors, one might preferably be interpreted as an X on a line, the other as touching arrows.
I have done some work on unsupervised learning of regularities in binary data when there are interacting underlying causes, known as the Multiple Cause Mixture Model, Neural Computation '95. I have also done work on detecting regularities in data arising when high-dimensional data is found to be constrained to lower-dimensional manifolds, PAMI '89.
Texture Scale Space
Texture is what we call it when too much is happening to describe a configuration of data in detail, so we resort to aggregate descriptions. A given pattern might appear as a simple configuration of elements when viewed close up, through a narrow window, but as a complex "texture" when viewed at a broader scale. I have made two attempts to build programs reflecting the notion of a texture scale-space. I have a story, and a position paper.
Mid-Crack Contour-Based Thinning
I have developed a pretty darned fast, efficient, but rather hairy algorithm for thinning binary figures based on eroding the boundary contour. One trick is to encode the boundary using a "mid-crack" representation. This is on my list of things to write up.
Calibration of a Pan/Tilt Head
I developed an algorithm to calibrate the parameters of a pan/tilt device in a workspace, which is quite different from calibrating a camera. It's a big hairy nonlinear optimization. I have not written this up yet; Hollerbach and Bennet worked on calibration of a general closed linkage, which is probably related.
Back to Eric Saund's Home Page.