Tracking Puzzle Pieces For A Smart Objects Interface

Lori L. Scarlatos

Department of Computer and Information Science
Brooklyn College of the City University of New York

Technical Report 1-98

ABSTRACT

Smart object interfaces enable computers to respond to one or more users' manipulations of a physical environment. These unobtrusive interfaces are especially well suited for providing guidance in collaborative learning environments because they allow students to play and explore quite naturally. In this effort, puzzle pieces are the smart objects in an interface designed to help middle school children to understand and appreciate mathematical and scientific concepts. This technical report explores different strategies for tracking the smart objects, and describes my approach for this project.


1. Introduction

Graphical user interfaces rely on metaphors to achieve a familiarity that allows people to learn to use them with minimal effort. Yet nothing is as easy as manipulating things in the real world. Consider, for example, a group of children attempting to solve a puzzle on a computer. One child - who is particularly adept at translating real-world actions into cursor manipulations - operates the mouse while the other children passively look on. Now compare this to a situation where a group of children is working with a 3D puzzle. They're all on the floor; everyone is participating and actively learning.

With smart objects interfaces, a computer responds to one's very natural manipulations of objects in the physical realm. This allows people to focus on the task at hand without having to worry about how to give instructions to a machine. Yet building a smart objects interface is non-trivial. A key problem is tracking multiple objects simultaneously in a noisy (real world) environment.

This report traces my search for a solution to this problem. It begins by describing the motivation for this work: a museum of mathematics with great puzzles for learning with, but not enough staff to provide the level of help that the visitors need. The next section documents my quest for technologies to track the smart objects with, which included a very informative visit to the MIT Media Laboratories. I then discuss the strategy that I settled on, the algorithms being employed, and the parameters that I am experimenting with. Conclusions summarize progress thus far and describe the next steps in this project.

2. Motivation: Learning With Mathematical Puzzles

For many students, math and science are "hard" subjects that must be avoided at all cost. Yet all children - even infants - are intrigued by physical puzzles. Babies will spend hours putting things in containers and taking them out again; older children will see how high they can pile things before they fall down. For many scientists and mathematicians, our job is to solve puzzles. By showing children the physical puzzles behind math and science, we can gain their interest and help them to understand and appreciate more abstract concepts.

Bernard Goudreau, an engineer and mathematics teacher, recognized this possibility and rose to the challenge. He built twenty-two mathematical puzzles and activities, and installed them in his Museum of Mathematics in Art and Science which he founded in 1980. Located in New Hyde Park, NY, this unique learning and resource center reaches more than 15,000 people annually through workshops, programs, special events and exhibitions. A large number of the visitors enjoy working with the puzzles most of all. Yet the help of a skilled instructor is often needed to clarify puzzle objectives, remind players of the rules, provide helpful hints when the players are "stuck", encourage players when they're on the right track, and explain the underlying mathematical concepts. With only one instructor available for each group of up to 35 visitors, students don't always get the help they need, and so they give up in frustration. The smart objects project was initiated to overcome this problem. Puzzles equipped with smart object interfaces can passively "observe" students as they play, offering help, hints, reminders and explanations only when they are needed.

Figure 1. The Tangram

To test these ideas out, I chose to implement a smart objects interface for an old Chinese puzzle known as the Tangram. Five triangles, one square, and one parallelogram, precisely cut from a large square, make up the pieces of this puzzle. Although one may choose to reconstruct literally hundreds of different shapes with the Tangram pieces, the first (and most important) challenge is to reconstruct the square from the pieces. In solving this initial problem, one may discover underlying geometry principles

3. Background: Sensor Technologies and Tangible Media

Initially I thought that I could use either ultrasonic or magnetic sensors to track the multiple pieces. Scientists, after all, can track animals with radio collars. Libraries tag books to keep track of what's coming and going. Department stores tag articles of clothing to deter shoplifters. Now consumers may even drive through a tollbooth without stopping, or they may show a tagged credit card to start up a gas pump, and they will be billed later.

A sensor may be defined as a device that receives a signal or stimulus and responds with an electrical signal [Frade97a]. Different types of sensors are designed to detect different types of stimuli. For example, magnetoresistive sensors can detect the proximity, position, or rotation of the source of a magnetic field. Acoustic sensors detect sound waves within a given spectral range. These sound waves may be either emitted from a source, or emitted by the sensor and reflected back by the source. Alone, they are capable of detecting relative proximity of the sound waves; triangulating results from a pair of acoustic sensors can yield a three-dimensional location. Piezoelectric devices, which generate an electrical charge when subjected to stress, are used to sense bending of materials. Photoelectric sensors can best detect motion or the presence of an object [Banne96a]. The inertial navigation systems of aircraft employ sets of orthogonal accelerometers and gyroscopes [Verpl96a].

While tracking the smart objects, I wanted to satisfy these objectives:

Sensors are getting smaller and cheaper all the time, and I found several examples of computer interfaces that deploy them. Sensors have recently been used to track human movement, so that the human body becomes the input device. For example, sensors can detect head motion and hand gestures in virtual reality interfaces [Mulde94a] and performance art [Osull]. More recently, researchers in the Physics and Media group at the MIT Media Laboratories have been using the user's body as an electrical conductor for graphical user interfaces [Smith98a] and intra-communication among devices on the body [Zimme96a]. Other researchers at the MIT Media Labs are actively working on tangible interfaces and self-sensing devices. Some in the Physics and Media group are investigating self-sensing everyday objects [Verpl96a]. The Epistemology and Learning group has developed programmable play objects that allow children to build their own robots and other devices that communicate with one another [Resni96a]. The work being done by the Tangible Media group relates most directly to the smart objects effort [Ishii97a, Gorbe98a, Under98a]. This group is investigating ways of using physical objects and environments as input and output devices.

However in most of these cases, only one object is being sensed or tracked at a time. Although networks of sensors are used in military applications [Abbas96a] and the tracking of animals, these sensors do not have the precision required to track small puzzle pieces on a table. In computer graphics, motion tracking relies on computer vision techniques to accurately track multiple articulation points on an actor [Foley90a]. Yet this typically is done in a controlled setting where street clothing and other visual noise are not factors.

Given that the current literature didn't address my needs at the time, I decided that my best course of action would be to visit the MIT Media Laboratories. Several good ideas came out of my meetings at the MIT Media Laboratories in January 1998 (see Appendix A). The remainder of this section reviews these potential strategies for tracking smart objects.

3.1 Electromagnetic Coils

Electromagnetic coils encoded with identifying data may be embedded in chips, credit cards, and other small objects. A wired sensor - in a table top, mouse pad, etc. - may then "read" the data from the coils that come in contact with it. In a demonstration that I saw, a web browser goes to a URL that it reads from a chip that is placed on the wired mouse pad.

This would be useful for detecting which puzzle pieces are on the table top, but could not determine where they are on the tabletop. I could get around this by associating different sensors with different regions of the table, and expect students to place puzzle pieces in the correct spots. For the Tangram, this would require a networked grid of sensors in a restrictive puzzle frame. That might be overly complicated and expensive to implement.

3.2 Infrared Emitters and Receivers

Sharp infrared units emulate television remote controls. Controlled by programmable integrated chips (PIC), the infrared emitter communicates with an infrared receiver, also controlled by a PIC. A good deal of information may be packed into this infrared signal. I saw this demonstrated in something called GroupWear: conference badges that communicate with one another.

One problem with this approach is that the infrared emitters are rather delicate, and might be broken by the rough handling they would encounter in the math museum. Although they could be embedded in the puzzle pieces, the emitter and receiver must then be perfectly aligned so that they may "see" one another. This would not work well for a free-form puzzle like the Tangram.

3.3 Triangles

In the Triangles project, Maggie Orth and Matt Gorbet developed triangular pieces (very much like puzzle pieces) that have an embedded 8-pin PIC to identify each one [Gorbe98a]. Serial connectors extend to the edges, where they hook up with adjacent puzzle pieces. Magnets on the edges help to properly align the triangles. A "mother triangle" provides the power for the entire puzzle.

This is a more attractive approach because the puzzle pieces need only to be aligned with one another, which is necessary to solve the puzzle. The difficulty is selecting a piece that is wired for power, and probably should not be moved. For puzzles that fit within a frame, this power supply could come from the frame. However, once again, the puzzle pieces become too delicate for middle school students, and too expensive for this project.

3.4 Capacitive Transistors

Using the players' bodies as conductors, a set of receptors embedded in the table could detect the precise position of a student's hand over the table. The student would have to be standing on a floor pad generating a low voltage electrical current. Resistors could then be embedded in the puzzle pieces, so that sensors in the table could detect the overall resistance. This would indicate which puzzle piece was being lifted and moved.

This approach has several problems. First, we couldn't tell which piece was being moved if the player were to simply slide it into place (rather than lifting it). Second, it is not clear that the receptors would be able to effectively track multiple hands moving multiple puzzle pieces. Finally, the hardware required to set up the environment may be prohibitively expensive.

3.5 Computer Vision

John Underkoffler has implemented a "luminous-tangible" interface that uses computer vision to track multiple objects [Under98a]. Each object in his system is marked with a pattern of color spots, created with 3M reflective tape overlaid with colored cellophane. A light source next to the camera causes those spots to appear much lighter than everything else in the scene. This makes it easy to threshold out all of the irrelevant details. He wrote his own computer vision package to determine the orientation of the pieces based on the order of the spots.

This is the most promising approach that I saw. The materials - including the camera - are relatively cheap and easy to setup. Furthermore, this is the only approach I saw that demonstrated the tracking of multiple objects. One problem with this approach is that of occlusion: if part of the pattern is covered, it will be difficult to tell what a piece is. Interference might also be caused by reflective articles of clothing or jewelry.

4. Computer Vision Approach

This investigation of sensor technologies convinced me that, for now at least, computer vision is the most feasible means of tracking multiple objects simultaneously. I therefore decided to adapt Underkoffler's approach for the smart objects. This adaptation incorporates the following changes.

  1. Cover all or most of each puzzle piece with a reflective pattern. This enables the software to identify a piece even when a player is holding it or another piece is partially covering it. This also allows me to extend these ideas when I turn to 3D puzzles, because I won't have to worry about the pattern facing away from the camera.
  2. View the puzzle pieces from below through a Plexiglas table. Because most pieces should be flat on the table most of the time, this greatly reduces the number of frames in which one or more pieces will be partially obscured.

The remainder of this section describes how computer vision is being used to track smart objects.

4.1 Tagging Smart Objects

There are several ways that puzzle pieces may be tagged with reflective patterns or coverings. Here are some of the more promising approaches.

Mark each puzzle piece with a pattern of tinted reflective spots. Underkoffler's approach is attractive because of the relative simplicity with which pieces may be found and tracked. In fact, software is commercially available for tracking small spots of color such as this [Rozin98a]. The biggest drawback is that those markings might easily be covered when the piece is in play.

Cover each puzzle piece with a unique tinted reflective surface. Reflective road paint, used to paint reflective dividers on roadways, provides the base paint. Decorator tints are used to produce different colors that identify the individual pieces. In puzzles with only a few distinct pieces (there are five in the Tangram), it is easy to produce distinct colors. Then color quantization may be used to do both segmentation and feature extraction at the same time. The problem in larger puzzles is coming up with enough distinct tints that are easily separable.

Cover all pieces with the same reflective surface, and use the geometry to distinguish one piece from another. Dark edges (created with electrical tape) can help to delineate the pieces. Although this eliminates the problem of producing distinct tints for all the pieces, this further complicates the identification process.

Figure 2. Two different bar code patterns for identifying puzzle pieces. These patterns are distinguished by the number of narrow bars between each pair of wide bars.

Cover the pieces with bar-code-like patterns. This must be a repeating pattern, so that the code can be detected even if part of the piece is obscured. One simple way to encode the pieces is to vary the number of narrow bars between each pair of wide bars on the piece, as shown in Figure 2. The number of possible patterns may be extended by coloring the bars, though it is important to remember that patterns may be viewed backwards (i.e. upside-down).

4.2 Computer Vision

Computer vision algorithms are typically robust and reliable as long as they are used appropriately for the situation. Therefore the algorithms being implemented are not new. Rather, I have adapted common techniques for my own purposes, making assumptions to make the code more efficient.

A color QuickCam captures images of the puzzle on the table at regular time intervals. The code that processes these images is written in C. Given a color image and a model of the pieces, the software returns identifiers indicating where each piece is in the scene. Missing or partially obscured pieces are also noted.

Puzzle pieces are identified in the following steps:

  1. Segment the image, finding the colored areas or spots marking the puzzle pieces.
  2. Extract features, correlating the segmented areas to internal models of the pieces.
  3. Determine the positions and orientations of the pieces.

After identifying all of the pieces in this manner, spatial relationships among the pieces may be determined. By comparing images over time, it is possible to focus on those areas where the puzzle arrangement has changed, thereby reducing processing time. The tracking module returns information about the position and orientation of puzzle pieces. Missing or partially obscured pieces are also noted.

4.2.1 Image Segmentation

Because the software is looking for specific reflective colors in the scene, background noise may be eliminated readily. An area of a particular color is found by scanning the image for a pixel of the specified color (within a tolerance ?), and then growing the region around that pixel to include all neighboring pixels of the same color. The remainder of the image is then scanned for additional areas of the same color. This procedure is repeated for all specified colors.

4.2.2 Puzzle Piece Model

Each distinct type of puzzle piece is represented by the following information. This information is established in a separate setup procedure.

In addition, I track the following for each piece:

4.2.3 Puzzle Piece Methods

Puzzle pieces support a series of methods that aid the tracking process by updating the attributes listed earlier. These methods include the following.

5. Conclusions

This report explores techniques for tracking smart objects and describes the strategy that I have adopted for my initial experiment. Yet tracking smart objects is only the first step in the process of developing a smart objects interface. I plan to take the following next steps.

6. Acknowledgments

I thank Bill Wilhelms and Tony Scarlatos for initially suggesting the problem of smart objects, and for introducing me to the Goudreau Museum of Mathematics in Art and Science. Thanks also go to Beth Deaner, the director of the Goudreau Museum, and her staff for being so helpful and supportive throughout this project. I am grateful to Brian Smith, John Underkoffler, Brygg Ullmer, Josh Smith, and Richard Borovoy at the MIT Media Labs for showing me their own innovative work in user interface design, and brainstorming on possible interfaces for the Goudreau Museum puzzles. Others who provided valuable input are Dayton Clark, Yuliya Dushkina, Natalya Griogoryev, and Shalva Landy at Brooklyn College. Finally, I wish to thank the National Science Foundation for supporting this work through a POWRE research planning grant.

7. References

[Abbas96a] Abbas, H, Zue, DP, Farooq, M, Parkinson, G, Blanchette, M: Track-Independent Estimation Schemes for Registration in a Network of Sensors, Proceedings of 35th Conference on Decision and Control, pp. 2563-2568, 1996.

[Banne96a] Banner Engineering Corp.: Handbook of Photoelectric Sensors, 1996.

[Foley90a] Foley, J, Van Dam, A, Feiner, S, Hughes, J: Principles of Computer Graphics, 2nd Edition, Addison-Wesley, 1990.

[Frade97a] Fraden, J: Handbook of Modern Sensors, 2nd Edition, American Institute of Physics Press, Woodbury, NY, 1997.

[Gorbe98a] Gorbet, M, Orth, M, Ishii, H: Triangles: Tangible Interface for Manipulation and Exploration of Digital Information Topography, Proceedings of CHI '98, pp. 49-56, 1998.

[Ishii97a] Ishii, H, Ullmer, B: Tangible Bits: Towards Seamless Interfaces Between People, Bits and Atoms, Proceedings of CHI '97, pp. 234-241, 1997.

[Mulde94a] Mulder, A: Human Movement Tracking Technology, Simon Fraser University School of Kinesiology Technical Report 94-1, 1994.

[Osull] O'Sullivan, D: Physical Computing: A Hands On How To Guide for Artists, http://www.itp.tsoa.nyu.edu/~alumni/dano/physical/physical.html.

[Pavli82a] Pavlidis, T: Algorithms for Graphics and Image Processing, Computer Science Press, 1982.

[Resni96a] Resnick, M, Martin, F, Sargent, R, Silverman, B: Programmable Bricks: Toys to Think With, IBM Systems Journal, vol. 35, nos. 3&4, pp. 443-452, 1996.

[Rozin98a] Rozin, D: Track Them Colors, available online at http://www.itp.nyu.edu/~danny/Xtras.html.

[Smith98a] Smith, J, White, T, Dodge, C, Allport, D, Paradiso, J, Gershenfeld, N: Electric Field Sensing for Graphical Interfaces, IEEE Computer Graphics and Applications, May 1998.

[Under98a] Underkoffler, J, Ishii, H: Illuminating Light: An Optical Design Tool with a Luminous-Tangible Interface, Proceedings of CHI '98, pp. 542- 549, 1998.

[Verpl96a] Verplaetse, C: Intertial Proprioceptive Devices: Self Motion- Sensing Toys and Tools, IBM Systems Journal, vol. 35, nos. 3&4, pp. 639- 651, 1996.

[Zimme96a] Zimmerman, TG: Personal Area Networks: Near-field Intrabody Communication, IBM Systems Journal, vol. 35, nos. 3&4, pp. 609-618, 1996.

Appendix A. A Visit to the MIT Media Labs

I visited the MIT Media Laboratories on January 23, 1998 at the invitation of assistant professor Brian Smith. When I went there, I planned to implement the Tower of Hanoi puzzle first, hence the frequent references to that puzzle. This appendix documents what I saw and did on that day.

10:00 am - Brian Smith

Dr. Smith demonstrated a "mouse pad" that plugs into the computer; placing thin plastic disks on it causes web browser to bring up URL encoded in the disks. Electro- magnetic coils are embedded in disks containing encoded information. Power and sensors in the mouse pad read the information and send it to the computer. This technology was developed by Swatch; the chips are produced by EM Microelectronic-Marin Sarfid. A potential application is embedding coils in department store credit cards. Placing a card on the mouse pad would allow customers to directly charge the department store over the web.

10:30 - 11:00 am - Marina Bers (Epistemology & Learning)

The focus of this work is trying to learn about children's "inner world" through narrative. Ms. Bers created a program that allows children (or others) to create a "sage" that will listen to a story and then tell a related story intended to teach something. The sage is represented by a photograph, and is accompanied by a stuffed rabbit "assistant" driven by small motors. Potential relationships are pre-programmed. The author writes the stories and key words used to describe and access those stories. The author also scripts the interaction between user and sage and assistant using graph representations. A similar process programs the behavior of the assistant. A "handi-board", a circuit board with a small computer programmed in a logo-like language, drives a series of motors that cause the assistant to move. The animal is plugged into a box which is a source of power and connection to the computer. This application targets 9-11 year olds, although the test group (children in a hospital awaiting transplants) ranged from ages 9 to 18.

11:00 - 11:30 am - Mitchel Resnick (Epistemology & Learning)

Dr. Resnick, director of the group, focuses on having children learn by building behaviors. He doesn't believe a computer can effectively teach by just telling students about things. He uses a Cricket, a small computer board with a programmed chip (using a logo-like language), powered by small batteries, with infrared lights and sensors for communication. The program is downloaded from the computer to the Cricket via infrared. Cricket is used in Legos and interactive sculptures to drive the objects as determined by the program on the chip. Children build these devices in a "computer clubhouse" (initially in Boston's computer museum, now being moved to community centers) where they can work on a single project over an extended period of time. One problem has been a prevalence of boys who tend to form "techie" cliques which might intimidate girls and newcomers. To overcome this, the clubhouse now offers a "girls day" on which only girls may come. This project targets middle school children.

Other applications are being investigated. Beads on a necklace communicate with one another through inductive coils (must be in direct contact with one another). Behavior of the bead (usually sending/blocking infrared signals) is programmed on a chip and powered with a watch battery. Combining different beads produces different patterns of flashing lights on the necklace. A ball with an accelerometer (and Cricket) embedded inside can detect sudden accelerations such as throwing the ball in the air and catching it.

11:30 am - noon - Josh Smith (Physics & Media)

The Media Lab focuses on disconnecting content from representation (such as a newspaper, book, radio, etc.). It's natural to try to couple information with new representations. Mr. Smith is primarily working with capacitive sensors, which send forth (and detect) low levels of electrical charge. The advantage of using electrical fields is that they are not sensitive to metals (unlike magnetic fields). A receiver is able to detect very slight changes in position, although translation to absolute position values is non-trivial. Applications include:

Other relevant technologies are:

Josh says that tracking multiple object positions is a yet unsolved problem that a lot of people are working on. Some ideas for the smart objects that he offered are:

noon - 1:30 pm - lunch with Brian Smith

1:30 - 2:00 pm - Kevin Brooks (Interactive Cinema)

Mr. Brooks is working on narrative, providing tools that help writers to write better. He is interested in answering the question: how can you tell the story better?

He suggests talking to Carol Strohecker (formerly at MIT, now at Mitsubishi) who is working on mathematical puzzles in software (stro@merl.com).

2:00 - 2:30 pm - Philip Tiongson (Interactive Cinema)

This work focuses on gathering information for storytelling (e.g. news stories on the web). How can one reveal context while telling the story?

2:30 - 3:00 pm - Rick Borovoy (Epistemology & Learning)

Mr. Borovoy is working on GroupWear: tools for collaboration using wearable computing devices. For example, "name tags" that hold wearer's response to 5 questions communicate with one other via infrared. Tags contain infrared units (Sharp) that emulate Sony remote controls. The Sharp emitter/receiver is controlled by a pic (microprocessor - available from Digikey, an electronics component store). Pics are usually programmed using an assembly-like language, but a Basic Stamp is programmable in Basic.

He suggests drilling holes in the Tower of Hanoi disks and embedding laser emitters in them, and then placing laser receivers in the poles. The big difficulty is that perfect line-of-sight must be achieved. A possible solution would be to use square pegs and holes with receivers (or emitters) on all 4 sides.

3:00 - 4:00 pm - Sola Grantham, Erin Panttaja, Kimiko Ryokai (Gesture & Narrative Language)

This group is working on technologies for getting people in cybercafes to interact. Examples are a table top keyboard that users must reach across, and a tabletop that generates a character on a centrally located screen that can interact with other characters from similar tabletops. Also working on games that play on language.

They suggest getting the students (i.e. users of the puzzles) to come up with hints and useful guides that would help people in their age group to understand the puzzle without solving it for them. They also suggest providing simpler puzzles (e.g. Tower of Hanoi with fewer disks) to help students to think about the problem. Then the puzzle doesn't really need to know what the students are actually doing (just provide it when they hit the "help" button).

4:00 - 4:30 pm - Brygg Ullmer, John Underkoffler (Tangible Media)

Mr. Underkoffler is working on a luminous-tangible interface that uses in I/O bulb. This is composed of an incandescent light next to a video camera (Panasonic KS-152) filtered with semi-opaque plastic (for input) and an InFocus LitePro 620 projector (for output). Objects on the table are identified by colored spot patterns, which are captured with glimpser software and interpreted by voodoo software, all running on a SGI O2. Spots are created with pieces of 3M scotch light reflectors (available at Pearl Paint or a bicycle shop) overlaid with colored cellophane. Co-locating the light source with the camera causes the spots to appear much brighter than anything else in the scene, allowing thresholding.

Mr. Ullmer is working with blocks tagged with a resistor ID. This ID is used to access a central database from a variety of devices: a whiteboard, a printer, a projector. This needs two wires (ground and power) to read the resistor information.

The triangles project uses an 8-pin PIC to identify each triangle. Triangle information is passed between triangles via serial connectors along the edges that make contact when two triangles are attached (via a series of magnets). Power comes from a "mother triangle", which gets the whole thing communicating.

Touch memories (a.k.a. eyebuttons, available at http://www.eyebutton.com, made by Dallas Semiconductor) store a unique identifier in a small (watch battery size) container. Another identifying chip was less than 1 cm square. Again, this needs two wires (power and ground) to read these memories. Brygg suggests making the Tower of Hanoi center post ground, and give an outer layer power for reading the memories. Although order can't be detected, it can track adding/removing disks to maintain an accurate picture of the towers' states.