|
Technical Overview of DataGlyphs®
PARC DataGlyphs are a robust
and unobtrusive method of embedding computer-readable
data on surfaces such as paper, labels, plastic,
glass, or metal.
How Data Is Encoded
DataGlyphs encode information
- text, data, graphics - in thousands of tiny glyphs.
Each glyph consists of a 45-degree diagonal line,
as short as one one-hundredth of an inch or less,
depending on the resolution of the printing and
scanning that's used. Each glyph represents a single
binary 0 or 1, depending on whether it slopes to
the right or left.
Leaning either
forward or back, DataGlyphs represents the ones
and zeroes in binary digital data.
The glyphs are laid down in
groups on a regular, finely spaced grid forming
unobtrusive, evenly textured gray areas. Even when
individual glyphs are large enough to be resolved
by the human eye, in groups they form a pleasing
pattern that is not distracting.
Robustness and Error Correction
In addition to the data, each
DataGlyph contains an embedded synchronization lattice
or skeleton - a repeating fixed pattern of glyphs
that marks the DataGlyph's boundaries and serves
as a clocking track to improve the reliability of
reading. Groups of glyphs, representing bytes of
data, are laid down within this frame.
The data is grouped into blocks
of a few dozen bytes each, and error-correction
code is added to each block. Individual applications
determine the amount of error correction necessary.
Of course, higher levels of error correction require
larger overall DataGlyphs for a given amount of
data, but improve the reliability with which the
information can be read back - often a worthwhile
trade-off, especially when the DataGlyph will sustain
a high level of image noise (for example, during
fax transmissions) or when the glyph block will
be subjected to rough handling.
For reliability, each
DataGlyph contains a measure of error correction
appropriate to the application. Glyphs are also
randomized to sustain the integrity of the data
through damage to the document and laid into a synchronization
frame.
As a final step, the bytes of
data are randomly dispersed across the entire area
of the DataGlyph. Thus, if any part of the DataGlyph
is severely damaged, the damage to any individual
block of data will be slight, and the error-correcting
code will easily be able to recover all the information
encoded, despite the damage.
Together, built-in error correction
code and data randomization give DataGlyphs a very
high level of reliability, even in the face of damage
from ink marks, staples, coffee spills, and the
other vicissitudes of a paper document's life.
Superior Data Density
The amount of data that can
be encoded in a DataGlyph of a given size will vary
with the quality of the imprinting and scanning
equipment to be used.
DataGlyphs offer a data density nearly twice that
of PDF417, one of the most popular forms of 2d barcodes.
 |
For example, with one- and two-dimensional
bar codes, the minimum feature size that can be
used is 0.0075inch - three dots at 400 dpi. At that
density, and with minimal height, Code 39 (the most
commonly used general-purpose linear bar code) can
only achieve a density of about 25 binary bytes
per square inch. Code 128 can achieve about 40 bytes
per square inch.
The two-dimensional bar codes,
such as PDF417, do much better. achieves a maximum
data density of 2,960 bits (or 370 binary bytes)
per square inch, with no error correction, at 400
dpi. But with realistic error correction of 27%,
the effective data rates for PDF417 are about 270
bytes per square inch.
At the same resolution and level
of error correction, DataGlyphs can carry nearly
500 bytes per square inch.
As with other visual encoding
schemes, the density of DataGlyph encoding is determined
by four factors:
- The resolution at
which the encoding is created and scanned.
High-resolution devices such as office laser printers
and document scanners permit denser marking patterns,
and thus denser encoding, than low-resolution
devices such as dot-matrix printers and fax machines.
- The amount of error
correction used. The process of printing
and scanning unavoidably degrades image-encoded
data. In high-density encoding, where the print
and scan defects are likely to be large compared
to the encoding feature size, more of the encoding
features will be lost or misread as a result of
such degradation. As a countermeasure, some system
of redundancy must be used to keep the failure
rate within reasonable bounds - that is, for error
correction. And redundant coding consumes extra
space, reducing the effective data density. Again,
how much error correction must be employed will
vary from application to application. But there
must always be some in any real-world application
of any encoding scheme. The data densities that
can be achieved using no error correction are
theoretical upper bounds, unlikely to be of practical
use.
- The data compression
used. Data can be compressed as it's
encoded, For example, if all the data is numeric,
there's no need to use one byte (8 bits) per digit,
3.32 bits will suffice. When text is encoded,
it can be compressed by factors of two or more
by means of character encoding or other compression
techniques. For example the full text of the Gettysburg
Address, often used to demonstrate high-density
encoding contains 268 words, or about 1,450 characters.
But the entire speech can easily be represented
in less than 900 bytes.
- The fixed overhead
of the synchronization frame and header.
For DataGlyphs, the synchronization frame is a
fixed proportion of the data area. DataGlyphs
also have a very small fixed header.
The size of the DataGlyph
required to encode 80 bytes of information depends
on the device(s) to be used for printing and/or
scanning. DataGlyphs for faxing are often drawn
disproportionately large for added reliability in
the face of the "noise" that frequently affects
fax images.
|
 |
|