|
|
|
|
Recent publications by SPIA members
Reverse chronological order, then alphabetical by first author.
Hide abstracts
Availability and distribution subject to terms of respective
copyright agreements.
In particular, the availability of some items might be delayed until
date of publication, and other items might not be available for general
download at all. If you are interested in an item for which a download
link does not appear, please contact the author, as often the
copyright agreements allow limited personal distribution.
SPIA'ers, to update your publications, follow the directions
here.
|
|
E. Gaussier, C. Goutte, K. Popat, and F. Chen, ``A hierarchical model for
clustering and categorising documents,'' in Proceedings of the 24th
BCS-IRSG European Colloquium on IR Research, March 2002.
to appear.
BibTeX entry
We propose a new hierarchical generative model for textual
data, where words may be generated by topic specilic distributions at
any level in the hierarchy. This model is naturally well-suited to clus-
tering documents in preset or automatically generated hierarchies, as
well as categorising new documents in an existing hierarchy. Training
algorithms are derived for both cases, and illustrated on real data by
clustering news stories and categorising newsgroup messages. Finally,
the generative model may be used to derive a Fisher kernel expressing
similarity between documents.
|
|
|
P. Sarkar, ``An iterative algorithm for optimal style-conscious field
classification,'' in [Submitted for review] Proceedings of the sixteenth
ICPR, (Quebec City), IEEE Computer Society Press, 2002.
BibTeX entry,
Available here
|
|
|
P. Sarkar, H. S. Baird, and J. Henderson, ``Triage of ocr output using
'confidence' scores,'' in [accepted for publication in] Proceedings of
SPIE/IS&T 2002 Document Recognition & Retrieval IX Conf. (DR&R IX), (San
Jose, California, USA), January 20-25 2002.
BibTeX entry,
Available here
|
|
|
G. E. Kopec, M. R. Said, and K. Popat, ``N-gram language models for document
image decoding,'' in Proceedings of IS&T/SPIE Electronic Imaging 2002:
Document Recognition and Retrieval IX, January 2002.
BibTeX entry,
PDF,
PS
This paper explores the problem of incorporating
linguistic constraints into document image decoding, a communication
theory approach to document recognition. Probabilistic character
n-grams (n=2-5) are used in a two-pass strategy where the decoder
first uses a very weak language model to generate a lattice of
candidate output strings. These are then re-scored in the second pass
using the full language model. Experimental results based on both
synthesized and scanned data show that this approach is capable of
improving the error rate by a factor of two to ten depending on the
quality of the data and the details of the language model used.
|
|
|
K. Toutanova, F. Chen, K. Popat, and T. Hofmann, ``Text classification in a
hierarchical mixture model for small training sets,'' in Proceedings of
the ACM Conference on Information and Knowledge Management (CIKM), November
2001.
BibTeX entry,
PDF,
PS
Documents are commonly categorized into hierarchies of topics, such as
the ones maintained by Yahoo! and the Open Directory project, in order
to facilitate browsing and other interactive forms of information
retrieval. In addition, topic hierarchies can be utilized to overcome
the sparseness problem in text categorization with a large number of
categories, which is the main focus of this paper. This paper presents
a hierarchical mixture model which extends the standard naive
Bayes classifier and previous hierarchical approaches. Improved
estimates of the term distributions are made by differentiation of
words in the hierarchy according to their level of
generality/specificity. Experiments on the Newsgroups and the
Reuters-21578 dataset indicate improved performance of the proposed
classifier in comparison to other state-of-the-art methods on datasets
with a small number of positive examples.
|
|
|
D. S. Bloomberg, T. P. Minka, and K. Popat, ``Document image decoding using
iterated complete path search with subsampled heuristic scoring,'' in
Proceedings of the IAPR 2001 International Conference Document Analysis and
Recognition (ICDAR 2001), September 2001.
BibTeX entry,
PDF,
PS
It has been shown that the computation time of Document Image Decoding
can be significantly reduced by employing heuristics in the search
for the best decoding of a text line. In the
Iterated Complete Path (ICP) method, template matches are performed only
along the best path found by dynamic programming on each iteration.
When the best path stabilizes, the decoding is optimal and no
more template matches need be performed.
In this way, only a tiny fraction of potential template matches
must be evaluated, and the computation time is typically dominated by
the evaluation of the initial heuristic upper-bound for each
template at each location in the image. The time to
compute this bound depends on the resolution at which the
matching scores are found. At lower resolution,
the heuristic computation is reduced,
but because a weaker bound is used, the number of Viterbi
iterations is increased. We present the optimal (lowest
upper-bound) heuristic for any degree of subsampling
of multilevel template and/or interpolation, for use in text line
decoding with ICP.
The optimal degree of subsampling depends
on image quality, but it is typically found that a small
amount of template subsampling is effective in
reducing the overall decoding time.
|
|
|
A. L. Coates, H. S. Baird, and R. J. Fateman, ``Pessimal print: a reverse
Turing test,'' in Proceedings of the IAPR 2001 International
Conference Document Analysis and Recognition (ICDAR 2001), September 2001.
BibTeX entry
We exploit the gap in ability between human and machine vision
systems to craft a family of automatic challenges that tell
human and machine users apart via graphical interfaces including
Internet browsers.
Turing proposed [Tur50] a method whereby human judges might validate
``artificial intelligence'' by failing to distinguish between human
and machine interlocutors.
Stimulated by the ``chat room problem'' posed by Udi Manber
of Yahoo!, and influenced by the CAPTCHA project [BAL00] of
Manuel Blum et al of Carnegie-Mellon Univ.,
we propose a variant of the Turing test using pessimal print:
that is,
low-quality images of machine-printed text synthesized pseudo-randomly
over certain ranges of words, typefaces, and image degradations.
We show experimentally that judicious choice of
these ranges can ensure that the images are legible to human readers
but illegible to several of the best present-day optical character
recognition (OCR) machines.
Our approach is motivated by a decade of research on performance evaluation
of OCR machines [RJN96,RNN99] and on
quantitative stochastic models of document image quality [Bai92,Kan96].
The slow pace of evolution of OCR and other species of machine vision
over many decades [NS96,Pav00] suggests that pessimal print will defy
automated attack for many years. Applications include `bot' barriers and
database rationing.
|
|
|
P. Sarkar and G. Nagy, ``Style consistency in isogenous patterns,'' in
Proceedings of the Sixth ICDAR, (Seattle, USA), pp. 1169-1174, September
2001.
BibTeX entry,
Available here
|
|
|
T. M. Breuel, ``Implicit manipulation of constraint sets for geometric matching
under translation and rotation,'' in Scandinavian Conference on Image
Analysis (SCIA 2001), (Bergen, Norway), June 2001.
BibTeX entry
|
|
|
T. M. Breuel, ``Classification by probabilistic clustering,'' in
Proceedings of the 2001 International Conference on Acoustics, Speech, and
Signal Processing (ICASSP 2001), (Salt Lake City, Utah), IEEE, May 2001.
BibTeX entry
|
|
|
K. Popat, ``Decoding of text lines in grayscale document images,'' in
Proceedings of the 2001 International Conference on Acoustics, Speech, and
Signal Processing (ICASSP 2001), (Salt Lake City, Utah), IEEE, May 2001.
BibTeX entry,
PDF,
PS
|
|
|
T. M. Breuel and K. Popat, ``Recent work in the document image decoding group
at xerox parc,'' in Proceedings of the DOD-sponsored Symposium on
Document Image Understanding Technology (SDIUT 2001), April 2001.
BibTeX entry
|
|
|
T. M. Breuel, ``Modeling the sample distribution for clustering by ocr,'' in
Proceedings of IS&T/SPIE Electronic Imaging 2001: Document Recognition
and Retrieval VIII, January 2001.
BibTeX entry,
PDF,
PS
|
|
|
T. P. Minka, D. S. Bloomberg, and K. Popat, ``Document image decoding using the
iterated complete path heuristic,'' in Proceedings of IS&T/SPIE
Electronic Imaging 2001: Document Recognition and Retrieval VIII, January
2001.
BibTeX entry,
PDF,
PS
|
|
|
K. Popat, ``Document image compression by adaptive-offset quantization,'' in
Proceedings of IS&T/SPIE Electronic Imaging 2001: Document Recognition
and Retrieval VIII, January 2001.
BibTeX entry,
PDF,
PS
|
|
|
K. Popat, D. Greene, J. Romberg, and D. S. Bloomberg, ``Adding linguistic
constraints to document image decoding: Comparing the iterated complete path
and stack algorithms,'' in Proceedings of IS&T/SPIE Electronic Imaging
2001: Document Recognition and Retrieval VIII, January 2001.
BibTeX entry,
PDF,
PS
|
|
|
H. S. Baird, ``State of the art of document image degradation modeling,'' in
Proceedings of the 4th IAPR Workshop on Document Analysis Systems (DAS
2000), (Rio de Janeiro), December 2000.
Invited plenary talk.
BibTeX entry,
PDF,
PS
The literature on models of document image degradation is
reviewed, and open problems are listed.
In response to the unpleasant fact that the accuracy of document
recognition algorithms falls drastically when image quality degrades
even slightly, researchers in the last decade have intensified their
study of explicit, quantitative, parameterized models of image defects that occur during
printing and scanning.
Several models have been proposed, some motivated by the physics of image
formation and others by the surface statistics of
image distributions.
A wide range of techniques for estimating parameters of these models
has been explored.
These models, in the form of pseudo-random generators of synthetic images,
permit, for the first time, investigations into fundamental
properties of concrete image recognition problems including
the Bayes error of problems and the asymptotic accuracy
and domain of competency of classifier technologies.
The use of massive sets of synthetic images, in the construction and
testing of high-performance classifiers, has accelerated in the last
few years.
Open problems include the search for methods for comparing competing
models and sound methodologies for the use of synthetic data in engineering.
|
|
|
T. M. Breuel, ``Layout analysis by exploring the space of segmentation
parameters,'' in Proceedings of the 4th IAPR Workshop on Document
Analysis Systems (DAS 2000), December 2000.
BibTeX entry,
PDF,
PS
|
|
|
K. Popat, D. Bloomberg, and D. Greene, ``Adding linguistic constraints to
document image decoding,'' in Proceedings of the 4th IAPR Workshop on
Document Analysis Systems (DAS 2000), December 2000.
BibTeX entry,
PDF,
PS
|
|
|
T. Kanungo, R. M. Haralick, H. S. Baird, W. Stuezle, and D. Madigan, ``A
statistical, nonparametric methodology for document degradation model
validation,'' IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 22, pp. 1209-1223, November 2000.
BibTeX entry
Printing, photocopying, and scanning processes degrade the
image quality of a document. Statistical models of these degradation
processes are crucial for document image understanding research. Models
allow us to predict system performance, conduct controlled experiments to study the breakdown points of the systems, create large multilingual data sets
with groundtruth for training classifiers, design optimal noise removal
algorithms, choose values for the free parameters of the algorithms,
and so on. Although research in document understanding started many decades
ago, only two document degradation models have been proposed thus far.
Furthermore, no attempts have been made to statistically validate these
models. In this paper, we present a statistical methodology that can
be used to validate local degradation models. This method is based on a
nonparametric, two-sample permutation test. Another standard statistical
device-the power function-is then used to choose between algorithm
variables such as distance functions. Since the validation and the power
function procedures are independent of the model, they can be used to
validate any other degradation model. A method for comparing any two
models is also described. It uses p-values associated with the
estimated models to select the model that is closer to the real world.
|
|
|
T. M. Breuel, ``Handwriting recognition on us census forms,'' in
Mathematical Morphology and its applications to image and signal processing:
Proceedings of the Fifth International Symposium on Mathematical Morphology
(ISMM 2000), June 2000.
Invited Plenary Talk.
BibTeX entry
|
|
|
K. Popat and D. S. Bloomberg, ``Two-stage lossy/lossless compression of
grayscale document images,'' in Mathematical Morphology and its
applications to image and signal processing: Proceedings of the Fifth
International Symposium on Mathematical Morphology (ISMM 2000), June 2000.
BibTeX entry,
PDF,
PS
|
|
|
H. S. Baird and F. Chen, ``Document image retrieval.'' Special Issue of
Information Retrieval journal, Vol. 2, Nos. 2/3, May 2000.
BibTeX entry
We are pleased to offer a selection of research papers on current
topics at the interface between the information retrieval (IR) and
document image analysis (DIA) fields. These two R&D fields have
benefited from mutual awareness and interdisciplinary collaboration
since 1992, when they were consciously brought together for this
purpose by Dr. Thomas Nartker and the staff of the Information
Sciences Research Institute of the University of Nevada in an influential
series of annual Symposia on Document Analysis and Information Retrieval
(1992-1996, Las Vegas). Since then, similar purposes have been
served by the gradually expanding annual SPIE Conference on Document
Recognition and Retrieval (1992-present, San Jose). We hope that
our special issue may also play a constructive role in stimulating
and amplifying substantive technical interactions between our two
communities.
Classical IR methods commonly operate on passages of plain,
correctly encoded text. If however that text is represented by
images of paper documents, then fresh technical issues and
opportunities for innovative solutions arise both for IR and DIA
researchers. These issues and the approaches that have been used to
resolve them are enumerated in M. Mitra and B.B. Chaudhuri's
critical survey of the literature on information retrieval from
document images.
Limitations of current DIA technology can compromise text-based
IR results. In particular, text extracted by machine vision means
(e.g. OCR systems) is usually corrupt to some degree. Paul B. Kantor
and Ellen M. Voorhees provide an overview of a TREC (Text REtrieval
Conference) track that evaluates the impact of this fact on text-based
IR performance, comparing different retrieval methods operating on page
images to which OCR techniques have been applied.
For DIA researchers, location and recognition of text in complex
document images remains a central challenge. With the advent of the
Internet have come document representations (e.g. HTML) which combine
encoded text and text-as-image (e.g. GIF, JPEG). As rendered by browsers,
it is often not apparent to the human reader when text-as-image is
employed - yet it is natural to want to search on it as easily as
on encoded text. Web authors often incorporate text-as-image
representations for visual appeal, but the open-ended variety in design
that this leads to increases the challenge in locating and recognizing
such text. Daniel Lopresti and Jiangying Zhou describe one of the
earliest sustained attempts to put text-as-image on an equal footing
with encoded text in WWW images for retrieval purposes.
Classical IR systems employ a number of techniques, such as the
bag-of-words model which is of course oblivious to reading order; also,
in domain-specific IR, custom non-stop-word lexica may be used.
Within the DIA world, it is notoriously harder to preserve reading order
than to isolate individual words, and it is sometimes easier to spot
whole-word images belonging to a constrained lexicon than to provide
exhaustive character-by-character transcriptions. Thus IR methods can
inform DIA strategies: for example, spotting words from a constrained
lexicon for use in a bag-of-words model. William J. Williams, Eugene
J. Zalubas, and Alfred O. Hero, III discuss research motivated in this
way, to spot content words reliably in low-quality document images
captured by FAX machines.
Textual content that appears in printed pages may be provided with
significant contextual clues, for example in the organization of the page
layout, in typographic design choices, and in proximity to images or graphics.
Such non-textual ``meta-data'' is often multi-dimensional and multimedia.
We are pleased to present two papers focused on this rapidly advancing
frontier of DIA research. Jianying Hu, Ramanujan Kashi, and Gordon Wilfong
describe methods for the comparison and classification of documents based
not on textual contents but on page-layout similarities. As with many IR
methods, their image-based approach supports not only retrieval but
categorization and information extraction. Rohini K. Srihari, Zhongfei Zhang,
and Aibing Rao discuss the indexing and retrieval of documents in an even
broader setting, where the documents themselves are multimodal, that is
containing both text and images of natural scenes. In this task, it is
the images, not the text, which are the targets of retrieval. Text is
both located and recognized, but principally as collateral data accompanying
the sought-for images.
The six papers of this special issue are representative, we believe, of
the best current research at the boundary between the IR and DIA fields,
but they do not exhaust its potential topics. For some retrieval purposes
it is best to operate directly on the document image - even compressed
forms of the image - rather than on its textual content, for example to
identify nearly identical documents or search for semi-textual ``terms''
such as logos or handwritten annotations and signatures. Categorization
of (multi-page) document images, as a prelude to retrieval or for other
purposes, is a significant area of current work only glancingly touched
on in these papers. The DIA community's long experience in two-dimensional
and multi-modal content analysis deserves to find application in text-based
IR (e.g. tabular data, HTML). And, surely, insights that arose first in
text-based retrieval will continue stimulate innovative image-based
document processing.
We are grateful for the patience of the authors during a long-protracted
review process, for the thoughtful service of two dozen anonymous
referees, and for David S. Doermann's generously offered advice in the
early stages of planning for this issue. We thank the Editors of the
IR Journal for inviting us to assemble the issue. We are confident
that its themes will remain provocative and thus that it will be succeeded
by other special issues, in this and other journals, similarly focused
on the boundary between information retrieval and document image analysis.
|
|
|
P. Sarkar and G. Nagy, ``Classification of style-constrained pattern-fields,''
in Proceedings of the fifteenth ICPR, (Barcelona), pp. 859-862, IEEE
Computer Society Press, 2000.
BibTeX entry,
Available here
|
|
|
P. Sarkar, Style consistency in pattern fields.
PhD thesis, Rensselaer Polytechnic Institute, Troy, NY, 2000.
BibTeX entry,
Available here
|
|
|
T. Berger, P. Chou, M. Effros, N. Farvardin, T. Fischer, W. R. Gardner, R. M.
Gray, N. S. Jayant, R. Laroia, U. Madhow, M. W. Marcellin, J. W. Modestino,
D. L. Neuhoff, A. Orlitsky, K. Popat, K. Ramchandran, J. A. Storer,
V. Vaishampayan, K. Zeger, and Z. Zhang, ``Workshop report: NSF sponsored
workshop on joint source-channel coding,'' tech. rep., California Institute
of Technology, October 1999.
BibTeX entry,
PDF,
PS
|
|
|
H. S. Baird, ``Document image quality: Making fine discriminations,'' in
Proceedings of the IAPR 1999 International Conference on Document Analysis
and Recognition (ICDAR 1999), (Bangalore, India), pp. 459-462, September
1999.
BibTeX entry,
PDF,
PS
We estimate empirically the smallest changes in document image quality
that can be distinguished reliably and fully automatically by Kanungo's
bootstrapping method [Kan96].
Six parameters of a physics-based document-image degradation model [Bai92]
are varied, one at a time: for each, over a range of parameter-value
differences, two sets of synthetic images are generated pseudorandomly
and the two sets tested for statistical equivalence using Kanungo's method.
The rate at which Kanungo's method rejects the hypothesis that the two
sets are drawn from the same distribution is analyzed as a function
of parameter difference (a specialized ``power function'').
The finest discriminations afforded by the method are given
by the width of the power function at a low fixed reject threshold.
The data show that remarkably fine discriminations are possible
- often subtler than are evident to visual inspection -
for all six parameters.
As few as 25 reference images are sufficient.
These results suggest that Kanungo's method is sufficiently sensitive
to a wide range of physics-based image degradations to serve as
an engineering foundation for many image-quality estimation and
OCR engineering purposes.
|
|
|
P. Sarkar and G. Nagy, ``Heeding more than the top template,'' in
Proceedings of the Fifth International Conference on Document Analysis and
Recognition, (Bangalore, India), September 1999.
BibTeX entry,
Available here
|
|
|
H. S. Baird, ``Model-directed document image analysis,'' in Proceedings of
the DOD-sponsored Symposium on Document Image Understanding Technology (SDIUT
1999), (Annapolis, MD), April 1999.
Invited published talk.
BibTeX entry,
PDF,
PS
If current OCR engineering trends continue, then, we believe,
so-called ``general-purpose'' systems - that is, fully automatic and
nonretargetable systems -
will leave many potential users unsatisfied, and lucrative
application niches unfilled, for years to come.
However, for users who care enough to volunteer some manual effort -
to help customize the system to their document(s) - significantly higher
accuracy may be achievable, without delay.
We discuss in detail two state-of-the-art document recognition systems
- Lucent Technologies' Table Reader System (TRS)
and Xerox's ``document image decoding'' (DID) research prototype -
which yield high accuracy by reliance on explicitly stated models
of properties of the target document, whether iconic (known
typefaces and image degradations), geometric (restricted classes of
layouts), or symbolic (linguistic and pragmatic contextual constraints).
How great are the performance advantages that can be realized by sacrificing
automation in these ways? To what extent can the necessary customizations
be (semi-)automated? We outline recent and planned research at Xerox PARC
motivated by these questions.
|
|
|
G. Nagy and P. Sarkar, ``Modeling statistical dependence in pattern
classification,'' in Proceedings of the IAPR Workshop on Statistical
Methods for Image Processing, (Uppsala), 1999.
BibTeX entry
|
|
|
J. Kanai and H. S. Baird, ``Document image understanding and retrieval.''
Special Issue of Computer Vision and Image Understanding journal, Vol. 70,
No. 3, June 1998.
BibTeX entry
Vast archives of information, handwritten and machine printed on paper, have accumulated over centuries. Advances in computer and
communication technologies now offer drastically improved ways to store,
retrieve, and distribute their contents. Billions of paper documents wait
to be made accessible via electronic media.
Document image understanding and retrieval research seeks to discover
methods for automatically extracting and organizing information from
handwritten and machine printed paper documents containing text, line
drawings, maps, music scores, etc. Its characteristic problems
include some of the earliest attacked by computer-vision pioneers.
The field has long been distinguished by close and productive ties
between the academic and commercial communities. Today, document analysis
research supports a viable industry which, stimulated by the growing
demand for digital archives, the proliferation of inexpensive personal
document scanners, and the ubiquity of FAXes, is poised for rapid growth.
But the performance of these technologies still lags far behind human
abilities. Many technical problems, critically important on both
theoretical and practical grounds, remain open.
We are pleased to offer a collection of state-of-the-art papers
touching on topics of current research interest. We begin with
Doermann's up-to-date critical survey of the literature
on document image retrieval, which reveals the rich interplay between the
document analysis and information retrieval research communities.
One example of this genre is the strikingly versatile language-independent
text categorization system described by Bayer, Kressel, Mogg-Schneider,
and Renz.
Chen and Bloomberg show that English-language textual document images
can be summarized without any resort to image pattern recognition
(this won the Outstanding Paper award at the 1997 IAPR International
Conference on Document Analysis and Recognition).
Such surprising instances of non-trivial yet ``OCR-free''
document processing may be harbingers of a new generation of
architectures for document analysis systems.
Document images are usually compressed before being exchanged and
archived. It is sometimes possible to analyze compressed document
images without fully decompressing them. Spitz demonstrates that non-trivial
characteristics, such as skew angles and specially designed logos, can
be extracted directly and extremely rapidly from images compressed by
the CCITT Group III and IV methods. Kia, Doermann, Rosenfeld, and
Chellappa provide a
compression technique for document images which explicitly enables
such ``compressed-domain'' processing, as one of several improvements
to a symbolic-compression system.
The great variety of geometric arrangements of text blocks on printed pages
poses daunting challenges. Antonacopoulos' fast `white-space tiling'
method copes well with an unusually wide range of skewed, non-rectangular
layouts.
Kise and Sato, motivated by similar goals and similarly choosing to
analyze the white background, prove that methods based on area
Voronoi diagrams are also effective - an example of the continuing
relevance of computational geometry to document analysis.
Of course the generic problem of segmentation - the partitioning of complex
images into regions which we can more easily recognize or analyze further -
pervades the field. Hu and Yan attempt the segmentation of handwriting,
in off-line (static) images, into individual characters.
Hidden Markov model (HMM) techniques, having been applied with notable
success in speech recognition, are increasingly being adapted to
selected sub-problems in document analysis. Knerr, Augustin, Baret,
and Price apply HMMs to word recognition in handwritten checks.
The robust detection of `graphical primitives' such as straight lines
and circular arcs is an inescapable subtask in graphics recognition.
Wenyin and Dori present a painstaking study of software-engineering
aspects arising in algorithms for this purpose. Ogier, Mullot, Labiche,
and Lecouter give an architectural tour of a complete system for the
knowledge-guided interpretation of city maps, putting to use some general
principles of human visual perception. This is one of many experiments
within our field - still cautiously exploratory - in the exploitation
of cognitive science.
These eleven papers were subjected to the exhaustive CVIU process of review
and revision. We would like enthusiastically to thank our twenty-seven highly
professional referees for admirable devotion to their anonymous duties,
and to the authors for responding to the referees' advice gracefully
and thoroughly. Finally, but no less ardently, we are grateful to
Editor-in-Chief Avi Kak for his kind invitation to us to assemble these
papers, and to him and Karen Rado and other CVIU Editorial Office staff for their
unfailing support and advice during our protracted labor.
We hope that this Special Issue will stimulate greater understanding,
mutual interest, and collaboration between the computer vision and
document image analysis research communities. Only a decade ago these
now divergent communities were unified. We continue to share
our most strongly held aspiration: to build machines able to infer
complete and highly accurate interpretations of the contents of complex
images - whether scenes of the 3-D physical world or the `visible
speech' of 2-D documents.
|
|
|
T. K. Ho and H. S. Baird, ``Pattern classification with compact distribution
maps,'' Computer Vision and Image Understanding, vol. 70, pp. 101-110,
March 1998.
BibTeX entry
A difficult problem in classification is representing the
the class-conditional distributions concisely and faithfully. We propose a
way of mapping such distributions and its use in constructing a similarity
metric. A classifier using this metric can achive low error rates and useful
confidence scores permitting reliable reject behavior. We illustrate the method
by an application in a challenging character recognition problem with
thousands of classes. For applications to arbitrary domains, we present a
method to automatically construct feature transformations that are suitable
for usch mappings.
|
|
|
P. Sarkar, G. Nagy, J. Zhou, and D. Lopresti, ``Spatial sampling of printed
patterns,'' IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 20, pp. 344-351, March 1998.
BibTeX entry,
Available here
|
|
|
G. Nagy, A. Samal, S. Seth, T.Fisher, E. Guthman, K. Kalafala, L. Li,
P. Sarkar, and Y. Xu, ``A prototype for adaptive association of street names
with streets on maps,'' in Graphics Recognition: Algorithms and Systems
(K. T. . A. Chhabra, ed.), vol. 1389 of Springer Lecture Notes in
Computer Science, pp. 302-313, 1998.
BibTeX entry
|
|
|
T. K. Ho and H. S. Baird, ``Large-scale simulation studies in image pattern
recognition,'' IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 19, pp. 1067-1079, October 1997.
BibTeX entry
Many obstacles to progress in image pattern recognition
result from the fact that the per-class distributions are often too irregular
to be well-approximated by simple analytical functions. Simulation studies
offer one way to circumvent these obstacles. We present three closely related
studies of machine-printed character recognition that rely on synthetic
data generated pseudorandomly in accordance with an explicit stochastic
model of document image degradations. The unusually large scale of experiments
- invloving several million samples - that this methodology makes possible
has allowed us to compute sharp estimates of the intrinsic difficulty
(Bayes risk) of concrete recognition problems, as well as the asymptotic
accuracy and domain of competency of classifiers.
|
|
|
J. Y. Zhou, D. Lopresti, P. Sarkar, and G. Nagy, ``Spatial sampling effects on
scanned 2-d patterns,'' in Advances in Visual Forms Analysis
(C. Arcelli, L. P. Cordella, and G. S. di Baja, eds.), Singapore: World
Scientific, 1997.
BibTeX entry,
Available here
|
|
|
D. Lopresti, J. Zhou, G. Nagy, and P. Sarkar, ``Spatial sampling effects in
optical character recognition,'' in Proceedings of the Third
International Conference on Document Analysis and Recognition, pp. 309-314,
1995.
BibTeX entry,
Available here
|
|
|
G. E. Kopec and P. A. Chou, ``Document image decoding using Markov source
models,'' IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 16, pp. 602-617, June 1994.
BibTeX entry,
PDF,
PS
|
|
|
P. Sarkar, ``Random phase spatial sampling effects in digitized patterns,''
Master's thesis, Rensselaer Polytechnic Institute, 1994.
BibTeX entry
|
|
|
T. Kanungo, H. S. Baird, and R. M. Haralick, ``Performance evaluation: Theory,
practice, and impact.'' Special Issue of Int'l J. on Document Analysis and
Recognition.
In Press, November 2001.
BibTeX entry
The document image analysis research community has been
distinguished for over a decade by a serious and sustained
commitment to sound methodologies for measuring the
performance of algorithms and systems. Objective, quantitative,
and standardized performance evaluation methods are essential
aids in our attempts to understand the behavior of our
systems, predict their future performance, compare rival systems,
identify the particular strengths and weaknesses of proposed
technologies, and track the progress of our community's research
achievements from year to year. We feel that the time is ripe
to offer, in a journal special issue, a selection of the strongest
papers having, as their principal theme, performance evaluation
theory, practice, or impact in a large-scale application.
The manuscripts submitted were reviewed by highly qualified expert
referees in a thorough two-stage review procedure.
The articles that we have been happy to accept all enjoy a combination
of originality, high technical merit, and clear relevance to the topic.
Evaluating geometrical page-layout segmentation algorithms is a
challenging task, in part due to the diversity of metrics that have
been plausibly proposed for measuring the similarity of two segmentations.
J. Hu, R. Kashi, D. Lopresti, and G. Wilfong discuss a methodology
for evaluating systems that extract tables from document images in
their article ``Evaluating the performance of table processing
algorithms.'' One of their innovations, applicable to a wide variety
of layout segmentation tasks, is to probe two
``table graphs'' at random, counting similarities and dissimilarities,
to accumulate a statistical measure of match.
In ``Large scale address recognition systems - truthing, testing,
tools and other evaluation issues,'' S. Setlur, A. Lawson,
V. Govindaraju, and S. Srihari describe the methodology they used for
evaluating, on a stunningly large scale, a USPS postal address
recognition system. Their methodology samples a live stream of
postal images to create ground-truthed images and evaluates the
system using encoding rate and error rate metrics.
When, as not infrequently happens, the number of original ground-truth
documents available for experimentation is severely limited, performance
evaluation results, however carefully calculated, can be inaccurate
and misleading.
In their article ``A statistical approach to the generation of a
database for evaluating OCR software,'' F. S. Brundick, A. E. M. Brodeen,
and M. S. Taylor propose a bootstrapping approach to the generation
of sufficiently large databases of synthetic ground-truthed documents.
These documents can then be printed and scanned to acquire test images
each, by construction, corresponding to known ground truth.
Contributing an interesting variation to the large literature on
applications of string-matching algorithms
to DIA problems, C. Fang, C. Liu, L. Pent, and X. Ding present
a specialized algorithm, in ``Automatic performance evaluation of
printed Chinese recognition systems,'' that assists researchers in
the evaluation and characterization of character-segmentation errors.
In ``An empirical measure of performance of document image segmentation
algorithm,'' A. K. Das, S. K. Saha, and B. Chanda argue for a new
graph-based
evaluation metric for page-layout segmentation algorithms, and provide
an algorithm to compute the metric.
Statistical classifiers form an integral part of many document image
analysis systems.
In their article ``Performance evaluation of pattern classifiers for
handwritten character
recognition,'' Liu, Sako and Fujisawa compare the performance of
well-known statistical
classifiers as a function of training sample size, outlier resistance, and
ambiguity rejection.
Agreement within the research community on standardized metrics, datasets,
and software tools is of course an essential foundation for the most
effective use of evaluation methodology.
S. Mao and T. Kanungo, in
``Software architecture of PSET: A page segmentation evaluation
toolkit,'' describe in detail the rationale and architecture of
public-domain
software tools offered for use in the evaluation of a broad class of
page-layout
segmentation algorithms.
We would like to thank all the authors who submitted manuscripts to this
special issue. We regret that only a fraction of the submissions, all
of them interesting, could be included. We owe a special debt of
gratitude to the many able reviewers who generously commented, often
is extraordinary detail, on the submissions. Finally, we would like to
acknowledge the good-hearted patience of the authors and of the journal's
editors in spite of this special issue's long gestation.
|
|