Quantitative Content
Analysis Area

QCA       ISTL PARC

Selected Publications

Authoritativeness Estimation

Ayman Farahat, Geoff Nunberg, Francine Chen, and Charles Mathis, "AuGEAS(AUthoritativeness Grading, Estimation, and Sorting)," Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) , 2002, pp. 194-202.

Topic Analysis

Thorsten Brants, Francine Chen, Ioannis Tsochantaridis, "Topic-Based Document Segmentation with Probabilistic Latent Semantic Analysis," Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) , 2002, pp 211-218.

Thorsten Brants, Francine Chen, Ayman Farahat, "Arabic Document Topic Analysis," Proceedings of Arabic Language resources and Evaluation: Status and Prospects (LREC-2002 ), Las Palmas, Spain, 2002.

Eric Gaussier, Cyril Goutte, Kris Popat, Francine Chen, A hierarchical model for clustering and categorising documents, Proceedings of 24th European Colloquium in IR research, 2002.

Kristina Toutanova, Francine Chen, Kris Popat, Thomas Hofmann, Text classification in a hierarchical mixture model for small training sets, Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2001.

Multi-Modal Scatter/Gather

Francine Chen, Ullas Gargi, Les Niles, Hinrich Schuetze, Multi-modal browsing of images in web documents, Proceedings of SPIE Document Recognition and Retrieval VI, Vol. 3651, pp. 122-133, 1999.

Statistical Natural Language Processing

Christopher Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing , MIT Press, Cambridge MA, 1999. (web site)

Clustering

Hinrich Schütze, The hypertext concordance: a better back-of-the-book index. Proceedings of Computerm '98, pages 101-104, 1998.

Hinrich Schütze and Craig Silverstein, Projections for efficient document clustering, Proceedings of SIGIR, pages 74-81, 1997.

Document Image Summarization

Francine R. Chen and Dan S. Bloomberg, Extraction of indicative summary sentences from imaged documents, Proceedings of the IAPR 1997 International Conference on Document Analysis and Recognition (ICDAR97 ), pp. 227-232, Aug 1997.

Dan S. Bloomberg and Francine R. Chen, Document image summarization without OCR Proceedings of the IEEE International Conference on Image Processing (ICIP96)

Francine R. Chen and Dan S. Bloomberg, "Summarization of imaged documents without OCR", Computer Vision and Image Understanding Vol. 70, no. 3, pp. 307-320, 1998.
( abstract )

Genre

Brett Kessler, Geoff Nunberg, and Hinrich Schütze, Automatic Detection of Genre, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Meeting of the European Chapter of the Association for Computational Linguistics, pages 32-38, Morgan Kaufmann Publishers, San Francisco CA, 1997.

The Future of the Book

Geoff Nunberg (editor), The Future of the Book, University of California Press, 1996.

Text Summarization

Julian Kupiec, Jan Pedersen and Francine Chen, A trainable document summarizer , SIGIR95. and in Advances in Automatic Text Summarization, Mani and Maybury, eds., 1999.

Scatter/Gather

Marti A. Hearst, David R. Karger, and Jan O. Pedersen, Scatter/Gather as a Tool for the Navigation of Retrieval Results Access, Proceedings of the 1995 AAAI Fall Symposium on Knowledge Navigation

D. Cutting, D. Karger, and J. Pedersen, Constant interaction-time scatter/gather browsing of large document collections, Proceedings of SIGIR'93.

D. Cutting, D. Karger, J. Pedersen, and J.W. Tukey, Scatter/Gather: A cluster-based approach to browsing large document collections, Proceedings of SIGIR'92.

Word Sense Disambiguation

Hinrich Schütze, Automatic Word Sense Discrimination, Computational Linguistics, Volume 24, Number 1, pages 97-124, 1998. (abstract)

Hinrich Schütze and Jan O. Pedersen, Information Retrieval Based on Word Senses, Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 161-175, Las Vegas NV, 1995.

Hinrich Schütze, Dimensions of Meaning, Proceedings of Supercomputing, pages 787-796, Minneapolis MN, 1992.

Classification

David Hull, Jan O. Pedersen and Hinrich Schütze, Method Combination for Document Filtering, Proceedings of SIGIR, pages 279--298, 1996.

Hinrich Schütze, David Hull and Jan O. Pedersen, A Comparison of Classifiers and Document Representations for the Routing Problem, Proceedings of SIGIR, pages 229-237, 1995.

Marti A. Hearst and Hinrich Schütze, Customizing a Lexicon to Better Suit a Computational Task, In Branimir Boguraev and James Pustejovsky, "Corpus Processing for Lexical Acquisition", MIT Press, Cambridge MA, to appear.

Word-Image Spotter

Francine Chen, Dan Bloomberg, and Lynn Wilcox Detection and location of multi-character sequences in lines of imaged text , Journal of Electronic Imaging, Jan 1996.

Speaker Segmentation

Lynn Wilcox, Francine Chen, Don Kimber, and Vijay Balsubramanian, Segmentation of speech using speaker identification , Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP94)

Question Answering

Julian Kupiec, MURAX: A Robust Linguistic Approach For Question-Answering Using An On-Line Encyclopedia, Proc. SIGIR-93, Pittsburgh, PA, June 27 - July 1, 1993, pp. 181-190.

Context-Free Grammars

Julian Kupiec, An Algorithm for Estimating the Parameters of Unrestricted Hidden Stochastic Context-Free Grammars, Proceedings of the 15th International Conference on Computational Linguistics (COLING-92), Nantes, France, Aug 23-28, 1992, pp 387-393.
PARC, 3333 Coyote Hill Road, Palo Alto, CA 94304