Style consistency in pattern fields

Prateek Sarkar

Abstract

In many pattern classification tasks, groups or {\em fields} of patterns have a common origin. Words, lines or documents in the same font, by the same writer or printed with the same device are examples of fields of characters. Measurements or observations on different patterns (pattern-features) that co-occur in a field bear traits of the common origin. They are therefore not independently distributed, but related through an underlying {\em style} that can be attributed to the origin. Traditionally, patterns in a field are classified without modeling such inter-pattern feature dependence. In {\em style-conscious classification} we classify entire fields at a time so that each pattern can affect and guide the classification of other patterns in the field by furnishing information about the underlying style, in addition to any linguistic context. Pattern-features are concatenated to form field-features. We model field-feature distributions of each field-class as a mixture of style-conditional distributions, weighted by style-probabilities. Within each style we model the field-feature distribution as an independent combination (product) of the pattern-feature distributions. We implement style-conscious classification using Gaussian basis functions. Classifier parameters are estimated from training fields that are not labeled by style, using Expectation-Maximization, where the contribution of a pattern to each Gaussian constituent is weighted by both the probability of the constituent given the pattern-feature, and the probability of the style given the field-feature. On five-font, ten-class machine-printed Arabic numerals with simple moment features, style-conscious classification reduced errors by nearly 25\%. On ten-class, hand-printed numerals, systems trained on 150 writers reduced pattern-errors by up to 19\% on a test-set comprising 150 new writers. We show, by simulation, that longer fields, well separated class-distributions, fields composed of more than one class, and inter-style class confusions, all favor style-conscious classification. Other candidate applications of style-conscious classification are printed text, handwriting and speech recognition.

Download thesis

PostScript (9M) / gzipped (1M) / Slides

Bibtex entry

@phdthesis{sarkar:phdthesis
, author = "P. Sarkar"
, title = "Style consistency in pattern fields"
, school = "Rensselaer Polytechnic Institute"
, address = "Troy
, year = "2000"
}
Prateek Sarkar
Last modified: Wed Mar 7 16:44:51 PST 2001