Interactive Information Services Using World-Wide Web Hypertext - (next, previous section)

A Folk Music Database

The Digital Tradition Folk Song Server is a World-Wide Web application which provides an interactive interface to a database of 4000 songs. It provides full text search of lyrics, category keyword search, and audio retrieval of melodies [Putz2]. The original Digital Tradition database is distributed as a single user standalone application for the IBM-PC. For the WWW version, the song lyrics and melodies were converted to a simple ASCII format, and a Plexus HTTP server module was created using the perl scripting language [Wall1].

Data Representation

Each song has a title, lyrics, and usually one or more category keywords such as "ballad" or "sailor". Most of the songs also have at least one melody stored in the database, represented in a simple ASCII music notation. The lyrics and melodies are stored in two large files, each with a companion table of contents file. Since the total amount of text is less than 6 million characters, a typical full text search can be performed in about four seconds, assuming the text is stored on a local hard disk. Title and keyword searches can be performed in a fraction of a second, since the server keeps the list of titles and keywords in memory. While this kind of linear scan would be too slow for much larger amounts of data, there are many applications for which it is well suited. A more sophisticated inverted word index would be more appropriate for a much larger database.

Digital Tradition User Interface

The Digital Tradition server provides different kinds of information displays depending on what information is being presented. Each display is presented as a formatted HTML document with hypertext links. The two main display formats are a search query and results page and a song lyrics display. There are also lists of category keywords, song titles and song tunes which allow browsing the database without performing a search.

Full Text Searches

This application provides a word-based text search capability. Although the underlying implementation uses regular expressions to find occurrences of search terms, a much simpler query syntax is provided. While regular expressions can be very powerful for specifying search patterns, most end users are unprepared to deal with the confusing and arcane syntax of regular expressions. And the vast majority of users' searches are for simple word or phrase combinations anyway.

The server accepts queries using the standard HTTP convention of ending a URL with a question mark ("?") character followed by the query terms. Adjacent words in a query are interpreted as a phrase (i.e. the words must occur together in the specified order for a match to occur). When the special character "&" is used between query terms (indicating an AND operation), the terms may occur separated and in any order. When the special character "|" is used between terms (indicating an OR operation), only one of the terms need occur for a match. Normally each word in a query must exactly match an entire word in the text (ignoring punctuation and case). However the special character "*" will match any sequence of letters of digits. This is most useful for matching variations of a word, such as singular or plural endings.

Category Keyword and Title Searches

Most songs in the database are categorized with one or more keywords used to group songs by topic or genre. All lyrics and title searches also check each song's list of keywords. It is also possible to search explicitly for keywords by using an "@" character as a prefix (e.g. "@sailor"). This is useful for keywords (such as "father") which occur frequently in lyrics on other topics. This categorization scheme and the use of "@" to mark category keywords was inherited from the original PC-based version of the database, and it works well with the regular expression searches used by the WWW server.

As as alternative to the full text search capability, searches may optionally be performed on just the song titles and keywords. The same query syntax is used as for full text searches. The user can set a display option to specify whether each song's category keywords are listed under the titles.

Search Terms Displayed in Context

Using HTML for presentation of search query results provides a great deal of flexibility. Using the presentation markup capabilities of HTML, the Digital Tradition server shows lines of text that match a search query below each song title with the search terms highlighted. When the full text is displayed, the search terms are highlighted there are well. See the search results and song lyrics examples.

Audio Output

Currently, the only widely supported format for delivering music via World-Wide Web is as telephone quality sampled digital audio (8-bit u-law). When a client requests a melody in audio format (by selecting a link), the server converts the stored note list into audio samples using a public domain music software package called Csound [Verc1].

This is a grossly inefficient method of sending simple tunes across a wide area network. A possible improvement is to provide the melodies to clients in Standard MIDI Format, as software MIDI emulators are now becoming available.

Using Hypertext for Documentation and Examples

Authored hypertext documents can be combined with the interactive service in useful ways. Links from the interface point to a variety of pages which use HTML hypertext for documentation about the application. Some of the documentation pages in turn include examples with links that perform sample database queries.

(next section)