Location: Board & Comms : I-Share Users' Group : I-Share OPAC Team : Spellchecker Proposal
Spellchecker Proposal

July 16, 2007

The I-Share OPAC Team feels strongly that a spellchecking tool of some kind should be incorporated into the catalog. It has been reported that spelling errors in search terms result in a large number of failed searches.  Texas A&M University found, for instance, that 45% of the title searches executed in their Voyager catalog resulted in a null set, in large part (but not entirely) due to spelling errors. Often without knowing why they fail, users abandon their searches and move on. This leads users to be both frustrated and unsuccessful in their attempts to find and use our resources for research.

After a cursory examination of other libraries’ implementations of some currently available spellchecking tools, such as Jaunter’s Lucien, Google’s spellchecker, Yahoo’s spellchecker, and aspell, the committee has agreed upon certain characteristics of an ideal spellchecker for I-Share.

In general, we feel an effective spellchecking tool would:

  • Catch common spelling errors such as
    • letter transpositions (e.g., "untied states" )
    • inadvertent spaces between parts of words (e.g., s mith), hyphenated or two-word phrases,  and compound word mistakes (e.g., full-text or full text vs. fulltext)
    • intelligent treatment of phrases (‘vendi vini vici’ would suggest ‘veni vidi vici’, not ‘vend vidi vici’)
  • Automatically perform basic stemming of words (looking for alternate endings)
  • Detect possible errors in both Quick and Advanced searches, even when the search is entered via multiple input fields, or when the misspelling occurs in one word of a multi-word search argument
  • Behave differently depending on the type of search performed
    • For browse searches with no matches, the spellchecker should jump the user to the appropriate spot in the index, and also make “did you mean?” suggestions
      • For author browses, even if a match is found, common alternate spellings of similar names should be suggested in addition to delivering the user to the matched name
  • Be invoked, generally, only if spelling errors are made
    • If keyword searches result in at least one match, the spellchecker should only make “did you mean?” suggestions if the search used a common misspelling (e.g., harrassment)
    • Should not make suggestions on searches with only correct spellings
    • Should return results, regardless, for exactly what was searched, along with “did you mean?” suggestions, rather than making assumptions about what the user was actually searching for
  • Permit us to use our own indexes as spellchecking “dictionaries,” and allow us to use different indexes for different types of searches (e.g., the author index for author browses, but not for title searches).  Early results from an examination of Urbana-Champaign’s search logs seem to indicate words that frequently are misspelled and not corrected by the user are either names or foreign phrases.  For example, Bhagavad-Gita in one search was misspelled several times as Bhagvada-Gita. These mistakes could be caught by being able to use our own indexes or possible inclusion of multiple dictionaries.
  • Spell check using multiple language dictionaries since our collections are multi-lingual.
  • Use of a double-metaphone algorithm or similar phonetic algorithm to increase accuracy would be desired.  May be difficult to check for.

We also feel that an effective spellchecker must not negatively impact catalog response time, as this would further discourage users in their searching.

The I-Share OPAC Team has not been able to identify a commercial product or implementation which can provide a spell check in both the Quick and Advanced Search. However, until such a product is found or developed, providing a spell checker for Quick Search would still represent a service improvement. The Quick Search is the default search page for most I-Share libraries and is more heavily used than Advanced Search.
Currently several implementations exist that can provide a spelling check for the Quick Search. Each has a varying degree of customization and person-hours needed to incorporate it into a production system and would need closer review by those in charge of implementing this feature.

Search


November 2009      

Sun
Mon
Tue
Wed
Thu
Fri
Sat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     

Board

Executive Committee
Finance Committee
Personnel Committee
Products and Services Vetting Committee
Program Planning Committee

Collections Working Group
E-resources Working Group
Preservation Working Group
Public Services Working Group

Digital Collections Users
I-Share Users
I-Share Acquisitions & Serials Team
I-Share Cataloging & Authority Control Team
I-Share Instruction Team
I-Share OPAC Team
I-Share Resource Sharing Team

Illinois Repository Task Force
Image Database Task Force
Learning Objects Task Force
Library Tools Task Force
Seamless Access Task Force
Statewide Serials Collection Task Force