Concept search techniques were developed because of limitations imposed by classical Boolean keyword search technologies when dealing with large, unstructured digital collections of text. Keyword searches often return results that include many non-relevant items (false positives) or that exclude too many relevant items (false negatives) because of the effects of synonymy and polysemy. Synonymy means that one of two or more words in the same language have the same meaning, and polysemy means that many individual words have more than one meaning.
Polysemy is a major obstacle for all computer systems that attempt to deal with human language. In English, most frequently used terms have several common meanings. For example, the word fire can mean: a combustion activity; to terminate employment; to launch, or to excite (as in fire up). For the 200 most-polysemous terms in English, the typical verb has more than twelve common meanings, or senses. The typical noun from this set has more than eight common senses. For the 2000 most-polysemous terms in English, the typical verb has more than eight common senses and the typical noun has more than five.
In addition to the problems of polysemy and synonymy, keyword searches can exclude inadvertently misspelled words as well as the variations on the stems (or roots) of words (for example, strike vs. striking). Keyword searches are also susceptible to errors introduced by optical character recognition (OCR) scanning processes, which can introduce random errors into the text of documents (often referred to as noisy text)during the scanning process.
A concept search can overcome these challenges by employing word sense disambiguation (WSD), and other techniques, to help it derive the actual meanings of the words, and their underlying concepts, rather than by simply matching character strings like keyword search technologies.

Categories:

Leave a Reply