For example a stemmer for English should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish".
Stemming process contains few steps to stem a given word.
In the first step use for removal of plurals and -ed or -ing.
e.g.
fishing -> fish, feed -> feed, agreed -> agree
In the next step turns terminal y to i when there is another vowel in the stem. Next step use to maps double suffices to single ones. so -ization maps to -ize etc.
In the fourth step deals with -ic-, -full, -ness etc in the same way as in step three. Fifth step is used to remove -ant, -ence etc from the given word.
In indexing application when a word is given to index, application first calls the Stemmer class and then indexed the stemmes word.
By doing stemming process we can save space and reduce response time since we use single key instead of several keys.
No comments:
Post a Comment