Thursday, April 9, 2009

Lucene.Net Index Building Process

Index building process is the process of indexing given to the index files. Before index given data, those data are analyzed by an analyzer. During the analyzing process given data strings are tokenized to tokens. Then the case of the all the tokens are turned to the lower case using the lower case filter. After that stop words are removed using Stop word filter.

The below English words are considered to be Stop words.

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "s", "such", "t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

Finally those lower case letters are stemmed using the porter stemmer. After that Index Writer will writes the stemmed words to the index file.

No comments:

Post a Comment