Lucene.Net Index Building Process

Index building process is the process of indexing given to the index files. Before index given data, those data are analyzed by an analyzer. During the analyzing process given data strings are tokenized to tokens. Then the case of the all the tokens are turned to the lower case using the lower case filter. After that stop words are removed using Stop word filter.

The below English words are considered to be Stop words.

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "s", "such", "t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

Finally those lower case letters are stemmed using the porter stemmer. After that Index Writer will writes the stemmed words to the index file.

No comments:

Post a Comment

How the transformed data is written to an output file with column headers in U-SQL...

While working with U-SQL language, I noticed that there are few ways of writing data to an output file. Let's assume the SalesDetails....