Gopika's Blog: Scoring and Ranking in Lucene.Net

Scoring feature is used to prioritize and sort the search results by considering their relevance to the search query. For the scoring formula several facts are used. Below the formula which is used to calculate the score value is shown.

Score for term t in document d = ∑ tf (t in d).idf(t).boost(t.field in d).lengthNorm(t.field in d)

Below table is listed how those functions are calculated and the description of those functions.

Function	Description
tf (t in d) = sqrt(freq)	Term frequency factor for the term (t) in the document (d). This factor result to have high score value for a document where more frequent a term occurred.
idf(t) = log(numDocs/(docFreq+1)) + 1	Inverse document frequency of the term. Common terms are less important than uncommon ones. This factor gives high value to a term which occurs only in few documents and low value to a term which occurs in most documents
boost(t.field in d)	Field boost, as set during indexing. Boosting is used to give high priority for a term or field. This is useful for similarity search to provide high priority for most important area.
lengthNorm(t.field in d)= 1/sqrt(numTerms)	Normalization value of a field, given the number of terms within the field. This value is computed during indexing and stored in the index. This factor returns a higher score when a term matched in fields with less terms

Ranking of the search results are based on the score value of the result. Documents which have high score value have high rank and documents which have low score value have low rank.

Gopika's Blog

Scoring and Ranking in Lucene.Net

No comments:

Post a Comment

tablename_WriteToDataDestination: Mashup Exception Data Source Error Couldn't refresh the entity...