Thursday, April 23, 2009

Scoring and Ranking in Lucene.Net

Scoring feature is used to prioritize and sort the search results by considering their relevance to the search query. For the scoring formula several facts are used. Below the formula which is used to calculate the score value is shown.

Score for term t in document d = ∑ tf (t in d).idf(t).boost(t.field in d).lengthNorm(t.field in d)

Below table is listed how those functions are calculated and the description of those functions.



tf (t in d) = sqrt(freq)

Term frequency factor for the term (t) in the document (d).

This factor result to have high score value for a document

where more frequent a term occurred.

idf(t) = log(numDocs/(docFreq+1)) + 1

Inverse document frequency of the term. 
Common terms are less important than uncommon ones.
This factor gives high value to a term which occurs only in few
documents and low value to a term which occurs in most

boost(t.field in d)

Field boost, as set during indexing. Boosting is used to give high

priority for a term or field. This is useful for similarity search

to provide high priority for most important area.

lengthNorm(t.field in d)= 1/sqrt(numTerms)

Normalization value of a field, given the number of terms
within the field. This value is computed during indexing and
stored in the index. This factor returns a higher score when
a term matched in fields with less terms

Ranking of the search results are based on the score value of the result. Documents which have high score value have high rank and documents which have low score value have low rank.

No comments:

Post a Comment