Gopika's Blog: 4/1/09

Scoring and Ranking in Lucene.Net

Scoring feature is used to prioritize and sort the search results by considering their relevance to the search query. For the scoring formula several facts are used. Below the formula which is used to calculate the score value is shown.

Score for term t in document d = ∑ tf (t in d).idf(t).boost(t.field in d).lengthNorm(t.field in d)

Below table is listed how those functions are calculated and the description of those functions.

Function	Description
tf (t in d) = sqrt(freq)	Term frequency factor for the term (t) in the document (d). This factor result to have high score value for a document where more frequent a term occurred.
idf(t) = log(numDocs/(docFreq+1)) + 1	Inverse document frequency of the term. Common terms are less important than uncommon ones. This factor gives high value to a term which occurs only in few documents and low value to a term which occurs in most documents
boost(t.field in d)	Field boost, as set during indexing. Boosting is used to give high priority for a term or field. This is useful for similarity search to provide high priority for most important area.
lengthNorm(t.field in d)= 1/sqrt(numTerms)	Normalization value of a field, given the number of terms within the field. This value is computed during indexing and stored in the index. This factor returns a higher score when a term matched in fields with less terms

Ranking of the search results are based on the score value of the result. Documents which have high score value have high rank and documents which have low score value have low rank.

Did You Mean Feature for a Search Application

This article describes how the suggesting word feature is used for a Searching application which uses Lucene.Net for indexing and searching.
Suggesting word feature is mainly useful when a user input some misspelled words and to suggest the correct word to the user.

To implement the Did You Mean feature by suggesting words, n-gram method is used. n-gram method divides the given misspelled word into sub words by considering the length of the word.
The idea behind with the n-gram method for suggesting word feature is that the misspelling occurs due to mainly one or two letters. Therefore it will only affects few n-grams. Therefore we can recognize the correct word by taking the word which share high proportion of n-grams with the misspelled word.

n-grams are created considering the length of the word.
If the word length greater than five, two grams are created having the length of three and four.
If the word length is five, then the length of the grams are two and three.
If the word length is less than five, then the length of the grams are one and two.

If we consider the misspelled word "university", then the n-gram query is like
start 3: uni^2.0 end:ity gram 3:uni gram 3:niv gram 3:ive gram 3: ves gram 3:esi gram 3:sit gram 3:ity start 4:univ^2.0 end 4:sity gram 4:univ gram 4:nive gram 4:ives gram 4:vesi gram 4:esit gram 4:sity

We index the start and end n-grams seperately because they are positional unlike other n-grams.
For example the words "eat" and "ate" have the same set of n-grams.

Also we use two set of n-grams to increase the accuracy of the suggestions. For example lets consider that words "ball" and "belt" are in the index file and no "bell" word.

Then since the word length is less than five , 1-gram and 2-gram are created.

For the word "ball" grams are
1- gram = b a l l
2-gram = ba al ll

For the word "belt"
1-gram = b e l t
2-gram = be el lt

For the searching word "bell"
1-gram = b e l l
2-gram = be el ll

By considering those 1-gram and 2-grams, we can see that number of grams matching in 1-gram is same. But it is different in the 2-gram. Therefore by taking two grams we can select the closest word.

C# code for the implementation is shown below. For the implementation two classes and one interface is used.

DidYouMeanParser interface is shown below.

using System;
using System.Collections.Generic;
using System.Text;
using Lucene.Net.Search;

///

/// DidYouMeanParser interface
///

namespace Indexer
{
public interface DidYouMeanParser
{
Query parse(string queryString);
Query suggest(string queryString);
}
}

The implementation of DidYouMeanParser interface is done by the DidYouMeanParserClass as shown below.

using System;
using System.Collections.Generic;
using System.Text;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Search.Spell;
using Lucene.Net.Store;

///

/// Class implement the DidYouMeanParser interface
///

namespace Indexer.DidYouMean
{
public class DidYouMeanParserClass : DidYouMeanParser
{

private string defaultField;
private string spellIndexDirectory;

public DidYouMeanParserClass(string defaultField, string spellIndexDirectory)
{
this.defaultField = defaultField;
this.spellIndexDirectory = spellIndexDirectory;
}

public Query parse(string queryString)
{
return new TermQuery(new Term(defaultField, queryString));
}

public Query suggest(string queryString)
{
try
{
SpellChecker spellChecker = new SpellChecker(spellIndexDirectory);
if (spellChecker.Exist(queryString))
{
return null;
}
string[] similarWords = spellChecker.SuggestSimilar(queryString, 1);
if (similarWords.Length == 0)
{
return null;
}
return new TermQuery(new Term(defaultField, similarWords[0]));
}
catch (Exception e)
{
throw new ParseException(e.Message);
}
}
}

}

To compare the misspelled word we use the words in the in index file. For that we create a temporary index file which having the n-grams as fields using the existing index file.
To create that DidYouMeanIndexer class is used as shown below.

using System;
using System.Collections.Generic;
using System.Text;
using Lucene.Net.Index;
using Lucene.Net.Search;
using Lucene.Net.Search.Spell;

///

/// Class use to create the temperorary index file for DidYouMeanParser
///

namespace Indexer
{
public class DidYouMeanIndexer
{

public void createSpellIndex(string field,
string originalIndexDirectory,
string spellIndexDirectory)
{

IndexReader indexReader = null;
try
{
indexReader = IndexReader.Open(originalIndexDirectory);
Dictionary dictionary = new LuceneDictionary(indexReader, field);
SpellChecker spellChecker = new SpellChecker(spellIndexDirectory);
spellChecker.IndexDictionary(dictionary);
}
finally
{
if (indexReader != null)
{
indexReader.Close();
}
}
}

}
}

After that we can provide the Did You mean feature by passing the misspelled words to the DidYouMeanParser class.

Integrate SQL Server Reports with MS CRM

To integrate SQL Server Reports with MS CRM, select "Reports" under MS CRM "Workplace" as shown below.

Then click on "New" to add a new report as shown in below figure.

After that click on "Browse" button which is shown in below figure, and select the report to be added. Enter the details of the report like Name, Description,etc.

Finally click the Save button to save the report. After that the added report is displayed under Reports in MS CRM Workplace.

Unable to edit my information using “My Settings” link in the SharePoint Site

We can find two links at the top of the SharePoint site as "My Settings" and "My Site". But I was unable to edit my information using "My Settings" link. Also when I clicked on "My Site" link it displayed an error message saying that cannot create a site because site is already created. Other users also had those two problems

I was able to solve those two problems by changing the "My Site Settings" from SharePoint 3.0 Central Administration. User Name in "My Site Settings" was changed to the Domain Users as shown in the below figure.

By changing User Name to Domain Users, it allows every user in the domain to create and maintain for him or her and maintaining his or her own profile. My Site Settings is appearing in the created Shared Service page

Creating a Sample Custom Web Part for SharePoint

We have to create custom web parts for SharePoint sites according to our requirements. Because web parts required for a site may not available in the Web Part Gallery of the SharePoint.
I created a simple web part which displays who the current user is logged in to SharePoint. To create and use it in SharePoint site below steps were used.

• Created a Class Library project in Microsoft Visual Studio 2005
• Then the below code is added to get the current user and show it

public class SimpleWebPart : WebPart
{
private string displayText = "Hello World!";

[WebBrowsable(true), Personalizable(true)]
public string DisplayText
{
get { return displayText; }
set { displayText = value; }
}

protected override void Render(System.Web.UI.HtmlTextWriter writer)
{
writer.Write(displayText);
}
}

• Assembly.cs file in the class library project was modified by adding below code
[assembly: AllowPartiallyTrustedCallers()]

• Then gave a strong name to the assembly by selecting project properties and then selected “Signing” tab as shown in below figure.

• Found the public key token of the Assembly by using a Reflector tool. To find the public key token of the assembly, drag and drop the compiled assembly in to the reflector. Then it shows the public key token of the assembly as shown in below figure.

• Then I located the dll file in the bin folder. The MOSS 2007 creates every portal in the inetpub\wwwroot\wss folder. The easiest way to find the bin folder from these folder hierarchies is to go from inetmgr console. Locate the appropriate portal (for which u want to deploy the web part), identified with the port number. Right click and have Properties. Under the Home Directory Tab, note the path in Local path text box as shown in below figure.

• Right clicked on the project name in the VS.Net 2005 IDE and clicked properties. Under the Build page paste the same path copied from inetmgr console into the Output Path as shown in below figure.

• Then created a new SafeControls entry for the created web part assembly by modifying the web.config file in the bin folder. The code is given below.

<SafeControls>
.
.
.
<SafeControl Assembly="NewWebPart" Namespace="NewWebPart" TypeName="*" Safe="True" / SafeControls>

• To add the created web part to the web part gallery of the SharePoint site first clicked on the “Site Actions” button and then select Site Settings as shown in below figure

• On the “Site Settings” page under Galleries column clicked on the “Web Parts” as shown in below figure.

• On the “Web Part Gallery” Page clicked on the “New” button, to add the new web part assembly to the gallery as shown in below figure.

• On the “New Web Parts” page locate the created web part is in the list, checked the check box on the left and clicked on the “Populate Gallery” button the top of the page as shown in below figure.

• Then we can add the created web part to a web part zone.

Lucene.Net Logical Index Structure

In the index file data are kept as Documents. Each document contains several fields that consist of name and value pairs. Therefore index file have several documents stored and each document contains several fields.

Fields are used to keep information in different ways in the indexed file. There are four types of field types.

They are listed in the below table with the description of the field type.

Field Type	Description
Keyword	Constructs a string value field that is not tokenized, but is indexed and stored. Therefore it is useful for non-text fields and for fields which we want to keep information as it is. Therefore name of the video file is keep in a this type of field
Text	Constructs a string value field that is tokenized, indexed and stored. Information in this type of field will return with hits. These types of fields are useful to keep content in a video frame.
UnIndexed	Constructs a string value field that is not tokenized nor indexed, but is stored in the index
UnStored	Constructs a string value field that is tokenized and indexed, but is not stored in the index

Lucene.Net Index Building Process

Index building process is the process of indexing given to the index files. Before index given data, those data are analyzed by an analyzer. During the analyzing process given data strings are tokenized to tokens. Then the case of the all the tokens are turned to the lower case using the lower case filter. After that stop words are removed using Stop word filter.

The below English words are considered to be Stop words.

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "s", "such", "t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

Finally those lower case letters are stemmed using the porter stemmer. After that Index Writer will writes the stemmed words to the index file.

IPL 2009 Match Schedule

April 2009

Sat 18
12:30 local, 10:30 GMT, 16:00 IST
1st match - Chennai Super Kings v Mumbai Indians
Newlands, Cape Town

Sat 18

16:30 local, 14:30 GMT, 20:00 IST
2nd match - Bangalore Royal Challengers v Rajasthan Royals
Newlands, Cape Town

Sun 19
12:30 local, 10:30 GMT, 16:00 IST
3rd match - Delhi Daredevils v Kings XI Punjab
Newlands, Cape Town

Sun 19

16:30 local, 14:30 GMT, 20:00 IST
4th match - Deccan Chargers v Kolkata Knight Riders
Newlands, Cape Town

Mon 20

16:30 local, 14:30 GMT, 20:00 IST
5th match - Bangalore Royal Challengers v Chennai Super Kings
St George's Park, Port Elizabeth

Tue 21
12:30 local, 10:30 GMT, 16:00 IST
6th match - Kings XI Punjab v Kolkata Knight Riders
Kingsmead, Durban

Tue 21

16:30 local, 14:30 GMT, 20:00 IST
7th match - Rajasthan Royals v Mumbai Indians
Kingsmead, Durban

Wed 22

16:30 local, 14:30 GMT, 20:00 IST
8th match - Bangalore Royal Challengers v Deccan Chargers
Newlands, Cape Town

Thu 23
12:30 local, 10:30 GMT, 16:00 IST
9th match - Chennai Super Kings v Delhi Daredevils
Kingsmead, Durban

Thu 23

16:30 local, 14:30 GMT, 20:00 IST
10th match - Kolkata Knight Riders v Rajasthan Royals
Newlands, Cape Town

Fri 24

16:30 local, 14:30 GMT, 20:00 IST
11th match - Bangalore Royal Challengers v Kings XI Punjab
Kingsmead, Durban

Sat 25
12:30 local, 10:30 GMT, 16:00 IST
12th match - Deccan Chargers v Mumbai Indians
Kingsmead, Durban

Sat 25

16:30 local, 14:30 GMT, 20:00 IST
13th match - Chennai Super Kings v Kolkata Knight Riders
Newlands, Cape Town

Sun 26
12:30 local, 10:30 GMT, 16:00 IST
14th match - Bangalore Royal Challengers v Delhi Daredevils
St George's Park, Port Elizabeth

Sun 26

16:30 local, 14:30 GMT, 20:00 IST
15th match - Kings XI Punjab v Rajasthan Royals
Newlands, Cape Town

Mon 27
12:30 local, 10:30 GMT, 16:00 IST
16th match - Deccan Chargers v Chennai Super Kings
Kingsmead, Durban

Mon 27

16:30 local, 14:30 GMT, 20:00 IST
17th match - Kolkata Knight Riders v Mumbai Indians
St George's Park, Port Elizabeth

Tue 28

16:30 local, 14:30 GMT, 20:00 IST
18th match - Delhi Daredevils v Rajasthan Royals
SuperSport Park, Centurion

Wed 29
12:30 local, 10:30 GMT, 16:00 IST
19th match - Bangalore Royal Challengers v Kolkata Knight Riders
Kingsmead, Durban

Wed 29

16:30 local, 14:30 GMT, 20:00 IST
20th match - Kings XI Punjab v Mumbai Indians
Kingsmead, Durban

Thu 30
12:30 local, 10:30 GMT, 16:00 IST
21st match - Delhi Daredevils v Deccan Chargers
SuperSport Park, Centurion

Thu 30

16:30 local, 14:30 GMT, 20:00 IST
22nd match - Chennai Super Kings v Rajasthan Royals
SuperSport Park, Centurion

May 2009

Fri 1
12:30 local, 10:30 GMT, 16:00 IST
23rd match - Kolkata Knight Riders v Mumbai Indians
Buffalo Park, East London

Fri 1

16:30 local, 14:30 GMT, 20:00 IST
24th match - Bangalore Royal Challengers v Kings XI Punjab
Kingsmead, Durban

Sat 2
12:30 local, 10:30 GMT, 16:00 IST
25th match - Deccan Chargers v Rajasthan Royals
St George's Park, Port Elizabeth

Sat 2

16:30 local, 14:30 GMT, 20:00 IST
26th match - Chennai Super Kings v Delhi Daredevils
New Wanderers Stadium, Johannesburg

Sun 3
12:30 local, 10:30 GMT, 16:00 IST
27th match - Kings XI Punjab v Kolkata Knight Riders
St George's Park, Port Elizabeth

Sun 3

16:30 local, 14:30 GMT, 20:00 IST
28th match - Bangalore Royal Challengers v Mumbai Indians
New Wanderers Stadium, Johannesburg

Mon 4

16:30 local, 14:30 GMT, 20:00 IST
29th match - Chennai Super Kings v Deccan Chargers
Buffalo Park, East London

Tue 5
12:30 local, 10:30 GMT, 16:00 IST
30th match - Kings XI Punjab v Rajasthan Royals
Kingsmead, Durban

Tue 5

16:30 local, 14:30 GMT, 20:00 IST
31st match - Delhi Daredevils v Kolkata Knight Riders
Kingsmead, Durban

Wed 6

16:30 local, 14:30 GMT, 20:00 IST
32nd match - Deccan Chargers v Mumbai Indians
SuperSport Park, Centurion

Thu 7
12:30 local, 10:30 GMT, 16:00 IST
33rd match - Bangalore Royal Challengers v Rajasthan Royals
SuperSport Park, Centurion

Thu 7

16:30 local, 14:30 GMT, 20:00 IST
34th match - Chennai Super Kings v Kings XI Punjab
SuperSport Park, Centurion

Fri 8

16:30 local, 14:30 GMT, 20:00 IST
35th match - Delhi Daredevils v Mumbai Indians
Buffalo Park, East London

Sat 9
12:30 local, 10:30 GMT, 16:00 IST
36th match - Deccan Chargers v Kings XI Punjab
De Beers Diamond Oval, Kimberley

Sat 9

16:30 local, 14:30 GMT, 20:00 IST
37th match - Chennai Super Kings v Rajasthan Royals
De Beers Diamond Oval, Kimberley

Sun 10
12:30 local, 10:30 GMT, 16:00 IST
38th match - Bangalore Royal Challengers v Mumbai Indians
St George's Park, Port Elizabeth

Sun 10

16:30 local, 14:30 GMT, 20:00 IST
39th match - Delhi Daredevils v Kolkata Knight Riders
New Wanderers Stadium, Johannesburg

Mon 11

16:30 local, 14:30 GMT, 20:00 IST
40th match - Deccan Chargers v Rajasthan Royals
De Beers Diamond Oval, Kimberley

Tue 12
12:30 local, 10:30 GMT, 16:00 IST
41st match - Bangalore Royal Challengers v Kolkata Knight Riders
SuperSport Park, Centurion

Tue 12

16:30 local, 14:30 GMT, 20:00 IST
42nd match - Kings XI Punjab v Mumbai Indian
SuperSport Park, Centurion

Wed 13

16:30 local, 14:30 GMT, 20:00 IST
43rd match - Deccan Chargers v Delhi Daredevils
Kingsmead, Durban

Thu 14
12:30 local, 10:30 GMT, 16:00 IST
44th match - Bangalore Royal Challengers v Chennai Super Kings
Kingsmead, Durban

Thu 14

16:30 local, 14:30 GMT, 20:00 IST
45th match - Mumbai Indians v Rajasthan Royals
Kingsmead, Durban

Fri 15

16:30 local, 14:30 GMT, 20:00 IST
46th match - Delhi Daredevils v Kings XI Punjab
OUTsurance Oval, Bloemfontein

Sat 16
12:30 local, 10:30 GMT, 16:00 IST
47th match - Chennai Super Kings v Mumbai Indians
St George's Park, Port Elizabeth

Sat 16

16:30 local, 14:30 GMT, 20:00 IST
48th match - Deccan Chargers v Kolkata Knight Riders
New Wanderers Stadium, Johannesburg

Sun 17
12:30 local, 10:30 GMT, 16:00 IST
49th match - Deccan Chargers v Kings XI Punjab
New Wanderers Stadium, Johannesburg

Sun 17

16:30 local, 14:30 GMT, 20:00 IST
50th match - Delhi Daredevils v Rajasthan Royals
OUTsurance Oval, Bloemfontein

Mon 18

16:30 local, 14:30 GMT, 20:00 IST
51st match - Chennai Super Kings v Kolkata Knight Riders
SuperSport Park, Centurion

Tue 19

16:30 local, 14:30 GMT, 20:00 IST
52nd match - Delhi Daredevils v Bangalore Royal Challengers
New Wanderers Stadium, Johannesburg

Wed 20
12:30 local, 10:30 GMT, 16:00 IST
53rd match - Kolkata Knight Riders v Rajasthan Royals
Kingsmead, Durban

Wed 20

16:30 local, 14:30 GMT, 20:00 IST
54th match - Chennai Super Kings v Kings XI Punjab
Kingsmead, Durban

Thu 21
12:30 local, 10:30 GMT, 16:00 IST
55th match - Delhi Daredevils v Mumbai Indians
SuperSport Park, Centurion

Thu 21

16:30 local, 14:30 GMT, 20:00 IST
56th match - Bangalore Royal Challengers v Deccan Chargers
SuperSport Park, Centurion

Fri 22

16:30 local, 14:30 GMT, 20:00 IST
1st Semi-Final - TBC v TBC
SuperSport Park, Centurion

Sat 23

16:30 local, 14:30 GMT, 20:00 IST
2nd Semi-Final - TBC v TBC
New Wanderers Stadium, Johannesburg

Sun 24

16:30 local, 14:30 GMT, 20:00 IST
Final - TBC v TBC
New Wanderers Stadium, Johannesburg