Saturday, November 29, 2008

Stemming

Stemming process plays a major role in Indexing applications. In most cases more than one word have similar semantic interpretations and can be considered as equivalent for the purpose of Information Retrieval applications .Due to that Stemmers have been developed reduce a word to it’s root form. Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form. The process of stemming, often called conflation, is useful in search engines for indexing and other natural language processing problems.

For example a stemmer for English should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish".

Stemming process contains few steps to stem a given word.

In the first step use for removal of plurals and -ed or -ing.

e.g.

fishing -> fish, feed -> feed, agreed -> agree

In the next step turns terminal y to i when there is another vowel in the stem. Next step use to maps double suffices to single ones. so -ization maps to -ize etc.

In the fourth step deals with -ic-, -full, -ness etc in the same way as in step three. Fifth step is used to remove -ant, -ence etc from the given word.

In indexing application when a word is given to index, application first calls the Stemmer class and then indexed the stemmes word.

By doing stemming process we can save space and reduce response time since we use single key instead of several keys.

Lucene for Indexing Applications

Lucene is a opensource information retrieval library. It is originally designed in Java, but has been ported to programming languages including Delphi, Perl, C++, Python, Ruby, PHP and C#.

It can be used for any application which requires full text indexing and searching capability.

The key classes that we will use to build a search engine.

  • Document - The Document class represents a document in Lucene. We index Document objects and get Document objects back when we do a search.
  • Field - The Field class represents a section of a Document. The Field object will contain a name for the section and the actual data.
  • Analyzer - The Analyzer class is an abstract class that used to provide an interface that will take a Document and turn it into tokens that can be indexed. There are several useful implementations of this class but the most commonly used is the StandardAnalyzer class.
  • IndexWriter - The IndexWriter class is used to create and maintain indexes.
  • IndexSearcher - The IndexSearcher class is used to search through an index.
  • QueryParser - The QueryParser class is used to build a parser that can search through an index.
  • Query - The Query class is an abstract class that contains the search criteria created by the QueryParser.
  • Hits - The Hits class contains the Document objects that are returned by running the Query object against the index.

Sunday, August 10, 2008

SQL Server 2005 Full-Text Search Indexing

SQL Server 2005 Full-Text Search Indexing can be used with MS SQL server.We index a sample database and search it using Web based or Windows based application. To index a sample database we have to write some queries.

To do that I created a table named VTx in MS SQL Server 2005. If the database table contains "VideoText", "VideoTitle", and "VideoFrame" as columns and if we indexed it using "VideoText" column then we can write below steps to index it.

First I created the full-text catalog named VTxCatalog using below query.

CREATE FULLTEXT CATALOG VTxCatalog

Then I enabled the fulltext indexing by running the sp_fulltext_database

exec sp_fulltext_database 'enable'

Then I created the full-text index on table VTx

CREATE FULLTEXT INDEX ON VTx
(
VideoText

)
KEY INDEX PK_VTx ON VTxCatalog
WITH CHANGE_TRACKING AUTO

Then after inserting some records to the table, I was able to search it using below query

"SELECT * FROM VTx WHERE FREETEXT(Text, N'" + texttosearch + "')"

Using this query in whatever application, we can get the result set for the given word.

Tuesday, March 18, 2008

SharePoint Search

I had a problem in SharePoint Search. That when I tried to search content, even though data is available no result was displayed. To overcome that problem below step were used.

* Clicked on the "Search Settings" which is in the created Shared Service page as shown below.

















* Then clicked on the "Content sources and crawl schedules" in Configure Search Settings as
shown below















* After that opened the content source's context menu by left clicking the down arrow and
selected "Start Full Crawl" to index all files. This would start the indexing process. Figure is
shown below.















* After that SharePoint Search works correctly.

Thursday, February 21, 2008

Retrieving a value from MSCRM and and Assign it to another Variable

We can retrieve data from MS CRM, like we retrieve data from a database. For that as in the below code, we have to first specify the attributes which we want to consider while retrieving by using ColumnSet class.

Then we can specify the conditions to be consider using the ConditionExpression class.
Also we can specify a condition operator like in SQL queries by using that class.

Then we can filter the given condition using FilterExpression class. After that we can create a query to retrieve data using QueryExpression class. By using that query we can retrieve the data as shown in below code.


ColumnSet cols = new ColumnSet();
cols.Attributes = new string[] { "name", "accountnumber" }; //name and accountnumber are two attributes.

ConditionExpression condition = new ConditionExpression();
condition.AttributeName = "accountnumber"; // Get values corresponding to accountnumber value
condition.Operator = ConditionOperator.Equal;
condition.Values = new string[] { str1[0].Trim() };

FilterExpression filter = new FilterExpression();
filter.FilterOperator = LogicalOperator.And;
filter.Conditions = new ConditionExpression[] { condition };

QueryExpression query = new QueryExpression();
query.EntityName = EntityName.account.ToString();
query.ColumnSet = cols;
query.Criteria = filter;

// Create the Web service request object.
RetrieveMultipleRequest retrieve = new RetrieveMultipleRequest();
retrieve.Query = query;

// Execute the Web service request.
RetrieveMultipleResponse retrieved =
(RetrieveMultipleResponse)service.Execute(retrieve);

BusinessEntityCollection entities = retrieved.BusinessEntityCollection;
account ac = entities.BusinessEntities[0] as account;

string nameinCrm=ac.name;

Hope this will be helpful for you!!!