Saturday, November 29, 2008

Lucene for Indexing Applications

Lucene is a opensource information retrieval library. It is originally designed in Java, but has been ported to programming languages including Delphi, Perl, C++, Python, Ruby, PHP and C#.

It can be used for any application which requires full text indexing and searching capability.

The key classes that we will use to build a search engine.

  • Document - The Document class represents a document in Lucene. We index Document objects and get Document objects back when we do a search.
  • Field - The Field class represents a section of a Document. The Field object will contain a name for the section and the actual data.
  • Analyzer - The Analyzer class is an abstract class that used to provide an interface that will take a Document and turn it into tokens that can be indexed. There are several useful implementations of this class but the most commonly used is the StandardAnalyzer class.
  • IndexWriter - The IndexWriter class is used to create and maintain indexes.
  • IndexSearcher - The IndexSearcher class is used to search through an index.
  • QueryParser - The QueryParser class is used to build a parser that can search through an index.
  • Query - The Query class is an abstract class that contains the search criteria created by the QueryParser.
  • Hits - The Hits class contains the Document objects that are returned by running the Query object against the index.

No comments:

Post a Comment