Devblog: Searching Sitecore Content using Lucene

Roundedcube Roundedcube
August 30, 2011
Web Development , Sitecore

Overview

When you need to search content within Sitecore, you have a few options available. You can use Google Custom Search, or the more involved Search Appliance (a.k.a. Google Mini). The later is an actual rack-mountable server that you must configure outside of any programming code. Another option, and the title of this blog, is to use the search feature within Sitecore itself.

Using the Sitecore search method leverages the Lucene search engine which many people can overlook. Having primarily implemented Google search options more than anything else in the past, I have come to appreciate a few aspects of the Sitecore/Lucene solution including:

• Speed and accuracy

• Thoroughness (i.e. search directly within specific Sitecore data fields)

• Used with Sitecore “Item” objects

• Easy sorting of results

• Perform both wildcard or keyword searches

This blog will provide the simple, basic, approach to configure and use the Sitecore search feature for a single content type. The solution seen here was implemented into a Sitecore 6.3.1 instance that also had a Google Custom Search installed and working just fine. The decision to use a Lucene search stemmed from the need to have a narrow, content specific, search within an interior part of the website that also needed up-to-the-second accuracy.

Creating an Index

The first step in the process is to configure Sitecore so it indexes a part of your website. This is accomplished with an update to the web.config file of the site. To make these web.config changes easier, simply create a new “.config” file (named anything you wish) and place it within the “/App_Config/Include/” folder. Sitecore will automatically merge any .config file located in the “Include” folder during runtime. I must say a huge thank you to Sitecore for this organizational bliss!

Here is an example .config file:

sitecore lucene search

Key Details

From the XML seen above, the key areas needing customizing and attention include (in order of appearance):

 

< database id = " web " >

The name of the Sitecore database to be indexed. Depending on your situation, you may want to index the “master” database for your content. However, I would think 99% of the time, most everyone will want to capture content from the public-facing web database.

 

< index id = " resource_search_index " ...>

This is the name of the search index (you can have as many as you like) that will be referenced in programming code.

 

< param desc = " folder " > resource_search_index </ param >

Used to specify the name of the local file system folder where the Lucene index files will be stored. All search indexes, by default, will be stored in a folder named “Indexes” contained within the Sitecore “Data” folder. Note: the Data folder is the same one used to store the Sitecore license file.

 

< Database > web </ Database >

As mentioned above, this is another reference to the database you wish to index.

 

< Root > /sitecore/content/Home/SomeSectionPage/ResourceParent </ Root >

This is the parent path within the Sitecore tree, beneath which any indexed content must reside.

 

< resource > {012B24FB-6CC2-465B-9DD9-9734371BDA2C} </ resource >

This tag specifies the GUID of the Sitecore template for the search engine to index. All other content items will be ignored.

The Code

The following C# code is comprised of a main search method that returns a generic collection of Sitecore Items and two helper methods that perform either a keyword or wildcard search.

sitecore lucene search

With the comments and extra examples removed, there really isn’t a lot of code to get Lucene up and running. I applaud Sitecore for integrating this search engine to this degree and look forward to using it again on the next project.

Caveats

• When told to index only the Sitecore Web database, the updating of the index only happens during publishing. Not only because of the issue mentioned below, I would still suggest coding for this scenario from the beginning unless the rules for you application are different.

• If you do happen to index the Master database, then keep in mind simply updating/saving content will invoke a re-index. Granted, some people don’t mind waiting for 100,000 items, with a dozen fields each, to re-index, I personally try to limit those experiences.

• If the Sitecore database configured to be indexed becomes corrupted, as it did for me, you will most likely receive zero results for any query except for a few Standard Value fields like “name.” The Lucene search refused to turn tail and show a YSOD (Yellow Screen of Death) which makes me happy to know the code will very likely continue this behavior in production.

Troubleshooting

Index Viewer Module

One issue I encountered while implementing the Lucene search was the result of something completely outside of the Sitecore Search mechanism itself and was instead a corrupted “Master” database that caused the Sitecore “Web” database to become corrupted upon publishing.

While diagnosing why no search results were being returned, there was an extremely helpful module within the Sitecore Shared Source library on the SDN Developer website named “IndexViewer.” This module will allow you to perform many types of searches using the Lucene engine and displays a dropdown list of all your indexes. I highly recommend using this tool for not only validating the integrity of the index, but also because it lets you rebuild an index associated with the Sitecore Web database without performing a publish operation. Cheers to the Authors for a very helpful tool!

Sitecore integrated Lucene index browser:

http://trac.sitecore.net/IndexViewer

Author: Phil Leara

comments powered by Disqus

STRATEGIC PARTNERS