Need Quality Code? Get Silver Backed

Date Range Searches in Lucene

27thDec

0

by Gary H

One of the problems that you are likely to come across when you start creating searches with Lucene is that of searching across date ranges. Lucene is a text search engine so it is not readily obvious on how to apply this to dates. The answer is to take a step back and look at Date representation.

If we fall back to thinking of a date as a collection of fixed length data working from most significant to least significant we can represent a date as a Year (most significant) followed by a month, then a day etc giving us something like 20121225 or 10661014. Looking at the date like this it becomes apparent we can represent the date as a sortable integer. For example 20121225 (25th Dec 2012) comes after 20121224 (24th Dec 2012) both of which are later than 20000101 (1st Jan 2000).

The next step is to get this numerical data into Lucene and then to search it. We can do this using a Range Query; specifically a NumericRangeQuery. To do this begin by indexing your dates using a NumericField and adding them to your document like:

var df = new NumericField(Fields.AmendedDate);
df.SetIntValue(int.Parse(itemToIndex.startDate.ToString("yyyyMMdd")));
doc.Add(df);

You can make your indexing a little faster by reusing your NumericField across many documents (see the documentation). If you need more resolution (i.e. to the hour, minute or second) change the ToString part of your method to include this in order from most to least significant.

With your dates all nicely indexed you are now ready to search across it. To do this we use a NumericRangeQuery:

var q = NumericRangeQuery.NewIntRange(Fields.AmendedDate,
                                      int.Parse(SearchFrom.ToString("yyyyMMdd")),
                                      int.Parse(SearchTo.ToString("yyyyMMdd")),
                                      true, true);

This query can then be used to search:

    yourIndexSearcher.Search(q, null, 1000)

or conjoined to an existing query like:

    masterQuery.Add(q, BooleanClause.Occur.MUST);

Splitting your search in this way is faster than using a textual term search due to the nature of how numeric fields are indexed (again, see the documentation). This is also generally faster than using a filter (see this post with a brief explanation including an answer from one of the Lucene maintainers).

C# , Lucene , Search

Comments are Locked for this Post