{
StartSolrServer
SolrClearIndex
[ "PreLoad" { SolrAddDoc > : 50000] : 4
SolrCommit
Wait(220)
[ "WarmupSearches" { SolrSearch > : 4 ] : 1
# Get a new near-real-time reader, sequentially as fast as possible:
[ "UpdateIndexView" { SolrCommit > : *] : 1 &
# Index with 2 threads, each adding 100 docs per sec
[ "Indexing" { SolrAddDoc > : * : 100/sec ] : 2 &
# Redline search (from queries.txt) with 4 threads
[ "Searching" { SolrSearch > : * ] : 4 &
# Wait 60 sec, then wrap up
Wait(60)
}
StopSolrServer
RepSumByPref Indexing
RepSumByPref Searching
RepSumByPref UpdateIndexView
This algorithm will start up the Solr example server (I’m using out of the box settings for this test), clear the current index, and then load 200,000 wikipedia docs into the index. Not a large index by any stretch, but it will help let us see the time affects due to various commit and merge activities well enough to make some simple judgements. After committing, the algorithm then waits 220 seconds – this is required on the latest trunk version because commits no longer wait for background merges to complete – so we wait long enough for those merges to complete and not interfere with the benchmark. This is not necessary on the older version – that commit call will wait until the background merges are finished to return.
Next we do 4 searches to warm up the index just a bit before starting a background thread that will continuously call commit sequentially, as fast as possible. Then we start two more background threads, each adding wikipedia documents at a target rate of 100 docs per second. Then we start 4 background threads that each query Solr as fast as possible. We continue this barrage for a minute and then look at the results.
The Before Picture
This is a graph of the “refresh” times – the time it took to perform each commit and open up a new view on the index. In this case, the index was refreshed 400 times in the minute we allowed the benchmark to run for. For the most part, the refresh time really does not look too bad. The average “refresh” time is actually just 150ms. Now that Lucene and Solr work mostly per segment, this process can naturally be pretty fast. And this is a pretty small index really. There is a troubling spike in this minute though – one “refresh” time took about 23 seconds! The reason for this is that the commit triggered background merges, and Solr waited for those background merges to finish before opening a new IndexSearcher and releasing the commit lock. It gets worse though – not only was the refresh time hurt, but while that commit lock was held, neither of our 2 indexing threads could get a document into the index! They were effectively stalled. Over that minute, we were only able to index the wikipedia documents at 13.91 documents per second. Far below our target hopes of 100 documents per second for each thread! Also, there was a very large block of time were no indexing happened at all. Less troubling, our 4 threads were able to query at a rate of 11.24 queries per second (this can likely vary wildly depending on the ‘challenge’ of the queries.txt file) [UPDATE 9/4/2011: the search rate is very low due to a problem with the initial benchmark - many queries ended up malformed - without so many errors, search performance jumps drastically]. But overall, this is not an optimal use of this desktop’s resources.
The After Picture
Now we try with the new UpdateHandler. The new UpdateHandler no longer blocks updates while a commit is in progress. Nor does it wait for background merges to complete before opening a new IndexSearcher and returning.
The results are not bad – a low average refresh time of 116.74 ms, but also no 23 second spike. There are still spikes, but they are not too frequent, and stay below 2.5 seconds at worst. Micro spikes.
Even better though, our indexing rate is now 125.48 documents per second (vs 13.91 before). This is a fantastic increase – and likely absent large gaps of no indexing activity. The search performance dropped to 2.8 queries per second (from 11.24), but no doubt this is largely because of all the additional indexing activity that was able to take place. There was a lot more work which the CPUs could now do that they couldn’t before; since the indexing threads soaked up more CPU resources, queries were allocated fewer resources.
The After Picture With Lucene NRT
While I was changing around the UpdateHandler, a simple natural extension was too allow the use of Lucene’s NRT feature when opening new views of the index. This feature allows you to skip certain steps that a full commit performs. The tradeoff is that nothing is guaranteed to be on stable storage, but the benefit is very fast “refresh” times.




[...] that NearRealTime search in Solr trunk has had a bit of time to bake, I’m starting to document how to take advantage of it on the Solr wiki: [...]
[...] http://wiki.apache.org/solr/SolrConfigXml?#Update_Handler_Section http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-%E2%80%98near-realtime%E2%… http://lucene.472066.n3.nabble.com/NRT-and-replication-tt3422940.html [...]