Solr took another step toward increasing its NoSQL datastore capabilities, with the addition of
realtime get.
Background
As readers probably know, Lucene/Solr search works off of point-in-time snapshots of the index. After changes have been made to the index, a commit (or a new
Near Real Time softCommit) needs to be done before those changes are visible. Even with Solr’s new NRT (Near Real Time) capabilities, it’s probably not advisable to reopen the searcher more than once a second. However there are some use cases that require the absolute latest version of a document, as opposed to just a very recent version. This is where Solr’s new
realtime get comes to the rescue, where the latest version of a document can be retrieved
without reopening the searcher and risk disrupting other normal search traffic.
The Realtime-Get API
The realtime get handler is registered at the
/get URL. As an example, a request like
http://localhost:8983/solr/get?id=SOLR1000&fl=id,name&wt=json
returns a response like
{"doc":{"id":"SOLR1000","name":"Solr, the Enterprise Search Server"}}
Notice that the optional
fl (
field
list) parameter works as normal, allowing you to select the fields you want returned.
There’s also a realtime get component that can be inserted into any request handler, including the standard request handler.
How it works
The realtime get feature uses transaction logging to keep track of uncommitted updates to the index. When a get request for a document is received, this log is checked first and retrieved from there if found. If it’s not found, then the latest opened searcher is used to retrieve the document. Checking the log is super fast, and IO reads from the log are fully concurrent for maximum scalability.
Try it out
Download a recent
nightly build of Solr 4.0-dev and follow the
Quick Start guide on the Solr wiki. Feedback on the
solr-user mailing list is always appreciated!
This looks promising. Is this is a master only feature or slaves also can reconcile using the master log?
What are the main differences between ElasticSearch, Apache Solr and SolrCloud?…
With a couple of fairly obscure exceptions, I’ve never found anything Solr can do that ElasticSearch can’t. Here are some pretty major things ElasticSearch can do that Solr can’t*: * Create new indices on the fly, via the HTTP API * Handle both repl…
[...] Real-time Get – The ability to quickly retrieve the latest version of a document, without the need to commit or open a new searcher [...]