Author | yonik
Home » Articles posted by yonik
Solr Cloud has garnered a ton of interest lately. In light of the planned release of Solr 4.0 in September, the community would like to hear from someone who has had experience with Solr Cloud, and the upcoming ApacheCon / Lucene EuroCon… represents a perfect opportunity.LucidWorks would like to invite the community to step up and submit a paper about their experience with Solr Cloud. So much so, we will pay to send the
The first alpha release of Solr 4 is quickly approaching, bringing powerful new features to enhance existing Solr powered applications, as well as enabling new applications by further blurring the lines between full-text search and NoSQL.The largest set of features goes by the development code-name “Solr Cloud” and involves bringing easy scalability to Solr.  Distributed indexing with no single points of failure… has been designed from the ground up for near real-time (NRT), and
Advanced Filter Caching is a relatively new feature in Solr, available in version 3.4 and above. It allows precise control over how Solr handles filter queries in order to maximize performance, including the ability to specify if a filter is cached, the order filters are evaluated, and post filtering.

Filter Queries in Solr

Adding a filter expressed as a query to a Solr request is a snap… simply add an additional fq… parameter for each

Background

I needed a really good hash function for the distributed indexing we’re implementing for Solr. Since it will be used for partitioning documents, it needed to be really high quality (well distributed) since we don’t want uneven shards. It also needs to be cross-platform, so a client could calculate this hash value themselves if desired, to predict which node has a given document.

MurmurHash3…

MurmurHash3 is one of the top favorite new hash function
Solr took another step toward increasing its NoSQL datastore capabilities, with the addition of realtime get.

Background

As readers probably know, Lucene/Solr search works off of point-in-time snapshots of the index. After changes have been made to the index, a commit (or a new Near Real Time softCommit…) needs to be done before those changes are visible. Even with Solr’s new NRT (Near Real Time) capabilities, it’s probably not advisable to reopen the
Lucene’s default ranking function uses factors such as tf, idf, and norm to help calculate relevancy scores. Solr has now exposed these factors as function queries.
  • docfreq(field,term) returns the number of documents that contain the term in the field.
  • termfreq(field,term) returns the number of times the term appears in the field for that document.
  • idf(field,term) returns the inverse document frequency for the given term, using the Similarity for the field.
  • tf(field,term) returns the
I previously introduced Solr’s Result Grouping, also called Field Collapsing, that limits the number of documents shown for each “group”, normally defined as the unique values in a field or function query.Since then, there have been a number of bug fixes, performance improvements, and feature enhancements. You’ll need a recent nightly build of Solr 4.0-dev, or the newly released LucidWorks Enterprise v1.6, our commercial version of Solr.

Feature Enhancements…

One improvement is the
Solr has been able to produce JSON results for a long time, by adding wt=json to any query. A new capability has recently been added to allow indexing in JSON, as well as issuing other update commands such as deletes and commits.All of the functionality that was available through XML update commands can now be given in JSON. For example, you can index a document like so: $ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d ' …
Result Grouping, also called Field Collapsing, has been committed to Solr! This functionality limits the number of documents for each “group”, usually defined by the unique values in a field (just like field faceting).You can think of it like faceted search, except instead of just getting a count, you get the top documents for that constraint or category. There are tons of potential use cases:
  • For web search, only show 1 or
Solr has been able to slurp in CSV for quite some time, and now I’ve finally got around to adding the ability to output query results in CSV also. The output format matches what the CSV loader can slurp.Adding a simple wt=csv to a query request will cause the docs to be written in a CSV format that can be loaded into something like Excel.http://localhost:8983/solr/select?q=ipod&fl=id,cat,name,popularity,price,score&wt=csv
id,cat,name,popularity,price,score
IW-02,"electronics,connector",iPod & iPod Mini USB 2.0 Cable,1,11.5,0.98867977
…
Google+