Tag | Nutch
Home » Posts tagged "Nutch"
After a week off to enjoy time with my family, I thought I would kick off the last week of 2010 with a look back at the year as it relates to the Apache Lucene ecosystem.  For anyone who follows the amalgamation of projects that I like to call the Lucene Ecosystem (the Apache projects: Lucene, Solr, Nutch, Mahout, Tika, PyLucene, Lucy, Lucene.NET, Droids, ManifoldCF — Lucene Connector Framework, OpenNLP and UIMA) you know it…
In my preparation for my upcoming talk on Apache Hadoop and Search, I thought I would try out using Nutch (the genesis for Hadoop) to index some content to Solr.  I started off by referencing Sami Siren’s excellent post on Nutch and Solr… (which worked flawlessly for 1.1 for me on OS X) to get up and going, but quickly hoped there is a much easier way to do this than typing in all
It’s that time of year, so I thought I would take a look back at the year that was for the Lucene Ecosystem… and maybe look ahead just a little bit too.First and foremost, it should be obvious to even the most casual observer that the Apache Lucene communities are thriving.  Not only is it a great time to be involved in open source, it’s a great time to be involved in Lucene.  Both
Apache Nutch, a subproject of Apache Lucene, is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats.Apache Nutch 1.0 contains almost 200 resolved issues and improvements such as Solr Integration, new indexing framework and new scoring framework just to mention a few.Nutch 1.0 is available from here….