This document serves as documentation for the Carrot 2 framework. It describes Carrot 2 application suite and the API developers can use to integrate Carrot 2 clustering algorithms into their code. It also provides a reference of all Carrot 2 components and their attributes. Carrot 2 is a library and a set of supporting applications you can use to build a search results clustering engine. Such an engine will organize your search results into topics , fully automatically and without external kowledge such as taxonomies or preclassified content.
|Published (Last):||1 October 2006|
|PDF File Size:||9.93 Mb|
|ePub File Size:||13.37 Mb|
|Price:||Free* [*Free Regsitration Required]|
In the same year, version 2. In , version 3. Document sources provide data for further processing. Typically, they would e. NET software without installing a Java runtime. NET Framework version 3. From Wikipedia, the free encyclopedia.
Workbench compatibility for Ubuntu distros. Document source updates and removals of non-functional document sources. A bugfix for. Other minor improvements. Servlet API bug fixes, Workbench bug fixes, removed Google document source, fixed language codes for a few languages. Upgrade of Morfologik Polish dictionary, infrastructural changes and adjustments allowing C2 to operate under more strict security manager policies.
Minor bug fixes and improvements: customization of Solr adapter XSLT, Workbench tweaks for larger inputs, updated dependencies. Ajax support in Document Clustering Server, Bing document source improved, Workbench improvements, bug fixes. Experimental support for clustering Arabic and Korean content, command line application for clustering in batch mode, LGPL -licensed dependencies removed.
Experimental support for clustering Chinese content, search results clustering plugin for Apache Solr. First official release, binaries available on SourceForge.
Incubation releases, source code available on SourceForge.
The clustering or cluster analysis plugin attempts to automatically discover groups of related search hits documents and assign human-readable labels to these groups. By default in Solr, the clustering algorithm is applied to the search result of each single query -— this is called an on-line clustering. While Solr contains an extension for full-index clustering off-line clustering this section will focus on discussing on-line clustering only. Clusters discovered for a given query can be perceived as dynamic facets. This is beneficial when regular faceting is difficult field values are not known in advance or when the queries are exploratory in nature. The query issued to the system was Solr. It seems clear that faceting could not yield a similar set of groups, although the goals of both techniques are similar—to let the user explore the set of search results and either rephrase the query or narrow the focus to a subset of current documents.