An extensible framework and user interface for combining various structured search and document clustering techniques. ExtMiner is released as Open Source (MIT/X, contains some components released with Apache or LGPL licenses).

ExtMiner's visualization component GridChart is based on the code from JOpenChart and is released separately from the rest of the application with LGPL license.

ExtMiner is dependent on following projects:

A general overview of the system can be found from the following paper:
M. Nurminen, A. Honkaranta and T. Kärkkäinen. ExtMiner: Combining Multiple Ranking and Clustering Algorithms for Structured Document Retrieval. In International workshop on Integrating Data Mining, Databases and Information Retrieval (IDDI'05), 16th International Workshop on Database and Expert Systems Applications, 22-26 August 2005, Copenhagen, Denmark. pp. 1036-1040. ISBN 0-7695-22424-9. Slides: [ ppt | pdf ]

The basic principle for indexing structured data in ExtMiner based on the approach described in:
O. Gospodnetic. Parsing, indexing, and searching XML with Digester and Lucene, IBM developerWorks, 2003.

