Project Aims
Implementation of a system consisting of 3 basic components:
- A harvester, reading documents from a file system or via HTTP and turning them into material that can later be indexed. The harvester should be configurable to ignore files or directories and should be able to understand several different file formats (HTML, PDF, PostScript, RTF, DVI) via plugins/external applications and store the processed information in a database.
- A web frontend providing an UI for the search functions.
- An indexer (working on the material delivered to the database by the harvester) and retrieval engine (cooperating with the web frontend), implementing Latent Semantic Indexing.
The project will be tested on the pages of the students journal UNiMUT.
As of 2003-11-03 the latter two components have emerged into a new project.