Ontology-based Information-Filtering and Retrieval
|
|
Dr. Dominik Kuropka
Hasso-Plattner-Inst. für Softwaresystemtechnik
Prof.-Dr.-Helmertstr. 2-3, 14482 Potsdam
Fon: (0331) 55 09-193
Fax: (0331) 55 09-189
http://bpt.hpi.uni-potsdam.de
Motivation
Cheap mass storage and the increasing interconnectivity of
computers lead to a rapid increase of available documents.
This has risen the flow of information in business, sciences
and administration to a point, where its exceeds the human
processing capacity. To cope with this problem automated systems
for Information Filtering and Retrieval are needed. This tutorial
will give a short overview on linguistics and a classification
and theoretical evalutation of popular IR&IR models which
motivates the need for ontology based IF&IR models. The
main part will deal with two ontology based IF&IR models:
the Topic-based Vector Space Model (TVSM) and the Enhanced
TVSM. Finally some implementation aspects and practical issues
of those models as well as quantitative evaluation methods
on IF&IR systems in general will be addressed.
Outline
1. Introduction to the issue of Information Filtering (IF)
and Information Retrieval (IR)
2. Basic definitions
2.1 Architecture of IF&IR systems
2.2 Basics of computer linguistics
2.3 Ontologies
3. Classification of popular IF&IR models and theory based
evaluation
3.1 models without term interdependencies
3.2 models with immanent term interdependencies
3.3 models with transcendent term interdependencies
4. Topic-based Vector Space Model (TVSM)
4.1 Concept
4.2 Stopword, stemming and synonym lemmas
4.3 comparison with other models and critics
5. Enhanced TVSM (eTVSM)
5.1 Concept
5.2 Connection to Ontologies
5.3 Implementation using relational databases
5.4 Comparison with other models and critics
6. Practical usage of the eTVSM
6.1 Ontology creation and reuse of available ontologies
6.2 Application to IF and IR
6.3 Quantitative evaluation of IF and IR systems
|