next up previous
Next: Document Management (DocMan) Up: c) Information Systems Previous: c) Information Systems

Data Mining

Data mining (or knowledge discovery in databases) is a new research area developing methods and systems for extracting interesting and useful information from large sets of data. Data mining methods can be used in a variety of application areas, such as commercial databases, telecommunication alarm sequences, epidemiological data, etc. The area combines techniques from databases, statistics, and machine learning.

The Data Mining research group has developed data mining methods and studied the theory of data mining. The research started in late 1980's in the context of developing tools for inferring integrity constraints from databases.

The recent focus of the research group has been in the analysis of event (sequence) data. We have considered methods for finding recurrent episodes within event sequences, and using these to locate strong rules about the occurrences of the events. We also apply Markov chain Monte Carlo methods to examine the interdependence of events in more detail. Also clustering methods have been applied to locate regularities in the data.

With large amounts of information produced by data mining, we are also looking at the problem of selection and visualization of the regularities, to help the user to make use of the data mining results. In connection with the Document Management group, we have also considered discovering document structures from marked documents.

The group has also studied the theory of data mining, e.g., by looking at the relationship of the logical complexity of the discovered sentences and the sample size needed for discovery, and by investigating various frameworks for data mining.

The research is done in several projects funded by the Academy of Finland, TEKES, and the EU ESPRIT research programme. The group has close cooperation with the Document Management and Machine Learning groups.

The members of the Data Mining group are Prof. Heikki Mannila (group leader), Ph.Lic. Helena Ahonen, M.Sc. Oskari Heinonen, M.Sc. Mika Klemettinen, M.Sc. Pirjo Ronkainen, M.Th. Marko Salmenkivi, M.Sc. Riikka Suramo, M.Sc. Hannu Toivonen, and Dr. Inkeri Verkamo.

Publications: [18-20, 162, 163, 170-178, 206, 216, 217, 227, 232, 241].

Home page: http://www.cs.helsinki.fi/research/pmdm/datamining/



next up previous
Next: Document Management (DocMan) Up: c) Information Systems Previous: c) Information Systems