Abstract
The goal of a data integration system is to allow users to query diverse information sources through a schema that is familiar to them. However, there may be many different users who may have dif- ferent preferred schemas, and the data may be stored in data sources which use still other schemas. To integrate data, mapping rules must be defined to map entities of the data sources to entities of the users’ schemas. In large information systems with many data sources which serve sophisticated applications, there can be many such mapping rules and they can be complex. The purpose of this paper is to study the per- formance of alternative query processing techniques for data integration systems with many complex mapping rules. A new approach, mapping data to queries (MDQ), is presented. Through extensive performance experiments, it is shown that this approach performs well for complex mapping rules and queries, and scales significantly better with the num- ber of rules than the state of the art, which is based on query rewrite. In fact, the performance is close to that of an ideal system in which there is only a single schema used by all sources and queries. Show more
Permanent link
https://doi.org/10.3929/ethz-a-006835897Publication status
publishedJournal / series
Technical report / [ETH, Department of Computer ScienceVolume
Publisher
Swiss Federal Institute of TechnologySubject
INFORMATION STORAGE + INFORMATION RETRIEVAL (INFORMATION SYSTEMS); INFORMATIONSSPEICHERUNG + INFORMATIONSGEWINNUNG (INFORMATIONSSYSTEME); SPECIAL PROGRAMMING METHODS; ABFRAGEN (INFORMATIONSSYSTEME); SPEZIELLE PROGRAMMIERMETHODEN; QUERIES (INFORMATION SYSTEMS)Organisational unit
02150 - Dep. Informatik / Dep. of Computer Science
More
Show all metadata
ETH Bibliography
yes
Altmetrics