This data isolation is an unintended artifact of the data modeling methodology that results in the development of disparate data models.
Disparate data models, when instantiated as databases, form disparate databases.
The first data integration system driven by structured metadata was designed at the University of Minnesota in 1991, for the Integrated Public Use Microdata Series (IPUMS).
As a result of recasting multiple data models, the set of recast data models will now share one or more commonality relationships that relate the structural metadata now common to these data models.
Commonality relationships are a peer-to-peer type of entity relationships that relate the standardized data entities of multiple data models.
A common strategy for the resolution of such problems involves the use of ontologies which explicitly define schema terms and thus help to resolve semantic conflicts.
This approach represents ontology-based data integration. On the other hand, the problem of combining research results from different bioinformatics repositories requires bench-marking of the similarities, computed from different data sources, on a single criterion such as positive predictive value.
As of 2010 some of the work in data integration research concerns the semantic integration problem.
This problem addresses not the structuring of the architecture of the integration, but how to resolve semantic conflicts between heterogeneous data sources.
Issues with combining heterogeneous data sources, often referred to as information silos, under a single query interface have existed for some time.
In the early 1980s, computer scientists began designing systems for interoperability of heterogeneous databases.
By making thousands of population databases interoperable, IPUMS demonstrated the feasibility of large-scale data integration.