Modeling and Meta-modeling

Database Models

<Back to Themes & Resources page>

  • Description
The development and the exploitation of data models are the basic processes that build the database engineeering domain. Models allow database structures to be described at the appropriate level of abstraction, so that schemas can be evaluated, transformed and reasoned about rigourously. They are at the core of design methodologies and CASE tools. The conceptual models, such as the Entity-relationship model and some interpretations of UML class diagrams aim at describing data/information structures at the conceptual, technology-independent level, while the many logical models (relational, object-relational, XML, and the like) currently available are intended to represent data structures as they are implemented by data managers (or by families thereof).
Families of data models have been designed for various application contexts. They are classified into five categories:
  1. abstract database models for information system design: Individual model, IDA Entity-relationship model, GAM, GER model, relational model
  2. DBMS models: SPHINX data models, NDBS model, virtual data models (wrappers)
  3. specific models: decision support data model, temporal data models
  4. models for CASE tools: DB-MAIN model
  5. model analysis: DBMS models, UML data model
  • The Individual model (abstract models). Reference [P74-04] reports on the first version on the Individual model (a variant of the ER model), which was the main component of the MERISE methodology. This model emerged from hot discussions in a French-Belgian think tank [P74-03]. Hubert Tardieu, the architect of the Merise methodology, was a member of the team (though not a co-author of [P74-03] for reasons I can't remember any more).
  • The IDA Entity-relationship model (abstract models). In 1983, François Bodart and Yves Pigneur published a book describing the IDA methodology, comprising models and design methods for various aspects of information system conceptual design [B83]. One of the models (called Entity-relationship model, though significantly different from P. Chen’s model) was devoted to information structures specifications. A second book addressed logical database design [B86].
  • The GAM (abstract models). The Generalized Access Model derived from the early work on technology-independent logical models [P74-01]. This binary model was the basis of three research lines: (1) database performance analysis [P76-02] [P77-01], (2) a first study on schema transformation [P81-02] and (3) logical design methodology, notably through schema/algorithm co-transformations [B86].
  • The GER model (abstract models). The Generic Entity-relationship model (GER) is a wide-spectrum information/data structure specification model. It encompasses the main concepts and constructs of most popular modeling formalisms, be they value-based or object-based, it has been given a precise semantics via an extended version of the NF2 (non-first normal form, or nested) relational model [P89-1] [P96-10]]. Through a specialization mechanism, such usual models as Entity-relationship, UML class diagrams and ORM can be rigourously specified and compared [P90-01]. Similarly, the GER can be used to define standard logical data models such as the relational, object-relational, CODASYL, IMS, XML or plain file structure models. The GER model has been used to study transformation-based database engineering processes [P06-10].
  • Relational model (theory) (abstract models). Some important theoretical aspects of the relational model, in particular the normalization process, have been developed in books [B09] and [B07]. The emphasis is on concepts and techniques applicable to database design problems solving by practitioners.
  • The SPHINX data models (DBMS models). The SPHINX DBMS was based on a hierarchy of two data models. The first one would be qualified, according to the current terminology, conceptual [P74-02] and the second one, logical [P74-01], though both were technology independent (I must confess that the titles of the paper are misleading!). The concept of technology-independent (PIM in the MDE vocabulary) logical model was popular in the seventies. This idea was later reused in the GAM, GER and DB-MAIN models. The SPHINX system and its data models are described in project reports [R78-01] to [R78-05].
  • The NDBS data model (DBMS models). NDBS (Network Database System, 1986-1996) was an educational database management environment allowing Turbo-Pascal programs to manage and use complex data in an efficient, though very intuitive, way. The NDBS data model was a variant of the ER model that offered flat entity types (with simple attributes), single-component identifiers and one-to-many relationship types [NDBS-86]
  • Virtual data models (wrappers) (DBMS models). Several architectures have been based on the concept of wrapper. A wrapper is a software component attached to a data source (file, spreadsheet, database, web page) and that provides its users (typically application programs) with a data model that is different from that of the data source. A wrapper forms a virtual DBMS that offers a virtual data model as well as a virtual data manipulation language (to some extent, JDBC, ODBS and ADO are generic wrappers). Wrappers have been used as natural interfaces to network databases [P81-01], as building blocks in federated databases [P01-03] and in an evolutive migration architecture allowing easy conversion of application programs [P08-05]. Normally, a wrapper simulates a new technology on top of a legacy data manager [P01-03]. In the latter application, inverse wrappers simulate the legacy technology (typically CODASYL or standard files) on top of an SQL DBMS in order to minimize the source code adaptation [P04-04].
  • Decision support data model (specific models). References [P90-02] and [B94] describe a decision support model coupling a database with a computing model. The data model is a simple Entity-relationship model (just like that of NDBS) in which attributes are either basic or derived. A derived attribute is defined by a derivation expression referencing other attributes of the schema. The derivation language is based on ADL (see below). This application is described in more detail in the theme Databases and Computing Models, which shows that such a model can be implemented as an active database.
  • Temporal data models (specific models). The DB-MAIN model has been extended to express temporal aspects of data (transaction, valid, bi-temporal) at the conceptual, logical and physical levels. A specific methodology has been designed and code generation rules have been implemented for active relational databases [P01-02]
  • The DB-MAIN model (models for CASE tools). This model is a partial, graphical, implementation of the GER. Due to its origin, it has also been called GER in some publications. It has been developed for the DB-MAIN CASE environment. A precise definition can be found in the DB-MAIN manuals [T09-01], in DB design tutorials [T02-01] [T02-02] and in text book [B09]. This model has been used since 1990 in all the publications of the LIBD.
  • DBMS models (model analysis). Descriptions of data models specific to the most popular DBMS are available in various references. SQL2, SQL3 in [B09], hierarchical or IMS in [P09-02] and [B02-02], network or CODASYL DBTG in [P09-03], [R03-01] and [B02-02]. Elementary knowledge in legacy data models is important in database reverse engineering. Also, it can't be bad to show young IT professionals that database technology has not begun with their first Ctrl-Alt-Del. On the contrary, it has a long history and many of the seemingly nice innovative data management features already existed in the seventies (e.g., triggers, predicates, dynamic DML, metadata), and sometimes before!
  • UML data model (model analysis). UML class diagrams are often proposed to express database schemas. The ability of this formalism to describe conceptual schemas has been studied in references [R02-01] and [B09]. It appears that by discarding some ill-designed constructs and by adding a small number of constructs (such as identifiers and other basic constraints) it is possible de define a variant of UML (called DB-UML) quite fitted to database schemas. DB-UML has also been implemented in the DB-MAIN CASE tool, together with bi-directional global schema transformations with the DB-MAIN GER model.
Let's finally mention an interesting proposal by Yannis Tzitzikas to optimize the graphical layout of large ER schemas based on three physical laws (electricity, magnetism, mechanics)[P05-06].
  • Keywords
ER model, Individual model, Merise model, UML class diagrams, wide-spectrum model, GER model, DB-MAIN model, logical data model, temporal model, legacy data model, relational model, network model, hierarchical model, OO model, OR model, XML model, decision support, large schema layout, semantic and statistical aspects of models, IS-A relation
  • Resources
[B09] Jean-Luc Hainaut. Bases de données - Concepts, utilisation et développement, Dunod, Collection Sciences Sup, Paris, 2009. [book description]
[P09-04] Jean-Luc Hainaut. Network Data Model, in Encyclopedia of Database Systems, Liu, L. and Özsu, T. (Eds), Springer-Verlag, 2009. [full text]
[P09-03] Jean-Luc Hainaut. Hierarchical Data Model, in Encyclopedia of Database Systems, Liu, L. and Özsu, T. (Eds), Springer-Verlag, 2009. [full text]
[T09-01] DB-MAIN Reference Manual, 2009 []
[P08-05] Anthony Cleve, Jean Henrard, Didier Roland and Jean-Luc Hainaut. Wrapper-based System Evolution - Application to CODASYL to Relational Migration, in Proceedings of the 12th European Conference on Software Maintenance and Reengineering (CSMR’08), pages 13-22, IEEE Computer Society, 2008. [description]
[B07] Jean-Luc Hainaut, Introduction pratique à la théorie relationnelle des bases de données, septembre 2007, 238 pages, [full text]
[P06-10] Jean-Luc Hainaut. The Transformational Approach to Database Engineering, in Generative and Transformational Techniques in Software Engineering, Lecture Notes in Computer Science, Volume 4143, pages 95-143, Springer, 2006. [description]
[P05-06] Yannis Tzitzikas and Jean-Luc Hainaut. How to Tame a Very Large ER Diagram (Using Link Analysis and Force-Directed Drawing Algorithms), in Proceedings of the 24th International Conference on Conceptual Modeling, (ER’05), Lecture Notes in Computer Science, Volume 3716, pages 144-159, Springer-Verlag, 2005. [description] [full text]
[P04-04] Jean Henrard, Anthony Cleve and Jean-Luc Hainaut. Inverse Wrappers for Legacy Information Systems Migration, in Proceedings of 1st International Workshop on Wrapper Techniques for Legacy Systems, (WCRE’04/WRAP’04), Computer Science Report, Volume 04-34, pages 30-43, Technische Universiteit Eindhoven, 2004. [description] [full text]
[TR03-01] Jean-Luc Hainaut, Introduction aux SGBD CODASYL DBTG 71, DB-MAIN Technical report, September 2003, 60 pages, [full text]
[T02-01] Jean-Luc Hainaut, First steps in Database design, Technical report, 2002 [full text]
[T02-02] Jean-Luc Hainaut, Introduction to database design, Technical report, 2002 [full text]
[B02-02] Jean-Luc Hainaut, Introduction to Database Reverse Engineering, May 2002 [full text]
[R02-01] Jean-Luc Hainaut, UML ou ERA : quel modèle pour l'analyse de l'information ?, Technical report, 2002 [full text]
[P01-03] Philippe Thiran and Jean-Luc Hainaut. Wrapper Development for Legacy Data Reuse, in Proceedings of the 8th Working Conference on Reverse Engineering, (WCRE’01), pages 198-207, IEEE Computer Society, 2001. [description] [full text]
[P01-02] Virginie Detienne and Jean-Luc Hainaut. CASE Tool Support for Temporal Database Design, in Proceedings of the 20th International Conference on Conceptual modeling, (ER’01), Lecture Notes in Computer Science, Volume 2224, pages 208-224, Springer-Verlag, 2001. [description] [full text]
[P97-03] Jean-Luc Hainaut, Jean Henrard, Jean-Marc Hick, Didier Roland and Vincent Englebert. Contribution to the Reverse Engineering of OO Applications - Methodology and Case Study, in Proceedings of the IFIP TC2/WG2.6 Seventh Conference on Database Semantics, (DS-7), IFIP Conference Proceedings, Volume 124, pages 131-161, Chapman and Hall, 1997. [description] [full text]
[P96-08] Jean-Luc Hainaut. Specification preservation in schema transformations - Application to semantics and statistics, Data and Knowledge Engineering, 16(1): Elsevier Science Publish., 1996. [description] [full text]
[P96-05] Jean-Luc Hainaut, Jean-Marc Hick, Vincent Englebert, Jean Henrard and Didier Roland. Understanding implementations of IS-A Relations, in Proceedings of 15th International Conference on Conceptual Modeling, (ER’96), Lecture Notes in Computer Science, Volume 1157, pages 42-57, Springer-Verlag, 1996. [description] [full text]
[P96-03] Jean-Luc Hainaut, Didier Roland, Vincent Englebert, Jean-Marc Hick and Jean Henrard. Database Reverse Engineering - A Case Study, in Actes du 2ème Forum International d’Informatique Appliquée, ESIG, 1996. [description] [full text]
[B94] Jean-Luc Hainaut. Bases de données et modèles de calcul - Outils et méthodes pour l'utilisateur, InterEditions (Dunod), Paris, 1994. | [book description]
[P90-02] Jean-Luc Hainaut. Systèmes d’aide à la décision : une approche méthodologique intégrée, in Actes du congrès INFORSID 1990, pages 7-34, Eyrolles-Afcet, 1990. [description] [full text].
[P90-01] Jean-Luc Hainaut. Entity-Relationship models : formal specification and comparison, in Proceedings of the 9th International Conference on the Entity-Relationship Approach (ER’90), pages 53-64, ER Institute, 1990. [description] [full text]
[P89-01] Jean-Luc Hainaut. A Generic Entity-Relationship Model, in Proceedings of the IFIP WG 8.1 Conference on Information System Concepts: an in-depth analysis, pages 109-138, North-Holland, 1989. [description] [full text]
[NDBS-86] Jean-Luc Hainaut. NDBS - A simple database system for small computers. Reference manual. University of Namur, 1986-1996.
[B86] Jean-Luc Hainaut, Conception assistée des applications informatiques - Conception de la base de données, Masson, 1986. [description]
[B83] François Bodart et Yves Pigneur, Conception assistée des applications informatiques - Méthodes, méthodes, outils. Masson, 1983. 2e édition en 1989.
[P81-01] Yves Delvaux and Jean-Luc Hainaut. Système portable de manipulation de bases de données, in Actes du congrès AFCET 1981, pages 385-395, Editions Hommes et Techniques, 1981. [description] [full text]
[P78-01] Baudouin Charlier. Quelques réflexions concernant les modèles et langages de bases de données, in Actes des communications de l'Ecole d'été 1978 de l'AFCET, pages 176-185, Publication FUNDP, 1978. [full text]
[R78-01] Jean-Luc Hainaut, Baudouin Le Charlier, et al., Système de conception et d'exploitation de bases de données - Volume 1 : Modèles et Langages. Rapport final du projet CIPS I2/15, Institut d'informatique, Université de Namur, 1978
[R78-02] Jean-Luc Hainaut, Baudouin Le Charlier, et al., Système de conception et d'exploitation de bases de données - Volume 2 : Manuel de référence des langages. Rapport final du projet CIPS I2/15, Institut d'informatique, Université de Namur, 1978
[R78-03] Jean-Luc Hainaut, Baudouin Le Charlier, et al., Système de conception et d'exploitation de bases de données - Volume 3 : Une implémentation du modèle d'accès. Rapport final du projet CIPS I2/15, Institut d'informatique, Université de Namur, 1978
[R78-04] Jean-Luc Hainaut, Baudouin Le Charlier, et al., Système de conception et d'exploitation de bases de données - Volume 4 : Le système SPHINX, Utilisation, fonctionnement et description interne. Rapport final du projet CIPS I2/15, Institut d'informatique, Université de Namur, 1978
[R78-05] Jean-Luc Hainaut, Baudouin Le Charlier, et al., Système de conception et d'exploitation de bases de données - Volume 5 : Exemples d'application. Rapport final du projet CIPS I2/15, Institut d'informatique, Université de Namur, 1978
[P77-01] Jean-Luc Hainaut. Some Tools for Data Independence in Multilevel Data Base Systems, in Proceedings of the IFIP WC on Modelling in Data Base Management Systems, pages 187-211, North-Holland, 1977. [description] [full text]
[P76-02] Jean-Luc Hainaut. Evaluation des performances d’une base de données par modèle probabiliste, in Cahier INFORSID, actes de la conférence sur la représentation des systèmes d’information : Maquette, modèle et prototype, volume 2, pages 177-221, Public. IRIA, 1976. [description] [full text]
[P74-04] Claude Deheneffe, Jean-Luc Hainaut and Hubert Tardieu. The Individual Model, in Proceedings of the International Workshop on Data Structure Models for Information Systems, pages 89-118, Presses Universitaires de Namur, 1974. [description] [full text]
[P74-02] Claude Deheneffe, Henri Hennebert and Walter Paulus. A Relational Model for a Data Base, in Proceedings of the IFIP congress 74, pages 1022-1025, North-Holland, 1974. [description] [full text]
[P74-01] Jean-Luc Hainaut and Baudouin Charlier. An Extensible Semantic Model of Data Base and Its Data language, in Proceedings of the IFIP Congress 74, pages 1026-1030, North-Holland, 1974. [description] [full text]

<Back to Themes & Resources page>

Outils personnels