Lewis Chapter 05: Designing Relational Database Systems
Metadata
Title: Designing Relational Database Systems
Number: 5
Book: CO2209: Database Systems
Core Ideas
Presents a ‘top-down’ approach to designing a database system.
We start with conceptual modelling or semantic design, avoiding implementation details and focussing on the real-life information that must be modelled.
Then we turn to logical design, translating the conceptual model to one that it is implementable according to relational theory.
Finally we turn to physical design, the DBMS to be used and the extensions of the logical model it enables (indices, caches etc).
Conceptual Modelling
Conceptual design is carried out within a theory or semantic model - a finite set of concepts by which a real life system can be represented.
The most popular semantic model in relational db design is the Entity-Relationship model, in which there are just three concepts:
Entities - a thing we want to model that can be uniquely identified and that has some kind of independent existence.
Attributes - information that describes an aspect of an entity.
Relationship - A connection or dependency between two entities.
We need to distinguish between entity types (a class, eg Planet) and entity instances (an object, eg Mars). At this level we focus on types.
Weak entities are those whose existence depends on the continued existence of other entities. EG a customer’s bank account depends on the customer.
Strong entities are those that aren’t weak. Entries will not be invalidated by the removal of another entity.
The entity is defined by its set of attributes. An attribute is defined on a domain which is the set of values from which the attribute can draw.
There are a number of characterisations possible for attibutes:
simple vs composite, a simple attribute has no internal structure - it is atomic or scalar (all values in relational theory). A composite attribute has an internal structure that can be broken down into further attributes, themselves simple or composite.
single or multi-valued, a single-valued attribute is one that takes one value from the associated domain (like date of birth). A multi-valued attribute has more than one value, eg email addresses belonging to a person, ERD has these as a double oval.
Base or derived, a derived attribute can be deduced from a set of attributes already in the system. Indicated in the ERD by a dotted or dashed border line for the ellipse. A base attribute cannot be derived from other attributes, and is raw information for the system.
Primary key, the attribute or attributes that uniquely identify any instance of the type. Underlined in the ERD.
Relationships also have types and instances. A relationship type is defined on entity types. Entities involved in a relationship are participants.
The diamond in the ERD is a double if one of the entities in the relationship is weak.
Most relationships are binary, involving two entity types. They may also be unary or recursive involving one entity type. Or even ternary involving three (eg Student, Tutor, Course).
Relations can have attributes, although some prefer to not allow this, and place an extra entity in the model instead.
If all instances of an entity type must have a given relationship, that entity’s participation in teh relationship is total, if it is not required, its participation is partial. Total participation can be denoted in an ERD with a double line.
For cardinality, we can sometimes use (<min>,<max>) instead of just 1 or n to indicate the cardinality for example (0,1), (1,10) or (5,n).
Type Hierarchies
Entity Types can have hierarchical relations. Take Student and Person. All entities of type Student are of type Person. Person is, accordingly a supertype of Student, and Student a subtype of Person.
In a simple model in which a type can only have one immediate supertype, then the subtype inherits all properties of its supertype, and participate in all the supertype’s relationships.
Fan Traps
We need to be wary of fan traps, where indirect relationships stand in for direct relationships. For example, if a Faculty includes departments, and it employs tutors. If there is no direct relationship between the tutor and the department we have no way of knowing which department houses the tutor. We need to instead flip the relationship so that the Department employs the tutor instead.
Chasm Traps
But that might give rise to another problem, what if the tutor was employed direct by the faculty? Now we have a chasm trap, since the department’s participation is only partial. A chasm trap occurs when:
Two entities are linked by a pathway that travels through at least one other entity; and
Participation of at least one of the entities on the path is partial for at least one of the relalationships.
Relational Model
Once we have an ER model we need to translate it to a relational model. Lots of ways to do this.
Entities
Entities are mapped to base relations in the relational model.
All of the entity’s simple attributes map to attributes in the base relation.
Compound attributes should be turned into simple attributes.
Primary key of the entity is primary key of the relation.
If the entity is weak it must contain a foreign key to the strong entity on which it depends, which should be part of the primary key.
Relationships
Mapping relationships is trickier.
For 1:1 relationships, if participation is partial on both sides, the choice of which entity will carry the foreign key is arbitrary. If participation of one entity is total, the other partial, the total participation entity has the foreign key, and can be constrained to ban null values. If participation on both sides is total consider whether they can be merged to one table, if not choice of which will carry the foreign key is arbitrary.
For 1:n relationships, the ‘many’ entity carries the foreign key referencing the single entity.
Type Hierachies
How we represent type hierarchies depends on whether the subtypes are disjoint, ie can an entity belong to multiple subtypes.
If disjoint, each subtype should have a base relation with all attributes that would have been inherited.
If not, the parent type should also be represented as a base relation, with the primary key of the parent acting as a foreign key of the subtypes.
Preparing the ER Model
Preliminary steps to prepare the model include:
flatten composite attributes. Perhaps just attach the simple attributes directly, or if really needed represent the compound attribute as an entity in its own right.
eliminate multi-valued attributes. An attribute that has a 1:n relationship to the parent entity (like email address), it has to be converted to an entity in its won right.
eliminate derived attributes, derived attributes should be removed from the model.
eliminate relations with attributes, if a relation itself has attributes, it needs to be converted to an entity.#
reduce complex relationships - relationships with 3 or more entities must be converted to binary. The relationship might be turned into an entity, or each implied binary relationship can be treated separately.
eliminate m:n relationships - Through a link table.
After all this we still have to go through normalisation which is treated in the next section.