The database designer is responsible for incorporating elements to promote the accuracy and reliability of stored data within the database. There are many different techniques that can be used to encourage data integrity, with some of these dependant on what database technology is being used. This article will concentrate on the different types of data integrity techniques available whilst working with Microsoft Access:
"Every non-null foreign key value must match an existing primary key value"
If a value exists in the foreign key field of a table, then there must be a matching value in the primary key field of the table to which it is related. Referential Integrity is all about preserving the validity of the foreign key values.
Depending on the database product, the database designer will probably need to invoke referential integrity between tables. Usually this will be as part of the table relationship definition, or in a way you are joining tables.
If we take a look at the following example of two tables that are related in a One-To-Many relationship:
In the above, we see that the DepartmentCode field is the foreign key field in the Employees table. With Referential Integrity, we are stating that if a value is entered into the DepartmentCode field in this table, it must match the primary key field (DepartmentCode) in the Departments table.
Referential Integrity is important as it ensures that the database contains valid and usable data and records by preserving the connections between related tables. Without it, the relationships could quickly become meaningless, and results returned via queries on the data would return unreliable results.
Referential Integrity will not allow you to change values in the primary or parent table that would create orphan records in the related or child table.
Note: Orphan record - a record in a related, or child table, with no matching record in the primary or parent table.
As with anything in the real world, things can alter and you will need to ensure that the database can cope with this. Code names such as DepartmentCode will get revised, and departments can close or merge, therefore we need to be able to maintain the data when changes required will violate referential integrity rules.
RDBMS products generally handle these changes through cascading updates and deletes (different products may handle this differently, and have different names and techniques for this). In some database products you may need to create rules or triggers or use an operator.
How does cascading updates work?
In the example tables above, if we change a DepartmentCode in the Departments table (for example we change the IS code to IT), all occurrences of the IS code in the Employees table will automatically change to IT. The change in the Departments table "cascades" to any related tables. This keeps that data valid and in compliance with referential integrity.
If we do not select to cascade updates then the primary key value in the parent table can not be changed if the value is contained in a related record in a child table.
If the primary key field in the parent table is a link field between several tables, then the option must be selected for all related tables or cascade updates will not work.
If we delete a record in the Departments table, then the cascade deletes option will delete all related records in the Employees table that contained the deleted department. Most database products will warn the user before making this (potentially dangerous) deletion.
Access will first delete all related child records and then delete the primary record from the parent table.
If you do not select the cascade deletes option, Microsoft Access will not allow you to delete a record that has related records in another table. In these cases, you would need to delete all related records in the child table(s) and then delete the primary key value from the parent table.
Referential Integrity is enforced at the database level, in that it controls the integrity of the data between tables. As the database designer, you can also do things at both field and table levels to help ensure data integrity. Business rules should be implemented to ensure that the data entered meets the requirements of a particular setting for the database.
Business rules should be documented as they are implemented. This should detail each rule, where and how it is implemented and enforced within the database design. Over time these rules may change, and having each and every rule documented will make it much easier to find and modify the design.
As you implement a rule, it is important that each one is tested. Does the rule give the intended result? What happens when the rule is violated?
Good application design will also give the user feedback (messages) when a rule is broken, and allow them to rectify any changes they were making.
Using Field Properties - Each of the fields that are contained in the database has properties associated with it. These properties may be referred to as elements or attributes of the field. These enable you, as the database designer, to place constraints on the values that may be entered into that field.
Data Types - the most obvious constraint that can be placed on the fields in your database will be done with the selection of a data type for the field. Data types may vary by RDBMS, however in general they will be pretty much the same; usually, you will also be able to create custom data types through code.
As you begin to collect information regarding the design of the database, you will be defining what types of data can, or should be entered into the fields that you define.
Most of these data types can also be restricted further by setting allowable sizes (some may already have default values that can not be changed). Some of the data types may also allow you to define a format, for example the amount of decimal places.
Database fields can be set as required, ensuring that the user is unable to save the record until a value is entered. This is the same as saying whether a field can contain NULL values or not. As the designer, you can also designate whether a field will allow duplicate values; this is usually done in conjunction with setting an index on a field. Primary Key fields automatically have these constraints applied.
Depending on the database product, you may be able to apply further constraints, such as applying input masks on certain fields and data types. For example, there are many ways that a date value could be entered, so applying an input mask will guide the user and ensure that the data is consistently entered in the same format.
A default value property enables you to designate a value that will be entered into a field, unless it is overwritten by another value, for example automatically entering today's date into an OrderPlaced field.
Again, depending on the database product, other types of validation may be available. You could limit the choices of entries into a field by defining a list of possible values allowed. If your business only ever ships products to local counties, you could set up a field list of only those values (defining a domain of allowable values). This technique works best on the field level when the list of values is short; otherwise it is better to implement lookup or validation tables (discussed below).
You can also set up a range of permissible values, for example a discount value of between 10 and 25%.
Field Comparisons - Database tables also have properties that you can use to set a validation rule on records in the table. By doing this, you can set a rule that compares the value of one field in the record to that of another value, in another field, in the same record. This rule is ran before the record is saved.
An example of this would be to compare dates, as part of your business rules. You business may have a rule in place that a OrderDespatchDate must be no more than 3 days after the OrderPlacedDate. The rule would look something like:
OrderDespatchDate <= OrderPlacedDate + 3
If the rule is violated, an error message can be displayed, and the data must be amended before the record can be saved.
A validation table is created to promote data integrity. Normally, a validation table will consist of two fields; one is the primary key, and the other holds the values used by some other field in the database. The validation table normally will hold a static set of values, enabling you to store a master set of values in one location and, by referencing those values instead of entering values directly into a field, you can ensure consistent values are used.
For example, if we have an Orders table that holds the data on the Orders placed with our company, it will probably have a field detailing DespatchVia information:
To ensure that the data in the DespatchVia field were method approved by the company (company business rule), we would add a DespatchVia table to the database as detailed below:
|UPS||United Parcel Service|
The DespatchVia field in the Orders table would then be used as a foreign key and a one-to-many relationship is established between the DespatchVia table and the Orders table. At the application level, it could be set up so that the user would see the values in the DespatchViaDetail field. The actual value stored in the DespatchVia field would be the corresponding value in the DespatchViaCode field of the table.