<b>Loss of Information</b>

Next: Normalization Using Functional Up: Pitfalls in Relational Previous: Representation of Information

Loss of Information

The previous example might seem to suggest that we should decompose schemes as much as possible.
This too can lead to a bad design.
Consider a design where Borrow-scheme is decomposed into two schemes
We construct our new relations from borrow by:
It appears that we can reconstruct the borrow relation by performing a natural join on the new schemes amt and loan.
Figure 6.5 shows what we get by computing amt loan.
We notice that there are tuples in amt loan that are not in borrow.
How did this happen?
- The intersection of the two schemes is amount, so the natural join is made on the basis of equality in the loan amount.
- If two loans are for the same amount, there will be four tuples in the natural join.
- Two of these tuples will be spurious - they will not appear in the original borrow relation, and should not appear in the database.
- Although we have more tuples in amt loan, we have less information.
- Because of this, we call this a lossy or lossy-join decomposition.
- A decomposition that is not lossy-join is called a lossless-join decomposition.
- The only way we could make a connection between loan and amt was through amount.
When we decomposed Lending-scheme into Borrow-scheme and Branch-scheme earlier, we did not have a similar problem. Why not?
- The only way we could represent a relationship between tuples in the two relations was through bname.
- This did not cause problems.
- For a given branch name, there is exactly one assets value and branch city.
The branch associated with a loan depends on the customer, not on the amount of the loan (which is not unique).
We'll make a more formal definition of lossless-join:
- Let be a relation scheme.
- A set of relation schemes is a decomposition of if
- That is, every attribute in appears in at least one for .
- Let be a relation on , and let
- That is, is the database that results from decomposing into .
- It is always the case that:
- To see why this is, consider a tuple .
  - When we compute the relations , the tuple gives rise to one tuple in each .
  - These tuples combine together to regenerate when we compute the natural join of the .
  - Thus every tuple in appears in .
- However, in general,
- We saw an example of this inequality in our decomposition of borrow into amt and loan.
- In order to have a lossless-join decomposition, we need to impose some constraints on the set of possible relations.
- Let represent a set of constraints on the database.
- A decomposition of a relation scheme is a lossless-join decomposition for if, for all relations on scheme that are legal under :
In other words, a lossless-join decomposition is one in which, for any legal relation , if we decompose and then ``recompose'' , we get what we started with - no more and no less.