Descending denormalization refers to the process of adding non-key attributes of a parent relationship to a child relationship in order to eliminate the join operation. This method is used when it is often necessary to display the non-key attribute of the parent relationship and the attributes of the child relationship [1].
The process of restructuring database relations by the method of downward denormalization can be described by the following formulas [2]:
where - the relation
after restructuring,
A – set of relations attributes,
– attribute added to relation
,
- set of
attribute values,
D – a set of hierarchical links between relations,
F – a set of functional dependencies between attributes,
B – set of non-key relations attributes,
– foreign key defining the dependency
.
To obtain information containing attributes from the relations R and S, it is necessary to execute a query of the form:
,
where L(b) – subset of non-key attributes of the relation R,
L(S) – a subset of the attributes of the relation S,
– the primary key of the relation R,
– the foreign key of the relationship S to determine its relationship with the relationship R.
After restructuring by decending denormalization, the query to get the same information looks like this:
where – subset of non-key attributes of the relation R, duplicated in relation to S’.
For denormalization to be effective the inequality must be satisfied
where C1 – the cost of executing the modified request,
C2 – the cost of executing the original request.
The cost refers to the number of blocks that must be read from the disk. Thus, C1 will be equal to the number of blocks that the denormalized ratio occupies [3].
The hash join is performed in two steps. In the first step, called linking, both relationships are partitioned according to some hash function that gives uniformly distributed random values. In this case, each equivalent section of the connected relations must have the same values of the connected attributes. At the second stage, called the breakdown, each of the sections of one of the relations is read (as a rule, a smaller section is selected), and matches are found in the equivalent section of the second relation [4, p. 755].
If we assume that the partition does not overflow during the build stage and the entire hash table is placed in RAM, then the cost of the hash join operation can be calculated using the following formula:

This formula takes into account the need to read the relations R and S when they are partitioned, write each of the resulting partitions to disk, and re-read each of the sections of the relations R and S to find the corresponding rows.
If the partitioning of relations cannot be performed in a single pass (, where buff – the size of the buffer for performing the hash join operation), a recursive partitioning algorithm must be used. In this case, the cost estimate can be expressed using the formula:
To confirm the findings, an experiment was conducted using a database designed for statistical accounting in a bookstore. The database contains the tables “Books” (R) and “Customers” (S). The first table stores data about the book: title, author, number of pages, and cost. The second one contains information about customers. The second table includes a foreign key to link to the first one. Periodically, a report is compiled for, the indicators of which are determined for all buyers and separately for those who purchased books of a value higher than a certain one. To calculate the number of customers, it will be necessary to join the two tables. When you include the “cost” (b) column in the “Customers” table, the join operation is no longer required.
The experiment was performed with a different number of tuples in the “Customers” table: 20000, 40000, and 60000 tuples. The results were averaged over 10 experiments in Oracle DB, PostgreSQL, and MySQL.
Table 1 – The dependence of the query execution time on the number of tuples of relations
DBMS
Number of tuples |
Oracle |
PostgreSQL | MySQL | |||
Norm | Denorm | Norm | Denorm | Norm |
Denorm |
|
20000 | 30ms | 23ms | 44ms | 26ms | 385ms | 165ms |
40000 | 45ms | 33ms | 56ms | 36ms | 438ms | 219ms |
60000 | 55ms | 44ms | 78ms | 49ms | 481ms | 272ms |
References
- Designing data Warehouses for Business Awareness Systems applications: an online lecture course. [Electronic resource]. – Access mode: URL: https://intuit.ru/studies/higher_education/3406/courses/455/info (20.05.2021).
- Kungurtsev A. Zinovatnaya S. The model of restructurization the relational database by using scheme denormalization of relations. // Proceedings of the Odessa Polytechnic University. – 2006. – № 2(22). – С. 105-111.
- Kungurtsev A. Zinovatnaya S. Analysis of method of descending denormalization introduction database restructurization advisability // Proceedings of the Odessa Polytechnic University. – 2006. – № 1(25). – С. 104-109.
- Connolly T. Begg C. Database systems. A Practical Approach to Design, Implementation, and Management. 3rd edition. : Eng trans..– M.: Publishing house “Williams”. – 2003. – 1440 с.