INVESTIGATION OF THE EFFECTIVENESS OF USING THE METHOD OF DESCENDING DENORMALIZATION IN ORDER TO ELIMINATE THE HASH JOIN

Levitskii Andrei Aleksandrovich
Sevastopol State University

Abstract
This article discusses one of the methods for restructuring database relations - descending denormalization. An algorithm for calculating the cost of performing a join operation by a hash and a condition under which denormalization leads to an increase in query performance are given. Comparisons of the efficiency of the denormalization method in various database management systems are made.

Keywords: DBMS, descending denormalization, join methods, relational database, SQL query

Category: 05.00.00 Technical sciences

Article reference:
Levitskii A.A. Investigation of the effectiveness of using the method of descending denormalization in order to eliminate the hash join // Modern scientific researches and innovations. 2021. № 5 [Electronic journal]. URL: https://web.snauka.ru/en/issues/2021/05/95533

View this article in Russian

Descending denormalization refers to the process of adding non-key attributes of a parent relationship to a child relationship in order to eliminate the join operation. This method is used when it is often necessary to display the non-key attribute of the parent relationship and the attributes of the child relationship [1].

The process of restructuring database relations by the method of downward denormalization can be described by the following formulas [2]:

where - the relation after restructuring,
A – set of relations attributes,
– attribute added to relation ,
- set of attribute values,
D – a set of hierarchical links between relations,
F – a set of functional dependencies between attributes,
B – set of non-key relations attributes,
– foreign key defining the dependency .

To obtain information containing attributes from the relations R and S, it is necessary to execute a query of the form:

,
where L(b) – subset of non-key attributes of the relation R,
L(S) – a subset of the attributes of the relation S,
– the primary key of the relation R,
– the foreign key of the relationship S to determine its relationship with the relationship R.

After restructuring by decending denormalization, the query to get the same information looks like this:

where – subset of non-key attributes of the relation R, duplicated in relation to S’.

For denormalization to be effective the inequality must be satisfied

where C₁ – the cost of executing the modified request,
C₂ – the cost of executing the original request.
The cost refers to the number of blocks that must be read from the disk. Thus, C₁ will be equal to the number of blocks that the denormalized ratio occupies [3].
The hash join is performed in two steps. In the first step, called linking, both relationships are partitioned according to some hash function that gives uniformly distributed random values. In this case, each equivalent section of the connected relations must have the same values of the connected attributes. At the second stage, called the breakdown, each of the sections of one of the relations is read (as a rule, a smaller section is selected), and matches are found in the equivalent section of the second relation [4, p. 755].
If we assume that the partition does not overflow during the build stage and the entire hash table is placed in RAM, then the cost of the hash join operation can be calculated using the following formula:

This formula takes into account the need to read the relations R and S when they are partitioned, write each of the resulting partitions to disk, and re-read each of the sections of the relations R and S to find the corresponding rows.
If the partitioning of relations cannot be performed in a single pass (, where buff – the size of the buffer for performing the hash join operation), a recursive partitioning algorithm must be used. In this case, the cost estimate can be expressed using the formula:

To confirm the findings, an experiment was conducted using a database designed for statistical accounting in a bookstore. The database contains the tables “Books” (R) and “Customers” (S). The first table stores data about the book: title, author, number of pages, and cost. The second one contains information about customers. The second table includes a foreign key to link to the first one. Periodically, a report is compiled for, the indicators of which are determined for all buyers and separately for those who purchased books of a value higher than a certain one. To calculate the number of customers, it will be necessary to join the two tables. When you include the “cost” (b) column in the “Customers” table, the join operation is no longer required.
The experiment was performed with a different number of tuples in the “Customers” table: 20000, 40000, and 60000 tuples. The results were averaged over 10 experiments in Oracle DB, PostgreSQL, and MySQL.

Table 1 – The dependence of the query execution time on the number of tuples of relations

DBMS Number of tuples	Oracle		PostgreSQL		MySQL
DBMS Number of tuples	Norm	Denorm	Norm	Denorm	Norm	Denorm
20000	30ms	23ms	44ms	26ms	385ms	165ms
40000	45ms	33ms	56ms	36ms	438ms	219ms
60000	55ms	44ms	78ms	49ms	481ms	272ms

References

Designing data Warehouses for Business Awareness Systems applications: an online lecture course. [Electronic resource]. – Access mode: URL: https://intuit.ru/studies/higher_education/3406/courses/455/info (20.05.2021).
Kungurtsev A. Zinovatnaya S. The model of restructurization the relational database by using scheme denormalization of relations. // Proceedings of the Odessa Polytechnic University. – 2006. – № 2(22). – С. 105-111.
Kungurtsev A. Zinovatnaya S. Analysis of method of descending denormalization introduction database restructurization advisability // Proceedings of the Odessa Polytechnic University. – 2006. – № 1(25). – С. 104-109.
Connolly T. Begg C. Database systems. A Practical Approach to Design, Implementation, and Management. 3rd edition. : Eng trans..– M.: Publishing house “Williams”. – 2003. – 1440 с.

All articles of author «Левицкий Андрей Александрович»

Contact author (comments/reviews)

Write comment

You must authorise to write a comment.

Если Вы еще не зарегистрированы на сайте, то Вам необходимо зарегистрироваться:

For Authors

About journal

INVESTIGATION OF THE EFFECTIVENESS OF USING THE METHOD OF DESCENDING DENORMALIZATION IN ORDER TO ELIMINATE THE HASH JOIN

Contact author (comments/reviews)

Write comment