Next: Equivalence of Expressions
Up: Query Processing
Previous: Query Processing
-
Why do we need to optimize?
- A high-level relational query is generally non-procedural in nature.
- It says ``what'', rather than ``how'' to find it.
- When a query is presented to the system, it is useful to find an
efficient method of finding the answer, using the existing database structure.
- Usually worthwhile for the system to spend some time on strategy
selection.
- Typically can be done using information in main memory, with little
or no disk access.
- Execution of the query will require disk accesses.
- Transfer of data from disk is slow, relative to the speed of
main memory and the CPU
- It is advantageous to spend a considerable amount of processing to save
disk accesses.
-
Do we really optimize?
- Optimizing means finding the best of all possible methods.
- The term ``optimization'' is a bit of a misnomer here.
- Usually the system does not calculate the cost of all
possible strategies.
- Perhaps ``query improvement'' is a better term.
-
Two main approaches:
- Rewriting the query in a more effective manner.
- Estimating the cost of various execution strategies for the query.
Usually both strategies are combined.
- The difference in execution time between a good strategy and a bad one
may be huge.
- Thus this is an important issue in any DB system.
- In network and hierarchical systems, optimization is left for the most
part to the application programmer.
- Since the DML language statements are embedded in the host language, it
is not easy to transform a hierarchical or network query to another one,
unless one has knowledge about the entire application program.
- As a relational query can be expressed entirely in a relational query
language without the use of a host language, it is possible to optimize
queries automatically.
- SQL is suitable for human use, but internally a query should be
represented in a more useful form, like the relational algebra.
-
So, first the system must translate the query into its internal form.
Then optimization begins:
- Find an equivalent expression that is more efficient to execute.
- Select a detailed strategy for processing the query.
(Choose specific indices to use, and order in which tuples are to be
processed, etc.)
-
Final choice of a strategy is based primarily on the number of disk accesses
required.
Next: Equivalence of Expressions
Up: Query Processing
Previous: Query Processing
Osmar Zaiane
Sun Jul 26 17:45:14 PDT 1998