From 596731dcf8ba4ad4bce42a5924dba53a74eaba62 Mon Sep 17 00:00:00 2001 From: "Thomas G. Lockhart" Date: Wed, 29 Jul 1998 06:07:54 +0000 Subject: [PATCH] Information moved to doc/src/sgml/geqo.sgml. --- doc/README.GEQO | 160 -------------------------------------------------------- doc/TODO.GEQO | 59 --------------------- 2 files changed, 219 deletions(-) delete mode 100644 doc/README.GEQO delete mode 100644 doc/TODO.GEQO diff --git a/doc/README.GEQO b/doc/README.GEQO deleted file mode 100644 index 6d8a3a353f..0000000000 --- a/doc/README.GEQO +++ /dev/null @@ -1,160 +0,0 @@ - -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* -Genetic Query Optimization in Database Systems -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* - - Martin Utesch - - - - Institute of Automatic Control - University of Mining and Technology - Freiberg, Germany - - 02/10/1997 - - -1.) Query Handling as a Complex Optimization Problem -==================================================== - - Among all relational operators the most difficult one to process and -optimize is the JOIN. The number of alternative plans to answer a query -grows exponentially with the number of JOINs included in it. Further -optimization effort is caused by the support of a variety of *JOIN -methods* (e.g., nested loop, index scan, merge join in Postgres) to -process individual JOINs and a diversity of *indices* (e.g., r-tree, -b-tree, hash in Postgres) as access paths for relations. - - The current Postgres optimizer implementation performs a *near- -exhaustive search* over the space of alternative strategies. This query -optimization technique is inadequate to support database application -domains that evolve the need for extensive queries, such as artifcial -intelligence. - - The Institute of Automatic Control at the University of Mining and -Technology Freiberg, Germany encountered the described problems as its -folks wanted to take the Postgres DBMS as the backend for a decision -support knowledge based system for the maintenance of an electrical -power grid. The DBMS needed to handle large JOIN queries for the -inference machine of the knowledge based system. - - Performance difficulties within exploring the space of possible query -plans arose the demand for a new optimization technique being developed. - - In the following we propose the implementation of a *Genetic -Algorithm* as an option for the database query optimization problem. - - -2.) Genetic Algorithms (GA) -=========================== - - The GA is a heuristic optimization method which operates through -determined, randomized search. The set of possible solutions for the -optimization problem is considered as a *population* of *individuals*. -The degree of adaption of an individual to its environment is specified -by its *fitness*. - - The coordinates of an individual in the search space are represented -by *chromosomes*, in essence a set of character strings. A *gene* is a -subsection of a chromosome which encodes the value of a single parameter -being optimized. Typical encodings for a gene could be *binary* or -*integer*. - - Through simulation of the evolutionary operations *recombination*, -*mutation*, and *selection* new generations of search points are found -that show a higher average fitness than their ancestors. - - According to the "comp.ai.genetic" FAQ it cannot be stressed too -strongly that a GA is not a pure random search for a solution to a -problem. A GA uses stochastic processes, but the result is distinctly -non-random (better than random). - -Structured Diagram of a GA: ---------------------------- - -P(t) generation of ancestors at a time t -P''(t) generation of descendants at a time t - -+=========================================+ -|>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<| -+=========================================+ -| INITIALIZE t := 0 | -+=========================================+ -| INITIALIZE P(t) | -+=========================================+ -| evalute FITNESS of P(t) | -+=========================================+ -| while not STOPPING CRITERION do | -| +-------------------------------------+ -| | P'(t) := RECOMBINATION{P(t)} | -| +-------------------------------------+ -| | P''(t) := MUTATION{P'(t)} | -| +-------------------------------------+ -| | P(t+1) := SELECTION{P''(t) + P(t)} | -| +-------------------------------------+ -| | evalute FITNESS of P''(t) | -| +-------------------------------------+ -| | t := t + 1 | -+===+=====================================+ - - -3.) Genetic Query Optimization (GEQO) in PostgreSQL -=================================================== - - The GEQO module is intended for the solution of the query -optimization problem similar to a traveling salesman problem (TSP). -Possible query plans are encoded as integer strings. Each string -represents the JOIN order from one relation of the query to the next. -E. g., the query tree /\ - /\ 2 - /\ 3 - 4 1 is encoded by the integer string '4-1-3-2', -which means, first join relation '4' and '1', then '3', and -then '2', where 1, 2, 3, 4 are relids in PostgreSQL. - - Parts of the GEQO module are adapted from D. Whitley's Genitor -algorithm. - - Specific characteristics of the GEQO implementation in PostgreSQL -are: - -o usage of a *steady state* GA (replacement of the least fit - individuals in a population, not whole-generational replacement) - allows fast convergence towards improved query plans. This is - essential for query handling with reasonable time; - -o usage of *edge recombination crossover* which is especially suited - to keep edge losses low for the solution of the TSP by means of a GA; - -o mutation as genetic operator is deprecated so that no repair - mechanisms are needed to generate legal TSP tours. - - The GEQO module gives the following benefits to the PostgreSQL DBMS -compared to the Postgres query optimizer implementation: - -o handling of large JOIN queries through non-exhaustive search; - -o improved cost size approximation of query plans since no longer - plan merging is needed (the GEQO module evaluates the cost for a - query plan as an individual). - - -References -========== - -J. Heitk"otter, D. Beasley: ---------------------------- - "The Hitch-Hicker's Guide to Evolutionary Computation", - FAQ in 'comp.ai.genetic', - 'ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html' - -Z. Fong: --------- - "The Design and Implementation of the Postgres Query Optimizer", - file 'planner/Report.ps' in the 'postgres-papers' distribution - -R. Elmasri, S. Navathe: ------------------------ - "Fundamentals of Database Systems", - The Benjamin/Cummings Pub., Inc. - diff --git a/doc/TODO.GEQO b/doc/TODO.GEQO deleted file mode 100644 index 3e5b9f4f76..0000000000 --- a/doc/TODO.GEQO +++ /dev/null @@ -1,59 +0,0 @@ - -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= -* Things left to done for the PostgreSQL * -= Genetic Query Optimization (GEQO) = -* module implementation * -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= -* Martin Utesch * Institute of Automatic Control * -= = University of Mining and Technology = -* utesch@aut.tu-freiberg.de * Freiberg, Germany * -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= - - -1.) Basic Improvements -=============================================================== - -a) improve freeing of memory when query is already processed: -------------------------------------------------------------- -with large JOIN queries the computing time spent for the genetic query -optimization seems to be a mere *fraction* of the time Postgres -needs for freeing memory via routine 'MemoryContextFree', -file 'backend/utils/mmgr/mcxt.c'; -debugging showed that it get stucked in a loop of routine -'OrderedElemPop', file 'backend/utils/mmgr/oset.c'; -the same problems arise with long queries when using the normal -Postgres query optimization algorithm; - -b) improve genetic algorithm parameter settings: ------------------------------------------------- -file 'backend/optimizer/geqo/geqo_params.c', routines -'gimme_pool_size' and 'gimme_number_generations'; -we have to find a compromise for the parameter settings -to satisfy two competing demands: -1. optimality of the query plan -2. computing time - -c) find better solution for integer overflow: ---------------------------------------------- -file 'backend/optimizer/geqo/geqo_eval.c', routine -'geqo_joinrel_size'; -the present hack for MAXINT overflow is to set the Postgres integer -value of 'rel->size' to its logarithm; -modifications of 'struct Rel' in 'backend/nodes/relation.h' will -surely have severe impacts on the whole PostgreSQL implementation. - -d) find solution for exhausted memory: --------------------------------------- -that may occur with more than 10 relations involved in a query, -file 'backend/optimizer/geqo/geqo_eval.c', routine -'gimme_tree' which is recursively called; -maybe I forgot something to be freed correctly, but I dunno what; -of course the 'rel' data structure of the JOIN keeps growing and -growing the more relations are packed into it; -suggestions are welcome :-( - - -2.) Further Improvements -=============================================================== -Enable bushy query tree processing within PostgreSQL; -that may improve the quality of query plans. -- 2.11.0