How Tables are arranged in JOINS

I often get asked how does an optimizer (being in Oracle or MySQL) decide which would be the outer table in a join. Consider the following three tables: T1 (100 rows), T2 (10 rows) and T3 (50000 rows).

If I have a query which joins T1->T2->T3 the optimizer will not join T1 with T2 and then joining the result set with T3. Instead, it will choose the table with the lowest number of rows, in this case T2, and then use it as the outermost table.

The join will then be T2->T1->T3 (note that the order T1->T3 or T3->T1 will not make a difference if they are related to each other as I will explain later on).

But why does the optimizer take the smallest table as the outer table? Very simple if you think about it. In a join, there will always be a full scan of all the rows in the outermost table because the query needs to look at all the rows in this table and the result will be used as a filter to the next table. Thus you can easily optimize the inner tables by using indexes on the joined columns. However the full scan on the outermost table is inevitable, so the optimizer decides to use the one with the least amount of rows.

In some cases the outermost table would still be very large. What happens in this case? In MySQL you can try to use a covering index to include all the columns referenced in the query for the outermost table - in this manner MySQL will do a full index scan (Type: Index in the Explain Plan) rather than retrieving the data from the table.

What does this mean in practical query tuning? This means that before you start implementing indexes on every column of all the joined tables, first you need to check the explain plan and the order of the join as per the optimizer. Then start your analysis and implement indexes accordingly.

No comments: