What is a line in SQL

Query top N lines

Top-N queries are queries that limit the result to a certain number of lines. These are often queries for the most recent or "best" entries. For an efficient execution of these queries, the ranking must be carried out using pipelined.

The easiest way to load only the first lines of a query is to load only the first lines and cancel the execution by the application. However, the optimizer cannot anticipate this when it creates the execution plan. In order to choose the best execution plan, however, the optimizer needs to know whether all rows will ultimately be loaded. After all, in this case a full table scan with explicit sorting can be the best option. However, if only the first ten lines are required, pipelined is often better - although the database has to read each line individually. So the optimizer needs to know from the start whether all the lines are needed to create the best execution plan.

tip

Tell the database if you don't need all of the rows.

The SQL standard has long ignored this need. The corresponding extension () was only introduced with SQL: 2008 and is available in IBM DB2, PostgreSQL, SQL Server (from 2012) and Oracle (from 12c). On the one hand, this is due to the fact that this extension is a "non-core feature". On the other hand, however, also because the individual databases offer their own solutions that have been established for many years.

The following examples show the respective syntax to query the ten most recent sales from the table. The basis is always the same: the listing all Sales starting with the newest. The respective Top-N addition only aborts the execution as soon as ten lines have been loaded.

DB2

DB2 has supported the syntax at least since version 9 (LUW and zOS).

The proprietary limit keyword has been supported (required) since DB2 LUW 9.7.

MySQL

With MySQL and PostgreSQL you can limit the number of lines you want using the clause.

Oracle

The Oracle database supports the extension since version 12c. With older versions you have to use the pseudo-column with which each line is numbered. An additional filter can be used to formulate a corresponding filter.

PostgreSQL

PostgreSQL supports the extension since version 8.4. The previously used syntax (analogous to MySQL) can still be used with current versions.

SQL Server

With SQL Server you can limit the number of lines by adding:

As of Release 2012, SQL Server will implement the extension.

The special thing about these SQL queries is that the databases recognize them as top N queries.

Important

The database can only optimize a query for a partial result if it knows this from the start.

If the database knows that only ten rows will be loaded, it may prefer a pipelined one.

DB2

The Top-N behavior cannot be read off immediately in a DB2 execution plan if no sort operation is necessary (otherwise the -View shows it in brackets during the sort operation:)

In this specific example, however, one can assume that it is a Top-N query, as there is a sudden drop in the row estimate that cannot be explained by filter predicates (the Predicate Information section of the execution plan is empty).

Oracle

The execution plan shows the planned demolition through the operation. This means that the top N query was recognized.

The correct use of the appropriate syntax is only half the battle. The execution can only be canceled efficiently if the underlying operations are carried out "on the assembly line". This means that the clause must be covered by an index. In the concrete example this is the index on. This eliminates the explicit sorting and the database can output the rows directly in the index order. So only the lines are read that are actually output.

Important

A top-N query that runs “on the assembly line” doesn't need to read and sort all of the data.

If there is no index that can be used for a pipelined, the database must read and sort the table completely. The result can only be output after the last line has been read from the table.

DB2
Oracle

This execution plan essentially corresponds to the variant that the query is canceled by the application. However, the use of the Top-N syntax is somewhat more efficient, since the database does not have to cache the entire result, but only the ten most recent entries. The memory requirement is therefore significantly lower. In the execution plan of the Oracle database, this optimization is indicated by the addition in the operation.

The strength of a pipelined Top-N query lies not only in the direct gain in performance, but also in the better scaling. While the response time of a Top-N query without pipelined increases with the table size, the speed with an execution "on the assembly line" only depends on the number of selected rows. In other words, a pipelined top N query is always the same speed - regardless of the table size. Only when the depth of the index tree grows does the execution become slightly slower.

Figure 7.1 shows the performance behavior with increasing data volume. The linear growth of the response time with increasing data volume can be clearly seen in the execution without pipelined. When running "on the assembly line", the response time remains constant.

Figure 7.1 Scaling of top N queries

The execution time of a pipelined Top-N query does not depend on the table size, but increases with the number of selected rows. The response time doubles if you ask twice as many lines. This applies in particular to page queries that reload further results. Entries on the first page must be skipped before the searched entries appear on the second page. But there is also a remedy for this, as the next section shows.