Friday, June 30, 2006
Summarizing Numerous Rows
It is often the case that web applications display a paginated resultset. The most familiar example is search engine results. However, the usefulness of the results is in providing both detail and distinctness. SELECT DISTINCT isn't sufficient in this case because collapsed rows cannot be distinguished. Furthermore, it would be advantageous to have a summarizing threshold.
For this reason, I've created an example query which summarizes strictly more than 30 rows. This query uses the world database to display continents and countries. The query summarizes continents with more than 30 countries. A flag is provided so that an application can identify and display the summarized rows differently:
(SELECT continent, NULL AS country, 1 AS more FROM country GROUP BY continent HAVING COUNT(continent)>30) UNION (SELECT c0.continent, c0.name, 0 AS more FROM country AS c0, (SELECT continent FROM country GROUP BY continent HAVING COUNT(continent)<=30) AS c1 WHERE c0.continent=c1.continent) ORDER BY continent ASC, country ASC;
The query works by making a UNION of abbreviated and detailed results. Detailed results are obtained from the unabbreviated result set joined with the details.
For web applications, pagination can be performed appending LIMIT <offset>, <quantity> . However, it requires the entire result set to be obtained and sorted. ORDER BY for large result sets is relatively processor intensive. However, it ensures that output remains deterministic after additional INSERTs Therefore, it could be omitted.
If you're trying to generate output similar to search engines then be aware that search engines use a collection of tricks to significantly reduce sort and pagination overhead.