在数据库中如何分类、分组并总计SQL数据[builder.com.cn]

2023-10-31 14:32

本文主要是介绍在数据库中如何分类、分组并总计SQL数据[builder.com.cn],希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

您需要了解如何使用某些SQL子句和运算符来安排SQL数据,从而对它进行高效分析。下面这些建议告诉您如何建立语句,获得您希望的结果。

以有意义的方式安排数据可能是一种挑战。有时您只需进行简单分类。通常您必须进行更多处理——进行分组以利于分析与总计。可喜的是,SQL提供了大 量用于分类、分组和总计的子句及运算符。下面的建议将有助于您了解何时进行分类、何时分组、何时及如何进行总计。欲了解每个子句和运算符的详细信息,请查 看在线书籍 。

#1 :分类排序

通常,我们确实需要对所有数据进行排序。SQL的ORDER BY子句将数据按字母或数字顺序进行排列。因此,同类数据明显分类到各个组中。然而,这些组只是分类的结果,它们并不是真正的组。ORDER BY显示每一个记录,而一个组可能代表多个记录。

#2 :减少组中的相似数据

分类与分组的最大不同在于:分类数据显示(任何限定标准内的)所有记录,而分组数据不显示这些记录。GROUP BY子句减少一个记录中的相似数据。例如,GROUP BY能够从重复那些值的源文件中返回一个唯一的邮政编码列表:

SELECT ZIP

FROM Customers

GROUP BY ZIP

仅包括那些在GROUP BY和SELECT列列表中字义组的列。换句话说,SELECT列表必须与GROUP列表相匹配。只有一种情况例外:SELECT列表能够包含聚合函数。(而GROUP BY不支持聚合函数。)

记住,GROUP BY不会对作为结果产生的组分类。要对组按字母或数字顺序排序,增加一个ORDER BY子句(#1)。另外,在GROUP BY子句中您不能引用一个有别名的域。组列必须在根本数据中,但它们不必出现在结果中。

#3 :分组前限定数据

您可以增加一个WHERE子句限定由GROUP BY分组的数据。例如,下面的语句仅返回肯塔基地区顾客的邮政编码列表。

SELECT ZIP

FROM Customers

WHERE State = 'KY'

GROUP BY ZIP

在GROUP BY子句求数据的值之前,WHERE对数据进行过滤,记住这一点很重要。

和GROUP BY一样,WHERE不支持聚合函数。

#4 :返回所有组

当您用WHERE过滤数据时,得到的组只显示那些您指定的记录。符合组定义但不满足子句条件的数据将不会出现在组中。不管WHERE条件如何,如果 您想包括所有数据,增加一个ALL子句。例如,在前面的语句中增加一个ALL子句会返回所有邮政编码组,而不仅仅是肯塔基地区的组。

SELECT ZIP

FROM Customers

WHERE State = 'KY'

GROUP BY ALL ZIP

照这个样子,这两个子句会造成冲突,您可能不会以这种方式使用ALL子句。当您用聚合求一个列的值时,应用ALL子句很方便。例如,下面的语句计算每个肯塔基邮政编码的顾客数目,同时显示其它邮政编码值。

SELECT ZIP, Count(ZIP) AS KYCustomersByZIP

FROM Customers

WHERE State = 'KY'

GROUP BY ALL ZIP

得到的组由根本数据中的所有邮政编码值构成。但是,聚合列(KYCustomerByZIP)显示为0,因为除肯塔基邮政编码组外没有别的组。

远程查询不支持GROUP BY ALL。

#5 :分组后限定数据

WHERE子句(#3)在 GROUP BY子句之前求数据的值。当您希望在分组以后限定数据时,使用HAVING。通常,不管您使用WHERE还是HAVING,得到的结果相同。但要记住,这 两个子句不能互换,这点很重要。如果您存在疑问,这里有一条应用指南:过滤记录时使用WHERE;过滤组时使用HAVING。

一般,您会用HAVING,利用聚合来求一个组的值。例如,下面的语句返回一个邮政编码列表,但这个表内可能不包含根本数据源中的每个邮政编码:

SELECT ZIP, Count(ZIP) AS CustomersByZIP

FROM Customers

GROUP BY ZIP

HAVING Count(ZIP) = 1

仅仅那些只有一名顾客的组出现在结果中。

#6 :详细了解WHERE 和HAVING

如果您仍然对WHERE和HAVING的用法感到迷惑,应用下面的指导方法:

  • WHERE出现在GROUP BY之前;SQL在它分组记录前求WHERE子句的值。
  • HAVING出现在GROUP BY之后;SQL在它分组记录后求HAVING子句的值。

#7 :用聚合总计分组值

分组数据有助于对数据进行分析,但有时您还需要组本身以外的其它信息。您可以增加一个聚合函数来总计分组数据。例如,下面的语句为每次排序显示一个小计:

SELECT OrderID, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY OrderID

与其它的组一样,SELECT和GROUP BY列表必须相匹配。在SELECT子句中包含一个聚合是这一规则的唯一例外。

#8 :总计聚合

您可以通过显示每个组的小计进一步总计数据。SQL的ROLLUP运算符为每个组显示一个额外的记录,一个小计。那个记录是用聚合函数在每个组中求所有记录的值的结果。下面的语句为每个组合计OrderTotal列。

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY Customer, OrderNumber

WITH ROLLUP

一个包含20和25这两个OrderTotal值的组的ROLLUP行将显示OrderTotal值45。ROLLUP结果的第一个值是唯一的,因为它求所有组记录的值。那个值是整个记录集的总和。

ROLLUP不支持聚合函数中的DISTINCT或GROUP BY ALL子句。

#9 :总计每一列

CUBE运算符比ROLLUP更进一步,它返回每个组中每个值的总数。得到的结果与ROLLUP相似,但CUBE包括组中每一列的一个额外记录。下面的语句显示每个组的小计和每名顾客的一个额外总数。

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY Customer, OrderNumber

WITH CUBE

用CUBE得到的总计最为复杂。不仅完成聚合与ROLLUP的工作,而且还求定义组的其它列的值。也就是说,CUBE总计每一个可能的列组合。

CUBE不支持GROUP BY ALL。

#10 :给总计排序

当CUBE的结果杂乱无章时(一般都是这样),可以增加一个GROUPING函数,如下所示:

SELECT GROUPING(Customer), OrderNumber, Sum(Cost * Quantity) AS OrderTotal

FROM Orders

GROUP BY Customer, OrderNumber

WITH CUBE

其结果包括每一行的两个额外的值。

  • 值1表明左边的值是一个总计值——ROLLUP或CUBE的运算符的结果。
  • 值0表明左边的值是一个原始GROUP BY子句产生的详细记录。

英文原版

10 tips for sorting, grouping, and summarizing SQL data

Takeaway: Arranging SQL data so that you can effectively analyze it requires an understanding of how to use certain SQL clauses and operators. These tips will help you figure out how to build statements that will give you the results you want.

This article is also available as a PDF download .

Arranging data in a manner that's meaningful can be a challenge. Sometimes all you need is a simple sort. Often, you need more--you need groups you can analyze and summarize. Fortunately, SQL offers a number of clauses and operators for sorting, grouping, and summarizing. The following tips will help you discern when to sort, when to group, and when and how to summarize. For detailed information on each clause and operator, see Books Online .

#1: Bring order with a sort

More often than not, all your data really needs is a little order. SQL's ORDER BY clause organizes data in alphabetic or numeric order. Consequently, similar values sort together in what appear to be groups. However, the apparent groups are a result of the sort; they aren't true groups. ORDER BY displays each record whereas a group may represent multiple records.

#2: Reduce similar values into a group

The biggest difference between sorting and grouping is this: Sorted data displays all the records (within the confines of any limiting criteria) and grouped data doesn't. The GROUP BY clause reduces similar values into one record. For instance, a GROUP BY clause can return a unique list of ZIP codes from a source that repeats those values:

SELECT ZIPFROM CustomersGROUP BY ZIP

Include only those columns that define the group in both the GROUP BY and SELECT column lists. In other words, the SELECT list must match the GROUP BY list, with one exception: The SELECT list can include aggregate functions. (GROUP BY doesn't allow aggregate functions.)

Keep in mind that GROUP BY won't sort the resulting groups. To arrange groups alphabetically or numerically, add an ORDER BY clause (# 1). In addition, you can't refer to an aliased field in the GROUP BY clause. Group columns must be in the underlying data, but they don't have to appear in the results.

#3: Limit data before it's grouped

You can limit the data that GROUP BY groups by adding a WHERE clause. For instance, the following statement returns a unique list of ZIP codes for just Kentucky customers:

SELECT ZIPFROM CustomersWHERE State = 'KY'GROUP BY ZIP

It's important to remember that WHERE filters data before the GROUP BY clause evaluates it.

Like GROUP BY, WHERE doesn't support aggregate functions.

#4: Return all groups

When you use WHERE to filter data, the resulting groups display only those records you specify. Data that fits the group's definition but does not meet the clause's conditions won't make it to a group. Include ALL when you want to include all data, regardless of the WHERE condition. For instance, adding ALL to the previous statement returns all of the ZIP groups, not just those in Kentucky:

SELECT ZIPFROM CustomersWHERE State = 'KY'GROUP BY ALL ZIP

As is, the two clauses are in conflict, and you probably wouldn't use ALL in this way. ALL comes in handy when you use an aggregate to evaluate a column. For example, the following statement counts the number of customers in each Kentucky ZIP, while also displaying other ZIP values:

SELECT ZIP, Count(ZIP) AS KYCustomersByZIPFROM CustomersWHERE State = 'KY'GROUP BY ALL ZIP

The resulting groups comprise all ZIP values in the underlying data. However, the aggregate column (KYCustomersByZIP) would display 0 for any group other than a Kentucky ZIP.

Remote queries don't support GROUP BY ALL.

#5: Limit data after it's grouped

The WHERE clause (# 3) evaluates data before the GROUP BY clause does. When you want to limit data after it's grouped, use HAVING. Often, the result will be the same whether you use WHERE or HAVING, but it's important to remember that the clauses are not interchangeable. Here's a good guideline to follow when you're in doubt: Use WHERE to filter records; use HAVING to filter groups.

Usually, you'll use HAVING to evaluate a group using an aggregate. For instance, the following statement returns a unique list of ZIP codes, but the list might not include every ZIP code in the underlying data source:

SELECT ZIP, Count(ZIP) AS CustomersByZIPFROM CustomersGROUP BY ZIPHAVING Count(ZIP) = 1

Only those groups with just one customer make it to the results.

#6: Get a closer look at WHERE and HAVING

If you're still confused about when to use WHERE and when to use HAVING, apply the following guidelines:

  • WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
  • HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.

#7: Summarize grouped values with aggregates

Grouping data can help you analyze your data, but sometimes you'll need a bit more information than just the groups themselves. You can add an aggregate function to summarize grouped data. For instance, the following statement displays a subtotal for each order:

SELECT OrderID, Sum(Cost * Quantity) AS OrderTotalFROM OrdersGROUP BY OrderID

As with any other group, the SELECT and GROUP BY lists must match. Including an aggregate in the SELECT clause is the only exception to this rule.

#8: Summarize the aggregate

You can further summarize data by displaying a subtotal for each group. SQL's ROLLUP operator displays an extra record, a subtotal, for each group. That record is the result of evaluating all the records within each group using an aggregate function. The following statement totals the OrderTotal column for each group:

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotalFROM OrdersGROUP BY Customer, OrderNumberWITH ROLLUP

The ROLLUP row for a group with two OrderTotal values of 20 and 25 would display an OrderTotal of 45. The first record in a ROLLUP result is unique because it evaluates all of the group records. That value is a grand total for the entire recordset.

ROLLUP doesn't support DISTINCT in aggregate functions or the GROUP BY ALL clause.

#9: Summarize each column

The CUBE operator goes a step further than ROLLUP by returning totals for each value in each group. The results are similar to ROLLUP, but CUBE includes an additional record for each column in the group. The following statement displays a subtotal for each group and an additional total for each customer:

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotalFROM OrdersGROUP BY Customer, OrderNumberWITH CUBE

CUBE gives the most comprehensive summarization. It not only does the work of both the aggregate and ROLLUP, but also evaluates the other columns that define the group. In other words, CUBE summarizes every possible column combination.

CUBE doesn't support GROUP BY ALL.

#10: Bring order to summaries

When the results of a CUBE are confusing (and they usually are), add the GROUPING function as follows:

SELECT GROUPING(Customer), OrderNumber, Sum(Cost * Quantity) AS OrderTotalFROM OrdersGROUP BY Customer, OrderNumberWITH CUBE

The results include two additional values for each row:

  • The value 1 indicates that the value to the left is a summary value--the result of the ROLLUP or CUBE operator.
  • The value 0 indicates that the value to the left is a detail record produced by the original GROUP BY clause.

这篇关于在数据库中如何分类、分组并总计SQL数据[builder.com.cn]的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/316032

相关文章

一文详解Python中数据清洗与处理的常用方法

《一文详解Python中数据清洗与处理的常用方法》在数据处理与分析过程中,缺失值、重复值、异常值等问题是常见的挑战,本文总结了多种数据清洗与处理方法,文中的示例代码简洁易懂,有需要的小伙伴可以参考下... 目录缺失值处理重复值处理异常值处理数据类型转换文本清洗数据分组统计数据分箱数据标准化在数据处理与分析过

大数据小内存排序问题如何巧妙解决

《大数据小内存排序问题如何巧妙解决》文章介绍了大数据小内存排序的三种方法:数据库排序、分治法和位图法,数据库排序简单但速度慢,对设备要求高;分治法高效但实现复杂;位图法可读性差,但存储空间受限... 目录三种方法:方法概要数据库排序(http://www.chinasem.cn对数据库设备要求较高)分治法(常

mysql外键创建不成功/失效如何处理

《mysql外键创建不成功/失效如何处理》文章介绍了在MySQL5.5.40版本中,创建带有外键约束的`stu`和`grade`表时遇到的问题,发现`grade`表的`id`字段没有随着`studen... 当前mysql版本:SELECT VERSION();结果为:5.5.40。在复习mysql外键约

SQL注入漏洞扫描之sqlmap详解

《SQL注入漏洞扫描之sqlmap详解》SQLMap是一款自动执行SQL注入的审计工具,支持多种SQL注入技术,包括布尔型盲注、时间型盲注、报错型注入、联合查询注入和堆叠查询注入... 目录what支持类型how---less-1为例1.检测网站是否存在sql注入漏洞的注入点2.列举可用数据库3.列举数据库

数据库oracle用户密码过期查询及解决方案

《数据库oracle用户密码过期查询及解决方案》:本文主要介绍如何处理ORACLE数据库用户密码过期和修改密码期限的问题,包括创建用户、赋予权限、修改密码、解锁用户和设置密码期限,文中通过代码介绍... 目录前言一、创建用户、赋予权限、修改密码、解锁用户和设置期限二、查询用户密码期限和过期后的修改1.查询用

Python将大量遥感数据的值缩放指定倍数的方法(推荐)

《Python将大量遥感数据的值缩放指定倍数的方法(推荐)》本文介绍基于Python中的gdal模块,批量读取大量多波段遥感影像文件,分别对各波段数据加以数值处理,并将所得处理后数据保存为新的遥感影像... 本文介绍基于python中的gdal模块,批量读取大量多波段遥感影像文件,分别对各波段数据加以数值处

Mysql虚拟列的使用场景

《Mysql虚拟列的使用场景》MySQL虚拟列是一种在查询时动态生成的特殊列,它不占用存储空间,可以提高查询效率和数据处理便利性,本文给大家介绍Mysql虚拟列的相关知识,感兴趣的朋友一起看看吧... 目录1. 介绍mysql虚拟列1.1 定义和作用1.2 虚拟列与普通列的区别2. MySQL虚拟列的类型2

使用MongoDB进行数据存储的操作流程

《使用MongoDB进行数据存储的操作流程》在现代应用开发中,数据存储是一个至关重要的部分,随着数据量的增大和复杂性的增加,传统的关系型数据库有时难以应对高并发和大数据量的处理需求,MongoDB作为... 目录什么是MongoDB?MongoDB的优势使用MongoDB进行数据存储1. 安装MongoDB

mysql数据库分区的使用

《mysql数据库分区的使用》MySQL分区技术通过将大表分割成多个较小片段,提高查询性能、管理效率和数据存储效率,本文就来介绍一下mysql数据库分区的使用,感兴趣的可以了解一下... 目录【一】分区的基本概念【1】物理存储与逻辑分割【2】查询性能提升【3】数据管理与维护【4】扩展性与并行处理【二】分区的

MySQL中时区参数time_zone解读

《MySQL中时区参数time_zone解读》MySQL时区参数time_zone用于控制系统函数和字段的DEFAULTCURRENT_TIMESTAMP属性,修改时区可能会影响timestamp类型... 目录前言1.时区参数影响2.如何设置3.字段类型选择总结前言mysql 时区参数 time_zon