SQL 查询优化:解决司机周里程统计超限问题
2025-03-13 00:31:14
解决数据库查询中排除未达标周总计的问题
这个问题看着眼熟,"Subquery returned more than 1 value" 这个错误在 SQL 查询里经常遇到。问题的核心在于,你想筛选出每周行驶里程超过 1000 英里的司机的数据,但在 WHERE 子句中使用了子查询,而这个子查询返回了多个值,导致错误。
问题原因
错误出在 WHERE 子句的子查询上。这个子查询:
SELECT (
sum(TotalMiles)
FROM #TEMP4
GROUP BY Driver, DATEADD(wk,DATEDIFF(wk,0,shipdate),0)-1)
Having sum(TotalMiles) > 1000
试图找出每个司机每周的总里程,并筛选出大于 1000 的。但是,WHERE 子句需要的是一个针对每一行的条件(真或假),而这个子查询返回的是一组值(每个司机每周的总里程)。 当子查询有having时, 返回的是一个集合, 显然,SQL 不知道如何将一行数据和一个集合进行比较, 这就好比不能将张三与一个学校所有学生比较一样,这就是错误的原因。
解决方案
解决这个问题的关键在于,如何把“每周总里程超过 1000”这个条件应用到每一行数据上。这提供几种方案。
方案 1:使用窗口函数 (推荐)
窗口函数是处理这类问题的好帮手。它允许你在不分组的情况下计算聚合值(比如每周总里程),并将这个值添加到每一行。
WITH WeeklyTotals AS (
SELECT
InvoiceNumber,
Dataflow,
BillTo,
ShipDate,
cy.cmp_name AS ShipperName,
c.cty_name AS OriginCity,
OriginState,
c.cty_zip AS OriginZip,
c.cty_latitude AS OriginLatitude,
c.cty_longitude AS OriginLongitude,
c.cty_region2 AS OriginRegion,
cy1.cmp_name AS ConsigneeName,
c1.cty_name AS DestCity,
DestState,
c1.cty_zip AS DestZip,
c1.cty_latitude AS DestLatitude,
c1.cty_longitude AS DestLongitude,
c1.cty_region2 AS DestRegion,
c.cty_region2 + ' || ' + c1.cty_region2 AS Lane,
Driver,
Delivery,
WeekStart,
SUM(Vol) AS Vol,
SUM(IBVol) AS IBVol,
SUM(OBVol) AS OBVol,
CASE WHEN Dataflow = 'Outbound' THEN SUM(Match) ELSE 0 END AS Match,
SUM(Weight) AS Weight,
SUM(Tons) AS Tons,
SUM(TotalMiles) AS TotalMiles,
SUM(Miles) AS Miles,
SUM(LDMiles) AS LDMiles,
SUM(MTMiles) AS MTMiles,
SUM(LDMiles) + SUM(MTMiles) AS [Total Miles],
SUM(MinCharge) AS MinCharge,
SUM(Pickups) AS Pickups,
SUM(Linehaul) AS Linehaul,
SUM(FSC) AS FSC,
SUM(Fixed) AS Fixed,
SUM(Tractors) AS Tractors,
SUM([Tractor Cost]) AS [Tractor Cost],
SUM(Trailers) AS Trailers,
SUM([Trailer Cost]) AS [Trailer Cost],
SUM(Tractors) AS Drivers,
SUM(TotalMiles) OVER (PARTITION BY Driver, DATEADD(wk, DATEDIFF(wk, 0, shipdate), 0) - 1) AS WeeklyDriverMiles -- 使用窗口函数计算每周总里程
FROM #temp3
INNER JOIN company cy (NOLOCK) ON Shipper = cy.cmp_id
INNER JOIN company cy1 (NOLOCK) ON Consignee = cy1.cmp_id
INNER JOIN city c (NOLOCK) ON origincity = c.cty_code
INNER JOIN city c1 (NOLOCK) ON DestCity = c1.cty_code
GROUP BY WeekStart, DataFlow, Delivery, ShipDate, BillTo, Driver, Shipper, OriginCity, OriginState, Consignee, DestCity, DestState, InvoiceNumber, c.cty_latitude, c.cty_longitude,
cy.cmp_name, cy1.cmp_name, c1.cty_name, c.cty_name, c1.cty_latitude, c1.cty_longitude, c1.cty_zip, c.cty_zip, c1.cty_region2, c.cty_region2
)
SELECT *
FROM WeeklyTotals
WHERE WeeklyDriverMiles > 1000;
代码解释:
WITH WeeklyTotals AS (...)
: 这定义了一个公用表表达式 (CTE),名为WeeklyTotals
。CTE 就像一个临时的、只在当前查询中有效的表。SUM(TotalMiles) OVER (PARTITION BY Driver, DATEADD(wk, DATEDIFF(wk, 0, shipdate), 0) - 1) AS WeeklyDriverMiles
: 这是关键!SUM(TotalMiles)
: 计算总里程。OVER (...)
: 表示这是一个窗口函数。PARTITION BY Driver, DATEADD(wk, DATEDIFF(wk, 0, shipdate), 0) - 1)
: 这告诉 SQL Server 如何“划分”数据。- 这里表示,按照“Driver”(司机)和每周的开始日期(通过
DATEADD(wk, DATEDIFF(wk, 0, shipdate), 0) - 1
计算)来分组。 对于每个司机和每周的开始日期,都会单独计算SUM(TotalMiles)
。 - 结果是,
WeeklyDriverMiles
列包含了每个司机每周的总里程。
SELECT * FROM WeeklyTotals WHERE WeeklyDriverMiles > 1000
: 从WeeklyTotals
CTE 中选择数据,并筛选出WeeklyDriverMiles
大于 1000 的行。
原理:
窗口函数的好处在于,它在计算聚合值(如 SUM
)时,不会像 GROUP BY
那样把数据行“折叠”成一行。它会把聚合值添加到每一行,这样你就可以在 WHERE
子句中直接使用这个值进行筛选。
安全建议:
使用NOLOCK
需要谨慎,尤其是在写入频繁的表上。它可能会导致读取到未提交的数据(脏读)。 如果数据一致性很重要, 请移除 (NOLOCK)
。不过就本问题而言,可以继续使用 (NOLOCK)
,影响很小。
方案 2:使用 JOIN
这种方法先把每周总里程超过 1000 的司机和周起始日期筛选出来,然后再与原表连接。
WITH DriverWeeklyTotals AS (
SELECT
Driver,
DATEADD(wk, DATEDIFF(wk, 0, shipdate), 0) - 1 AS WeekStart
FROM #temp3
GROUP BY Driver, DATEADD(wk, DATEDIFF(wk, 0, shipdate), 0) - 1
HAVING SUM(TotalMiles) > 1000
)
SELECT
t4.*
FROM #temp4 t4
INNER JOIN DriverWeeklyTotals dwt ON t4.Driver = dwt.Driver AND DATEADD(wk, DATEDIFF(wk, 0, t4.shipdate), 0) - 1 = dwt.WeekStart;
代码解释:
- 首先构建
DriverWeeklyTotals
表, 根据Driver
和WeekStart
进行分组,利用having
语句筛选出大于1000的周数据. - 将
#temp4
表与DriverWeeklyTotals
通过Driver
与WeekStart
两个字段进行连接。这样可以得到每周形式里超过1000,且在#temp4
的对应数据.
原理:
先计算出符合条件的司机和周起始日期组合,然后通过 JOIN
操作将这些组合与原始数据进行匹配,只保留符合条件的行。
方案 3:使用 EXISTS
这种方法使用 EXISTS
子查询来检查每一行是否存在于一个满足条件的子查询中。
SELECT
t4.*
FROM #temp4 t4
WHERE EXISTS (
SELECT 1
FROM #temp3 t3
WHERE t3.Driver = t4.Driver
AND DATEADD(wk, DATEDIFF(wk, 0, t3.shipdate), 0) - 1 = DATEADD(wk, DATEDIFF(wk, 0, t4.shipdate), 0) - 1
GROUP BY Driver, DATEADD(wk, DATEDIFF(wk, 0, t3.shipdate), 0) - 1
HAVING SUM(t3.TotalMiles) > 1000
);
代码解释:
- 对
#temp4
的每一行数据, 都会进行一次EXISTS
后的子查询. - 在该子查询中, 首先使用
WHERE
语句匹配#temp3
中的相同Driver
与WeekStart
的数据, 然后对Driver, WeekStart
进行分组, 判断该分组下SUM(t3.TotalMiles)
是否大于1000. - 大于1000时,
EXISTS
子查询有返回, 外部SELECT
就选出该行数据, 否则不选择.
原理:
类似JOIN
的原理, 不过使用的是EXISTS
. EXISTS
比JOIN
更快一点, 因为数据库只要找到任何一行匹配 EXISTS 子查询就会停止搜索。
方案4 (可选, 基于现有代码结构): 先创建表,再插入数据
考虑到你已经有了 #temp4
,可以将计算周总里程并筛选的逻辑放在插入 #temp4
数据的过程中。
-- 先创建一个包含周总里程的临时表
SELECT
WeekStart,
DataFlow,
Delivery,
ShipDate,
BillTo,
Driver,
Shipper,
OriginCity,
OriginState,
Consignee,
DestCity,
DestState,
InvoiceNumber,
c.cty_latitude,
c.cty_longitude,
cy.cmp_name,
cy1.cmp_name,
c1.cty_name,
c.cty_name,
c1.cty_latitude,
c1.cty_longitude,
c1.cty_zip,
c.cty_zip,
c1.cty_region2,
c.cty_region2,
SUM(Vol) AS Vol,
SUM(IBVol) AS IBVol,
SUM(OBVol) AS OBVol,
CASE WHEN Dataflow = 'Outbound' THEN SUM(Match) ELSE 0 END AS Match,
SUM(Weight) AS Weight,
SUM(Tons) AS Tons,
SUM(TotalMiles) AS TotalMiles,
SUM(Miles) AS Miles,
SUM(LDMiles) AS LDMiles,
SUM(MTMiles) AS MTMiles,
SUM(LDMiles) + SUM(MTMiles) AS [Total Miles],
SUM(MinCharge) AS MinCharge,
SUM(Pickups) AS Pickups,
SUM(Linehaul) AS Linehaul,
SUM(FSC) AS FSC,
SUM(Fixed) AS Fixed,
SUM(Tractors) AS Tractors,
SUM([Tractor Cost]) AS [Tractor Cost],
SUM(Trailers) AS Trailers,
SUM([Trailer Cost]) AS [Trailer Cost],
SUM(Tractors) AS Drivers,
WeeklyDriverMiles -- 直接在这里计算周总里程
INTO #temp4_with_weekly
FROM (
SELECT *,
SUM(TotalMiles) OVER (PARTITION BY Driver, DATEADD(wk, DATEDIFF(wk, 0, shipdate), 0) - 1) AS WeeklyDriverMiles
FROM #temp3
) AS Subquery
INNER JOIN company cy (NOLOCK) ON Shipper = cy.cmp_id
INNER JOIN company cy1 (NOLOCK) ON Consignee = cy1.cmp_id
INNER JOIN city c (NOLOCK) ON origincity = c.cty_code
INNER JOIN city c1 (NOLOCK) ON DestCity = c1.cty_code
GROUP BY WeekStart, DataFlow, Delivery, ShipDate, BillTo, Driver, Shipper, OriginCity, OriginState, Consignee, DestCity, DestState, InvoiceNumber, c.cty_latitude, c.cty_longitude,
cy.cmp_name, cy1.cmp_name, c1.cty_name, c.cty_name, c1.cty_latitude, c1.cty_longitude, c1.cty_zip, c.cty_zip, c1.cty_region2, c.cty_region2, WeeklyDriverMiles;
-- 再筛选数据插入到最终的 #temp4 表
INSERT INTO #temp4
SELECT *
FROM #temp4_with_weekly
WHERE WeeklyDriverMiles > 1000;
-- 最后查询 #temp4
SELECT *
FROM #temp4;
代码解释
- 创建
#temp4_with_weekly
时就计算WeeklyDriverMiles
- 使用
WHERE WeeklyDriverMiles > 1000
插入#temp4
. - 查询最终表.
原理: 将筛选与临时表创建合并, 更精简. 注意如果 #temp4
已存在,需要先删除。
总结与建议:
- 首选窗口函数(方案 1) :这是最简洁、最有效的方式。
- JOIN(方案 2)和 EXISTS(方案 3)也可以 :它们在逻辑上更直观一些,但性能可能稍逊于窗口函数。
- 方案4是一种将筛选与构建临时表相结合的方法, 代码量少.
请根据你的具体情况和偏好选择合适的方案。强烈建议测试不同方案的性能,尤其是在数据量很大的情况下。