
在关系型数据库中,当我们需要从多个相互关联的表中汇总数据时,通常会使用JOIN操作。然而,如果存在多对一(或一对多)的关系,例如一个销售(sale)可以有多个销售明细(sale_lines),同时也可以有多个现金交易记录(cash_transactions),那么直接将这些表连接起来并进行SUM等聚合操作,很容易导致数据重复计算,从而得出错误的结果。
考虑以下数据库结构:
问题在于,sale_lines的货币总是与sale一致,但cash_transactions中的收款货币(received_currency_id)或转换后货币(converted_currency_id)可能与sale的销售货币不同。当尝试计算总销售额、总收款额和总转换额时,如果简单地将sale、sale_lines和cash_transactions进行LEFT JOIN,然后GROUP BY sale.currency_items_sold_in,会导致以下问题:
以下是一个展示问题所在的SQL查询示例(基于原始问题中的SQL Fiddle):
-- 错误地聚合了重复行
SELECT
s.currency_items_sold_in,
SUM(sl.price_paid) as "price_paid" -- 此处SUM结果错误
FROM sale s
LEFT JOIN sale_lines sl ON sl.sale_id = s.id
LEFT JOIN cash_transactions ct ON ct.sale_id = s.id
GROUP BY s.currency_items_sold_in;
-- 尝试使用子查询预聚合,但cash_transactions的金额可能仍是混合币种
SELECT
s.currency_items_sold_in,
SUM(sale_line_aggregates.price_paid) as "total_price_paid",
SUM(cash_transaction_aggregates.converted_amount) as "total_converted_amount",
SUM(cash_transaction_aggregates.received_amount) as "total_received_amount"
FROM sale s
LEFT JOIN (
SELECT
sale_id,
SUM(price_paid) AS price_paid
FROM sale_lines
GROUP BY sale_id
) AS sale_line_aggregates ON sale_line_aggregates.sale_id = s.id
LEFT JOIN (
SELECT
sale_id,
SUM(converted_amount) as converted_amount,
SUM(received_amount) as received_amount
FROM cash_transactions
GROUP BY sale_id
) AS cash_transaction_aggregates ON cash_transaction_aggregates.sale_id = s.id
GROUP BY s.currency_items_sold_in;在上述第二个查询中,total_received_amount和total_converted_amount虽然在sale_id层面进行了预聚合,但如果一个sale_id下的cash_transactions包含多种received_currency_id或converted_currency_id,那么最终按s.currency_items_sold_in分组时,这些金额仍然是混合币种的总和,其业务价值有限。
为了解决上述问题,我们需要一种更精细的聚合策略。核心思想是:
这种方法避免了在多对多关联中直接JOIN导致的行膨胀,并允许我们根据数据的实际货币维度进行精确的汇总。
以下是具体的SQL实现:
WITH CTE_SALE AS (
-- 定义一个CTE,用于获取所有销售的ID及其销售货币
SELECT
id as sale_id,
currency_items_sold_in AS iso_number
FROM sale
)
SELECT
curr.iso_code AS currency,
COALESCE(line.price_paid, 0) as total_price_paid,
COALESCE(received.amount, 0) as total_received_amount,
COALESCE(converted.amount, 0) as total_converted_amount
FROM currency AS curr
LEFT JOIN (
-- 聚合销售明细金额,按销售的货币类型分组
SELECT
s.iso_number,
SUM(sl.price_paid) AS price_paid
FROM sale_lines sl
JOIN CTE_SALE s ON s.sale_id = sl.sale_id
GROUP BY s.iso_number
) AS line
ON line.iso_number = curr.iso_number
LEFT JOIN (
-- 聚合现金交易的收款金额,按收款货币类型分组
SELECT
tr.received_currency_id as iso_number,
SUM(tr.received_amount) AS amount
FROM cash_transactions tr
JOIN CTE_SALE s ON s.sale_id = tr.sale_id
GROUP BY tr.received_currency_id
) AS received
ON received.iso_number = curr.iso_number
LEFT JOIN (
-- 聚合现金交易的转换金额,按转换后货币类型分组
SELECT
tr.converted_currency_id as iso_number,
SUM(tr.converted_amount) AS amount
FROM cash_transactions AS tr
JOIN CTE_SALE s ON s.sale_id = tr.sale_id
GROUP BY tr.converted_currency_id
) AS converted
ON converted.iso_number = curr.iso_number;代码解析:
CTE_SALE (公共表表达式):
line 子查询:
received 子查询:
converted 子查询:
主查询:
预期输出示例:
| currency | total_price_paid | total_received_amount | total_converted_amount |
|---|---|---|---|
| DKK | 500 | 300 | 700 |
| SEK | 200 | 400 | 0 |
| NOK | 0 | 150 | 0 |
这个结果清晰地展示了:
这样,每列的聚合结果都具有明确的货币维度,避免了不同币种金额的混淆。
在处理具有多对一复杂关联和多币种交易的聚合查询时,直接JOIN并聚合会导致数据膨胀和结果不准确。通过采用CTE结合分步聚合的策略,我们可以有效地避免这些陷阱。这种方法允许我们为不同的业务指标(如销售额、收款额、转换额)根据其内在的货币维度进行独立的、精确的汇总,然后将这些结果整合到一起。这不仅确保了数据的准确性,也提高了查询的清晰度和可维护性,是处理复杂数据聚合场景的专业实践。
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号