前一篇文章介紹用CTE實現SQL一對多關聯欄位合併的方法,找資料期間曾發現另一個替代做法,利用SQL Server的FOR XML PATH('')技巧,可將多筆資料轉成單一字串(參考),看來也相當簡潔。但當下覺得子查詢加FOR XML轉換的做法有效能疑慮,理應不如CTE(事實不然!),加上搞CTE比較有挑戰性,故選了CTE研究嚐鮮。
網友ChoeChin留言提到FOR XML寫法,讓我重新思考:如果要大量重度應用,效能議題不可忽視,CTE效能是否真的比FOR XML方法好?有待實驗證實。經一番測試後,結論是我錯估了CTE與FOR XML兩種做法的效能表現。
原本的測試樣本太小,很難看出效能差異。我改用以下指令生出10萬筆糸統,並將AppSystem的Nam欄位與AppSupport的AppName, Engineer欄位設成Primary Key確保查詢效能。
DECLARE @I INT, @NOVARCHAR(6);
SET @I = 0;
TRUNCATETABLE AppSystem;
TRUNCATETABLE AppSupport;
WHILE @I < 25000
BEGIN
SET @NO = CONVERT(VARCHAR(6), @I);
INSERTINTO AppSystem VALUES ('HR-' + @NO, 'Mars-' + @NO);
INSERTINTO AppSystem VALUES ('ERP-' + @NO, 'Jupiter-' + @NO);
INSERTINTO AppSystem VALUES ('POS-' + @NO, 'Venus-' + @NO);
INSERTINTO AppSystem VALUES ('MAIL-' + @NO, 'Apolo-' + @NO);
INSERTINTO AppSupport VALUES ('HR-' + @NO, 'Jeffrey-' + @NO);
INSERTINTO AppSupport VALUES ('ERP-' + @NO, 'Jeffrey-' + @NO);
INSERTINTO AppSupport VALUES ('POS-' + @NO, 'Jeffrey-' + @NO);
INSERTINTO AppSupport VALUES ('ERP-' + @NO, 'Darkthread-' + @NO);
INSERTINTO AppSupport VALUES ('POS-' + @NO, 'Darkthread-' + @NO);
INSERTINTO AppSupport VALUES ('HR-' + @NO, 'Alice-' + @NO);
INSERTINTO AppSupport VALUES ('ERP-' + @NO, 'Bob-' + @NO);
INSERTINTO AppSupport VALUES ('MAIL-' + @NO, 'Jeffrey-' + @NO);
SET @I = @I + 1;
END
我設計了四種查詢方式:
- CTE寫法的最終改良版,採用路人乙建議的ROW_NUMBER()法挑出最終合併結果
- 我最早的CTE寫法,用JOIN找出最終合併結果
- 標準FOR XML PATH做法
- 將重複出現的FOR XML PATH子查詢轉成CTE,看效能是否因此提升
;WITH SupportCTE (RowNum, Name, CodeName, Engineer,Engineers) AS
(
SELECT 1, A.Name, A.CodeName, MIN(B.Engineer),
CAST(MIN(B.Engineer) ASVARCHAR(MAX)) AS Engineers
FROM AppSystem A JOIN AppSupport B ON A.Name = B.AppName
GROUPBY A.Name, A.CodeName
UNIONALL
SELECT A.RowNum + 1, A.Name, A.CodeName, B.Engineer,
CAST(A.Engineers + ', ' + B.Engineer ASVARCHAR(MAX)) As Supports
FROM SupportCTE A JOIN AppSupport B
ON A.Name = B.AppName AND B.Engineer > A.Engineer
)
--Trail 1
SELECT Name,CodeName,Engineers
FROM (
SELECT *,ROW_NUMBER() OVER (PARTITION BY Name ORDERBY RowNum Desc) AS Pos
FROM SupportCTE
) T
WHERE Pos = 1
ORDERBY Name
--Trial 2
--SELECT A.Name, A.CodeName, A.Engineers FROM SupportCTE A
--JOIN (SELECT Name, Max(RowNum) AS MaxRowNum FROM SupportCTE GROUPBY Name) B
--ON A.Name = B.Name AND A.RowNum = B.MaxRowNum
--ORDERBY A.Name
GO
--Trail 3
SELECTDISTINCT d.Name ,d.CodeName,
(
SELECT c.Engineer +','FROM
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFTJOIN AppSupport b ON a.Name = b.AppName
) c
WHERE c.Name = d.Name FOR XML PATH('')
) AS Engineer
FROM
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFTJOIN AppSupport b ON a.Name = b.AppName
) d
ORDERBY Name
GO
--Trail 4
;WITH SupportCTE (Name, CodeName, Engineer) AS
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFTJOIN AppSupport b ON a.Name = b.AppName
)
SELECTDISTINCT d.Name ,d.CodeName,
(
SELECT c.Engineer + ','FROM SupportCTE c
WHERE c.Name = d.Name FOR XML PATH('')
) AS Engineer
FROM SupportCTE d
ORDERBY Name
執行結果如下。以Total execution time欄位做為比較標的,CTE+ROW_NUMBER()法 2.96秒、CTE+JOIN找結果法7.14秒,而FOR XML PATH法只花了0.59秒,FOR XML PATH+CTE法0.64秒(沒有變快)。
CTE慢了近5倍,顛覆我原本的想像…
由執行計劃推敲效能差異來源。FOR XML PATH法的執行計劃很單純,絕大部分的時間用在JOIN查詢。
CTE做法的執行計劃相對複雜許多,而高達84%消耗在Sort上!由Sort出現位置推測約略發生在用ROW_NUMBER()找最終合併結果時,猜想問題出在CTE結果暫存於記憶體,無法仰賴Index等機制加速,資料量一變大即成瓶頸。
若要以此推論CTE效能不好也有失公允,這次合併欄位的CTE動用了MIN、GROUP BY、ROW_NUMBER(),相對FOR XML PATH笨重複雜許多,而CTE的遞迴邏輯有其強大難以被取代的場合,隨便換個例子就不是FOR XML PATH可以搞定的。只能說,在這個案例用CTE加遞迴有殺雞用牛刀之嫌。如果只是在SQL要將一對多欄位合併成單一字串,FOR XML PATH方法有較好的效能表現。
最後補充一點,前述的FOR XML PATH寫法有個小瑕疵,Engineer欄位結尾會多一個逗號。
這個小問題在C#端可以用TrimEnd(',')輕鬆搞定,若要在T-SQL做掉稍微囉嗦一點,但也不難,例如以下範例。(感謝網友艾里克斯補充STUFF解法)
SELECT Name, CodeName,
CASEWHEN LEN(Engineer) > 1 THENSUBSTRING(Engineer, 1, LEN(Engineer)-1) ELSE Engineer ENDAS Engineer
FROM (
SELECTDISTINCT d.Name ,d.CodeName,
STUFF(
(
SELECT',' + c.Engineer FROM
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFTJOIN AppSupport b ON a.Name = b.AppName
) c
WHERE c.Name = d.Name FOR XML PATH('')
), 1, 1, ''
) AS Engineer
FROM
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFTJOIN AppSupport b ON a.Name = b.AppName
) d
) X
ORDERBY Name