如何从NVARCHAR(MAX)属性解析编码为UTF-8的XML?

K4t*_*ini 3 xml sql t-sql sql-server sql-server-2012

我正在解决存储在类型字段中的XML字符串的问题NVARCHAR(MAX)(我无法更改此字段的类型).

这是我的桌子(WorkingHours):

CREATE TABLE WorkingHours(
    [ID] [int] NOT NULL PRIMARY KEY,
    [CONTENT] [nvarchar](MAX) NOT NULL,
    -- ...
);
Run Code Online (Sandbox Code Playgroud)

以下是[CONTENT]属性的示例:

<?xml version="1.0" encoding="UTF-8"?>
    <calendar>
        <day number="1" worked_day="no">
            <interval number="1" begin_hour="08:30" end_hour="12:00"/>
            <interval number="2" begin_hour="13:30" end_hour="17:00"/>
            <interval number="3" begin_hour="" end_hour=""/></day>
        <day number="2" worked_day="no">
            <interval number="1" begin_hour="08:30" end_hour="12:00"/>
            <interval number="2" begin_hour="13:30" end_hour="17:00"/>
            <interval number="3" begin_hour="" end_hour=""/>
        </day>
        <day number="3" worked_day="no">
            <interval number="1" begin_hour="08:30" end_hour="12:00"/>
            <interval number="2" begin_hour="13:30" end_hour="17:00"/>
            <interval number="3" begin_hour="" end_hour=""/>
        </day>
        <day number="4" worked_day="no">
            <interval number="1" begin_hour="08:30" end_hour="12:00"/>
            <interval number="2" begin_hour="13:30" end_hour="17:00"/>
            <interval number="3" begin_hour="" end_hour=""/>
        </day>
        <day number="5" worked_day="no">
            <interval number="1" begin_hour="08:30" end_hour="12:00"/>
            <interval number="2" begin_hour="13:30" end_hour="17:00"/>
            <interval number="3" begin_hour="" end_hour=""/>
        </day>
        <day number="6" worked_day="no">
            <interval number="1" begin_hour="" end_hour=""/>
            <interval number="2" begin_hour="" end_hour=""/>
            <interval number="3" begin_hour="" end_hour=""/>
        </day>
        <day number="7" worked_day="no">
            <interval number="1" begin_hour="" end_hour=""/>
            <interval number="2" begin_hour="" end_hour=""/>
            <interval number="3" begin_hour="" end_hour=""/>
        </day>
    </calendar>
Run Code Online (Sandbox Code Playgroud)

如您所见,数据编码为UTF-8.

现在,我想解析这些数据以创建一些计算:

DECLARE @RawContent [nvarchar](MAX) = (
    SELECT wh.[CONTENT]
    FROM [WorkingHours] wh 
    WHERE wh.[ID] = 100);

DECLARE @XMLContent [Xml] = @RawContent; // KO
-- DECLARE @XMLContent [Xml] = CAST(@RawContent AS XML);  // KO
-- DECLARE @XMLContent [Xml] = CONVERT(XML, @RawContent); // KO

-- Just a test to query XML data.
SELECT 
    C.WD.value('@number', 'int') AS DayId         
FROM @XMLContent.nodes('/calendar/day') AS C(WD);   
Run Code Online (Sandbox Code Playgroud)

我不知道如何将结果(包含UTF-8 XML字符串的nvarchar(max)字段)转换为XML值.SQL Server返回以下错误:

"Unable to switch encoding"
Run Code Online (Sandbox Code Playgroud)

它指的是CAST行(当我定义@XMLContent变量时).

有什么想法解决这个问题?

Jer*_*ert 6

删除处理指令 - 它没有意义且不正确,因为数据已经以UTF-16编码(因为它存储为NVARCHAR).如果您无法更改已存在的数据,则必须依赖(略微脆弱)字符串替换:

CAST(REPLACE(wh.[CONTENT], '<?xml version="1.0" encoding="UTF-8"?>', '') AS XML)
Run Code Online (Sandbox Code Playgroud)

请注意,显式指示编码是UTF-16也可以工作 - 虽然它什么都不添加.