use*_*162 4 xml sql-server varbinary
我将图像文件作为 XML 数据接收,图像的每个字节都是一个带有十进制值的节点,例如对于这个示例 .png 文件, ,我得到的xml是:
DECLARE @xml XML = N'<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<XmlData>
<Element>
<id>Test</id>
<image>
<Element>137</Element><Element>80</Element><Element>78</Element><Element>71</Element><Element>13</Element><Element>10</Element><Element>26</Element><Element>10</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>13</Element><Element>73</Element><Element>72</Element><Element>68</Element><Element>82</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>20</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>20</Element><Element>8</Element><Element>6</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>141</Element><Element>137</Element><Element>29</Element><Element>13</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>4</Element><Element>103</Element><Element>65</Element><Element>77</Element><Element>65</Element><Element>0</Element><Element>0</Element><Element>177</Element><Element>143</Element><Element>11</Element><Element>252</Element><Element>97</Element><Element>5</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>9</Element><Element>112</Element><Element>72</Element><Element>89</Element><Element>115</Element><Element>0</Element><Element>0</Element><Element>14</Element><Element>193</Element><Element>0</Element><Element>0</Element><Element>14</Element><Element>193</Element><Element>1</Element><Element>184</Element><Element>145</Element><Element>107</Element><Element>237</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>24</Element><Element>116</Element><Element>69</Element><Element>88</Element><Element>116</Element><Element>83</Element><Element>111</Element><Element>102</Element><Element>116</Element><Element>119</Element><Element>97</Element><Element>114</Element><Element>101</Element><Element>0</Element><Element>112</Element><Element>97</Element><Element>105</Element><Element>110</Element><Element>116</Element><Element>46</Element><Element>110</Element><Element>101</Element><Element>116</Element><Element>32</Element><Element>52</Element><Element>46</Element><Element>48</Element><Element>46</Element><Element>54</Element><Element>252</Element><Element>140</Element><Element>99</Element><Element>223</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>108</Element><Element>73</Element><Element>68</Element><Element>65</Element><Element>84</Element><Element>56</Element><Element>79</Element><Element>99</Element><Element>24</Element><Element>5</Element><Element>184</Element><Element>192</Element><Element>114</Element><Element>32</Element><Element>254</Element><Element>14</Element><Element>196</Element><Element>30</Element><Element>96</Element><Element>30</Element><Element>21</Element><Element>192</Element><Element>126</Element><Element>32</Element><Element>254</Element><Element>15</Element><Element>196</Element><Element>9</Element><Element>96</Element><Element>30</Element><Element>21</Element><Element>192</Element><Element>48</Element><Element>55</Element><Element>80</Element><Element>7</Element><Element>136</Element><Element>29</Element><Element>208</Element><Element>240</Element><Element>121</Element><Element>32</Element><Element>6</Element><Element>25</Element><Element>216</Element><Element>142</Element><Element>36</Element><Element>6</Element><Element>195</Element><Element>34</Element><Element>64</Element><Element>140</Element><Element>23</Element><Element>128</Element><Element>98</Element><Element>19</Element><Element>164</Element><Element>153</Element><Element>88</Element><Element>60</Element><Element>27</Element><Element>136</Element><Element>241</Element><Element>130</Element><Element>213</Element><Element>64</Element><Element>12</Element><Element>242</Element><Element>34</Element><Element>50</Element><Element>126</Element><Element>15</Element><Element>196</Element><Element>32</Element><Element>205</Element><Element>215</Element><Element>145</Element><Element>196</Element><Element>96</Element><Element>56</Element><Element>3</Element><Element>136</Element><Element>73</Element><Element>6</Element><Element>32</Element><Element>141</Element><Element>32</Element><Element>3</Element><Element>71</Element><Element>147</Element><Element>13</Element><Element>249</Element><Element>128</Element><Element>234</Element><Element>6</Element><Element>250</Element><Element>0</Element><Element>113</Element><Element>5</Element><Element>16</Element><Element>43</Element><Element>128</Element><Element>121</Element><Element>163</Element><Element>0</Element><Element>1</Element><Element>24</Element><Element>24</Element><Element>0</Element><Element>127</Element><Element>60</Element><Element>48</Element><Element>197</Element><Element>152</Element><Element>102</Element><Element>243</Element><Element>130</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>73</Element><Element>69</Element><Element>78</Element><Element>68</Element><Element>174</Element><Element>66</Element><Element>96</Element><Element>130</Element>
</image>
</Element>
</XmlData>'
Run Code Online (Sandbox Code Playgroud)
在二进制中:
SELECT * FROM OPENROWSET(BULK 'C:\test.png', SINGLE_BLOB) AS q;
==========
BulkColumn
----------
0x89504E470D0A1A0A0000000D49484452000000140000001408060000008D891D0D0000000467414D410000B18F0BFC6105000000097048597300000EC100000EC101B8916BED0000001874455874536F667477617265007061696E742E6E657420342E302E36FC8C63DF0000006C49444154384F631805B8C07220FE0EC41E601E15C07E20FE0FC409601E15C030375007881DD0F079200619D88E2406C322408C17806213A499583C1B88F182D5400CF222327E0FC420CDD791C4603803884906208D200347930DF980EA06FA007105102B8079A300011818007F3C30C59866F3820000000049454E44AE426082
Run Code Online (Sandbox Code Playgroud)
如何从 xml 中检索图像文件作为 varbinary?
不久前我也问过类似的问题,所以我尝试了以下查询,但结果二进制数据不正确:
SELECT r.c.value('id[1]', 'varchar(50)') AS id,
CONVERT(VARBINARY(MAX), (SELECT (t.u.value('.','tinyint')) FROM r.c.nodes('image/Element') AS t(u) FOR XML PATH(''))) AS image
FROM @xml.nodes('/XmlData/Element') AS r(c);
=============
id image
-------------
Test 0x31003300370038003000370038003700310031003300310030003200360031003000300030003000310033003700330037003200360038003800320030003000300032003000300030003000320030003800360030003000300031003400310031003300370032003900310033003000300030003400310030003300360035003700370036003500300030003100370037003100340033003100310032003500320039003700350030003000300039003100310032003700320038003900310031003500300030003100340031003900330030003000310034003100390033003100310038003400310034003500310030003700320033003700300030003000320034003100310036003600390038003800310031003600380033003100310031003100300032003100310036003100310039003900370031003100340031003000310030003100310032003900370031003000350031003100300031003100360034003600310031003000310030003100310031003600330032003500320034003600340038003400360035003400320035003200310034003000390039003200320033003000300030003100300038003700330036003800360035003800340035003600370039003900390032003400350031003800340031003900320031003100340033003200320035003400310034003100390036003300300039003600330030003200310031003900320031003200360033003200320035003400310035003100390036003900390036003300300032003100310039003200340038003500350038003000370031003300360032003900320030003800320034003000310032003100330032003600320035003200310036003100340032003300360036003100390035003300340036003400310034003000320033003100320038003900380031003900310036003400310035003300380038003600300032003700310033003600320034003100310033003000320031003300360034003100320032003400320033003400350030003100320036003100350031003900360033003200320030003500320031003500310034003500310039003600390036003500360033003100330036003700330036003300320031003400310033003200330037003100310034003700310033003200340039003100320038003200330034003600320035003000300031003100330035003100360034003300310032003800310032003100310036003300300031003200340032003400300031003200370036003000340038003100390037003100350032003100300032003200340033003100330030003000300030003000370033003600390037003800360038003100370034003600360039003600310033003000
Run Code Online (Sandbox Code Playgroud)
Sol*_*zky 16
这很接近,但缺少一些部分。您TINYINT
从<Element>
XML 中的每个(例如 137、80、78 等)中提取十进制值的行,然后FOR XML PATH('')
将它们转换回字符串并将它们连接起来,留下一个 UTF-16 编码的字符串“1378078.. .”。将其转换为VARBINARY
只是将每个字符串数字——“1”、“3”、“7”、“8”等——转换为它的二进制/十六进制代码点:
SELECT CONVERT(VARBINARY(20), N'1378078');
-- 0x3100330037003800300037003800
-- 0x 3100 3300 3700 3800 3000 3700 3800 -- each character separated for readability
-- XML in SQL Server is encoded as UTF-16, same as NCHAR / NVARCHAR.
-- Each of these characters in UTF-16 is two bytes: 0x31 + 0x00, 0x33 + 0x00, etc.
-- Each pair of bytes is in reverse order due to "endianness". 0x3100 is really 0x0031.
SELECT NCHAR(0x0031), NCHAR(0x0033), NCHAR(0x0037), NCHAR(0x0038), NCHAR(0x0030),
NCHAR(0x0037), NCHAR(0x0038);
-- 1 3 7 8 0 7 8
Run Code Online (Sandbox Code Playgroud)
相反,您需要执行以下操作:
TINYINT
“137”转换为十六进制/ BINARY
“0x89”VARCHAR
,但不带前导“0x”(这需要使用CONVERT
函数,而不是CAST
,以便您可以指定 的“样式” 2
)FOR XML PATH('')
放一切融合在一起,在形式的字符串89504E470D0A1A0A0000...
,那么你需要应用的“风格”2
再次在CONVERT(VARBINARY(MAX), ...
这样它知道,89504E470D0A1A0A0000...
只是0x89504E470D0A1A0A0000...
没有前导“0x”。将这些部分放入您的查询中,我们得到以下信息:
SELECT r.c.value('id[1]', 'varchar(50)') AS [id],
CONVERT(VARBINARY(MAX),
(SELECT CONVERT(VARCHAR(3),
CONVERT(BINARY(1),
t.u.value('.', 'tinyint')
),
2 -- style creates binary string without the leading "0x"
)
FROM r.c.nodes('image/Element') AS t(u)
FOR XML PATH('')
),
2 -- style creates binary string without the leading "0x"
) AS [image]
FROM @xml.nodes('/XmlData/Element') AS r(c);
Run Code Online (Sandbox Code Playgroud)
这将返回该image
字段的以下内容:
0x89504E470D0A1A0A0000000D49484452000000140000001408060000008D891D0D0000000467414D410000B18F0BFC6105000000097048597300000EC100000EC101B8916BED0000001874455874536F667477617265007061696E742E6E657420342E302E36FC8C63DF0000006C49444154384F631805B8C07220FE0EC41E601E15C07E20FE0FC409601E15C030375007881DD0F079200619D88E2406C322408C17806213A499583C1B88F182D5400CF222327E0FC420CDD791C4603803884906208D200347930DF980EA06FA007105102B8079A300011818007F3C30C59866F3820000000049454E44AE426082
需要考虑的事情:
发送二进制数据的方法
<Element>137</Element><Element>80</Element>...
Run Code Online (Sandbox Code Playgroud)
可能是最糟糕/效率最低的方法。我意识到您说您正在以这种格式接收此信息,因此可能不应为此负责并且无法控制它。然而,为了让每个人都明白发生了什么(请参阅下面的更新部分),二进制数据的每个字节是:
<Element>
和</Element>
标签中 { 为什么不<Byte>
?}(19 个字符)这是如何解决的?好吧,只有 10 个十进制值 ( 0 - 9 ) 是 1 个字符/ 2 个字节。另外 90 ( 10 - 99 ) 是 2 个字符/ 4 个字节,而其余 156 个值 ( 100 - 255 ) 是 3 个字符/ 6 个字节。所以大多数可能的值占用了完整的 6 个字节,只有一小部分占用了最小的 2 个字节。这意味着每个原始字节占用的平均空间可能在 2 到 3 个字符/4-6 个字节之间(我猜他们在某些地方称之为“5”;-)?)。
对于您的特定示例数据,您可以运行以下查询以查看细分:
;WITH cte AS
( SELECT LEN(CONVERT(VARCHAR(3), r.c.value('.', 'tinyint'))) AS [Length]
FROM @xml.nodes('/XmlData/Element/image/Element') AS r(c)
)
SELECT cte.[Length] AS [ElementLength], COUNT(*) AS [ElementCount]
FROM cte
GROUP BY cte.[Length];
Run Code Online (Sandbox Code Playgroud)
那返回:
ElementLength ElementCount
------------- ------------
1 55
2 98
3 85
Run Code Online (Sandbox Code Playgroud)
现在我们可以将每个乘以ElementCount
( ElementLength * 2
) 以获得字节数。我们需要考虑<Element>
标签,这也是 238 个原始字节中的每一个 38 个字节:
SELECT (55 * 2) + (98 * 4) + (85 * 6) + (238 * 38)
-- 10,056 bytes !!!
Run Code Online (Sandbox Code Playgroud)
更具体地讲,我们应该将其与原始二进制大小进行比较,以了解膨胀:
SELECT 10056 / 238.0 -- 42.25 times larger !!!
Run Code Online (Sandbox Code Playgroud)
意思是,如果您收到一个 1 MB 的图像(并非不合理),它将由 42.25 MB 的 XML 表示。哎呀!(请参阅下面的更新部分)
但在任何人开始抱怨 XML 之前,这不是 XML 的错,它虽然是一种不可否认的臃肿格式,但可以做得比这好得多。XML(至少在 SQL Server 中)支持使用 Base64 编码/解码处理二进制数据的能力。例如,使用相同的测试 PNG 二进制值,我们可以使用FOR XML PATH
子句将其转换为 XML 中的字符串:
DECLARE @PngImage VARBINARY(MAX);
SET @PngImage = 0x89504E470D0A1A0A0000000D49484452000000140000001408060000008D891D0D0\
000000467414D410000B18F0BFC6105000000097048597300000EC100000EC101B8916BED000000187445\
5874536F667477617265007061696E742E6E657420342E302E36FC8C63DF0000006C49444154384F63180\
5B8C07220FE0EC41E601E15C07E20FE0FC409601E15C030375007881DD0F079200619D88E2406C322408C\
17806213A499583C1B88F182D5400CF222327E0FC420CDD791C4603803884906208D200347930DF980EA0\
6FA007105102B8079A300011818007F3C30C59866F3820000000049454E44AE426082;
SELECT @PngImage AS [PngImage]
FOR XML PATH('Test'), BINARY BASE64;
Run Code Online (Sandbox Code Playgroud)
这给了我们:
<Test>
<PngImage>iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAAABGdBTUEAALGPC/xhBQAAAAlwSFlzAAAOwQAADsEBuJFr7QAAABh0RVh0U29mdHdhcmUAcGFpbnQubmV0IDQuMC42/Ixj3wAAAGxJREFUOE9jGAW4wHIg/g7EHmAeFcB+IP4PxAlgHhXAMDdQB4gd0PB5IAYZ2I4kBsMiQIwXgGITpJlYPBuI8YLVQAzyIjJ+D8QgzdeRxGA4A4hJBiCNIANHkw35gOoG+gBxBRArgHmjAAEYGAB/PDDFmGbzggAAAABJRU5ErkJggg==</PngImage>
</Test>
Run Code Online (Sandbox Code Playgroud)
执行 aDATALENGTH(N'iVBORw0KGg...')
给我们 640 个字节,包括<PngImage>
标签 ( DATALENGTH(N'<PngImage>iVBORw0KGg...</PngImage>')
) 给我们总共 682个字节,不是字符,而是总字节数(仅比原始方法大 2.87 倍),而当前方法的总字节数为 10,056 个。
VARBINARY(MAX)
从 Base64 编码的字符串中获取值需要付出多少努力?绝对没有:
DECLARE @PngImage XML = N'<Test>
<PngImage>iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAAABGdBTUEAALGPC/xhBQAAAAlwSFlzAAAOwQAADsEBuJFr7QAAABh0RVh0U29mdHdhcmUAcGFpbnQubmV0IDQuMC42/Ixj3wAAAGxJREFUOE9jGAW4wHIg/g7EHmAeFcB+IP4PxAlgHhXAMDdQB4gd0PB5IAYZ2I4kBsMiQIwXgGITpJlYPBuI8YLVQAzyIjJ+D8QgzdeRxGA4A4hJBiCNIANHkw35gOoG+gBxBRArgHmjAAEYGAB/PDDFmGbzggAAAABJRU5ErkJggg==</PngImage>
</Test>';
SELECT @PngImage.value('(/Test/PngImage)[1]', 'VARBINARY(MAX)');
Run Code Online (Sandbox Code Playgroud)
返回:
0x89504E470D0...
Run Code Online (Sandbox Code Playgroud)
因此,您可能想向向您发送此数据的任何人提及这一点。Base64 编码相当标准化,并且在大多数语言中都易于编码/解码。
更新
在研究其他东西时,我发现 SQL Server 中的 XML 数据类型优化得相当好。它实际上并没有存储XML文档的全文。至少,它会创建一个唯一元素和属性名称的字典(即数组),为每个元素和属性分配一个数字,并使用该数字在文档中引用它们。这节省了大量的空间。因此,在这个特定的 XML 文档中,不仅节点<Element></Element>
内每个元素的 19 个字符没有重复(UTF-16 编码为 38 个字节),而且在主词典中,名称只出现一次,即使它用于两个不同层次的结构。<image>
Element
只有将文档保存到表格的 XML 字段,然后通过DBCC PAGE
. 查看显示用于文档开销的一些字节数、元素名称字典(仅包含一次“元素”)和 238 个元素(代表 PNG 的字节),每个元素有 5 个字节的开销。
我重新检查了每个变体的大小,这次直接检查原始@xml
变量的大小(如问题中所见),然后将其转换为 bothNVARCHAR
和 to VARCHAR
,然后通过以下更改对 Base64 编码的 XML 进行类似检查上面的倒数第二个测试:
DECLARE @ConvertedXml XML;
SET @ConvertedXml = (
SELECT @PngImage AS [PngImage]
FOR XML PATH('Test'), BINARY BASE64
);
SELECT DATALENGTH(@PngImage) AS [PNG],
DATALENGTH(@ConvertedXml) AS [XmlBytes],
DATALENGTH(CONVERT(NVARCHAR(4000), @ConvertedXml)) AS [NVarCharBytes],
DATALENGTH(CONVERT(VARCHAR(4000), @ConvertedXml)) AS [VarCharBytes];
Run Code Online (Sandbox Code Playgroud)
结果:
Format Size (in bytes)
PNG (original file) 238
XML in VARCHAR (or ASCII text file) 5094
XML in NVARCHAR (or UTF-16 text file) 10188
XML datatype 2295
Base64-XML datatype 690
Base64-NVARCHAR 708
Base64-VARCHAR 354
Run Code Online (Sandbox Code Playgroud)
如您所见,大小因存储方式而异。问题中的 XML 最初来自一个文件。所以那个文件,可能是常规的 ASCII / ANSI 文本文件,是 5094 字节。如果它存储在一个VARCHAR
字段中,它将是相同的大小。如果文件是用 UTF-16 编码保存的,那么它实际上是 10,188 字节,如果存储在一个NVARCHAR
字段中,则大小相同。但同样的 XML 文档,存储在 SQL Server 中的XML
字段或变量中,只有 2295 个字节!这有点酷:-)。尽管如此,2295 字节仍然比原始 PNG 文件大 10 倍左右。
但是 Base64 编码仍然是在 XML 中存储和传输二进制值的最佳方式。当存储在XML
字段或变量中时,完全相同的 PNG 文件只有 690 字节。如果将该 XML 存储在NVARCHAR
字段或 UTF-16 文本文件中,它也只有 708 个字节,如果存储在VARCHAR
字段或 ASCII 文本文件中,则只有 354 个字节。