use*_*585 2 snowflake-cloud-data-platform
我们在 Snowflake 中有一个名为“portfolio”的临时表,它有一个名为“cdc_xml”的 Variant 列,用于存储由 Snowpipe 通过 S3 加载的 XML 文档。
XML 看起来像:
<xyz>
<jmsTimestamp>1570068080385</jmsTimestamp>
<portfolio>
<id>1234</id>
<portfolioNumber>909</portfolioNumber>
<portfolioName>Hello World</portfolioName>
<master>
<attribute fieldName="active" value="1" oldValue="0"/>
<attribute fieldName="name" value="Hello Co" oldValue="Hello Company"/>
<attribute fieldName="startDate" value="04/02/1988" oldValue="04/01/1988"/>
</master>
<characteristics>
<characteristic fieldName="currency" value="JPY" oldValue="USD"/>
<characteristic fieldName="duplicate" value="YES" oldValue="NO"/>
<characteristic fieldName="clone" value="TRUE" oldValue="FALSE"/>
</characteristics>
</portfolio>
</xyz>
Run Code Online (Sandbox Code Playgroud)
以下是 Snowflake 横向展平代码,用于解析 XML 以检索<master><attribute>级别的所有“@fieldName”和“@value”以及级别的所有“@fieldName”和“@value” <characteristics><characteristic>。所有这些数据都将作为名称-值对进行检索。
-- flatten the characteristics nested structure to get all characteristic nvps
select 'XYZ' as source_name,
xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'id'):"$"::string as source_portfolio_id,
xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'portfolioNumber'):"$"::string as portfolio_number,
xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'portfolioName'):"$"::string as name,
get(flt1.value, '@fieldName')::string as field_name,
nvl(decode(get(flt1.value, '@value')::string, '', null, get(flt1.value, '@value')::string), '\b') as field_value -- deletion CDC if new value is null or empty
from staging.portfolio src1,
lateral flatten(xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'characteristics'):"$") flt1
union
-- flatten the master nested structure to get all attribute nvps
select 'XYZ' as source_name,
xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'id'):"$"::string as source_portfolio_id,
xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'portfolioNumber'):"$"::string as portfolio_number,
xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'portfolioName'):"$"::string as name,
get(flt2.value, '@fieldName')::string as field_name,
nvl(decode(get(flt2.value, '@value')::string, '', null, get(flt2.value, '@value')::string), '\b') as field_value -- deletion CDC if new value is null or empty
from staging.portfolio src2,
lateral flatten(xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'master'):"$") flt2
Run Code Online (Sandbox Code Playgroud)
它适用于上面提供的示例。但是,如果 XML 如下所示(只有 1 个嵌套<master><attribute>结构实例),<master><attribute>则无法解析 的1 个实例,并且其“@fieldName”和“@value”都是 NULL(而不是“startDate”和“ 11/02/1988")。
类似地,如果 XML 看起来像底部的那个(只有 1 个嵌套<characteristics><characteristic>结构的实例),<characteristics><characteristic>则无法解析 的1 个实例,并且其“@fieldName”和“@value”都是 NULL(而不是“克隆”和“真”)。
任何帮助表示赞赏。提前致谢!
<xyz>
<jmsTimestamp>1570068080300</jmsTimestamp>
<portfolio>
<id>9876</id>
<portfolioNumber>808</portfolioNumber>
<portfolioName>Another Example</portfolioName>
<master>
<attribute fieldName="startDate" value="11/02/1988" oldValue="11/01/1988"/>
</master>
<characteristics>
<characteristic fieldName="currency" value="JPY" oldValue="USD"/>
<characteristic fieldName="duplicate" value="YES" oldValue="NO"/>
<characteristic fieldName="clone" value="TRUE" oldValue="FALSE"/>
</characteristics>
</portfolio>
</xyz>
Run Code Online (Sandbox Code Playgroud)
<xyz>
<jmsTimestamp>1570068080300</jmsTimestamp>
<portfolio>
<id>9876</id>
<portfolioNumber>808</portfolioNumber>
<portfolioName>Another Example</portfolioName>
<master>
<attribute fieldName="active" value="0" oldValue="1"/>
<attribute fieldName="name" value="Example Inc" oldValue="Example LLC"/>
<attribute fieldName="startDate" value="11/02/1988" oldValue="11/01/1988"/>
</master>
<characteristics>
<characteristic fieldName="clone" value="TRUE" oldValue="FALSE"/>
</characteristics>
</portfolio>
</xyz>
Run Code Online (Sandbox Code Playgroud)
与 Simeon Pilgrim 刚刚提供的解决方案非常相似,您可以无条件地将每个元素列表转换为数组,以避免让 FLATTEN 尝试将元素“分解”为其组件属性(这就是您正在经历的)。因此,这也将起作用:
select 'XYZ' as source_name,
xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'id'):"$"::string as source_portfolio_id,
xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'portfolioNumber'):"$"::string as portfolio_number,
xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'portfolioName'):"$"::string as name,
get(flt1.value, '@fieldName')::string as field_name,
nvl(decode(get(flt1.value, '@value')::string, '', null, get(flt1.value, '@value')::string), '\b') as field_value -- deletion CDC if new value is null or empty
from staging.portfolio src1,
lateral flatten(to_array(xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'characteristics'):"$")) flt1
union
-- flatten the master nested structure to get all attribute nvps
select 'XYZ' as source_name,
xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'id'):"$"::string as source_portfolio_id,
xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'portfolioNumber'):"$"::string as portfolio_number,
xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'portfolioName'):"$"::string as name,
get(flt2.value, '@fieldName')::string as field_name,
nvl(decode(get(flt2.value, '@value')::string, '', null, get(flt2.value, '@value')::string), '\b') as field_value -- deletion CDC if new value is null or empty
from staging.portfolio src2,
lateral flatten(to_array(xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'master'):"$")) flt2```
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1239 次 |
| 最近记录: |