Chu*_*yTM 0 xml xpath xquery xquery-3.0
我有一个带有连续标签的XML文件,而不是嵌套标签,如下所示:
<title>
<subtitle>
<topic att="TopicTitle">Topic title 1</topic>
<content att="TopicSubtitle">topic subtitle 1</content>
<content att="Paragraph">paragraph text 1</content>
<content att="Paragraph">paragraph text 2</content>
<content att="TopicSubtitle">topic subtitle 2</content>
<content att="Paragraph">paragraph text 1</content>
<content att="Paragraph">paragraph text 2</content>
<topic att="TopicTitle">Topic title 2</topic>
<content att="TopicSubtitle">topic subtitle 1</content>
<content att="Paragraph">paragraph text 1</content>
<content att="Paragraph">paragraph text 2</content>
<content att="TopicSubtitle">topic subtitle 2</content>
<content att="Paragraph">paragraph text 1</content>
<content att="Paragraph">paragraph text 2</content>
</subtitle>
</title>
Run Code Online (Sandbox Code Playgroud)
我在BaseX中使用XQuery,我想将其转换为包含以下列的表:
Title Subtitle TopicTitle TopicSubtitle Paragraph
Irrelevant Irrelevant Topic title 1 Topic Subtitle 1 paragraph text 1
Irrelevant Irrelevant Topic title 1 Topic Subtitle 1 paragraph text 2
Irrelevant Irrelevant Topic title 1 Topic Subtitle 2 paragraph text 1
Irrelevant Irrelevant Topic title 1 Topic Subtitle 2 paragraph text 2
Irrelevant Irrelevant Topic title 2 Topic Subtitle 1 paragraph text 1
Irrelevant Irrelevant Topic title 2 Topic Subtitle 1 paragraph text 2
Irrelevant Irrelevant Topic title 2 Topic Subtitle 2 paragraph text 1
Irrelevant Irrelevant Topic title 2 Topic Subtitle 2 paragraph text 2
Run Code Online (Sandbox Code Playgroud)
我是XQuery和XPath的新手,但我已经了解了如何浏览节点并选择我需要的基础知识.我还不知道如何使用我想要转换为嵌套XML或表格的连续数据(CSV?).有人可以帮忙吗?
您可以使用tumbling window(https://www.w3.org/TR/xquery-30/#id-windows)将扁平XML转换为嵌套XML,例如
for tumbling window $w in title/subtitle/*
start $t when $t instance of element(topic)
return
<topic
title="{$t/@att}">
{
for tumbling window $content in tail($w)
start $c when $c/@att = 'TopicSubtitle'
return
<subtopic
title="{$c/@att}">
{
tail($content) ! <para>{node()}</para>
}
</subtopic>
}
</topic>
Run Code Online (Sandbox Code Playgroud)
给
<topic title="TopicTitle">
<subtopic title="TopicSubtitle">
<para>paragraph text 1</para>
<para>paragraph text 2</para>
</subtopic>
<subtopic title="TopicSubtitle">
<para>paragraph text 1</para>
<para>paragraph text 2</para>
</subtopic>
</topic><topic title="TopicTitle">
<subtopic title="TopicSubtitle">
<para>paragraph text 1</para>
<para>paragraph text 2</para>
</subtopic>
<subtopic title="TopicSubtitle">
<para>paragraph text 1</para>
<para>paragraph text 2</para>
</subtopic>
</topic>
Run Code Online (Sandbox Code Playgroud)
基于此,我认为您可以将整个转换为分号分隔数据
string-join(
<title>
<subtitle>
{
for tumbling window $w in title/subtitle/*
start $t when $t instance of element(topic)
return
<topic
title="{$t/@att}"
value="{$t}">
{
for tumbling window $content in tail($w)
start $c when $c/@att = 'TopicSubtitle'
return
<subtopic
title="{$c/@att}"
value="{$c}">
{
tail($content) ! <para>{node()}</para>
}
</subtopic>
}
</topic>
}
</subtitle>
</title>//para ! string-join(ancestor-or-self::* ! (text(), @value, 'Irrelevant')[1], ';'), ' ')
Run Code Online (Sandbox Code Playgroud)