如何使用XQuery将连续标记转换为嵌套标记或表

Chu*_*yTM 0 xml xpath xquery xquery-3.0

我有一个带有连续标签的XML文件,而不是嵌套标签,如下所示:

<title>
    <subtitle>
        <topic att="TopicTitle">Topic title 1</topic>
        <content att="TopicSubtitle">topic subtitle 1</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>
        <content att="TopicSubtitle">topic subtitle 2</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>

        <topic att="TopicTitle">Topic title 2</topic>
        <content att="TopicSubtitle">topic subtitle 1</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>
        <content att="TopicSubtitle">topic subtitle 2</content>
        <content att="Paragraph">paragraph text 1</content>
        <content att="Paragraph">paragraph text 2</content>
    </subtitle>
</title>
Run Code Online (Sandbox Code Playgroud)

我在BaseX中使用XQuery,我想将其转换为包含以下列的表:

Title      Subtitle      TopicTitle      TopicSubtitle      Paragraph
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 1   paragraph text 1
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 1   paragraph text 2
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 2   paragraph text 1
Irrelevant Irrelevant    Topic title 1   Topic Subtitle 2   paragraph text 2
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 1   paragraph text 1
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 1   paragraph text 2
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 2   paragraph text 1
Irrelevant Irrelevant    Topic title 2   Topic Subtitle 2   paragraph text 2
Run Code Online (Sandbox Code Playgroud)

我是XQuery和XPath的新手,但我已经了解了如何浏览节点并选择我需要的基础知识.我还不知道如何使用我想要转换为嵌套XML或表格的连续数据(CSV?).有人可以帮忙吗?

Mar*_*nen 5

您可以使用tumbling window(https://www.w3.org/TR/xquery-30/#id-windows)将扁平XML转换为嵌套XML,例如

for tumbling window $w in title/subtitle/*
    start $t when $t instance of element(topic)
return
    <topic
        title="{$t/@att}">
        {
            for tumbling window $content in tail($w)
                start $c when $c/@att = 'TopicSubtitle'
            return
                <subtopic
                    title="{$c/@att}">
                    {
                        tail($content) ! <para>{node()}</para>
                    }
                </subtopic>
        }
    </topic>
Run Code Online (Sandbox Code Playgroud)

<topic title="TopicTitle">
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
</topic><topic title="TopicTitle">
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
    <subtopic title="TopicSubtitle">
        <para>paragraph text 1</para>
        <para>paragraph text 2</para>
    </subtopic>
</topic>
Run Code Online (Sandbox Code Playgroud)

基于此,我认为您可以将整个转换为分号分隔数据

string-join(
<title>
    <subtitle>
        {
            for tumbling window $w in title/subtitle/*
                start $t when $t instance of element(topic)
            return
                <topic
                    title="{$t/@att}"
                    value="{$t}">
                    {
                        for tumbling window $content in tail($w)
                            start $c when $c/@att = 'TopicSubtitle'
                        return
                            <subtopic
                                title="{$c/@att}"
                                value="{$c}">
                                {
                                    tail($content) ! <para>{node()}</para>
                                }
                            </subtopic>
                    }
                </topic>
        }
    </subtitle>
</title>//para ! string-join(ancestor-or-self::* ! (text(), @value, 'Irrelevant')[1], ';'), '&#10;')
Run Code Online (Sandbox Code Playgroud)