如何获取包含shadowRoot元素的文档或节点中的所有HTML

Question

如何获取包含shadowRoot元素的文档或节点中的所有HTML

Mos*_*oss 3 javascript web-scraping shadow-dom custom-element native-web-component

对于这个问题我还没有看到满意的答案。这基本上是这个问题的重复，但它关闭不当并且给出的答案不充分。

我提出了自己的解决方案，我将在下面发布。

这对于网页抓取很有用，或者在我的例子中，在处理自定义元素的 javascript 库上运行测试。我确保它生成我想要的输出，然后使用此函数来抓取给定测试输出的 HTML，并使用复制的 HTML 作为预期输出，以便将来与测试进行比较。

Answer 1

Mos*_*oss 5

这是一个可以执行请求的函数。请注意，它会忽略 html 注释和其他边缘内容。但它使用 ShadowRoots 检索常规元素、文本节点和自定义元素。它还处理开槽模板内容。它尚未经过详尽的测试，但似乎可以很好地满足我的需求。

extractHTML(document.body)像或一样使用它extractHTML(document.getElementByID('app'))。

function extractHTML(node) {
            
    // return a blank string if not a valid node
    if (!node) return ''

    // if it is a text node just return the trimmed textContent
    if (node.nodeType===3) return node.textContent.trim()

    //beyond here, only deal with element nodes
    if (node.nodeType!==1) return ''

    let html = ''

    // clone the node for its outer html sans inner html
    let outer = node.cloneNode()

    // if the node has a shadowroot, jump into it
    node = node.shadowRoot || node
    
    if (node.children.length) {
        
        // we checked for children but now iterate over childNodes
        // which includes #text nodes (and even other things)
        for (let n of node.childNodes) {
            
            // if the node is a slot
            if (n.assignedNodes) {
                
                // an assigned slot
                if (n.assignedNodes()[0]){
                    // Can there be more than 1 assigned node??
                    html += extractHTML(n.assignedNodes()[0])

                // an unassigned slot
                } else { html += n.innerHTML }                    

            // node is not a slot, recurse
            } else { html += extractHTML(n) }
        }

    // node has no children
    } else { html = node.innerHTML }

    // insert all the (children's) innerHTML 
    // into the (cloned) parent element
    // and return the whole package
    outer.innerHTML = html
    return outer.outerHTML
    
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，7 月前
查看次数：	2974 次
最近记录：	4 年，7 月前