如何剪切 HTML 以便保留结束标记?

Mon*_*LiH 7 html javascript server-side next.js

如何创建以 HTML 格式存储的博客文章的预览?换句话说,我如何“剪切”HTML,确保标签正确关闭?目前,我正在前端渲染整个内容(使用 react dangerouslySetInnerHTML),然后设置overflow: hiddenheight: 150px。我更喜欢一种可以直接剪切 HTML 的方式。这样我就不需要将整个 HTML 流发送到前端;如果我有 10 个博客文章预览,那将是很多访问者甚至看不到的 HTML 发送。

如果我有 HTML(说这是整个博客文章)

<body>
   <h1>Test</h1>
   <p>This is a long string of text that I may want to cut.. blah blah blah foo bar bar foo bar bar</p>
</body>
Run Code Online (Sandbox Code Playgroud)

尝试对其进行切片(以进行预览)将不起作用,因为标签将变得不匹配:

<body>
   <h1>Test</h1>
   <p>This is a long string of text <!-- Oops! unclosed tags -->
Run Code Online (Sandbox Code Playgroud)

我真正想要的是这个:

<body>
   <h1>Test</h1>
   <p>This is a long string of text</p>
</body>
Run Code Online (Sandbox Code Playgroud)

我正在使用 next.js,所以任何 node.js 解决方案都应该可以正常工作。有没有办法做到这一点(例如 next.js 服务器端的库)?或者我只需要自己解析 HTML(服务器端)然后修复未关闭的标签?

Som*_*ceS 0

猜测每个预渲染元素的高度是相当复杂的。但是,您可以使用以下伪规则按字符数剪切条目:

    1. 首先定义要保留的最大字符数。
    1. 从一开始:如果您遇到 HTML 标记(通过正则表达式< .. >或来识别它< .. />),请找到结束标记。
    1. 然后从停下来的地方继续搜索标签。

我刚刚写的一个快速建议javascript(可能可以改进,但这就是想法):

let str = `<body>
   <h1>Test</h1>
   <p>This is a long string of text that I may want to cut.. blah blah blah foo bar bar foo bar bar</p>
</body>`;

const MAXIMUM = 100; // Maximum characters for the preview
let currentChars = 0; // Will hold how many characters we kept until now

let list = str.split(/(<\/?[A-Za-z0-9]*>)/g); // split by tags

const isATag = (s) => (s[0] === '<'); // Returns true if it is a tag
const tagName = (s) => (s.replace('<', '').replace('>', '').replace('\/', '')) // Get the tag name
const findMatchingTag = (list, i) => {
    let name = tagName(list[i]);
    let searchingregex = new RegExp(`<\/ *${name} *>`,'g'); // The regex for closing mathing tag
    let sametagregex = new RegExp(`< *${name} *>`,'g'); // The regex for mathing tag (in case there are inner scoped same tags, we want to pass those)
    let buffer = 0; // Will count how many tags with the same name are in an inner hirarchy level, we need to pass those
    for(let j=i+1;j<list.length;j++){
        if(list[j].match(sametagregex)!=null) buffer++;
        if(list[j].match(searchingregex)!=null){
            if(buffer>0) buffer--;
            else{
                return j;
            }
        }
    }
    return -1;
}

let k = 0;
let endCut = false;
let cutArray = new Array(list.length);
while (currentChars < MAXIMUM && !endCut && k < list.length) { // As long we are still within the limit of characters and within the array
    if (isATag(list[k])) { // Handling tags, finding the matching tag
        let matchingTagindex = findMatchingTag(list, k);
        if (matchingTagindex != -1) {
            if (list[k].length + list[matchingTagindex].length + currentChars < MAXIMUM) { // If icluding both the tag and its closing exceeds the limit, do not include them and end the cut proccess
                currentChars += list[k].length + list[matchingTagindex].length;
                cutArray[k] = list[k];
                cutArray[matchingTagindex] = list[matchingTagindex];
            }
            else {
                endCut = true;
            }
        }
        else {
            if (list[k].length + currentChars < MAXIMUM) { // If icluding the tag exceeds the limit, do not include them and end the cut proccess
                currentChars += list[k].length;
                cutArray[k] = list[k];
            }
            else {
                endCut = true;
            }
        }
    }
    else { // In case it isn't a tag - trim the text
        let cutstr = list[k].substring(0, MAXIMUM - currentChars)
        currentChars += cutstr.length;
        cutArray[k] = cutstr;
    }
    k++;
}

console.log(cutArray.join(''))
Run Code Online (Sandbox Code Playgroud)