使用ASP.NET,如何可靠地从给定字符串中剥离HTML标记(即不使用正则表达式)?我正在寻找像PHP这样的东西strip_tags.
<ul><li>Hello</li></ul>
"你好"
我试图不重新发明轮子,但到目前为止我还没有找到满足我需求的东西.
如何删除所有HTML标记,包括在C#中使用正则表达式.我的字符串看起来像
"<div>hello</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div> </div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div>"
Run Code Online (Sandbox Code Playgroud) 我在这里看到了一些相关的问题,但他们并没有完全谈论我面临的同样问题.
我想使用HTML Agility Pack从HTML中删除不需要的标记,而不会丢失标记中的内容.
例如,在我的场景中,我想保留标签" b"," i"和" u".
并输入如下:
<p>my paragraph <div>and my <b>div</b></div> are <i>italic</i> and <b>bold</b></p>
生成的HTML应为:
my paragraph and my <b>div</b> are <i>italic</i> and <b>bold</b>
我尝试使用HtmlNode的Remove方法,但它也删除了我的内容.有什么建议?
如何从以下字符串中删除HTML标记?
<P style="MARGIN: 0cm 0cm 10pt" class=MsoNormal><SPAN style="LINE-HEIGHT: 115%;
FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #333333; FONT-SIZE: 9pt">In an
email sent just three days before the Deepwater Horizon exploded, the onshore
<SPAN style="mso-bidi-font-weight: bold"><b>BP</b></SPAN> manager in charge of
the drilling rig warned his supervisor that last-minute procedural changes were
creating "chaos". April emails were given to government investigators by <SPAN
style="mso-bidi-font-weight: bold"><b>BP</b></SPAN> and reviewed by The Wall
Street Journal and are the most direct evidence yet that workers on the rig
were unhappy …Run Code Online (Sandbox Code Playgroud) 例如:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>title</title>
</head>
<body>
<a href="aaa.asp?id=1"> I want to get this text </a>
<div>
<h1>this is my want!!</h1>
<b>this is my want!!!</b>
</div>
</body>
</html>
Run Code Online (Sandbox Code Playgroud)
结果是:
I want to get this text
this is my want!!
this is my want!!!
Run Code Online (Sandbox Code Playgroud) 我有包含 HTML 图像的字符串,例如:
string str = "There is some nice <img alt='img1' src='img/img1.png' /> images in this <img alt='img2' src='img/img2.png' /> string. I would like to ask you <img alt='img3' src='img/img3.png' /> how Can I can I get the Lenght of the string?";
Run Code Online (Sandbox Code Playgroud)
我想获取没有图像的字符串长度和图像数量。所以,结果应该是:
int strLenght = 111;
int imagesCount= 3;
Run Code Online (Sandbox Code Playgroud)
请问您能告诉我最有效的方法吗?
谢谢