Jas*_*son 0 vb.net asp.net infinite-loop
所以我今天刚刚把我的网站从服务器上踢了,我觉得这个功能是罪魁祸首.谁能告诉我这是什么问题?我似乎无法弄清楚:
Public Function CleanText(ByVal str As String) As String
'removes HTML tags and other characters that title tags and descriptions don't like
If Not String.IsNullOrEmpty(str) Then
'mini db of extended tags to get rid of
Dim indexChars() As String = {"<a", "<img", "<input type=""hidden"" name=""tax""", "<input type=""hidden"" name=""handling""", "<span", "<p", "<ul", "<div", "<embed", "<object", "<param"}
For i As Integer = 0 To indexChars.GetUpperBound(0) 'loop through indexchars array
Dim indexOfInput As Integer = 0
Do 'get rid of links
indexOfInput = str.IndexOf(indexChars(i)) 'find instance of indexChar
If indexOfInput <> -1 Then
Dim indexNextLeftBracket As Integer = str.IndexOf("<", indexOfInput) + 1
Dim indexRightBracket As Integer = str.IndexOf(">", indexOfInput) + 1
'check to make sure a right bracket hasn't been left off a tag
If indexNextLeftBracket > indexRightBracket Then 'normal case
str = str.Remove(indexOfInput, indexRightBracket - indexOfInput)
Else
'add the right bracket right before the next left bracket, just remove everything
'in the bad tag
str = str.Insert(indexNextLeftBracket - 1, ">")
indexRightBracket = str.IndexOf(">", indexOfInput) + 1
str = str.Remove(indexOfInput, indexRightBracket - indexOfInput)
End If
End If
Loop Until indexOfInput = -1
Next
End If
Return str
End Function
Run Code Online (Sandbox Code Playgroud)
这样的事情会不会更简单?(好的,我知道它与发布的代码不一样):
public string StripHTMLTags(string text)
{
return Regex.Replace(text, @"<(.|\n)*?>", string.Empty);
}
Run Code Online (Sandbox Code Playgroud)
(转换为VB.NET应该是微不足道的!)
注意:如果您经常运行此功能,则可以对该功能进行两项性能改进Regex.
一种是使用预编译的表达式,这需要稍微重写.
第二种是使用正则表达式的非捕获形式; .NET正则表达式实现了(?:)语法,允许进行分组而不会导致被捕获文本的性能损失被记住为反向引用.使用此语法,上面的正则表达式可以更改为:
@"<(?:.|\n)*?>"
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
287 次 |
| 最近记录: |