Applescript:清理字符串

Spa*_*Dog 2 applescript

我有这个字符串,其中包含我想要删除的非法字符,但我不知道可能存在哪种字符。

\n\n

我构建了一个我不想被过滤的字符列表,并构建了这个脚本(来自我在网络上找到的另一个脚本)。

\n\n
on clean_string(TheString)\n    --Store the current TIDs. To be polite to other scripts.\n    set previousDelimiter to AppleScript\'s text item delimiters\n    set potentialName to TheString\n    set legalName to {}\n    set legalCharacters to {"a", "b", "c", "d", "e", "f", \n"g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r",\n"s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E",\n "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R",\n "S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5",\n "6", "7", "8", "9", "0", "?", "+", "-", "\xc3\x87", "\xc3\xa7", "\xc3\xa1", "\xc3\x81", "\xc3\xa9",\n "\xc3\x89", "\xc3\xad", "\xc3\x8d", "\xc3\xb3", "\xc3\x93", "\xc3\xba", "\xc3\x9a", "\xc3\xa2", "\xc3\x82", "\xc3\xa3", "\xc3\x83", "\xc3\xb1", "\xc3\x91",\n "\xc3\xb5", "\xc3\x95", "\xc3\xa0", "\xc3\x80", "\xc3\xa8", "\xc3\x88", "\xc3\xbc", "\xc3\x9c", "\xc3\xb6", "\xc3\x96", "!", "$", "%",\n "/", "(", ")", "&", "\xe2\x82\xac", "#", "@", "=", "*", "+", "-", ",", ".",\n "\xe2\x80\x93", "_", " ", ":", ";", ASCII character 10, ASCII character 13}\n\n    --Whatever you want to eliminate.\n    --Now iterate through the characters checking them.\n    repeat with thisCharacter in the characters of potentialName\n        set thisCharacter to thisCharacter as text\n        if thisCharacter is in legalCharacters then\n            set the end of legalName to thisCharacter\n            log (legalName as string)\n\n        end if\n    end repeat\n    --Make sure that you set the TIDs before making the\n    --list of characters into a string.\n    set AppleScript\'s text item delimiters to ""\n    --Check the name\'s length.\n    if length of legalName is greater than 32 then\n        set legalName to items 1 thru 32 of legalName as text\n    else\n        set legalName to legalName as text\n    end if\n    --Restore the current TIDs. To be polite to other scripts.\n    set AppleScript\'s text item delimiters to previousDelimiter\n    return legalName\nend clean_string\n
Run Code Online (Sandbox Code Playgroud)\n\n

问题是这个脚本非常慢并且让我超时。

\n\n

我正在做的是逐个字符检查并与 legalCharacters 列表进行比较。如果人物有的话就好了。如果没有,请忽略。

\n\n

有没有一种快速的方法可以做到这一点?

\n\n

就像是

\n\n

“查看 TheString 的每个字符并删除那些不在 legalCharacters 上的字符”

\n\n

\n\n

谢谢你的帮助。

\n

mar*_*dge 6

您遇到了哪些非 ASCII 字符?你的文件编码是什么?

使用 shell 脚本和 tr、sed 或 perl 来处理文本要高效得多。OS X 中默认安装所有语言。

您可以使用带有 tr 的 shell 脚本(如下例所示)来去除回车符,也可以使用 sed 来去除空格(不在下面的示例中):

set clean_text to do shell script "echo " & quoted form of the_string & "| tr -d '\\r\\n' "
Run Code Online (Sandbox Code Playgroud)

技术说明 TN2065:在 AppleScript 中执行 shell 脚本

或者,使用 perl,这将删除非打印字符:

set x to quoted form of "Sample text. smdm#$%%&"
set y to do shell script "echo " & x & " | perl -pe 's/[^[:alnum:]|[:space:]]//g'"
Run Code Online (Sandbox Code Playgroud)

在 SO 中搜索其他使用 tr、sed 和 perl 通过 Applescript 处理文本的示例。或者搜索MacScripter / AppleScript | 论坛