正则表达式取代乱码

Jos*_*shB 11 regex

我必须清理OCR的一些输入,它将手写识别为乱码.有关正则表达式清除随机字符的任何建议吗?例:


Federal prosecutors on Monday charged a Miami man with the largest 
case of credit and debit card data theft ever in the United States, 
accusing the one-time government informant of swiping 130 million 
accounts on top of 40 million he stole previously.

, ':, Ie
':... 11'1
. '(.. ~!' ': f I I
. " .' I ~
I' ,11 l
I I I ~ \ :' ,! .~ , .. r, 1 , ~ I . I' , .' I ,.
, i
I ; J . I.' ,.\ ) ..
. : I
'I', I
.' '
r,"

Gonzalez is a former informant for the U.S. Secret Service who helped 
the agency hunt hackers, authorities say. The agency later found out that 
he had also been working with criminals and feeding them information 
on ongoing investigations, even warning off at least one individual, 
according to authorities.

eh....l
~.\O ::t
e;~~~
s: ~ ~. 0
qs c::; ~ g
o t/J (Ii .,
::3 (1l Il:l
~ cil~ 0 2:
t:lHj~(1l
. ~ ~a
0~ ~ S'
N ("b t/J :s
Ot/JIl:l"-<:!
v'g::!t:O
-....c......
VI (:ll <' 0
:= - ~
< (1l ::3
(1l ~ '
t/J VJ ~
Pl
.....
....
(II

Rus*_*ell 0

好吧,一组符号会匹配一些乱码。也许可以查一下字典中的单词?

似乎有很多乱码所在的换行符,所以这也可能是一个指标。