grep/regex找不到重音词

Question

grep/regex找不到重音词

God*_*her 4 regex unicode grep cat non-ascii-characters

我正在尝试安装一个正则表达式,在文件中得到一些单词,这个单词的所有字母都与单词模式匹配.

我的问题是,正则表达式找不到重音词,但在我的文本文件中有很多重音词.

我的命令行是:

cat input/words.txt | grep '^[éra]\{1,4\}$' > output/words_era.txt
cat input/words.txt | grep '^[carroça]\{1,7\}$' > output/words_carroca.txt

Run Code Online (Sandbox Code Playgroud)

而文件的内容是:

carroça
éra
éssa
roça
roco
rato
onça
orça
roca

Run Code Online (Sandbox Code Playgroud)

我该如何解决？

Answer 1

eph*_*ent 8

如果您的文件是用ISO-8859-1编码的,但您的系统区域设置是UTF-8,则不起作用.

将文件转换为UTF-8或将系统区域设置更改为ISO-8859-1.

# convert from ISO-8859-1 to the environmental locale before grepping
# output will be in the current locale
$ iconv -f 8859_1 input/words.txt | grep ...

# run grep with an ISO-8859-1 locale
# output will be in ISO-8859-1 encoding
$ cat input/words.txt | env LC_ALL=en_US grep ...

归档时间：	14 年，12 月前
查看次数：	5785 次
最近记录：	10 年，9 月前