仅从 UTF-8 文件返回拉丁字符的 sed、grep 或 tr 命令

ixt*_*lix 2 grep character-encoding sed regular-expression

我正在处理300首唐诗的文本,不幸的是,这是一个包含中文和英文的文件。由于我对“提取”英语感兴趣,我希望使用sed, grep,tr简单地返回所有包含拉丁字符的行。因此,例如,此文本:

051
????
??
????????

???????? ???????? 
???????? ???????? 
???????? ???????? 
???????? ???????? 
???????? ???????? 
???????? ???????? 
???????? ???????? 
???????? ???????? 
???????? ????????

Seven-character-ancient-verse
Li Qi
ON HEARING AN WANSHAN PLAY THE REED-PIPE

Bamboo from the southern hills was used to make this pipe. 
And its music, that was introduced from Persia first of all, 
Has taken on new magic through later use in China. 
And now the Tartar from Liangzhou, blowing it for me, 
Drawing a sigh from whosoever hears it, 
Is bringing to a wanderer's eyes homesick tears.... 
Many like to listen; but few understand. 
To and fro at will there's a long wind flying, 
Dry mulberry-trees, old cypresses, trembling in its chill. 
There are nine baby phoenixes, outcrying one another; 
A dragon and a tiger spring up at the same moment; 
Then in a hundred waterfalls ten thousand songs of autumn 
Are suddenly changing to The Yuyang Lament; 
And when yellow clouds grow thin and the white sun darkens, 
They are changing still again to Spring in the Willow Trees. 
Like Imperial Garden flowers, brightening the eye with beauty, 
Are the high-hall candles we have lighted this cold night, 
And with every cup of wine goes another round of music.
Run Code Online (Sandbox Code Playgroud)

我想要一个只返回 051 行的命令,跳过中文,然后返回“七字古诗”行和后面的所有内容。

Gil*_*il' 6

下面的Perl命令打印不包含任何中国字符(韩行脚本)。-CIO告诉 perl 输入和输出是用 UTF-8 编码的。

perl -CIO -lne '/\p{Han}/ or print'
Run Code Online (Sandbox Code Playgroud)