计算多少行以哪些字符开头

Question

计算多少行以哪些字符开头

cod*_*ode 1 command-line grep text-processing

我有一个文件名中包含多行的文件。

我想一次性计算有多少行以字符“a”、“b”等开头。

我应该执行什么命令。？

Answer 1

Sté*_*las 8

对于单字符字母：

< file cut -c1 | grep '[[:alpha:]]' | LC_ALL=C sort | LC_ALL=C uniq -c | sort -k 2

Run Code Online (Sandbox Code Playgroud)

要处理组合字符，如果在 utf-8 语言环境中：

< file PERLIO=:utf8 perl -Mlocale -MUnicode::Normalize -lne '
  $_=NFKD($_); $n{$&}++ if /^[[:alpha:]]/u && /^\X/u;
  END{for $i (sort keys %n) {print "$n{$i} $i"}}'

Run Code Online (Sandbox Code Playgroud)

（替换为$n{$&}与$n{lc$&}案例无关的计数）。

在像这样的输入上：

fix
été
-dash-
e?léphant
?????????
??????
alphabet
3com
foo
?-letter
?-letter

Run Code Online (Sandbox Code Playgroud)

在我的语言环境中，第一个将输出：

Run Code Online (Sandbox Code Playgroud)

因为在è？léphant以上（这在我的Firefox版本显示的不正确的，因为它把重音的方式l），第一个é被写成两个Unicode字符e和\U0301（组合重音符），而在été，它的\U00E9预组成e急性口音。

第二个将输出：

1 ?
1 ?
1 a
2 é
2 f
1 ?
1 ?

Run Code Online (Sandbox Code Playgroud)

（在那里，的所有变体é都已转换为e\U0301（规范化分解版本））。

虽然cut -c 1 | grep '[[:alpha:]]' | sort | uniq -c会输出：

Run Code Online (Sandbox Code Playgroud)

因为在我的语言环境中，排序顺序?，并?没有规定，所以他们的排序相同，算作同一尽可能sort和uniq关注。

（请注意，您需要cut上面的 POSIX 。我的 GNU 版本cut不是因为它将字符视为字节，所以我必须为此使用cut内置的ksh93）。

如果数据只是 US-ASCII，您可以将其简化为：

(export LC_ALL=C; < file cut -c 1 | grep '[[:alpha:]]' | sort | uniq -c)

Run Code Online (Sandbox Code Playgroud)

或者，如果您想报告0未找到的 52 个美国 ASCII 字母中的任何一个：

< file LC_ALL=C awk '{n[substr($0,1,1)]++};END{
  for(i=65;i<=122;i++) if (i < 91 || i > 96) {
    c=sprintf("%c",i);print 0+n[c], c}}'

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，2 月前
查看次数：	2578 次
最近记录：	10 年，1 月前