从文本文件的数据中取平均值

Question

从文本文件的数据中取平均值

我有一个文本文件，如下所示，其中字符串之间有两列：

1   23
2   29
3   21
4   18
5   19
6   18
7   19
8   24
Cluster analysis done for this configuration!

1   23
2   22
3   19
4   18
5   23
6   17
7   19
8   31
9   21
10   27
11   19
Cluster analysis done for this configuration!

1   22
2   26
3   27
4   23
5   25
6   32
7   23
8   19
9   19
10   18
11   30
12   21
13   23
14   16
Cluster analysis done for this configuration!

1   23
2   19
3   23
4   27
5   20
6   17
7   15
8   22
9   16
10   23
11   20
12   23
Cluster analysis done for this configuration!

Run Code Online (Sandbox Code Playgroud)

所需的输出是：

1 22.75
2 24.0
3 22.5
4 21.5
5 21.75
6 21.0
7 19.0
8 24.0
9 18.666666666666668
10 22.666666666666668
11 23.0
12 22.0
13 23.0
14 16.0

Run Code Online (Sandbox Code Playgroud)

我想获得第一列中每个数字的平均值。如果我拿这个例子来说，对应于“1”的平均值将是：(23+23+22+23)/4 = 22.75 等等对于“2”、“3”……请注意总数字符串 'Cluster analysis....' 之间的行不相同不过没关系。例如，在这种情况下，“14”的平均值仅为 16，因为除了“3rd”块之外，没有其他数字对应于“14”。

我在想，不知何故，人们需要打印字符串“集群分析......”之间的所有数字。然后可能是一个数组中的存储，然后只是做一个平均但无法在代码中实现它。谁能给我一个线索？

我对编码语言没有任何偏好；它只需要解决问题。我正在考虑 bash/shell，但也欢迎使用 python。

Answer 1

Enr*_*lis 6

awk '/^[0-9]+ +[0-9]+$/ { # pick only lines with two numbers
         arr[$1] += $2    # accumulate the numbers in indexed bins
         n[$1]++          # keep track of how may numbers are in each bin
     }
     END {                     # finally,
         for (e in arr)        # for each bin
             print arr[e]/n[e] # divide
     }' your_input_file

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，7 月前
查看次数：	142 次
最近记录：	5 年，7 月前