根据特定的分隔符解析txt文件,然后将其转换为CSV文件

HB8*_*B87 4 text-processing csv

我有以下名为 OpenSimStats.txt 的文件:

TestreportsRootAgentCount=0agent(s)
TestreportsChildAgentCount=0childagent(s)
TestreportsGCReportedMemory=10MB(Global)
TestreportsTotalObjectsCount=0Object(s)
TestreportsTotalPhysicsFrameTime=0ms
TestreportsPhysicsUpdateFrameTime=0ms
TestreportsPrivateWorkingSetMemory=2144MB(Global)
TestreportsTotalThreads=0Thread(s)(Global)
TestreportsTotalFrameTime=89ms
TestreportsTotalEventFrameTime=0ms
TestreportsLandFrameTime=0ms
TestreportsLastCompletedFrameAt=25msago
TestreportsTimeDilationMonitor=1
TestreportsSimFPSMonitor=55.3333320617676
TestreportsPhysicsFPSMonitor=55.4766654968262
TestreportsAgentUpdatesPerSecondMonitor=0persecond
TestreportsActiveObjectCountMonitor=0
TestreportsActiveScriptsMonitor=0
TestreportsScriptEventsPerSecondMonitor=0persecond
TestreportsInPacketsPerSecondMonitor=0persecond
TestreportsOutPacketsPerSecondMonitor=0persecond
TestreportsUnackedBytesMonitor=0
TestreportsPendingDownloadsMonitor=0
TestreportsPendingUploadsMonitor=0
TestreportsTotalFrameTimeMonitor=18.18239402771ms
TestreportsNetFrameTimeMonitor=0ms
TestreportsPhysicsFrameTimeMonitor=0.0106373848393559ms
TestreportsSimulationFrameTimeMonitor=0.17440040409565ms
TestreportsAgentFrameTimeMonitor=0ms
TestreportsImagesFrameTimeMonitor=0ms
TestreportsSpareFrameTimeMonitor=18.1818199157715ms
TestreportsLastReportedObjectUpdates=0
TestreportsSlowFrames=1
Run Code Online (Sandbox Code Playgroud)

我想将此文件转换为 CSV 文件,如下所示:

TestreportsRootAgentCount,TestreportsChildAgentCount,...,TestreportsSlowFrames
0,0,10,0,0...,1
Run Code Online (Sandbox Code Playgroud)

我的意思是:

  1. 在这种情况下,取出分隔符前后的所有单词,分隔符为“=”
  2. 将分隔符左侧的所有单词放在一行中,以逗号分隔
  3. 在末尾插入一个新行
  4. 然后在分隔符 ( =)之后放置任何内容- 仅数字(数字后没有单位或字符)在另一行中,这些数字用逗号分隔。
  5. 然后插入一个新行

关于如何在 Linux shell 脚本中完成此操作的任何想法/建议?通过使用 sed 还是 gawk?

Sat*_*ura 9

OpenSim 启蒙的 9 条途径:

随着sed一些外壳魔术:

sed 's/=.*//' OpenSimStats.txt | paste -sd, >out.csv
sed 's/.*=//; s/[^0-9]*$//' OpenSimStats.txt | paste -sd, >>out.csv
Run Code Online (Sandbox Code Playgroud)

sed,没有壳魔法:

sed -n 's/=.*//; 1{ h; b; }; $! H; $ { x; s/\n/,/g; p; }' OpenSimStats.txt >out.csv
sed -n 's/.*=//; 1{ s/[0-9]*$//; h; b; }; s/[^0-9]*$//; $! H; $ { x; s/\n/,/g; p; }' OpenSimStats.txt >>out.csv
Run Code Online (Sandbox Code Playgroud)

使用 shell 魔法和一点点sed

paste -sd, <(cut -d= -f1 OpenSimStats.txt) <(cut -d= -f2 OpenSimStats.txt | sed 's/[^0-9]*$//')
Run Code Online (Sandbox Code Playgroud)

随着cut一些外壳魔术:

cut -d= -f1 OpenSimStats.txt | paste -sd, >out.csv
cut -d= -f2 OpenSimStats.txt | sed 's/[^0-9]*$//' | paste -sd, >>out.csv
Run Code Online (Sandbox Code Playgroud)

使用 GNU datamash

sed 's/=/,/; s/[^0-9]*$//' OpenSimStats.txt | datamash -t, transpose
Run Code Online (Sandbox Code Playgroud)

perl

perl -lnE 's/\D+$//o;
    ($a, $b) = split /=/;
    push @a, $a; push @b, $b;
    END { $, = ","; say @a; say @b }' OpenSimStats.txt
Run Code Online (Sandbox Code Playgroud)

grep

grep -o '^[^=]*' OpenSimStats.txt | paste -sd, >out.csv
egrep -o '[0-9.]+' OpenSimStats.txt | paste -sd, >>out.csv
Run Code Online (Sandbox Code Playgroud)

bash

#! /usr/bin/env bash
line1=()
line2=()
while IFS='=' read -r a b; do
    line1+=("$a")
    [[ $b =~ ^[0-9.]+ ]]
    line2+=("$BASH_REMATCH")
done <OpenSimStats.txt
( set "${line1[@]}"; IFS=,; echo "$*" ) >out.csv
( set "${line2[@]}"; IFS=,; echo "$*" ) >>out.csv
Run Code Online (Sandbox Code Playgroud)

awk

awk -F= '
    NR==1 { a = $1; sub(/[^0-9]+$/, "", $2); b = $2; next }
    { a = a "," $1; sub(/[^0-9]+$/, "", $2); b = b "," $2 }
    END { print a; print b }' OpenSimStats.txt
Run Code Online (Sandbox Code Playgroud)

数据书呆子的第 10 条奖励路径,包括csvtk

csvtk replace -d= -f 2 -p '\D+$' -r '' <OpenSimStats.txt | csvtk transpose
Run Code Online (Sandbox Code Playgroud)

奖励第 11 条路径vim

:%s/\D*$//
:%s/=/\r/
qaq
:g/^\D/y A | normal dd
:1,$-1 s/\n/,/
"aP
:2,$-2 s/\n/,/
:d 1
:w out.csv
Run Code Online (Sandbox Code Playgroud)


oli*_*liv 8

awk 来帮助你:

awk -F= '{a[NR,1]=$1;a[NR,2]=$2}
         END{
            for(i=1; i<NR; i++){
                printf a[i,1] ","
            }
            print a[i,1]; 
            for(i=1; i<NR; i++){
                printf "%s", a[i,2]+0
            } 
            print a[i,2];
        }' file
Run Code Online (Sandbox Code Playgroud)

该数组a填充$1了第一列中的键和$2第二列中的值。

读取所有行后,对数组的所有元素循环两次以显示键和值。