Abd*_*del 7 unix csv whitespace sed
我有以下sed命令:
sed 's/\s/,/g' input > output.csv
Run Code Online (Sandbox Code Playgroud)
(我从这个相关主题得到了命令)
它转为以下输入:
SNP A1 A2 FRQ INFO OR SE P
10:33367054 C T 0.9275 0.9434 1.1685 0.1281 0.1843
10:33367707 G A 0.9476 0.9436 1.0292 0.1530 0.8244
10:33367804 G C 0.4193 1.0443 0.9734 0.0988 0.6443
10:33368119 C A 0.9742 0.9343 1.0201 0.1822 0.9156
Run Code Online (Sandbox Code Playgroud)
成:
SNP,,A1,,A2,,,,,FRQ,,,,INFO,,,,,,OR,,,,,,SE,,,,,,,P
10:33367054,,,C,,,T,,0.9275,,0.9434,,1.1685,,0.1281,,0.1843
10:33367707,,,G,,,A,,0.9476,,0.9436,,1.0292,,0.1530,,0.8244
10:33367804,,,G,,,C,,0.4193,,1.0443,,0.9734,,0.0988,,0.6443
10:33368119,,,C,,,A,,0.9742,,0.9343,,1.0201,,0.1822,,0.9156
Run Code Online (Sandbox Code Playgroud)
我需要一个命令,将多个连续的空格转换为一个commma,给我一个这样的输出:
SNP,A1,A2,FRQ,INFO,OR,SE,P
10:33367054,C,T,0.9275,0.9434,1.1685,0.1281,0.1843
10:33367707,G,A,0.9476,0.9436,1.0292,0.1530,0.8244
10:33367804,G,C,0.4193,1.0443,0.9734,0.0988,0.6443
10:33368119,C,A,0.9742,0.9343,1.0201,0.1822,0.9156
Run Code Online (Sandbox Code Playgroud)
有任何想法吗?
fed*_*qui 12
如果你想使用sed,你可以使用这个:
$ sed 's/ \{1,\}/,/g' file
SNP,A1,A2,FRQ,INFO,OR,SE,P
10:33367054,C,T,0.9275,0.9434,1.1685,0.1281,0.1843
10:33367707,G,A,0.9476,0.9436,1.0292,0.1530,0.8244
10:33367804,G,C,0.4193,1.0443,0.9734,0.0988,0.6443
10:33368119,C,A,0.9742,0.9343,1.0201,0.1822,0.9156
Run Code Online (Sandbox Code Playgroud)
它基于glenn jackman对如何使用sed剥离多空间空间的答案?.
它也可以是
sed 's/[[:space:]]\{1,\}/,/g' file
Run Code Online (Sandbox Code Playgroud)
并且请注意,您可以使用sed -i.bak '...' file以进行就地编辑,以便将原始文件备份为file.bak并file具有已编辑的内容.
但我认为它更清楚tr.有了它,你可以挤压空格,然后用逗号替换它们中的每一个:
$ tr -s ' ' < file | tr ' ' ','
SNP,A1,A2,FRQ,INFO,OR,SE,P
10:33367054,C,T,0.9275,0.9434,1.1685,0.1281,0.1843
10:33367707,G,A,0.9476,0.9436,1.0292,0.1530,0.8244
10:33367804,G,C,0.4193,1.0443,0.9734,0.0988,0.6443
10:33368119,C,A,0.9742,0.9343,1.0201,0.1822,0.9156
Run Code Online (Sandbox Code Playgroud)
分段:
$ tr -s ' ' < file
SNP A1 A2 FRQ INFO OR SE P
10:33367054 C T 0.9275 0.9434 1.1685 0.1281 0.1843
10:33367707 G A 0.9476 0.9436 1.0292 0.1530 0.8244
10:33367804 G C 0.4193 1.0443 0.9734 0.0988 0.6443
10:33368119 C A 0.9742 0.9343 1.0201 0.1822 0.9156
Run Code Online (Sandbox Code Playgroud)
来自man tr:
tr [选项] ... SET1 [SET2]
从标准输入翻译,挤压和/或删除字符,写入标准输出.
-s, - 挤压重复
用一次出现的该字符替换SET1中列出的重复字符的每个输入序列
小智 9
如果启用扩展正则表达式用-r,那么你可以添加+到\s指一个或多个:
$ sed -r 's/\s+/,/g' file.txt
SNP,A1,A2,FRQ,INFO,OR,SE,P
10:33367054,C,T,0.9275,0.9434,1.1685,0.1281,0.1843
10:33367707,G,A,0.9476,0.9436,1.0292,0.1530,0.8244
10:33367804,G,C,0.4193,1.0443,0.9734,0.0988,0.6443
10:33368119,C,A,0.9742,0.9343,1.0201,0.1822,0.9156
Run Code Online (Sandbox Code Playgroud)
以供参考:
-r, --regexp-extended
use extended regular expressions in the script.
Run Code Online (Sandbox Code Playgroud)
注意:在Mac OS X上,sed它基于BSD并且没有GNU扩展,因此您必须使用该-E标志:
-E Interpret regular expressions as extended (modern) regular expressions rather
than basic regular expressions (BRE's). The re_format(7) manual page fully
describes both formats.
Run Code Online (Sandbox Code Playgroud)
这是一个非常简单的解决方案 awk
awk '{$1=$1}1' OFS=, file
SNP,A1,A2,FRQ,INFO,OR,SE,P
10:33367054,C,T,0.9275,0.9434,1.1685,0.1281,0.1843
10:33367707,G,A,0.9476,0.9436,1.0292,0.1530,0.8244
10:33367804,G,C,0.4193,1.0443,0.9734,0.0988,0.6443
10:33368119,C,A,0.9742,0.9343,1.0201,0.1822,0.9156
Run Code Online (Sandbox Code Playgroud)
$1=$1 重新格式化文件,以便将所有额外空格设置为一个空格.