标签: text-processing

使用公共列值合并文件

awk我想使用命令将文件 1（100 万行）和文件 2（10,000 行）加入到新文件 3（应该是 100 万行）中

文件1：

 471808241 29164840 1 10001 156197396 
 471722917 21067410 1 31001 135961856 
 471941441 20774160 1 7001  180995072 
 471568655 29042630 1 15001 157502996 
 471524711 20716360 1 4001  180226817 
 471873918 29583520 1 2001  128567298 
 471568650 29042631 1 15002 157502910

Run Code Online (Sandbox Code Playgroud)

文件2

610146 156197396 
531101 135961856 
704011 180226817 
502216 128567298 
707012 180995072 
615246 157502996 
685221 157502910

Run Code Online (Sandbox Code Playgroud)

期望的输出：

471808241 29164840 1 10001 156197396 610146 
471722917 21067410 1 31001 135961856 531101 
471941441 20774160 1 7001  180995072 707012 
471568655 …

Run Code Online (Sandbox Code Playgroud)

command-line text-processing

Nyd*_*enn

2017 02-26

3
推荐指数

1
解决办法

1967
查看次数

sed 替换文件中的字符

我需要替换 a 中的单个字符/etc/request-key.conf

文件格式为；

###############################################################################
#
# Copyright (C) 2005 Red Hat, Inc. All Rights Reserved.
# Written by David Howells (dhowells@redhat.com)
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version
# 2 of the License, or (at your option) any later version.
#
###############################################################################


###############################################################################
#
# We can run programs or scripts …

Run Code Online (Sandbox Code Playgroud)

command-line sed text-processing

eek*_*nky

2019 09-13

3
推荐指数

1
解决办法

289
查看次数

如何从每一行中删除尾随的非字母字符？

我正在尝试删除除字母以外的最后一个字符：

support.help1.com,,
support.help1.com.
support.help1.com9
support.help1.com*
support.help1.com@@
support.help1.com##
support.help1.com%%
support.help1.com^
support.help1.com
support.help1.com,
support.help1.com-

Run Code Online (Sandbox Code Playgroud)

我希望我的输出是这样的：

support.help1.com
support.help1.com
support.help1.com
support.help1.com
support.help1.com
support.help1.com
support.help1.com
support.help1.com
support.help1.com
support.help1.com
support.help1.com

Run Code Online (Sandbox Code Playgroud)

command-line text-processing

Sar*_*kar

2019 11-06

3
推荐指数

1
解决办法

220
查看次数

获取特殊字符模式之间的行

下面是我的输入文件：

---
{
  "date":"2015-09-24",
  "title":"Getting Started with Git",
  "template":"post",
  "thumbnail":"content/thumbnails/test.jpeg",
  "slug":"getting-started-with-git",
  "categories":[ "cat1", "Focus", "Mustang" ],
  "tags":[ "Fiesta", "Focus", "Mustang" ]
}
---

#Hello

---
This is sample
---

```
var x=1;
entry.forEach(function(item){
    x=x++;
})
```

Run Code Online (Sandbox Code Playgroud)

我在输出中期望的是，前2行之间的' --- '

{
  "date":"2015-09-24",
  "title":"Getting Started with Git",
  "template":"post",
  "thumbnail":"content/thumbnails/test.jpeg",
  "slug":"getting-started-with-git",
  "categories":[ "cat1", "Focus", "Mustang" ],
  "tags":[ "Fiesta", "Focus", "Mustang" ]
}

Run Code Online (Sandbox Code Playgroud)

我怎样才能实现它？使用awk，只有当我将 ' --- '替换为诸如 'start'/'end' 之类的字符串时，我才能实现这一点。

awk '/start/{f=1;next} /end/{f=0;exit} f'  $FILE_PATH

Run Code Online (Sandbox Code Playgroud)

command-line awk text-processing

PKV*_*PKV

lucky-day

3
推荐指数

1
解决办法

85
查看次数

编辑 FASTA 标头

我想删除 fasta 文件中所有序列的部分标头，因此只有 Otu 编号显示为标头。

所以来自：

>M02300_51_000000000-CJMTC_1_1115_17014_15334   Otu0001  
T-AC--GG-AG-GGT---GCA-A-G-C--G-T-T--AA-T-CGG-AA--TT-A-C-T

Run Code Online (Sandbox Code Playgroud)

我想将其更改为：

>Otu0001  
T-AC--GG-AG-GGT---GCA-A-G-C--G-T-T--AA-T-CGG-AA--TT-A-C-T

Run Code Online (Sandbox Code Playgroud)

我相信这应该可以通过 sed 命令实现，但还未能使其工作。任何帮助都会很棒！先感谢您。

command-line sed text-processing

kat*_* HM

2020 07-01

3
推荐指数

1
解决办法

1449
查看次数

如何使用 AWK 列出 Docker 容器？

我希望使用以下工具：

格雷普
sed
AWK

与 Docker 一起工作。

列出容器：

docker container ls | awk '{print $1}'

Run Code Online (Sandbox Code Playgroud)

结果：

CONTAINER
490e3d669259
a44230a617e1

Run Code Online (Sandbox Code Playgroud)

我怎样才能省略“标题”？

以下是docker container ls应证明有用的完整输出：

CONTAINER ID  IMAGE  COMMAND                 CREATED        STATUS        PORTS                                        NAMES
490e3d669259  jetty  "/docker-entrypoint.…"  3 minutes ago  Up 3 minutes  0.0.0.0:80->8080/tcp, 0.0.0.0:443->8443/tcp  quirky_antonelli
a44230a617e1  jetty  "/docker-entrypoint.…"  4 minutes ago  Up 4 minutes  8080/tcp                                     goofy_hamilton

Run Code Online (Sandbox Code Playgroud)

我只是在寻找容器标题下的值。

command-line text-processing docker

Nic*_*ers

2020 08-05

3
推荐指数

1
解决办法

2199
查看次数

运行 ls 但跳过前 3 个文件

假设ls -t返回：

Run Code Online (Sandbox Code Playgroud)

如何跳过前 3 个结果？所以结果只有：

4
5
6

Run Code Online (Sandbox Code Playgroud)

我知道我可以运行ls -t | head -3，但这只会占用前 3 行，但我需要跳过前 3 行。

1
2
3

Run Code Online (Sandbox Code Playgroud)

command-line ls text-processing

Foo*_*Bar

2020 08-11

3
推荐指数

1
解决办法

1439
查看次数

“sed”仅在特定条件下添加空格

我必须在每次出现的 # 之后添加一个空格，只有当 # 位于行首且 # 之后至少有一个不是空格的字符时。例如这段代码：

echo "# ok" | sed "s|^#[^ ]|# |g"

Run Code Online (Sandbox Code Playgroud)

# ok按预期返回，但此代码：

echo "#ok" | sed "s|^#[^ ]|# |g"

Run Code Online (Sandbox Code Playgroud)

返回# k并没有# ok像预期的那样。
我如何获得# ok输出？

编辑：

这是解决我的问题的代码，感谢@FedonKadifeli：

echo -e "#ok\n# ok\n #ok\n#ok #ok\n##ok #ok"

Run Code Online (Sandbox Code Playgroud)

#ok
# ok
 #ok
#ok #ok
##ok #ok

Run Code Online (Sandbox Code Playgroud)

这段代码：

echo -e "#ok\n# ok\n #ok\n#ok #ok\n##ok #ok" | sed -r 's|^#(#*)([^[:space:]#])|#\1 \2|g'

Run Code Online (Sandbox Code Playgroud)

# ok
# ok
 #ok
# ok #ok
## ok #ok

Run Code Online (Sandbox Code Playgroud)

command-line sed text-processing

Mar*_*mbo

2020 09-26

3
推荐指数

2
解决办法

306
查看次数

如何仅删除文件中紧随其后的重复行

假设我有以下文件：

$ cat test.txt
a
-----
b
-----
-----
c
-----
-----
-----
d
-----
e
-----
-----

Run Code Online (Sandbox Code Playgroud)

现在我想删除所有-----, 但前提是它们相互重复。所以结果应该是这样的：

a
-----
b
-----
c
-----
d
-----
e
-----

Run Code Online (Sandbox Code Playgroud)

我试过了grep -Pvz -- "-----\n-----"，但这没有用。

command-line text-processing

Cas*_*Cas

2021 08-04

3
推荐指数

1
解决办法

241
查看次数

如何在Linux命令行中替换具有不同层次结构的多个文件夹中的多个文件中的字符串

我有许多扩展名*.launch分布在父目录内不同文件夹中的文件，文件的层次结构并不总是相同.launch。即：src/folder/sth.launch和src/folder2/../../another.launch，所以这个解决方案在这里不起作用！

如何使用 Linux 命令在不同文件夹和级别的所有这些文件中将一个字符串替换xarco.py为另一个字符串？提前致谢。xarco*launch

command-line bash text-processing

Bil*_*lal

2021 10-15

3
推荐指数

1
解决办法

2210
查看次数

标签统计

command-line ×10

text-processing ×10

sed ×3

awk ×1

bash ×1

docker ×1

ls ×1

标签 统计

标签统计