如何将文件的内容重复 n 次？

Question

如何将文件的内容重复 n 次？

Oli*_*Oli 21 command-line text-processing

我正在尝试进行基准测试以比较处理文件的两种不同方式。我有少量输入数据，但为了获得良好的比较，我需要多次重复测试。

与其只是重复测试，我还想多次复制输入数据（例如 1000 次），这样 3 行文件就变成了 3000 行，这样我就可以运行更令人满意的测试。

我通过文件名传递输入数据：

mycommand input-data.txt

Run Code Online (Sandbox Code Playgroud)

Answer 1

cuo*_*glm 22

你不需要input-duplicated.txt.

尝试：

mycommand <(perl -0777pe '$_=$_ x 1000' input-data.txt)

Run Code Online (Sandbox Code Playgroud)

解释

0777: -0sets 设置输入记录分隔符（perl 特殊变量$/，默认为换行符）。将此值设置为大于0400将导致 Perl 将整个输入文件拖入内存。
pe：-p意思是“在应用给-e它的脚本后打印每个输入行”。
$_=$_ x 1000:$_是当前输入行。由于我们一次读取整个文件-0700，这意味着整个文件。这x 1000将导致打印整个文件的 1000 份。

Answer 2

Oli*_*Oli 13

我最初认为我必须生成一个辅助文件，但我可以在 Bash 中循环原始文件并使用一些重定向使其显示为文件。

可能有十几种不同的循环方式，但这里有四种：

mycommand <( seq 1000 | xargs -i -- cat input-data.txt )
mycommand <( for _ in {1..1000}; do cat input-data.txt; done )
mycommand <((for _ in {1..1000}; do echo input-data.txt; done) | xargs cat )
mycommand <(awk '{for(i=0; i<1000; i++)print}' input-data.txt)  #*

Run Code Online (Sandbox Code Playgroud)

第三种方法是从下面 maru 的评论中即兴创作的，并为 cat 构建了一个大的输入文件名列表。xargs将把它分成系统允许的尽可能多的参数。它比n 个单独的猫要快得多。

该awk办法（灵感terdon的答案）可能是最优化的，但它一次复制每一行。这可能适合也可能不适合特定应用程序，但它闪电般快速和高效。

但这是动态生成的。Bash 输出可能比某些东西可以读取慢得多，因此您应该生成一个新文件进行测试。幸运的是，这只是一个非常简单的扩展：

(for _ in {1..1000}; do echo input-data.txt; done) | xargs cat > input-duplicated.txt mycommand input-duplicated.txt
Run Code Online (Sandbox Code Playgroud)

您的两个命令都让 cat 运行了 N 次。运行 cat 一次并给它一个参数 N 次不是更有效吗？类似于`cat $(for i in {1..N}; do echo filename; done)`。这有 arg 大小的限制，但应该更快。 (3认同)

Answer 3

evi*_*oup 8

我只会使用文本编辑器。

vi input-data.txt
gg (move cursor to the beginning of the file)
yG (yank til the end of the file)
G (move the cursor to the last line of the file)
999p (paste the yanked text 999 times)
:wq (save the file and exit)

Run Code Online (Sandbox Code Playgroud)

如果您绝对需要通过命令行执行此操作（这需要您vim安装，因为vi没有:normal命令），您可以使用：

vim -es -u NONE "+normal ggyGG999p" +wq input-data.txt

Run Code Online (Sandbox Code Playgroud)

在这里，-es(或-e -s) 使 vim 静默运行，因此它不应该接管您的终端窗口，并-u NONE阻止它查看您的 vimrc，这应该使它运行得比其他方式快一点（如果您使用，可能会快得多）很多vim插件）。

Answer 4

ter*_*don 7

这是一个awk解决方案：

awk '{a[NR]=$0}END{for (i=0; i<1000; i++){for(k in a){print a[k]}}}' file

Run Code Online (Sandbox Code Playgroud)

它基本上和@Gnuc 的 Perl 一样快（我跑了 1000 次，得到了平均时间）：

$ for i in {1..1000}; do 
 (time awk '{a[NR]=$0}END{for (i=0;i<1000;i++){for(k in a){print a[k]}}}' file > a) 2>&1 | 
    grep -oP 'real.*?m\K[\d\.]+'; done | awk '{k+=$1}END{print k/1000}'; 
0.00426

$ for i in {1..1000}; do 
  (time perl -0777pe '$_=$_ x 1000' file > a ) 2>&1 | 
    grep -oP 'real.*?m\K[\d\.]+'; done | awk '{k+=$1}END{print k/1000}'; 
0.004076

Run Code Online (Sandbox Code Playgroud)

Answer 5

小智 5

这是一个简单的单行代码，不涉及脚本：

mycommand <(cat `yes input-data.txt | head -1000 | paste -s`)

Run Code Online (Sandbox Code Playgroud)

解释

`yes input-data.txt | head -1000 | paste -s`产生input-data.txt由空格分隔的 1000 次文本
然后将文本cat作为文件列表传递给

归档时间：	11 年前
查看次数：	19646 次
最近记录：	8 年，7 月前