hadoop datacopy中的getmerge命令

Question

hadoop datacopy中的getmerge命令

我的目的是读取目录中所有以“ trans”开头的文件，并将其转换为单个文件，然后将该单个文件加载到HDFS位置

我的源目录是/ user / cloudera / inputfiles /

假设在上面的目录中，有很多文件，但是我需要所有以“ trans”开头的文件

我的目标目录是/ user / cloudera / transfiles /

所以我在下面尝试了这个命令

hadoop dfs - getmerge /user/cloudera/inputfiles/trans* /user/cloudera/transfiles/records.txt

Run Code Online (Sandbox Code Playgroud)

但以上命令不起作用。

如果我尝试下面的命令，那就可以了

hadoop dfs - getmerge /user/cloudera/inputfiles   /user/cloudera/transfiles/records.txt

Run Code Online (Sandbox Code Playgroud)

关于如何合并来自hdfs位置的某些文件并将合并的单个文件存储在另一个hdfs位置的任何建议

Answer 1

Ash*_*ish 5

下面是 getmerge 命令的用法：

Usage: hdfs dfs -getmerge <src> <localdst> [addnl]

Takes a source directory and a destination file as input and 
concatenates files in src into the destination local file. 
Optionally addnl can be set to enable adding a newline character at the
end of each file.

Run Code Online (Sandbox Code Playgroud)

它期望目录作为第一个参数。

你可以尝试这样的 cat 命令：

hadoop dfs -cat /user/cloudera/inputfiles/trans* > /<local_fs_dir>/records.txt
hadoop dfs -copyFromLocal /<local_fs_dir>/records.txt /user/cloudera/transfiles/records.txt

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，8 月前
查看次数：	16145 次
最近记录：	7 年，8 月前