如何尽快复制文件?

joh*_*ohn 16 unix linux bash ubuntu scp

我运行我的shell脚本上machineA创建一个副本文件从machineBmachineCmachineA.

如果文件不在machineB,那么它应该在那里machineC肯定.所以我会先尝试复制machineB,如果不存在,machineB那么我会去machineC复制相同的文件.

machineBmachineC会有这样的文件夹,YYYYMMDD这个文件夹里面-

/data/pe_t1_snapshot
Run Code Online (Sandbox Code Playgroud)

因此,无论日期是YYYYMMDD上述文件夹中此格式的最新日期- 我将选择该文件夹作为我需要开始复制文件的完整路径 -

所以如果这是最新的日期文件夹20140317,/data/pe_t1_snapshot那么这将是我的完整路径 -

/data/pe_t1_snapshot/20140317
Run Code Online (Sandbox Code Playgroud)

从我需要开始复制文件machineBmachineC.我需要复制周围400文件machineAmachineBmachineC每个文件大小1.5 GB.

目前我有我的下面的shell脚本工作正常,因为我正在使用scp但不知何故它需要〜2 hours复制400machineA中的文件,这对我来说太长了我想.:(

下面是我的shell脚本 -

#!/bin/bash

readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot
PRIMARY_PARTITION=(0 3 5 7 9) # this will have more file numbers around 200
SECONDARY_PARTITION=(1 2 4 6 8) # this will have more file numbers around 200

dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)

echo $dir1
echo $dir2

if [ "$dir1" = "$dir2" ]
then
    # delete all the files first
    find "$PRIMARY" -mindepth 1 -delete
    for el in "${PRIMARY_PARTITION[@]}"
    do
        scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.
    done

    # delete all the files first
    find "$SECONDARY" -mindepth 1 -delete
    for sl in "${SECONDARY_PARTITION[@]}"
    do
        scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.
    done
fi
Run Code Online (Sandbox Code Playgroud)

我正在复制PRIMARY_PARTITION文件PRIMARY夹中的文件和SECONDARY_PARTITION文件SECONDARY夹中的文件machineA.

有没有办法更快地移动文件machineA.我可以一次复制10个文件,也可以一次复制5个文件,以加快此过程或任何其他方法吗?

注意:machineA正在运行SSD

更新: -

我试过的并行Shell脚本,shell脚本的顶部与上面显示的相同.

if [ "$dir1" = "$dir2" ] && [ "$length1" -gt 0 ] && [ "$length2" -gt 0 ]
then
    find "$PRIMARY" -mindepth 1 -delete
    for el in "${PRIMARY_PARTITION[@]}"
    do
        (scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.) &
          WAITPID="$WAITPID $!"        
    done

    find "$SECONDARY" -mindepth 1 -delete
    for sl in "${SECONDARY_PARTITION[@]}"
    do
        (scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.) &
          WAITPID="$WAITPID $!"        
    done
     wait $WAITPID
     echo "All files done copying."
fi
Run Code Online (Sandbox Code Playgroud)

我用并行shell脚本得到的错误 -

channel 24: open failed: administratively prohibited: open failed
channel 25: open failed: administratively prohibited: open failed
channel 26: open failed: administratively prohibited: open failed
channel 28: open failed: administratively prohibited: open failed
channel 30: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 32: open failed: administratively prohibited: open failed
channel 36: open failed: administratively prohibited: open failed
channel 37: open failed: administratively prohibited: open failed
channel 38: open failed: administratively prohibited: open failed
channel 40: open failed: administratively prohibited: open failed
channel 46: open failed: administratively prohibited: open failed
channel 47: open failed: administratively prohibited: open failed
channel 49: open failed: administratively prohibited: open failed
channel 52: open failed: administratively prohibited: open failed
channel 54: open failed: administratively prohibited: open failed
channel 55: open failed: administratively prohibited: open failed
channel 56: open failed: administratively prohibited: open failed
channel 57: open failed: administratively prohibited: open failed
channel 59: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 61: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 64: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 68: open failed: administratively prohibited: open failed
channel 72: open failed: administratively prohibited: open failed
channel 74: open failed: administratively prohibited: open failed
channel 76: open failed: administratively prohibited: open failed
channel 78: open failed: administratively prohibited: open failed
Run Code Online (Sandbox Code Playgroud)

ooh*_*ode 21

你可以尝试这个命令

rsync
Run Code Online (Sandbox Code Playgroud)

来自

man rsync
Run Code Online (Sandbox Code Playgroud)

您将看到:rsync远程更新协议允许rsync使用此软件包随附的技术报告中描述的高效校验和搜索算法,仅通过网络连接传输两组文件之间的差异.

  • +1不重新发明轮子 (8认同)

osg*_*sgx 7

您可以尝试使用HPN-SSH(高性能SSH/SCP) - http://www.psc.edu/index.php/hpn-sshhttp://hpnssh.sourceforge.net/

HPN-SSH项目是OpenSSH的一组补丁(scp是其中的一部分),可以更好地调整各种tcp和内部缓冲区.还有"无"密码("无密码交换")禁用加密,这也可能对您有帮助(如果您不使用公共网络发送数据).

压缩和加密都会占用CPU时间; 有时,10 Gbit以太网可以更快地传输未压缩文件,然后等待CPU压缩和加密.

您可以分析您的设置:

  • 使用iperf或测量机器之间的网络带宽netperf.与实际网络(网卡功能,交换机)进行比较.如果设置良好,您应该获得超过80-90%的声明速度.
  • 使用来自iperf或的速度计算数据量和使用网络传输如此多数据所需的时间netperf.与实际转移时间相比,是否存在巨大差异?
    • 如果您的CPU速度很快,数据是可压缩的,网络速度很慢,压缩将对您有所帮助.
  • 采取一看top,vmstat,iostat.
    • 是否有100%加载的CPU核心(运行top并按下1以查看核心)?
    • 是否有太多的中断(in)vmstat 1?上下文切换(cs)怎么样?
    • 什么是文件读取速度iostat 1?你的硬盘驱动器是否足够快以读取数据; 在接收器上写数据?
  • 您可以尝试使用perf top或进行全系统分析perf record -a.Linux中的scp或网络堆栈有很多计算机吗?如果你可以安装dtrace或者ktap尝试进行off-cpu profiling


hda*_*nte 6

您有1.5 GB*400 = 600 GB的数据.与答案无关我建议如果您需要传输此数据量,机器设置看起来不正确.您可能首先需要在机器A上生成此数据.

在2小时内传输600 GB的数据,即〜85 MB/s的传输速率,这意味着您可能达到了磁盘驱动器或(几乎)网络的传输限制.我相信你将无法使用任何其他命令更快地转移.

如果机器彼此靠近,我认为最快的复制方法是从机器B和C中物理移除存储,将它们放入机器A,然后在本地复制它们而不通过网络传输.这个时间是移动存储的时间,加上磁盘传输时间.但是,我担心副本的速度不会比85 MB/s快.

我认为最快的网络传输命令是netcat,因为它没有与加密相关的开销.此外,如果文件不是媒体文件,则必须使用压缩比压缩速度超过85 MB/s的压缩器进行压缩.我知道lzop和lz4被授予比这个速度更快的速度.所以我转移单个目录的命令行是(BSD netcat语法):

机器A:

$ nc -l 2000 | lzop -d | tar x
Run Code Online (Sandbox Code Playgroud)

机器B或C(可以在ssh的帮助下从机器A执行):

$ tar c directory | lzop | nc machineA 2000
Run Code Online (Sandbox Code Playgroud)

如果传输已压缩的媒体文件,请删除压缩器.

组织目录结构的命令在速度方面是无关紧要的,所以我不打算在这里写它们,但你可以重用自己的代码.

这是我能想到的最快的方法,但是,我不相信这个命令会比你已经拥有的更快.


小智 5

你肯定想尝试一下rclone。这件事太快了:

sudo rclone 同步 /usr/home/fred/temp -P -L --transfers 64

已传输:17.929G / 17.929 GBytes、100%、165.692 MBytes/s、ETA 0s 错误:75(重试可能有帮助) 检查:691078 / 691078、100% 已传输:345539 ​​/ 345539、100% 已用时间:1m50.8s

这是 LITEONIT LCS-256 (256GB) SSD 的本地副本。