将多个文件复制(追加)到单个目标文件中

MWa*_*ace 2 powershell merge cmd append copy-item

我在使用 PowerShell 将多个 csv 文件合并为一个时遇到一个奇怪的问题。我已经在 Windows 7 的 cmd 提示符下完成了很多次操作,但这里的输出仅包含最早的文件。该命令是标准的东西:

C:\> copy *.csv output.csv
Run Code Online (Sandbox Code Playgroud)

正如我所说,我得到的只是复制到这个新文件中的最早的 csv,但没有其他任何东西。这是 powershell 与简单 cmd 提示符之间的问题吗?

谢谢迈克尔

mkl*_*nt0 5

As noted by lit in the comments, in PowerShell copy is a built-in alias of the Copy-Item cmdlet, which functions differently from cmd.exe's internal copy command:

  • As of PowerShell 7.2.1, Copy-Item does not support merging multiple files into a single destination file. See the bottom section for a - potentially content-modifying - Get-Content solution.

  • Currently, if Copy-Item's -Destination argument (the second positional argument, output.csv in your case) is a file, all -Path arguments (the first positional argument, *.csv in your case) are sequentially copied to the same destination file - in other words: the last file that matches wildcard pattern *.csv "wins", and output.csv is simply a copy of it alone - see GitHub issue #12805 for a discussion.


To use cmd.exe's copy command, which merges the input files to form the destination file, call via cmd /c:

cmd /c 'copy /y /b *.csv output.csv'
Run Code Online (Sandbox Code Playgroud)

Caveat: As discussed in aschipfl's helpful answer, how a preexisting output.csv file is handled depends on whether output.csv happens to be the first file matched by wildcard pattern *.csv or not. Either use the workaround proposed there, or simply ensure that no output.csv file is present beforehand.

Note the addition of:

  • /y, which suppresses a confirmation prompt if the destination file already exists

  • /b, which copies in binary mode, which prevents an "EOF character" (the Substitute character, 0x1a, which you can interactively produce withCtrl-Z) from being appended to the destination file.

As an aside: on Unix-like platforms you could use sh -c 'cat *.csv > output.csv', but there you'd always have to first ensure that there's no preexisting output.csv file, as that would result in an endlessly growing file.


Alternatively, you may use the Get-Content cmdlet to merge multiple text files, as proposed by lit and refined by zett42 in the comments on the question, but doing so can change the character encoding and newline format, which may or may not be desired in a given use case:

# !! Caveat: may change character encoding and newline format.
# !! -Encoding utf8 used as an example.
Get-Content *.csv -Exclude output.csv | Set-Content -Encoding utf8 output.csv
Run Code Online (Sandbox Code Playgroud)
  • Get-Content, assuming it interprets a text file's encoding correctly (based on a file's BOM, if present, and assuming a default otherwise), loads a file's lines into .NET strings, and the information about the file's character encoding is not preserved.

  • Similarly, file-writing cmdlets such as Out-File (and its effective alias >) and Set-Content operate on .NET input strings and use a default encoding when saving to a file - though a different encoding may be requested via the -Encoding parameter.

    • In other words: If your input file had a consistent, non-default encoding that you want to preserve in the destination file, you (a) need to know what that encoding is and (b) request its use via -Encoding.

    • Note: Windows PowerShell defaults to the system's legacy ANSI code page for Get-Content and Set-Content, and to UTF-16LE ("Unicode") for Out-File / >. By contrast, PowerShell (Core) 7+ now commendably uses (BOM-less) UTF-8, consistently across all cmdlets.

  • Additionally, because files are read line by line by Get-Content by default, information about the specific newline format is lost. The file-saving cmdlets then use the platform-native newline sequence (CRLF ("`r`n") on Windows, LF ("`n") on Unix-like platforms), so the destination file may end up with a different newline format. Also, the information as to whether a given file had a trailing newline is lost.

    • At the expense of having to read each file into memory in full (which normally isn't a problem with text files), you can preserve the original newline format and trailing-newline status by combining Get-Content -Raw with Set-Content -NoNewLine:

      Get-Content -Raw *.csv -Exclude output.csv | 
        Set-Content -Encoding utf8 -NoNewLine output.csv
      
      Run Code Online (Sandbox Code Playgroud)

As for use cases:

  • You can use Get-Content + Set-Content for one or more of the following scenarios:

    • If your input files are text files that use varying character encodings (all of which Get-Content needs to be able to recognize), so as to create a consistently encoded destination file.

    • Similarly, even if the input files have the same encoding, you can choose to transcode the content, i.e. to choose a different encoding for the destination file.

    • If you want to normalize the newline format to the platform-native format and possibly also to ensure the existence of a trailing newline.

  • Otherwise, if the input files' content must be preserved as-is - which is especially true for binary files - use the cmd /c 'copy ...' approach.

    • Solving this in PowerShell would require nontrivial use of lower-level .NET APIs.