确定并更改 Windows 上的文件名编码

nix*_*xer 5 windows encoding filesystems character-encoding smb

我的 Windows 服务器上有一些文件名称中包含某些重音字符。在 Windows 资源管理器上,文件正常显示,但在命令提示符下使用默认设置运行“dir”会显示替换字符。

\n\n

例如,字符\xc3\xb6显示如o"列表中所示。当通过 SMB 从其他平台访问这些文件时,这会导致问题,可能是因为编码/代码页冲突。并非所有文件都存在该问题,而且我不知道问题文件来自何处。

\n\n

例子:

\n\n
E:\\folder\\files>dir\n Volume in drive E is data\n Volume Serial Number is 5841-C30E\n\n Directory of E:\\folder\\files  \n\n07/05/2016  07:46 PM    <DIR>          .\n07/05/2016  07:46 PM    <DIR>          ..\n12/01/2015  11:12 AM            14,105 file with o" character.xlsx\n01/22/2015  05:30 PM            11,598 file with correct \xc3\xb6 character.xlsx\n               2 File(s)         25,703 bytes\n               2 Dir(s)  2,727,491,600,384 bytes free\n
Run Code Online (Sandbox Code Playgroud)\n\n

我已经更改了文件和目录名称,但您会明白的。

\n\n

你知道这些名字是怎么来的吗?也许它们是使用其他平台或工具复制或创建的?

\n\n

如何批量查找并重命名所有问题文件?我查看了几个 GUI 重命名实用程序,但它们没有发现问题,并且仅适用于 Windows 资源管理器中显示的名称。

\n\n

驱动器上的文件系统是 ReFS,这可能与此有关吗?

\n\n

编辑:运行 PowerShell 命令

\n\n
Y:\\test>powershell -c Get-ChildItem ^|ForEach-Object {$x=$_.Name; For ($i=0;$i\n-lt $x.Length; $i++) {\\"{0} {1} {2}\\" -f $x,$x[$i],[int]$x[$i]}}\nfile with o\xc2\xa8 character.xlsx o 111\nfile with o\xc2\xa8 character.xlsx \xc2\xa8 776\n
Run Code Online (Sandbox Code Playgroud)\n\n

已清理以仅显示相关部分。

\n\n

所以看起来它实际上是一个combining diaeresis而不是垂直引号。据我了解,在谈论 unicode 规范化时应该如此。

\n

Jos*_*efZ 5

我可以使用下一个简单的 Powershell 脚本重现您的问题

\n\n
$RatedName = "\xc5\xa1\xc3\xb6\xc3\xbc"                            # set sample string\n$FormDName = $RatedName.Normalize("FormD")    # its Canonical Decomposition\n$FormCName = $FormDName.Normalize("FormC")    #     followed by Canonical Composition\n                                              # list each string character by character\n($RatedName,$FormDName,$FormCName) | ForEach-Object {\n    $charArr = [char[]]$_ \n    "$_"      # display string in new line for better readability\n              # display each character together with its Unicode codepoint\n    For( $i=0; $i -lt $charArr.Count; $i++ ) { \n        $charInt = [int]$charArr[$i]\n        # next "Try-Catch-Finally" code snippet adopted from my "Alt KeyCode Finder"\n        #                                       http://superuser.com/a/1047961/376602\n        Try {    \n            # Get-CharInfo module downloadable from http://poshcode.org/5234\n            #        to add it into the current session: use Import-Module cmdlet\n            $charInt | Get-CharInfo |% {\n                $ChUCode = $_.CodePoint\n                $ChCtgry = $_.Category\n                $ChDescr = $_.Description\n            }\n        }\n        Catch {\n            $ChUCode = "U+{0:x4}" -f $charInt\n            if ( $charInt -le 0x1F -or ($charInt -ge 0x7F -and $charInt -le 0x9F)) \n                 { $ChCtgry = "Control" } else { $ChCtgry = "" }\n            $ChDescr = ""\n        }\n        Finally { $ChOut = $charArr[$i] }\n        "{0} {1,-2} {2} {3,5} {4}" -f $i, $charArr[$i], $ChUCode, $charInt, $ChDescr\n    }\n}\n# create sample files\n$RatedName | Out-File "D:\\test\\1097217Rated$RatedName.txt" -Encoding utf8\n$FormDName | Out-File "D:\\test\\1097217FormD$FormDName.txt" -Encoding utf8\n$FormCName | Out-File "D:\\test\\1097217FormC$FormCName.txt" -Encoding utf8\n\n\n""                                 # very artless draft of possible solution\nGet-ChildItem "D:\\test\\1097217*" | ForEach-Object {\n    $y = $_.Name.Normalize("FormC")\n    if ( $y.Length -ne $_.Name.Length ) {\n        Rename-Item -NewName $y -LiteralPath $_ -WhatIf\n    } else {\n        "       : file name is already normalized $_"\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

上面的脚本更新如下:第一个显示有关组合/分解的 Unicode 字符的更多信息,即它们的 Unicode 名称(请参阅Get-CharInfo 模块);第二个嵌入的非常朴素的可能解决方案草案。\n提示
输出cmd

\n\n
==> powershell -c D:\\PShell\\SU\\1097217.ps1\n\xc5\xa1\xc3\xb6\xc3\xbc\n0 \xc5\xa1  U+0161   353 Latin Small Letter S With Caron\n1 \xc3\xb6  U+00F6   246 Latin Small Letter O With Diaeresis\n2 \xc3\xbc  U+00FC   252 Latin Small Letter U With Diaeresis\ns\xcc\x8co\xcc\x88u\xcc\x88\n0 s  U+0073   115 Latin Small Letter S\n1 \xcc\x8c  U+030C   780 Combining Caron\n2 o  U+006F   111 Latin Small Letter O\n3 \xcc\x88  U+0308   776 Combining Diaeresis\n4 u  U+0075   117 Latin Small Letter U\n5 \xcc\x88  U+0308   776 Combining Diaeresis\n\xc5\xa1\xc3\xb6\xc3\xbc\n0 \xc5\xa1  U+0161   353 Latin Small Letter S With Caron\n1 \xc3\xb6  U+00F6   246 Latin Small Letter O With Diaeresis\n2 \xc3\xbc  U+00FC   252 Latin Small Letter U With Diaeresis\n\n       : file name is already normalized D:\\test\\1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt\nWhat if: Performing the operation "Rename File" on target "Item: D:\\test\\1097217\nFormDs\xcc\x8co\xcc\x88u\xcc\x88.txt Destination: D:\\test\\1097217FormD\xc5\xa1\xc3\xb6\xc3\xbc.txt".\n       : file name is already normalized D:\\test\\1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt\n\n==> dir /b D:\\test\\1097217*\n1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt\n1097217FormDs\xcc\x8co\xcc\x88u\xcc\x88.txt\n1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt\n
Run Code Online (Sandbox Code Playgroud)\n\n

事实上,上面的dir输出看起来1097217FormDs\xcb\x87o\xc2\xa8u\xc2\xa8.txtcmd窗口中,我的支持 unicode 的浏览器组成了上面列出的字符串,但unicode 分析器显示了真实的字符以及最新的图像:

\n\n

组合口音

\n\n

然而,下一个示例充分展示了该问题:循环将组合for重音更改为普通重音:

\n\n
==> for /F "delims=" %G in (\'dir /b /S D:\\test\\1097217*\') do @echo %~nxG & dir /B %~fG\n1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt\n1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt\n1097217FormDs\xcb\x87o\xc2\xa8u\xc2\xa8.txt\nFile Not Found\n1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt\n1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt\n
Run Code Online (Sandbox Code Playgroud)\n\n

==>

\n\n

这是可能的解决方案的非常简单的草案(参见上面的输出):

\n\n
""                                 # very artless draft of possible solution\nGet-ChildItem "D:\\test\\1097217*" | ForEach-Object {\n    $y = $_.Name.Normalize("FormC")\n    if ( $y.Length -ne $_.Name.Length ) {\n        Rename-Item -NewName $y -LiteralPath $_ -WhatIf\n    } else {\n        "       : file name is already normalized $_"\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

ToDo:仅在必要时调用Rename-Item):

\n\n
Get-ChildItem "D:\\test\\1097217*" | ForEach-Object {\n    $y = $_.Name.Normalize("FormC")\n    if ($true) {                                         ### ToDo\n        Rename-Item -NewName $y -LiteralPath $_ -WhatIf\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

及其输出(同样,这里呈现的是组合字符串,下图显示了cmd窗口看起来无偏见):

\n\n
What if: Performing the operation "Rename File" on target "Item: D:\\test\\1097217\nFormC\xc5\xa1\xc3\xb6\xc3\xbc.txt Destination: D:\\test\\1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt".\nWhat if: Performing the operation "Rename File" on target "Item: D:\\test\\1097217\nFormDs\xcc\x8co\xcc\x88u\xcc\x88.txt Destination: D:\\test\\1097217FormD\xc5\xa1\xc3\xb6\xc3\xbc.txt".\nWhat if: Performing the operation "Rename File" on target "Item: D:\\test\\1097217\nRated\xc5\xa1\xc3\xb6\xc3\xbc.txt Destination: D:\\test\\1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt".\n
Run Code Online (Sandbox Code Playgroud)\n\n

\n\n

组合口音

\n\n

更新cmd输出

\n\n

更新了 cmd 输出

\n


nix*_*xer 1

基于 JosefZ 的脚本,这里是一个递归工作的修改版本:

Get-ChildItem "X:\" -Recurse | ForEach-Object {
    $y = $_.Name.Normalize("FormC")
    $file = $_.Fullname
    if ( $y.Length -ne $_.Name.Length ) {
        Rename-Item -LiteralPath "$file" -NewName "$y" -WhatIf
        Write-Host "renamed file $file"
    }
}
Run Code Online (Sandbox Code Playgroud)

-WhatIf测试后移除。我遇到了路径太长的问题,但这是另一篇文章的主题。