nix*_*xer 5 windows encoding filesystems character-encoding smb
我的 Windows 服务器上有一些文件名称中包含某些重音字符。在 Windows 资源管理器上,文件正常显示,但在命令提示符下使用默认设置运行“dir”会显示替换字符。
\n\n例如,字符\xc3\xb6
显示如o"
列表中所示。当通过 SMB 从其他平台访问这些文件时,这会导致问题,可能是因为编码/代码页冲突。并非所有文件都存在该问题,而且我不知道问题文件来自何处。
例子:
\n\nE:\\folder\\files>dir\n Volume in drive E is data\n Volume Serial Number is 5841-C30E\n\n Directory of E:\\folder\\files \n\n07/05/2016 07:46 PM <DIR> .\n07/05/2016 07:46 PM <DIR> ..\n12/01/2015 11:12 AM 14,105 file with o" character.xlsx\n01/22/2015 05:30 PM 11,598 file with correct \xc3\xb6 character.xlsx\n 2 File(s) 25,703 bytes\n 2 Dir(s) 2,727,491,600,384 bytes free\n
Run Code Online (Sandbox Code Playgroud)\n\n我已经更改了文件和目录名称,但您会明白的。
\n\n你知道这些名字是怎么来的吗?也许它们是使用其他平台或工具复制或创建的?
\n\n如何批量查找并重命名所有问题文件?我查看了几个 GUI 重命名实用程序,但它们没有发现问题,并且仅适用于 Windows 资源管理器中显示的名称。
\n\n驱动器上的文件系统是 ReFS,这可能与此有关吗?
\n\n编辑:运行 PowerShell 命令
\n\nY:\\test>powershell -c Get-ChildItem ^|ForEach-Object {$x=$_.Name; For ($i=0;$i\n-lt $x.Length; $i++) {\\"{0} {1} {2}\\" -f $x,$x[$i],[int]$x[$i]}}\nfile with o\xc2\xa8 character.xlsx o 111\nfile with o\xc2\xa8 character.xlsx \xc2\xa8 776\n
Run Code Online (Sandbox Code Playgroud)\n\n已清理以仅显示相关部分。
\n\n所以看起来它实际上是一个combining diaeresis
而不是垂直引号。据我了解,在谈论 unicode 规范化时应该如此。
我可以使用下一个简单的 Powershell 脚本重现您的问题
\n\n$RatedName = "\xc5\xa1\xc3\xb6\xc3\xbc" # set sample string\n$FormDName = $RatedName.Normalize("FormD") # its Canonical Decomposition\n$FormCName = $FormDName.Normalize("FormC") # followed by Canonical Composition\n # list each string character by character\n($RatedName,$FormDName,$FormCName) | ForEach-Object {\n $charArr = [char[]]$_ \n "$_" # display string in new line for better readability\n # display each character together with its Unicode codepoint\n For( $i=0; $i -lt $charArr.Count; $i++ ) { \n $charInt = [int]$charArr[$i]\n # next "Try-Catch-Finally" code snippet adopted from my "Alt KeyCode Finder"\n # http://superuser.com/a/1047961/376602\n Try { \n # Get-CharInfo module downloadable from http://poshcode.org/5234\n # to add it into the current session: use Import-Module cmdlet\n $charInt | Get-CharInfo |% {\n $ChUCode = $_.CodePoint\n $ChCtgry = $_.Category\n $ChDescr = $_.Description\n }\n }\n Catch {\n $ChUCode = "U+{0:x4}" -f $charInt\n if ( $charInt -le 0x1F -or ($charInt -ge 0x7F -and $charInt -le 0x9F)) \n { $ChCtgry = "Control" } else { $ChCtgry = "" }\n $ChDescr = ""\n }\n Finally { $ChOut = $charArr[$i] }\n "{0} {1,-2} {2} {3,5} {4}" -f $i, $charArr[$i], $ChUCode, $charInt, $ChDescr\n }\n}\n# create sample files\n$RatedName | Out-File "D:\\test\\1097217Rated$RatedName.txt" -Encoding utf8\n$FormDName | Out-File "D:\\test\\1097217FormD$FormDName.txt" -Encoding utf8\n$FormCName | Out-File "D:\\test\\1097217FormC$FormCName.txt" -Encoding utf8\n\n\n"" # very artless draft of possible solution\nGet-ChildItem "D:\\test\\1097217*" | ForEach-Object {\n $y = $_.Name.Normalize("FormC")\n if ( $y.Length -ne $_.Name.Length ) {\n Rename-Item -NewName $y -LiteralPath $_ -WhatIf\n } else {\n " : file name is already normalized $_"\n }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n上面的脚本更新如下:第一个显示有关组合/分解的 Unicode 字符的更多信息,即它们的 Unicode 名称(请参阅Get-CharInfo 模块);第二个嵌入的非常朴素的可能解决方案草案。\n提示
输出cmd
:
==> powershell -c D:\\PShell\\SU\\1097217.ps1\n\xc5\xa1\xc3\xb6\xc3\xbc\n0 \xc5\xa1 U+0161 353 Latin Small Letter S With Caron\n1 \xc3\xb6 U+00F6 246 Latin Small Letter O With Diaeresis\n2 \xc3\xbc U+00FC 252 Latin Small Letter U With Diaeresis\ns\xcc\x8co\xcc\x88u\xcc\x88\n0 s U+0073 115 Latin Small Letter S\n1 \xcc\x8c U+030C 780 Combining Caron\n2 o U+006F 111 Latin Small Letter O\n3 \xcc\x88 U+0308 776 Combining Diaeresis\n4 u U+0075 117 Latin Small Letter U\n5 \xcc\x88 U+0308 776 Combining Diaeresis\n\xc5\xa1\xc3\xb6\xc3\xbc\n0 \xc5\xa1 U+0161 353 Latin Small Letter S With Caron\n1 \xc3\xb6 U+00F6 246 Latin Small Letter O With Diaeresis\n2 \xc3\xbc U+00FC 252 Latin Small Letter U With Diaeresis\n\n : file name is already normalized D:\\test\\1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt\nWhat if: Performing the operation "Rename File" on target "Item: D:\\test\\1097217\nFormDs\xcc\x8co\xcc\x88u\xcc\x88.txt Destination: D:\\test\\1097217FormD\xc5\xa1\xc3\xb6\xc3\xbc.txt".\n : file name is already normalized D:\\test\\1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt\n\n==> dir /b D:\\test\\1097217*\n1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt\n1097217FormDs\xcc\x8co\xcc\x88u\xcc\x88.txt\n1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt\n
Run Code Online (Sandbox Code Playgroud)\n\n事实上,上面的dir
输出看起来像1097217FormDs\xcb\x87o\xc2\xa8u\xc2\xa8.txt
在cmd
窗口中,我的支持 unicode 的浏览器组成了上面列出的字符串,但unicode 分析器显示了真实的字符以及最新的图像:
然而,下一个示例充分展示了该问题:循环将组合for
重音更改为普通重音:
==> for /F "delims=" %G in (\'dir /b /S D:\\test\\1097217*\') do @echo %~nxG & dir /B %~fG\n1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt\n1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt\n1097217FormDs\xcb\x87o\xc2\xa8u\xc2\xa8.txt\nFile Not Found\n1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt\n1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt\n
Run Code Online (Sandbox Code Playgroud)\n\n==>
\n\n这是可能的解决方案的非常简单的草案(参见上面的输出):
\n\n"" # very artless draft of possible solution\nGet-ChildItem "D:\\test\\1097217*" | ForEach-Object {\n $y = $_.Name.Normalize("FormC")\n if ( $y.Length -ne $_.Name.Length ) {\n Rename-Item -NewName $y -LiteralPath $_ -WhatIf\n } else {\n " : file name is already normalized $_"\n }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n(ToDo:仅在必要时调用Rename-Item
):
Get-ChildItem "D:\\test\\1097217*" | ForEach-Object {\n $y = $_.Name.Normalize("FormC")\n if ($true) { ### ToDo\n Rename-Item -NewName $y -LiteralPath $_ -WhatIf\n }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n及其输出(同样,这里呈现的是组合字符串,下图显示了cmd
窗口看起来无偏见):
What if: Performing the operation "Rename File" on target "Item: D:\\test\\1097217\nFormC\xc5\xa1\xc3\xb6\xc3\xbc.txt Destination: D:\\test\\1097217FormC\xc5\xa1\xc3\xb6\xc3\xbc.txt".\nWhat if: Performing the operation "Rename File" on target "Item: D:\\test\\1097217\nFormDs\xcc\x8co\xcc\x88u\xcc\x88.txt Destination: D:\\test\\1097217FormD\xc5\xa1\xc3\xb6\xc3\xbc.txt".\nWhat if: Performing the operation "Rename File" on target "Item: D:\\test\\1097217\nRated\xc5\xa1\xc3\xb6\xc3\xbc.txt Destination: D:\\test\\1097217Rated\xc5\xa1\xc3\xb6\xc3\xbc.txt".\n
Run Code Online (Sandbox Code Playgroud)\n\n更新cmd
输出
基于 JosefZ 的脚本,这里是一个递归工作的修改版本:
Get-ChildItem "X:\" -Recurse | ForEach-Object {
$y = $_.Name.Normalize("FormC")
$file = $_.Fullname
if ( $y.Length -ne $_.Name.Length ) {
Rename-Item -LiteralPath "$file" -NewName "$y" -WhatIf
Write-Host "renamed file $file"
}
}
Run Code Online (Sandbox Code Playgroud)
-WhatIf
测试后移除。我遇到了路径太长的问题,但这是另一篇文章的主题。
归档时间: |
|
查看次数: |
12343 次 |
最近记录: |