如何使用PowerShell显示Unicode字符名称及其十六进制代码?

Nem*_*XXX 3 unicode powershell surrogate-pairs emoji

由于默认的 Windows PowerShell 控制台字体不支持表情符号,因此我想显示它们的代理对十六进制代码,最好还显示它们的 Unicode 字符名称以进行调试。

我知道如何将表情符号转换为字节数组,但我还没有弄清楚如何将它们转换为代理对十六进制代码和 Unicode 字符名称。

$ThumbsUp = ""
$Bytes = [system.Text.Encoding]::UTF8.GetBytes($ThumbsUp)
# output
#240
#159
#145
#141
Run Code Online (Sandbox Code Playgroud)

我需要的是以下输出:

$Hex = 0x1F44D 
$CharName = "Thumbs Up Sign"
Run Code Online (Sandbox Code Playgroud)

即,以下命令应将十六进制值转换回 Emoj:

[char]::ConvertFromUtf32($Hex)
# output
#
Run Code Online (Sandbox Code Playgroud)

zet*_*t42 5

部分答案 - 我只知道如何获取 UTF-32 代码点:

$ThumbsUp = ""
$utf32bytes = [System.Text.Encoding]::UTF32.GetBytes( $ThumbsUp )
$codePoint = [System.BitConverter]::ToUint32( $utf32bytes )
"0x{0:X}" -f $codePoint
Run Code Online (Sandbox Code Playgroud)

输出:

0x1F44D
Run Code Online (Sandbox Code Playgroud)

对于字符名称,您可以在这里找到答案: Finding out Unicode character name in .Net


Jos*_*efZ 5

也许下面的脚本(我更广泛的项目的一部分)可以有所帮助。该脚本定义了相当复杂的Get-CharInfo函数。

\n

例子:'r \xc5\x98',0x1F44D|chr -OutUni -OutHex -OutStr -IgnoreWhiteSpace

\n
r \xc5\x98\n0x0072,0x002C,0x0158,0x0001F44D\n\\u0072\\u002C\\u0158\\U0001F44D\nChar CodePoint                              Category Description\n---- ---------                              -------- -----------\n   r {U+0072, 0x72}                  LowercaseLetter Latin Small Letter R\n   \xc5\x98 {U+0158, 0xC5,0x98}             UppercaseLetter Latin Capital Letter R With Caron\n   {U+1F44D, 0xF0,0x9F,0x91,0x8D}               So THUMBS UP SIGN (0xd83d,0xdc4d)\n #             \xe2\x86\x91 UFF-8                               \xe2\x86\x91 name          \xe2\x86\x91 surrogates\n
Run Code Online (Sandbox Code Playgroud)\n

代码(函数体末尾基于注释的帮助):

\n
# Get-CharInfo function. Activate dot-sourced\n# . .\\_get-CharInfo_2.1.ps1\n# Comment-based help at the end of the function body\n# History notes at the very end of the script\n\nif ( -not ('Microsofts.CharMap.UName' -as [type]) ) {\n  Add-Type -Name UName -Namespace Microsofts.CharMap -MemberDefinition $(\n    switch ("$([System.Environment]::SystemDirectory -replace \n                '\\\\', '\\\\')\\\\getuname.dll") {\n    {Test-Path -LiteralPath $_ -PathType Leaf} {@"\n[DllImport("${_}", ExactSpelling=true, SetLastError=true)]\nprivate static extern int GetUName(ushort wCharCode, \n    [MarshalAs(UnmanagedType.LPWStr)] System.Text.StringBuilder buf);\n\npublic static string Get(char ch) {\n    var sb = new System.Text.StringBuilder(300);\n    UName.GetUName(ch, sb);\n    return sb.ToString();\n}\n"@\n    }\n    default {'public static string Get(char ch) { return "???"; }'}\n    })\n}\nfunction Get-CharInfo {\n    [CmdletBinding()]\n    [OutputType([System.Management.Automation.PSCustomObject],\n                [System.Array])]\n    param(\n        # named or positional: a string or a number e.g. 'r \xc5\x98'\n        # pipeline: an array of strings and numbers, e.g 'r \xc5\x98',0x1f44d\n        [Parameter(Position=0, Mandatory, ValueFromPipeline)]\n        $InputObject,\n        # + Write-Host Python-like Unicode literal e.g. \\u0072\\u0020\\u0158\\U0001F44D\n        [Parameter()]\n        [switch]$OutUni,\n        # + Write-Host array of hexadecimals e.g. 0x0072,0x0020,0x0158,0x0001F44D\n        [Parameter()]\n        [switch]$OutHex,\n        # + Write-Host concatenated string e.g. r \xc5\x98\n        [Parameter()]\n        [switch]$OutStr,\n        # choke down whitespaces ( $s -match '\\s' ) from output\n        [Parameter()]\n        [switch]$IgnoreWhiteSpace,\n        # from https://www.unicode.org/Public/UNIDATA/UnicodeData.txt\n        [Parameter()]\n        [string]$UnicodeData = 'D:\\Utils\\CodePages\\UnicodeData.txt'\n    )\n    begin {\n        Set-StrictMode -Version latest\n        if ( [string]::IsNullOrEmpty( $UnicodeData) ) { $UnicodeData = '::' }\n        Function ReadUnicodeRanges {\n            if ($Script:UnicodeFirstLast.Count -eq 0) {\n                $Script:UnicodeFirstLast = @'\n                    First,Last,Category,Description\n                    128,128,Cc-Control,Padding Character\n                    129,129,Cc-Control,High Octet Preset\n                    132,132,Cc-Control,Index\n                    153,153,Cc-Control,Single Graphic Character Introducer\n                    13312,19903,Lo-Other_Letter,CJK Ideograph Extension A\n                    19968,40956,Lo-Other_Letter,CJK Ideograph\n                    44032,55203,Lo-Other_Letter,Hangul Syllable\n                    94208,100343,Lo-Other_Letter,Tangut Ideograph\n                    101632,101640,Lo-Other_Letter,Tangut Ideograph Supplement\n                    131072,173789,Lo-Other_Letter,CJK Ideograph Extension B\n                    173824,177972,Lo-Other_Letter,CJK Ideograph Extension C\n                    177984,178205,Lo-Other_Letter,CJK Ideograph Extension D\n                    178208,183969,Lo-Other_Letter,CJK Ideograph Extension E\n                    183984,191456,Lo-Other_Letter,CJK Ideograph Extension F\n                    196608,201546,Lo-Other_Letter,CJK Ideograph Extension G\n                    983040,1048573,Co-Private_Use,Plane 15 Private Use\n                    1048576,1114109,Co-Private_Use,Plane 16 Private Use\n'@ | ConvertFrom-Csv -Delimiter ',' |\n                ForEach-Object {\n                    [PSCustomObject]@{\n                        First      = [int]$_.First\n                        Last       = [int]$_.Last\n                        Category   = $_.Category\n                        Description= $_.Description\n                    }\n                }\n            }\n            foreach ( $FirstLast in $Script:UnicodeFirstLast) {\n                if ( $FirstLast.First -le $ch -and $ch -le $FirstLast.Last ) {\n                    $out.Category = $FirstLast.Category\n                    $out.Description = $FirstLast.Description + $nil\n                    break\n                }\n            }\n        }\n        $AuxHex = [System.Collections.ArrayList]::new()\n        $AuxStr = [System.Collections.ArrayList]::new()\n        $AuxUni = [System.Collections.ArrayList]::new()\n        $Script:UnicodeFirstLast = @()\n        $Script:UnicodeDataLines = @()\n        function ReadUnicodeData {\n            if ( $Script:UnicodeDataLines.Count -eq 0 -and (Test-Path $UnicodeData) ) {\n                 $Script:UnicodeDataLines = @([System.IO.File]::ReadAllLines(\n                        $UnicodeData, [System.Text.Encoding]::UTF8))\n            }\n            $DescrLine = $Script:UnicodeDataLines -match ('^{0:X4}\\;' -f $ch)\n            if ( $DescrLine.Count -gt 0) {\n                $u0, $Descr, $Categ, $u3 = $DescrLine[0] -split ';'\n                $out.Category = $Categ\n                $out.Description = $Descr + $nil\n            }\n        }\n        function out {\n            param(\n                [Parameter(Position=0, Mandatory=$true )] $ch,\n                [Parameter(Position=1, Mandatory=$false)]$nil=''\n                 )\n            if (0 -le $ch -and 0xFFFF -ge $ch) {\n                [void]$AuxHex.Add('0x{0:X4}' -f $ch)\n                $s = [char]$ch\n                [void]$AuxStr.Add($s)\n                [void]$AuxUni.Add('\\u{0:X4}' -f $ch)\n                $out = [pscustomobject]@{\n                    Char      = $s\n                    CodePoint = ('U+{0:X4}' -f $ch),\n                        (([System.Text.UTF32Encoding]::UTF8.GetBytes($s) |\n                            ForEach-Object { '0x{0:X2}' -f $_ }) -join ',')\n                    Category  = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($ch)\n                    Description = [Microsofts.CharMap.UName]::Get($ch)\n                }\n                if ( $out.Description -eq 'Undefined' ) { ReadUnicodeRanges }\n                if ( $out.Description -eq 'Undefined' ) { ReadUnicodeData }\n            } elseif (0x10000 -le $ch -and 0x10FFFF -ge $ch) {\n                [void]$AuxHex.Add('0x{0:X8}' -f $ch)\n                $s = [char]::ConvertFromUtf32($ch)\n                [void]$AuxStr.Add($s)\n                [void]$AuxUni.Add('\\U{0:X8}' -f $ch)\n                $out = [pscustomobject]@{\n                    Char        = $s\n                    CodePoint   = ('U+{0:X}' -f $ch),\n                        (([System.Text.UTF32Encoding]::UTF8.GetBytes($s) |\n                            ForEach-Object { '0x{0:X2}' -f $_ }) -join ',')\n                    Category    = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($s, 0)\n                    Description = '???' + $nil\n                }\n                ReadUnicodeRanges \n                if ( $out.Description -eq ('???' + $nil) ) { ReadUnicodeData }\n            } else {\n                Write-Warning ('Character U+{0:X4} is out of range' -f $ch)\n                $s = $null\n            }\n            if (( $null -eq $s ) -or\n                ( $IgnoreWhiteSpace.IsPresent -and ( $s -match '\\s' ))\n               ) {\n            } else {\n                $out\n            }\n        }\n    }\n    process {\n        #if ($PSBoundParameters['Verbose']) {\n        #    Write-Warning "InputObject $InputObject, type = $($InputObject.GetType().Name)"\n        #}\n        if ( ($InputObject -as [int]) -gt 0xFFFF -and \n             ($InputObject -as [int]) -le 0x10ffff ) {\n            $InputObject = [string][char]::ConvertFromUtf32($InputObject)\n        }\n        if ($null -cne ($InputObject -as [char])) {\n            #Write-Verbose "A $([char]$InputObject) InputObject character"\n            out $([int][char]$InputObject) ''\n        } elseif (  $InputObject -isnot [string] -and \n                    $null -cne ($InputObject -as [int])) {\n            #Write-Verbose "B $InputObject InputObject"\n            out $([int]$InputObject) ''\n        } else {\n            $InputObject = [string]$InputObject\n            #Write-Verbose "C $InputObject InputObject.Length $($InputObject.Length)"\n            for ($i = 0; $i -lt $InputObject.Length; ++$i) {\n                if (  [char]::IsHighSurrogate($InputObject[$i]) -and \n                      (1+$i) -lt $InputObject.Length -and \n                      [char]::IsLowSurrogate($InputObject[$i+1])) {\n                    $aux = ' (0x{0:x4},0x{1:x4})' -f [int]$InputObject[$i], \n                                                   [int]$InputObject[$i+1]\n                    # Write-Verbose "surrogate pair $aux at position $i" \n                    out $([char]::ConvertToUtf32($InputObject[$i], $InputObject[1+$i])) $aux\n                    $i++\n                } else {\n                    out $([int][char]$InputObject[$i]) ''\n                }\n            }\n        }\n    }\n    end {\n        if ( $OutStr.IsPresent -or $PSBoundParameters['Verbose']) {\n            Write-Host -ForegroundColor Magenta -Object $($AuxStr -join '')\n        }\n        if ( $OutHex.IsPresent -or $PSBoundParameters['Verbose']) {\n            Write-Host -ForegroundColor Cyan -Object $($AuxHex -join ',')\n        }\n        if ( $OutUni.IsPresent -or $PSBoundParameters['Verbose']) {\n            Write-Host -ForegroundColor Yellow -Object $($AuxUni -join '')\n        }\n    }\n<#\n.SYNOPSIS\nReturn basic information about supplied Unicode characters.\n\n.DESCRIPTION\nReturn information about supplied Unicode characters:\n    - as a PSCustomObject for programming purposes,\n    - in a human-readable form, and\n    - with optional additional output to the Information Stream.\n\nProperties of the output PSCustomObject are as follows:\n\nChar        The character itself (if renderable)\nCodePoint   [string[]]Unicode CodePoint, its UTF-8 byte sequence\nCategory    General Category (long name or abbreviation)\nDescription Name (and surrogate pair in parentheses if apply).\n\n.INPUTS\n    An array of characters, strings and numbers (in any combination)\n    can be piped to the function as parameter $InputObject, e.g as\n    "\xce\xa7\xd0\x90B",[char]4301,191,0x1F3DE | Get-CharInfo\n    or (the same in terms of decimal numbers) as\n    935,1040,66,4301,191,127966 | Get-CharInfo\n\n    On the other side, the $InputObject parameter supplied named\n    or positionally must be of the only base type: either a number\n    or a character or a string.\n    The same input as a string:\n    Get-CharInfo -InputObject '\xce\xa7\xd0\x90B\xe1\x83\x8d\xc2\xbf'\n\n    -Verbose implies all -OutUni, -OutHex and -OutStr\n\n.OUTPUTS\n    [System.Management.Automation.PSCustomObject]\n    [Object[]]    (an array like [PSCustomObject[]])\n\n.NOTES\n    The UnicodeData.txt file (if used) must be saved locally\n    from https://www.unicode.org/Public/UNIDATA/UnicodeData.txt\n    (currently Unicode 13.0.0)\n\n    The UnicodeData.txt file is not required however, in such case,\n    Get-CharInfo function could be return inaccurate properties\n    Category and Description for characters above BMP, see Example-3.\n\n.LINK\n    Unicode\xc2\xae Standard Annex #44: Unicode Character Database (UCD)\n.LINK\n    https://www.unicode.org/reports/tr44/\n.LINK\n    https://www.unicode.org/reports/tr44/#General_Category_Values\n\n.EXAMPLE\n# full (first three lines are in the Information Stream)\n'r \xc5\x98'|Get-CharInfo -OutUni -OutHex -OutStr -IgnoreWhiteSpace\n\nr \xc5\x98\n0x0072,0x0020,0x0158,0x0001F44D\n\\u0072\\u0020\\u0158\\U0001F44D\nChar CodePoint                             Category Description                \n---- ---------                             -------- -----------                \n   r {U+0072, 0x72}                 LowercaseLetter Latin Small Letter R       \n   \xc5\x98 {U+0158, 0xC5,0x98}            UppercaseLetter Latin Capital Letter R W...\n   {U+1F44D, 0xF0,0x9F,0x91,0x8D}              So THUMBS UP SIGN (0xd83d,0...\n\n\n.EXAMPLE\n# shortened version of above (output is the same)\n'r \xc5\x98'|chr -Verbose -IgnoreWhiteSpace\n\n.EXAMPLE\n# inaccurate (inexact) output above BMP if missing UnicodeData.txt\n'r \xc5\x98'|chr -Verbose -IgnoreWhiteSpace -UnicodeData .\\foo.bar\n\nr \xc5\x98\n0x0072,0x0020,0x0158,0x0001F44D\n\\u0072\\u0020\\u0158\\U0001F44D\nChar CodePoint                             Category Description                \n---- ---------                             -------- -----------                \n   r {U+0072, 0x72}                 LowercaseLetter Latin Small Letter R       \n   \xc5\x98 {U+0158, 0xC5,0x98}            UppercaseLetter Latin Capital Letter R W...\n   {U+1F44D, 0xF0,0x9F,0x91,0x8D}     OtherSymbol ??? (0xd83d,0xdc4d)        \n\n\n.FUNCTIONALITY\nTested: Windows 8.1/64bit, Powershell 4\n        Windows 10 /64bit, Powershell 5\n        Windows 10 /64bit, Powershell Core 6.2.0\n        Windows 10 /64bit, Powershell Core 7.1.0\n#>\n}\nSet-Alias -Name chr -Value Get-CharInfo\n\n<#\nHISTORY NOTES\n\nOrigin by: http://poshcode.org/5234\n           http://fossil.include-once.org/poshcode/artifact/5757dbbd0bc26c84333e7cf4ccc330ab89447bf679e86ddd6fbd3589ca24027e\n\nLicense: CC0\n  https://creativecommons.org/publicdomain/zero/1.0/legalcode\n\nActivate dot-sourced like this (apply a real path instead of .\\):\n. .\\Get-CharInfo.ps1\n\nImproved by: /sf/users/240758311/\n             (to version 2)\n#>\n
Run Code Online (Sandbox Code Playgroud)\n