是否存在指定的（子）索引分隔符？

Question

是否存在指定的（子）索引分隔符？

iRo*_*Ron 2 .net indexing powershell hashtable

背景

在 PowerShell 中构建哈希表以通过特定属性快速访问对象是很常见的，例如基于以下内容建立索引LastName：

$List =  ConvertFrom-Csv @'
Id, LastName, FirstName, Country
 1, Aerts,    Ronald,    Belgium
 2, Berg,     Ashly,     Germany
 3, Cook,     James,     England
 4, Duval,    Frank,     France
 5, Lyberg,   Ash,       England
 6, Fischer,  Adam,      Germany
'@

$Index = @{}
$List |ForEach-Object { $Index[$_.LastName] = $_ }

$Index.Cook

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

Run Code Online (Sandbox Code Playgroud)

在某些情况下，需要在两个（甚至更多）属性上构建索引，例如 theFirstName和 the LastName。为此，您可以创建一个多维密钥，例如：

$Index = @{}
$List |ForEach-Object {
     $Index[$_.FirstName] = @{}
     $Index[$_.FirstName][$_.LastName] = $_
}

$Index.James.Cook

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

Run Code Online (Sandbox Code Playgroud)

但连接这两个属性更容易（甚至可能更快）。如果仅用于检查条目是否存在：如果条目不存在，$Index.ContainsKey('James').ContainsKey('Cook')则可能会发生错误。要连接属性，需要在属性之间使用分隔符，否则不同的属性列表可能最终会成为相同的键。正如这个例子：和。FirstName
AshlyBergAshLyberg

$Index = @{}
$List |ForEach-Object { $Index["$($_.FirstName)`t$($_.LastName)"] = $_ }

$Index."James`tCook"

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

Run Code Online (Sandbox Code Playgroud)

注意：以上是最小的、可重复的示例。在现实生活中，我多次遇到以下问题，其中通常包括连接背景和索引中使用的属性数量可变的对象。

问题：

对于这种情况，加入（串联）属性是一个好习惯吗？
如果是，是否有（标准？）分隔符？（意味着永远不应在属性名称中使用/存在的字符或字符序列）

Answer 1

mkl*_*nt0 5

多组件哈希表（字典）键没有内置分隔符。

至于自定义分隔符：对于组件本身中不太可能出现的字符，最好的选择是（带有代码点的NUL字符），您可以将其表示为在 PowerShell 中。然而，在每次查找时执行基于约定的字符串操作是很尴尬的（例如），并且通常只有在对关键组件进行字符串化可行的情况下才有效- 或者如果它们一开始都是字符串，如您的示例所示。0x0"`0"$Index."James`0Cook"

使用数组作为多组件键在语法上更可取，但使用集合通常不能按原样工作，因为 .NET引用类型通常不会有意义地比较不同的实例，即使它们碰巧表示相同的数据 - 请参阅此答案。

注意：以下假设充当键的集合元素确实进行有意义的比较（它们本身是字符串或 .NET值类型或具有自定义相等逻辑的 .NET 引用类型）。如果该假设不成立，则没有强大的通用解决方案，但链接答案中显示的基于 CLIXML 序列化的尽力而为的方法可能会起作用，这是您自己提出的。

zett42 的有用答案使用tuples，它确实对成员包含相同数据的不同实例执行有意义的比较。然而，需要为每个添加/修改/查找构造一个元组实例在语法上很尴尬（例如，
$Index.([Tuple]::Create('James', 'Cook'))

有一种方法可以使常规 PowerShell数组作为哈希表键工作，这种方式只会增加创建哈希表（调用构造函数）的复杂性，同时允许常规数组语法进行添加/更新和查找（例如，$Index.('James', 'Cook')）。

注意：以下内容与hashtables相同[ordered]，但是必须通过其真实类型名称来引用，以便能够调用构造，即[System.Collections.Specialized.OrderedDictionary].
但是，它不适用于通用词典( [System.Collections.Generic.Dictionary[TKey, TValue]])。

# Sample objects for the hashtable. $list = ConvertFrom-Csv @' Id, LastName, FirstName, Country 1, Aerts, Ronald, Belgium 2, Berg, Ashly, Germany 3, Cook, James, England 4, Duval, Frank, France 5, Lyberg, Ash, England 6, Fischer, Adam, Germany '@ # Initialize the hashtable with a structural equality comparer, i.e. # a comparer that compares the *elements* of the array and only returns $true # if *all* compare equal. # This relies on the fact that [System.Array] implements the # [System.Collections.IStructuralEquatable] interface. $dict = [hashtable]::new([Collections.StructuralComparisons]::StructuralEqualityComparer) # Add entries that map the combination of first name and last name # to each object in $list. # Note the regular array syntax. $list.ForEach({ $dict.($_.FirstName, $_.LastName) = $_ }) # Use regular array syntax for lookups too. # Note: CASE MATTERS $dict.('James', 'Cook')
Run Code Online (Sandbox Code Playgroud)
重要提示：与常规 PowerShell 哈希表不同，上面执行区分大小写的比较（如 zett42 的元组解决方案所做的那样）。

进行不区分大小写的比较需要更多工作[System.Collections.IEqualityComparer]，因为需要接口的自定义实现，即提供以下内容的不区分大小写的实现：[System.Collections.StructuralComparisons]::StructuralEqualityComparer

# Case-insensitive IEqualityComparer implementation for # use of arrays as dictionary keys. # Note: Dictionary keys cannot be $null, so there is no need for $null checks. class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer { [bool] Equals([object] $o1, [object] $o2) { return ([System.Collections.IStructuralEquatable] [array] $o1).Equals([array] $o2, [System.StringComparer]::InvariantCultureIgnoreCase) } [int] GetHashCode([object] $o) { return ([System.Collections.IStructuralEquatable] [array] $o).GetHashCode([StringComparer]::InvariantCultureIgnoreCase) } } # Pass an instance of the custom equality comparer to the constructor. $dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())
Run Code Online (Sandbox Code Playgroud)
笔记：

Santiago Squarzon发现了([System.Collections.IStructuralEquatable] $o).GetHashCode([StringComparer]::InvariantCultureIgnoreCase)一种内置方法，可以根据数组元素的不区分大小写的哈希码来获取数组的哈希码。

下面的原始解决方案逐个元素计算数组的不区分大小写的哈希码，这既比较麻烦又效率较低。也许他们仍然对哈希码的计算方式感兴趣。

可选阅读：逐元素哈希码实现：

# Case-insensitive IEqualityComparer implementation for arrays. # See the bottom section of this answer for a better .NET 7+ alternative. class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer { [bool] Equals([object] $o1, [object] $o2) { return ([System.Collections.IStructuralEquatable] [array] $o1).Equals([array] $o2, [System.StringComparer]::InvariantCultureIgnoreCase) } [int] GetHashCode([object] $o) { if ($o -isnot [Array]) { return $o.GetHashCode() } [int] $hashCode = 0 foreach ($el in $o) { if ($null -eq $el) { continue } elseif ($el -is [string]) { $hashCode = $hashCode -bxor $el.ToLowerInvariant().GetHashCode() } else { $hashCode = $hashCode -bxor $el.GetHashCode() } } return $hashCode } } $list = ConvertFrom-Csv @' Id, LastName, FirstName, Country 1, Aerts, Ronald, Belgium 2, Berg, Ashly, Germany 3, Cook, James, England 4, Duval, Frank, France 5, Lyberg, Ash, England 6, Fischer, Adam, Germany '@ # Pass an instance of the custom equality comparer to the constructor. $dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new()) $list.ForEach({ $dict.($_.FirstName, $_.LastName) = $_ }) # Now, case does NOT matter. $dict.('james', 'cook')
Run Code Online (Sandbox Code Playgroud)

关于上面自定义比较器类中的.GetHashCode()实现的注释：

.GetHashCode()需要自定义实现来为所有比较相等的对象返回相同的哈希码（一个值）（即，如果是，并且必须返回相同的值）。[int]$o1 -eq $o2$true$o1.GetHashCode()$o2.GetHashCode()

虽然哈希码不需要是唯一的（并且不能在所有情况下都是唯一的），但理想情况下，共享相同哈希码的对象尽可能少，因为这会减少所谓的冲突数量，从而降低哈希表的查找效率- 有关背景信息，请参阅相关的维基百科文章。

上面的实现使用相当简单的-bxor基于（按位异或）的算法，这会为具有相同元素但顺序不同的两个数组生成相同的哈希码。

帮助.GetHashCode()主题显示了更复杂的方法，包括使用辅助元组实例，因为它的哈希码算法是顺序感知的 - 虽然简单，但这种方法的计算成本很高，并且需要更多的工作才能获得更好的性能。请参阅底部部分了解 .NET 7+ 选项。

zett42的碰撞测试代码（改编），它确定 1000 个具有给定数量的元素（随机字符串值）的数组中有多少个导致相同的哈希码，即产生碰撞，并从中计算碰撞百分比。如果您需要提高上述实现的效率，您可以使用此代码来测试它（也可能测量测试的运行时间以查看不同实现的比较）。

# Create an instance of the custom comparer defined above. $cmp = [CaseInsensitiveArrayEqualityComparer]::new() $numArrays = 1000 foreach ($elementCount in 2..5 + 10) { $numUniqueHashes = ( 1..$numArrays | ForEach-Object { $cmp.GetHashCode(@(1..$elementCount | ForEach-Object { "$(New-Guid)" })) } | Sort-Object -Unique ).Count [pscustomobject] @{ ElementCount = $elementCount CollisionPercentage = '{0:P2}' -f (($numArrays - $numUniqueHashes) / $numArrays) } }
Run Code Online (Sandbox Code Playgroud)
所有测试的 about 输出 0%，因此该-bxor方法似乎足以防止冲突，至少与随机字符串发生冲突，并且不包括仅元素顺序不同的数组变体。请继续阅读，了解卓越的 .NET 7+ 解决方案。

.NET 7+ 中的高级自定义相等比较器实现（至少需要 PowerShell 7.3 的预览版本）：

zett42 指出[HashCode]::Combine()，在 .NET 7+ 中可用，可以实现更高效的实现，因为它：

具有订单意识

允许在单个操作中确定多个值的哈希码。

笔记：

该方法仅限于最多8数组元素- 但对于多组件来说应该足够了。

要组合的值（在本例中为数组元素）必须作为单独的参数传递给方法 - 将数组作为一个整体传递并不能按预期工作。这使得实现起来有些麻烦。

# .NET 7+ / PowerShell 7.3+ # Case-insensitive IEqualityComparer implementation for arrays # using [HashCode]::Combine() - limited to 8 elements. class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer { [bool] Equals([object] $o1, [object] $o2) { return ([System.Collections.IStructuralEquatable] [array] $o1).Equals([array] $o2, [System.StringComparer]::InvariantCultureIgnoreCase) } [int] GetHashCode([object] $o) { if ($o -isnot [Array] -or 0 -eq $o.Count) { return $o.GetHashCode() } $o = $o.ForEach({ $_ -is [string] ? $_.ToLowerInvariant() : $_ }) $hashCode = switch ($o.Count) { 1 { [HashCode]::Combine($o[0]) } 2 { [HashCode]::Combine($o[0], $o[1]) } 3 { [HashCode]::Combine($o[0], $o[1], $o[2]) } 4 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3]) } 5 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4]) } 6 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5]) } 7 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5], $o[6]) } 8 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5], $o[6], $o[7]) } default { throw 'Not implemented for more than 8 array elements.' } } return $hashCode } }
Run Code Online (Sandbox Code Playgroud)
然而，正如 zett42 指出的那样，您可以通过在循环中迭代调用来克服值计数限制[HashCode]::Combine() 。

在不区分大小写的实现的情况下，考虑到您无论如何都需要一个循环，即为了调用.ToLowerInvariant()类型[string]值（这就是.ForEach()上面的调用隐式执行的操作），这并不是太多的开销。

这是他的实现：

# .NET 7+ / PowerShell 7.3+ # Case-insensitive IEqualityComparer implementation for arrays # using [HashCode]::Combine() *iteratively*, with *no* element-count limit. class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer { [bool] Equals([object] $o1, [object] $o2) { return ([System.Collections.IStructuralEquatable] [array] $o1).Equals([array] $o2, [System.StringComparer]::InvariantCultureIgnoreCase) } [int] GetHashCode([object] $o) { if ($o -isnot [Array] -or 0 -eq $o.Count) { return $o.GetHashCode() } $hashCode = 0 $o.ForEach({ $value = $_ -is [string] ? $_.ToLowerInvariant() : $_ $hashCode = [HashCode]::Combine( $hashCode, $value ) }) return $hashCode } }
Run Code Online (Sandbox Code Playgroud)

谢谢，@圣地亚哥。答案很简单：因为我不知道:)干得好。请查看我的更新。 (2认同)

归档时间：	3 年，4 月前
查看次数：	291 次
最近记录：	2 年，10 月前