Jul*_*iet 20 c# performance treap data-structures
现在我总是听说二进制搜索树比随机选择的数据更快地构建,而不是有序数据,因为有序数据需要显式重新平衡以保持树高最小.
最近我实现了一个不可变的treap,一种特殊的二叉搜索树,它使用随机化来保持自己相对平衡.与我的预期相反,我发现我可以持续建立一个快速约2倍的treap,并且通常比有序数据更好地平衡 - 而且我不知道为什么.
这是我的treap实现:
这是一个测试程序:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Diagnostics;
namespace ConsoleApplication1
{
class Program
{
static Random rnd = new Random();
const int ITERATION_COUNT = 20;
static void Main(string[] args)
{
List<double> rndTimes = new List<double>();
List<double> orderedTimes = new List<double>();
rndTimes.Add(TimeIt(50, RandomInsert));
rndTimes.Add(TimeIt(100, RandomInsert));
rndTimes.Add(TimeIt(200, RandomInsert));
rndTimes.Add(TimeIt(400, RandomInsert));
rndTimes.Add(TimeIt(800, RandomInsert));
rndTimes.Add(TimeIt(1000, RandomInsert));
rndTimes.Add(TimeIt(2000, RandomInsert));
rndTimes.Add(TimeIt(4000, RandomInsert));
rndTimes.Add(TimeIt(8000, RandomInsert));
rndTimes.Add(TimeIt(16000, RandomInsert));
rndTimes.Add(TimeIt(32000, RandomInsert));
rndTimes.Add(TimeIt(64000, RandomInsert));
rndTimes.Add(TimeIt(128000, RandomInsert));
string rndTimesAsString = string.Join("\n", rndTimes.Select(x => x.ToString()).ToArray());
orderedTimes.Add(TimeIt(50, OrderedInsert));
orderedTimes.Add(TimeIt(100, OrderedInsert));
orderedTimes.Add(TimeIt(200, OrderedInsert));
orderedTimes.Add(TimeIt(400, OrderedInsert));
orderedTimes.Add(TimeIt(800, OrderedInsert));
orderedTimes.Add(TimeIt(1000, OrderedInsert));
orderedTimes.Add(TimeIt(2000, OrderedInsert));
orderedTimes.Add(TimeIt(4000, OrderedInsert));
orderedTimes.Add(TimeIt(8000, OrderedInsert));
orderedTimes.Add(TimeIt(16000, OrderedInsert));
orderedTimes.Add(TimeIt(32000, OrderedInsert));
orderedTimes.Add(TimeIt(64000, OrderedInsert));
orderedTimes.Add(TimeIt(128000, OrderedInsert));
string orderedTimesAsString = string.Join("\n", orderedTimes.Select(x => x.ToString()).ToArray());
Console.WriteLine("Done");
}
static double TimeIt(int insertCount, Action<int> f)
{
Console.WriteLine("TimeIt({0}, {1})", insertCount, f.Method.Name);
List<double> times = new List<double>();
for (int i = 0; i < ITERATION_COUNT; i++)
{
Stopwatch sw = Stopwatch.StartNew();
f(insertCount);
sw.Stop();
times.Add(sw.Elapsed.TotalMilliseconds);
}
return times.Average();
}
static void RandomInsert(int insertCount)
{
Treap<double> tree = new Treap<double>((x, y) => x.CompareTo(y));
for (int i = 0; i < insertCount; i++)
{
tree = tree.Insert(rnd.NextDouble());
}
}
static void OrderedInsert(int insertCount)
{
Treap<double> tree = new Treap<double>((x, y) => x.CompareTo(y));
for(int i = 0; i < insertCount; i++)
{
tree = tree.Insert(i + rnd.NextDouble());
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
这是一个比较随机和有序插入时间的图表,以毫秒为单位:
Insertions Random Ordered RandomTime / OrderedTime
50 1.031665 0.261585 3.94
100 0.544345 1.377155 0.4
200 1.268320 0.734570 1.73
400 2.765555 1.639150 1.69
800 6.089700 3.558350 1.71
1000 7.855150 4.704190 1.67
2000 17.852000 12.554065 1.42
4000 40.157340 22.474445 1.79
8000 88.375430 48.364265 1.83
16000 197.524000 109.082200 1.81
32000 459.277050 238.154405 1.93
64000 1055.508875 512.020310 2.06
128000 2481.694230 1107.980425 2.24
Run Code Online (Sandbox Code Playgroud)
我没有在代码中看到任何使有序输入渐近比无序输入更快的输入,所以我无法解释其中的差异.
为什么从有序输入构建treap比随机输入快得多?
Gug*_*uge 10
我运行了你的代码,我认为它与旋转次数有关.在有序输入期间,旋转次数是最佳的,并且树将永远不必旋转.
在随机输入期间,树将必须执行更多旋转,因为它可能必须来回旋转.
要真正找到答案,我必须为每次运行添加左右旋转数字的计数器.你可以自己做.
更新:
我在rotateleft和rotateright上设置了断点.在有序输入期间,从不使用rotateright.在随机输入期间,两者都被击中,在我看来它们被更频繁地使用.
更新2:
我在50项订购运行中添加了一些输出(为了清晰起见,用整数代替),以了解更多信息:
TimeIt(50, OrderedInsert)
LastValue = 0, Top.Value = 0, Right.Count = 0, Left.Count = 0
RotateLeft @value=0
LastValue = 1, Top.Value = 1, Right.Count = 0, Left.Count = 1
LastValue = 2, Top.Value = 1, Right.Count = 1, Left.Count = 1
LastValue = 3, Top.Value = 1, Right.Count = 2, Left.Count = 1
RotateLeft @value=3
RotateLeft @value=2
RotateLeft @value=1
LastValue = 4, Top.Value = 4, Right.Count = 0, Left.Count = 4
LastValue = 5, Top.Value = 4, Right.Count = 1, Left.Count = 4
LastValue = 6, Top.Value = 4, Right.Count = 2, Left.Count = 4
RotateLeft @value=6
LastValue = 7, Top.Value = 4, Right.Count = 3, Left.Count = 4
LastValue = 8, Top.Value = 4, Right.Count = 4, Left.Count = 4
RotateLeft @value=8
RotateLeft @value=7
LastValue = 9, Top.Value = 4, Right.Count = 5, Left.Count = 4
LastValue = 10, Top.Value = 4, Right.Count = 6, Left.Count = 4
RotateLeft @value=10
RotateLeft @value=9
RotateLeft @value=5
RotateLeft @value=4
LastValue = 11, Top.Value = 11, Right.Count = 0, Left.Count = 11
LastValue = 12, Top.Value = 11, Right.Count = 1, Left.Count = 11
RotateLeft @value=12
LastValue = 13, Top.Value = 11, Right.Count = 2, Left.Count = 11
RotateLeft @value=13
LastValue = 14, Top.Value = 11, Right.Count = 3, Left.Count = 11
LastValue = 15, Top.Value = 11, Right.Count = 4, Left.Count = 11
RotateLeft @value=15
RotateLeft @value=14
LastValue = 16, Top.Value = 11, Right.Count = 5, Left.Count = 11
LastValue = 17, Top.Value = 11, Right.Count = 6, Left.Count = 11
RotateLeft @value=17
LastValue = 18, Top.Value = 11, Right.Count = 7, Left.Count = 11
LastValue = 19, Top.Value = 11, Right.Count = 8, Left.Count = 11
RotateLeft @value=19
LastValue = 20, Top.Value = 11, Right.Count = 9, Left.Count = 11
LastValue = 21, Top.Value = 11, Right.Count = 10, Left.Count = 11
RotateLeft @value=21
LastValue = 22, Top.Value = 11, Right.Count = 11, Left.Count = 11
RotateLeft @value=22
RotateLeft @value=20
RotateLeft @value=18
LastValue = 23, Top.Value = 11, Right.Count = 12, Left.Count = 11
LastValue = 24, Top.Value = 11, Right.Count = 13, Left.Count = 11
LastValue = 25, Top.Value = 11, Right.Count = 14, Left.Count = 11
RotateLeft @value=25
RotateLeft @value=24
LastValue = 26, Top.Value = 11, Right.Count = 15, Left.Count = 11
LastValue = 27, Top.Value = 11, Right.Count = 16, Left.Count = 11
RotateLeft @value=27
LastValue = 28, Top.Value = 11, Right.Count = 17, Left.Count = 11
RotateLeft @value=28
RotateLeft @value=26
RotateLeft @value=23
RotateLeft @value=16
RotateLeft @value=11
LastValue = 29, Top.Value = 29, Right.Count = 0, Left.Count = 29
LastValue = 30, Top.Value = 29, Right.Count = 1, Left.Count = 29
LastValue = 31, Top.Value = 29, Right.Count = 2, Left.Count = 29
LastValue = 32, Top.Value = 29, Right.Count = 3, Left.Count = 29
RotateLeft @value=32
RotateLeft @value=31
LastValue = 33, Top.Value = 29, Right.Count = 4, Left.Count = 29
RotateLeft @value=33
RotateLeft @value=30
LastValue = 34, Top.Value = 29, Right.Count = 5, Left.Count = 29
RotateLeft @value=34
LastValue = 35, Top.Value = 29, Right.Count = 6, Left.Count = 29
LastValue = 36, Top.Value = 29, Right.Count = 7, Left.Count = 29
LastValue = 37, Top.Value = 29, Right.Count = 8, Left.Count = 29
RotateLeft @value=37
LastValue = 38, Top.Value = 29, Right.Count = 9, Left.Count = 29
LastValue = 39, Top.Value = 29, Right.Count = 10, Left.Count = 29
RotateLeft @value=39
LastValue = 40, Top.Value = 29, Right.Count = 11, Left.Count = 29
RotateLeft @value=40
RotateLeft @value=38
RotateLeft @value=36
LastValue = 41, Top.Value = 29, Right.Count = 12, Left.Count = 29
LastValue = 42, Top.Value = 29, Right.Count = 13, Left.Count = 29
RotateLeft @value=42
LastValue = 43, Top.Value = 29, Right.Count = 14, Left.Count = 29
LastValue = 44, Top.Value = 29, Right.Count = 15, Left.Count = 29
RotateLeft @value=44
LastValue = 45, Top.Value = 29, Right.Count = 16, Left.Count = 29
LastValue = 46, Top.Value = 29, Right.Count = 17, Left.Count = 29
RotateLeft @value=46
RotateLeft @value=45
LastValue = 47, Top.Value = 29, Right.Count = 18, Left.Count = 29
LastValue = 48, Top.Value = 29, Right.Count = 19, Left.Count = 29
LastValue = 49, Top.Value = 29, Right.Count = 20, Left.Count = 29
Run Code Online (Sandbox Code Playgroud)
有序的项目总是自然地添加到树的右侧.当右侧比左侧大时,会发生旋转左侧.Rotateright永远不会发生.每次树加倍时,大致选择一个新的顶级节点.优先级值的随机性会稍微抖动它,因此在此次运行中它会变为0,1,4,11,29.
随机运行揭示了一些有趣的事:
TimeIt(50, RandomInsert)
LastValue = 0,748661640914465, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 0
LastValue = 0,669427539533669, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 1
RotateRight @value=0,669427539533669
LastValue = 0,318363281115127, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 2
RotateRight @value=0,669427539533669
LastValue = 0,33133987678743, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 3
RotateLeft @value=0,748661640914465
LastValue = 0,955126694382693, Top.Value = 0,955126694382693, Right.Count = 0, Left.Count = 4
RotateRight @value=0,669427539533669
RotateLeft @value=0,33133987678743
RotateLeft @value=0,318363281115127
RotateRight @value=0,748661640914465
RotateRight @value=0,955126694382693
LastValue = 0,641024029180884, Top.Value = 0,641024029180884, Right.Count = 3, Left.Count = 2
LastValue = 0,20709771951991, Top.Value = 0,641024029180884, Right.Count = 3, Left.Count = 3
LastValue = 0,830862050331599, Top.Value = 0,641024029180884, Right.Count = 4, Left.Count = 3
RotateRight @value=0,20709771951991
RotateRight @value=0,318363281115127
LastValue = 0,203250563798123, Top.Value = 0,641024029180884, Right.Count = 4, Left.Count = 4
RotateLeft @value=0,669427539533669
RotateRight @value=0,748661640914465
RotateRight @value=0,955126694382693
LastValue = 0,701743399585478, Top.Value = 0,641024029180884, Right.Count = 5, Left.Count = 4
RotateLeft @value=0,669427539533669
RotateRight @value=0,701743399585478
RotateLeft @value=0,641024029180884
LastValue = 0,675667521858433, Top.Value = 0,675667521858433, Right.Count = 4, Left.Count = 6
RotateLeft @value=0,33133987678743
RotateLeft @value=0,318363281115127
RotateLeft @value=0,203250563798123
LastValue = 0,531275219531392, Top.Value = 0,675667521858433, Right.Count = 4, Left.Count = 7
RotateRight @value=0,748661640914465
RotateRight @value=0,955126694382693
RotateLeft @value=0,701743399585478
LastValue = 0,704049674190604, Top.Value = 0,675667521858433, Right.Count = 5, Left.Count = 7
RotateRight @value=0,203250563798123
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
LastValue = 0,161392807104342, Top.Value = 0,161392807104342, Right.Count = 13, Left.Count = 0
RotateRight @value=0,203250563798123
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateLeft @value=0,161392807104342
LastValue = 0,167598206162266, Top.Value = 0,167598206162266, Right.Count = 13, Left.Count = 1
LastValue = 0,154996359793002, Top.Value = 0,167598206162266, Right.Count = 13, Left.Count = 2
RotateLeft @value=0,33133987678743
LastValue = 0,431767346538495, Top.Value = 0,167598206162266, Right.Count = 14, Left.Count = 2
RotateRight @value=0,203250563798123
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateLeft @value=0,167598206162266
LastValue = 0,173774613614089, Top.Value = 0,173774613614089, Right.Count = 14, Left.Count = 3
RotateRight @value=0,830862050331599
LastValue = 0,76559642412029, Top.Value = 0,173774613614089, Right.Count = 15, Left.Count = 3
RotateRight @value=0,76559642412029
RotateLeft @value=0,748661640914465
RotateRight @value=0,955126694382693
RotateLeft @value=0,704049674190604
RotateLeft @value=0,675667521858433
LastValue = 0,75742144871383, Top.Value = 0,173774613614089, Right.Count = 16, Left.Count = 3
LastValue = 0,346844367844446, Top.Value = 0,173774613614089, Right.Count = 17, Left.Count = 3
RotateRight @value=0,830862050331599
LastValue = 0,787565814232251, Top.Value = 0,173774613614089, Right.Count = 18, Left.Count = 3
LastValue = 0,734950566540915, Top.Value = 0,173774613614089, Right.Count = 19, Left.Count = 3
RotateLeft @value=0,20709771951991
RotateRight @value=0,318363281115127
RotateLeft @value=0,203250563798123
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateRight @value=0,75742144871383
RotateLeft @value=0,173774613614089
LastValue = 0,236504829598826, Top.Value = 0,236504829598826, Right.Count = 17, Left.Count = 6
RotateLeft @value=0,830862050331599
RotateLeft @value=0,787565814232251
RotateLeft @value=0,76559642412029
RotateRight @value=0,955126694382693
LastValue = 0,895606500048007, Top.Value = 0,236504829598826, Right.Count = 18, Left.Count = 6
LastValue = 0,599106418713511, Top.Value = 0,236504829598826, Right.Count = 19, Left.Count = 6
LastValue = 0,8182332901369, Top.Value = 0,236504829598826, Right.Count = 20, Left.Count = 6
RotateRight @value=0,734950566540915
LastValue = 0,704216948572647, Top.Value = 0,236504829598826, Right.Count = 21, Left.Count = 6
RotateLeft @value=0,346844367844446
RotateLeft @value=0,33133987678743
RotateRight @value=0,431767346538495
RotateLeft @value=0,318363281115127
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateRight @value=0,75742144871383
LastValue = 0,379157059536854, Top.Value = 0,236504829598826, Right.Count = 22, Left.Count = 6
RotateLeft @value=0,431767346538495
LastValue = 0,46832062046431, Top.Value = 0,236504829598826, Right.Count = 23, Left.Count = 6
RotateRight @value=0,154996359793002
LastValue = 0,0999000217299443, Top.Value = 0,236504829598826, Right.Count = 23, Left.Count = 7
RotateLeft @value=0,20709771951991
LastValue = 0,229543754006524, Top.Value = 0,236504829598826, Right.Count = 23, Left.Count = 8
RotateRight @value=0,8182332901369
LastValue = 0,80358425984326, Top.Value = 0,236504829598826, Right.Count = 24, Left.Count = 8
RotateRight @value=0,318363281115127
LastValue = 0,259324726769386, Top.Value = 0,236504829598826, Right.Count = 25, Left.Count = 8
RotateRight @value=0,318363281115127
LastValue = 0,307835293145774, Top.Value = 0,236504829598826, Right.Count = 26, Left.Count = 8
RotateLeft @value=0,431767346538495
LastValue = 0,453910283024381, Top.Value = 0,236504829598826, Right.Count = 27, Left.Count = 8
RotateLeft @value=0,830862050331599
LastValue = 0,868997387527021, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 8
RotateLeft @value=0,20709771951991
RotateRight @value=0,229543754006524
RotateLeft @value=0,203250563798123
LastValue = 0,218358597354199, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 9
RotateRight @value=0,0999000217299443
RotateRight @value=0,161392807104342
LastValue = 0,0642934488431986, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 10
RotateRight @value=0,154996359793002
RotateLeft @value=0,0999000217299443
LastValue = 0,148295871982489, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 11
LastValue = 0,217621828065078, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 12
RotateRight @value=0,599106418713511
LastValue = 0,553135806020878, Top.Value = 0,236504829598826, Right.Count = 29, Left.Count = 12
LastValue = 0,982277666210326, Top.Value = 0,236504829598826, Right.Count = 30, Left.Count = 12
RotateRight @value=0,8182332901369
LastValue = 0,803671114520948, Top.Value = 0,236504829598826, Right.Count = 31, Left.Count = 12
RotateRight @value=0,203250563798123
RotateRight @value=0,218358597354199
LastValue = 0,19310415405459, Top.Value = 0,236504829598826, Right.Count = 31, Left.Count = 13
LastValue = 0,0133136604043253, Top.Value = 0,236504829598826, Right.Count = 31, Left.Count = 14
RotateLeft @value=0,46832062046431
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateRight @value=0,75742144871383
LastValue = 0,483394719419719, Top.Value = 0,236504829598826, Right.Count = 32, Left.Count = 14
RotateLeft @value=0,431767346538495
RotateRight @value=0,453910283024381
LastValue = 0,453370328738061, Top.Value = 0,236504829598826, Right.Count = 33, Left.Count = 14
LastValue = 0,762330518459124, Top.Value = 0,236504829598826, Right.Count = 34, Left.Count = 14
LastValue = 0,699010426969738, Top.Value = 0,236504829598826, Right.Count = 35, Left.Count = 14
Run Code Online (Sandbox Code Playgroud)
轮换发生的次数不是很多,因为树是不平衡的,而是因为优先级是随机选择的.例如,我们在第13次插入时获得4次旋转.我们有一棵5/7平衡的树(很好),但是到了13/0!似乎随机优先事项的使用值得进一步调查.无论如何,很明显看到随机插入比有序插入引起更多的旋转.
存在自平衡树以修复与非随机分布的数据相关的问题.根据定义,它们会牺牲一些最佳案例性能来大大改善与非平衡BST相关的最坏情况性能,特别是排序输入.
你实际上是在过度思考这个问题,因为随机数据与有序数据的较慢插入是任何平衡树的特征.在AVL上试一试,你会看到相同的结果.
卡梅伦有正确的想法,删除优先检查以强制最坏的情况.如果你这样做并检测你的树,这样你就可以看到每个插入物发生了多少次重新平衡,它实际上变得非常明显正在发生什么.插入已排序的数据时,树始终向左旋转,而根的右子项始终为空.插入总是导致一个重新平衡,因为插入节点没有子节点并且不发生递归.另一方面,当您在随机数据上运行它时,几乎立即开始看到每个插入上发生多个重新平衡,在最小的情况下多达5或6个(50个插入),因为它发生在子树上好.
通过重新打开优先级检查,不仅重新平衡通常更便宜,因为更多节点被推入左子树(它们永远不会出现,因为那里没有插入),但它们也不太可能发生.为什么?因为在treap中,高优先级节点浮动到顶部,并且恒定的左旋转(不伴随右旋)开始将所有高优先级节点推送到左子树.结果是,由于概率分布不均匀,重新平衡的发生频率较低.
如果您使用重新平衡代码,您将看到这是真的; 对于排序和随机输入,你最终得到几乎相同数量的左旋转,但是随机输入也提供相同数量的右旋转,这使得总数的两倍.这应该不足为奇 - 高斯输入应该导致旋转的高斯分布.您还会看到排序输入的顶级重新平衡只有大约60-70%,这可能是令人惊讶的,这也是由于排序的输入混乱了优先级的自然分布.
您还可以通过检查插入循环结束时的完整树来验证这一点.随机输入,优先级往往会逐级线性降低; 对于排序的输入,优先级往往会保持很高,直到从底部到达一个或两个级别.
希望我做了一个体面的工作来解释这个...让我知道是否有任何一个太模糊.