从C#中的List <T>中选择N个随机元素

Question

从C#中的List <T>中选择N个随机元素

JC *_*bbs 145 c# random algorithm collections element

我需要一个快速算法从通用列表中选择5个随机元素.例如,我想从a获得5个随机元素List<string>.

Answer 1

使用linq:

YourList.OrderBy(x => rnd.Next()).Take(5)

Run Code Online (Sandbox Code Playgroud)

我认为order by是O(n log(n)),所以如果代码简单性是主要关注点(即使用小列表),我会选择这个解决方案. (4认同)
+1但是如果两个元素从rnd.Next()或类似物中得到相同的数字,那么第一个将被选中而第二个可能不会(如果不再需要更多的元素).但是,根据使用情况,它足够随机. (2认同)
但这不枚举整个列表吗？除非用“快速”来表示，否则OP表示“简单”，而不是“性能”。 (2认同)
这仅在OrderBy()仅为每个元素调用一次键选择器时才有效.如果它想要在两个元素之间进行比较时调用它,那么它每次都会得到一个不同的值,这将搞砸排序.[文档](https://msdn.microsoft.com/en-us/library/vstudio/bb534966%28v=vs.100%29.aspx)没有说明它做了什么. (2认同)
我对 LINQ 的分析和日志记录表明，它对每个元素仅对 OrderBy 表达式求值一次。如果它以任何其他方式这样做，昂贵的 OrderBy 表达式将破坏排序的性能。虽然合同没有承诺，但他们改变它是愚蠢的。 (2认同)
注意`YourList`是否有很多项目,但你只想选择一些.在这种情况下,它不是一种有效的方法. (2认同)

Answer 2

Kyl*_*nin 124

迭代通过并为每个元素使选择的概率=(需要的数量)/(数字左)

因此,如果您有40个项目,那么第一个项目将有5/40的机会被选中.如果是,则下一次有4/39的机会,否则它有5/39的机会.当你到达目的地时,你会得到5件物品,而且在此之前你通常会拥有所有物品.

我觉得这是微不足道的.看起来列表的后端将比前端更频繁地被选中,因为后端将看到更大的概率.例如,如果前三个号码没有被选中,则最后5个号码必须被选中.第一个数字只会看到5/40的机会,但是最后一个数字将比5/40倍更多地看到1/1.在实现此算法之前,您必须先将列表随机化. (29认同)
好吧,我在40个元素的列表上运行了这个算法1000万次,每个元素在被选中时都有5/40(.125)的镜头,然后多次运行该模拟.事实证明,这不是均匀分布的.元素16到22被取消选择(16 = .123,17 = .124),而元素34被取消(34 = .129).元素39和40也被选中,但没有那么多(39 = .1247,40 = .1246) (22认同)
@Ankur:我认为这不具有统计意义.我相信有一个归纳证明,这将提供均匀的分布. (20认同)
我重复了相同的试验1亿次,在我的试验中,选择最少的项目比最常选择的项目少了0.106%. (8认同)
@recursive:证明几乎是微不足道的.我们知道如何为任何K选择K中的K项以及如何从N中选择0项中的任何N.假设我们知道从N-1> = K中均匀地选择K或K-1项的方法.然后我们可以通过选择概率为K/N的第一项,然后使用已知方法从剩余的N-1中选择仍需要的K或K-1项来从N中选择K项. (5认同)
看来您应该能够在N时间内完成此操作，其中N是子集的大小。我们在M时间内完成，其中M是列表的长度。这表明存在更优雅的方法。 (2认同)
无法通过随机选择开始的地方来解决非常轻微的偏见.然后,您从(i + randOffset)%listsize中检索.就是这样,在100M次迭代时,它创建了一个均匀分布到+ - .01%或匹配数据的呈现方式.1249到.1251 (2认同)
@Daniel：如果您仍在对列表进行混排，请假定随机随机混排（因为为什么您会建议这样做？）为什么不简单地采用混排列表的前N个元素？ (2认同)
你能给出一个实现这个的例子吗？这听起来不错，但如何做到对我来说并不简单。 (2认同)

Answer 3

小智 33

public static List<T> GetRandomElements<T>(this IEnumerable<T> list, int elementsCount)
{
    return list.OrderBy(arg => Guid.NewGuid()).Take(elementsCount).ToList();
}

Run Code Online (Sandbox Code Playgroud)

使用 Guid.NewGuid() 并不能确保随机性，而只能确保唯一性。 (2认同)

Answer 4

Tyl*_*ler 26

这实际上是一个比它听起来更难的问题,主要是因为许多数学上正确的解决方案实际上无法实现所有可能性(更多内容见下文).

首先,这里有一些易于实现,正确的,如果你有一个真正的随机数生成器:

(0)凯尔的答案,即O(n).

(1)生成n对[(0,rand),(1,rand),(2,rand),...]的列表,按第二个坐标对它们进行排序,并使用第一个k(对于你,k) = 5)获取随机子集的索引.我认为这很容易实现,虽然它是O(n log n)时间.

(2)初始化一个空列表s = [],它将成为k个随机元素的索引.随机选择{0,1,2,...,n-1}中的数字r,r = rand%n,并将其添加到s.接下来取r = rand%(n-1)并坚持s; 在s中添加少于#元素的#元素以避免冲突.接下来取r = rand%(n-2),并做同样的事情,等等,直到你在s中有k个不同的元素.这具有最坏情况的运行时间O(k ^ 2).所以对于k << n,这可以更快.如果你保持排序并跟踪它有哪些连续的间隔,你可以在O(k log k)中实现它,但它更有效.

@Kyle - 你是对的,第二个想我同意你的回答.我一开始匆匆读了它,并错误地认为你指示按顺序选择每个固定概率为k/n的元素,这本来是错误的 - 但你的自适应方法对我来说是正确的.对于那个很抱歉.

好的,现在对于踢球者:渐近(对于固定的k,n在增长),有n ^ k/k!n个元素中k元素子集的选择[这是(n选择k)的近似].如果n很大,而k不是很小,那么这些数字就很大了.在任何标准32位随机数发生器中,您可以期望的最佳周期长度是2 ^ 32 = 256 ^ 4.因此,如果我们有1000个元素的列表,并且我们想要随机选择5,那么标准随机数生成器就无法实现所有可能性.但是,只要您选择适用于较小的集合并且始终"看起来"随机,那么这些算法应该没问题.

附录:写完之后,我意识到正确实现构思(2)很棘手,所以我想澄清这个答案.要获得O(k log k)时间,您需要一个支持O(log m)搜索和插入的类似数组的结构 - 平衡二叉树可以执行此操作.使用这样的结构来构建一个名为s的数组,这里有一些伪随机:

# Returns a container s with k distinct random numbers from {0, 1, ..., n-1}
def ChooseRandomSubset(n, k):
  for i in range(k):
    r = UniformRandom(0, n-i)                 # May be 0, must be < n-i
    q = s.FirstIndexSuchThat( s[q] - q > r )  # This is the search.
    s.InsertInOrder(q ? r + q : r + len(s))   # Inserts right before q.
  return s

Run Code Online (Sandbox Code Playgroud)

我建议通过一些示例案例来了解这是如何有效地实现上述英语解释的.

对于（1）您可以比排序更快地洗牌列表，对于（2）您将通过使用 % 来偏置您的分布 (3认同)

Answer 5

Fra*_*man 16

我认为所选答案是正确的,非常可爱.我实现它的方式不同,因为我也希望结果是随机顺序的.

    static IEnumerable<SomeType> PickSomeInRandomOrder<SomeType>(
        IEnumerable<SomeType> someTypes,
        int maxCount)
    {
        Random random = new Random(DateTime.Now.Millisecond);

        Dictionary<double, SomeType> randomSortTable = new Dictionary<double,SomeType>();

        foreach(SomeType someType in someTypes)
            randomSortTable[random.NextDouble()] = someType;

        return randomSortTable.OrderBy(KVP => KVP.Key).Take(maxCount).Select(KVP => KVP.Value);
    }

Run Code Online (Sandbox Code Playgroud)

好一年晚了但是......这不是@ ersin相当简短的答案,如果你得到一个重复的随机数,它不会失败(其中Ersin将偏向重复对的第一项) (2认同)
每次调用时的“Random random = new Random(DateTime.Now.Millisecond);”*绝对*是错误的。每次创建“Random”的新实例都会减少实际的随机性。使用它的“静态只读”实例，最好使用默认构造函数构造。 (2认同)

Answer 6

小智 11

12 年过去了，这个问题仍然存在，我没有找到我喜欢的凯尔解决方案的实现，所以这里是：

public IEnumerable<T> TakeRandom<T>(IEnumerable<T> collection, int take)
{
    var random = new Random();
    var available = collection.Count();
    var needed = take;
    foreach (var item in collection)
    {
        if (random.Next(available) < needed)
        {
            needed--;
            yield return item;
            if (needed == 0)
            {
                break;
            }
        }
        available--;
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 7

dha*_*kim 10

我刚遇到这个问题,而且更多的谷歌搜索让我想到了随机洗牌的问题:http://en.wikipedia.org/wiki/Fisher-Yates_shuffle

要完全随机地移动列表(就地),请执行以下操作:

要改组n个元素的数组(索引0..n-1):

  for i from n ? 1 downto 1 do
       j ? random integer with 0 ? j ? i
       exchange a[j] and a[i]

Run Code Online (Sandbox Code Playgroud)

如果你只需要前5个元素,那么你不需要从n-1到1运行i,而只需要将它运行到n-5(即:n-5)

让我们说你需要k项,

这变为:

  for (i = n ? 1; i >= n-k; i--)
  {
       j = random integer with 0 ? j ? i
       exchange a[j] and a[i]
  }

Run Code Online (Sandbox Code Playgroud)

选择的每个项目都交换到数组的末尾,因此选择的k个元素是数组的最后k个元素.

这需要时间O(k),其中k是您需要的随机选择元素的数量.

此外,如果您不想修改初始列表,可以在临时列表中记下所有交换,反转该列表,然后再次应用它们,从而执行相反的交换集并返回初始列表而不更改O(k)运行时间.

最后,对于真正的stickler,if(n == k),你应该停在1而不是nk,因为随机选择的整数总是0.

Answer 8

Mar*_*hdy 10

你可以使用它,但订购将在客户端进行

 .AsEnumerable().OrderBy(n => Guid.NewGuid()).Take(5);

Run Code Online (Sandbox Code Playgroud)

Answer 9

spo*_*son 8

从算法中的龙, C#中的解释:

int k = 10; // items to select
var items = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 });
var selected = new List<int>();
double needed = k;
double available = items.Count;
var rand = new Random();
while (selected.Count < k) {
   if( rand.NextDouble() < needed / available ) {
      selected.Add(items[(int)available-1])
      needed--;
   }
   available--;
}

Run Code Online (Sandbox Code Playgroud)

该算法将选择项目列表的唯一标记.

这个实现被打破了,因为使用`var`会导致`needed`和`available`都是整数,这使得`needed/available`总是为0. (2认同)

Answer 10

drz*_*aus 7

正在考虑@JohnShedletsky 关于(释义)接受的答案的评论:

你应该能够在O(subset.Length),而不是O(originalList.Length)

基本上,您应该能够生成subset随机索引,然后从原始列表中提取它们.

方法

public static class EnumerableExtensions {

    public static Random randomizer = new Random(); // you'd ideally be able to replace this with whatever makes you comfortable

    public static IEnumerable<T> GetRandom<T>(this IEnumerable<T> list, int numItems) {
        return (list as T[] ?? list.ToArray()).GetRandom(numItems);

        // because ReSharper whined about duplicate enumeration...
        /*
        items.Add(list.ElementAt(randomizer.Next(list.Count()))) ) numItems--;
        */
    }

    // just because the parentheses were getting confusing
    public static IEnumerable<T> GetRandom<T>(this T[] list, int numItems) {
        var items = new HashSet<T>(); // don't want to add the same item twice; otherwise use a list
        while (numItems > 0 )
            // if we successfully added it, move on
            if( items.Add(list[randomizer.Next(list.Length)]) ) numItems--;

        return items;
    }

    // and because it's really fun; note -- you may get repetition
    public static IEnumerable<T> PluckRandomly<T>(this IEnumerable<T> list) {
        while( true )
            yield return list.ElementAt(randomizer.Next(list.Count()));
    }

}

Run Code Online (Sandbox Code Playgroud)

如果你想成为更有效,你可能会使用HashSet的的指标,而不是实际列表中的元素(如果你有复杂类型或昂贵的比较);

单元测试

并确保我们没有任何碰撞等

[TestClass]
public class RandomizingTests : UnitTestBase {
    [TestMethod]
    public void GetRandomFromList() {
        this.testGetRandomFromList((list, num) => list.GetRandom(num));
    }

    [TestMethod]
    public void PluckRandomly() {
        this.testGetRandomFromList((list, num) => list.PluckRandomly().Take(num), requireDistinct:false);
    }

    private void testGetRandomFromList(Func<IEnumerable<int>, int, IEnumerable<int>> methodToGetRandomItems, int numToTake = 10, int repetitions = 100000, bool requireDistinct = true) {
        var items = Enumerable.Range(0, 100);
        IEnumerable<int> randomItems = null;

        while( repetitions-- > 0 ) {
            randomItems = methodToGetRandomItems(items, numToTake);
            Assert.AreEqual(numToTake, randomItems.Count(),
                            "Did not get expected number of items {0}; failed at {1} repetition--", numToTake, repetitions);
            if(requireDistinct) Assert.AreEqual(numToTake, randomItems.Distinct().Count(),
                            "Collisions (non-unique values) found, failed at {0} repetition--", repetitions);
            Assert.IsTrue(randomItems.All(o => items.Contains(o)),
                        "Some unknown values found; failed at {0} repetition--", repetitions);
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

好主意,有问题.(1)如果您的较大列表很大(例如,从数据库中读取),那么您将实现整个列表,这可能会超出内存.(2)如果K接近于N,那么你将在循环中搜索一个无人认领的索引,导致代码需要不可预测的时间.这些问题是可以解决的. (2认同)

Answer 11

har*_*ing 6

Extending from @ers's answer, if one is worried about possible different implementations of OrderBy, this should be safe:

// Instead of this
YourList.OrderBy(x => rnd.Next()).Take(5)

// Temporarily transform 
YourList
    .Select(v => new {v, i = rnd.Next()}) // Associate a random index to each entry
    .OrderBy(x => x.i).Take(5) // Sort by (at this point fixed) random index 
    .Select(x => x.v); // Go back to enumerable of entry

Run Code Online (Sandbox Code Playgroud)

Answer 12

naw*_*fal 5

从组中选择N个随机项不应该与订单有任何关系!随机性是关于不可预测性的,而不是关于组中的洗牌位置.处理某种有序排序的所有答案都必然效率低于不具备这种顺序的答案.由于效率是关键,我会发布一些不会过多改变项目顺序的东西.

1)如果您需要真正的随机值,这意味着对可供选择的元素没有限制(即,一旦选择的项目可以重新选择):

public static List<T> GetTrueRandom<T>(this IList<T> source, int count, 
                                       bool throwArgumentOutOfRangeException = true)
{
    if (throwArgumentOutOfRangeException && count > source.Count)
        throw new ArgumentOutOfRangeException();

    var randoms = new List<T>(count);
    randoms.AddRandomly(source, count);
    return randoms;
}

Run Code Online (Sandbox Code Playgroud)

如果关闭异常标志,则可以多次选择随机项.

如果您有{1,2,3,4},那么它可以为3个项目提供{1,4,4},{1,4,3}等,甚至{1,4,3,2,4} 5项!

这应该很快,因为它没有什么可检查的.

2)如果你需要小组中的个别成员没有重复,那么我会依赖一本字典(正如许多人已经指出的那样).

public static List<T> GetDistinctRandom<T>(this IList<T> source, int count)
{
    if (count > source.Count)
        throw new ArgumentOutOfRangeException();

    if (count == source.Count)
        return new List<T>(source);

    var sourceDict = source.ToIndexedDictionary();

    if (count > source.Count / 2)
    {
        while (sourceDict.Count > count)
            sourceDict.Remove(source.GetRandomIndex());

        return sourceDict.Select(kvp => kvp.Value).ToList();
    }

    var randomDict = new Dictionary<int, T>(count);
    while (randomDict.Count < count)
    {
        int key = source.GetRandomIndex();
        if (!randomDict.ContainsKey(key))
            randomDict.Add(key, sourceDict[key]);
    }

    return randomDict.Select(kvp => kvp.Value).ToList();
}

Run Code Online (Sandbox Code Playgroud)

代码比其他字典方法有点长,因为我不仅添加,而且还从列表中删除,所以它有点两个循环.你可以在这里看到,当我变得相等时,我没有重新订购任何东西.那是因为我认为随机性应该在返回的集合中作为一个整体.我的意思是,如果你想5名从随机的项目,如果它不应该的问题还是,但如果你需要4个从同一组项目,那么它应该产生不可预测的,,等.其次,当随机的项目数是返回的是原始组的一半以上,然后更容易删除countsource.Count1, 2, 3, 4, 51, 3, 4, 2, 51, 2, 3, 4, 51, 2, 3, 41, 3, 5, 22, 3, 5, 4source.Count - count组中的count项目比添加项目.出于性能原因,我使用source而不是sourceDict在remove方法中获取随机索引.

因此,如果您有{1,2,3,4},则最终可能会在{1,2,3},{3,4,1}等3个项目中结束.

3)如果您需要通过考虑原始组中的重复项来从您的组中获得真正不同的随机值,那么您可以使用与上面相同的方法,但是a HashSet将比字典轻.

public static List<T> GetTrueDistinctRandom<T>(this IList<T> source, int count, 
                                               bool throwArgumentOutOfRangeException = true)
{
    if (count > source.Count)
        throw new ArgumentOutOfRangeException();

    var set = new HashSet<T>(source);

    if (throwArgumentOutOfRangeException && count > set.Count)
        throw new ArgumentOutOfRangeException();

    List<T> list = hash.ToList();

    if (count >= set.Count)
        return list;

    if (count > set.Count / 2)
    {
        while (set.Count > count)
            set.Remove(list.GetRandom());

        return set.ToList();
    }

    var randoms = new HashSet<T>();
    randoms.AddRandomly(list, count);
    return randoms.ToList();
}

Run Code Online (Sandbox Code Playgroud)

该randoms变量是HashSet为了避免在最罕见的情况下添加重复项,其中Random.Next可以产生相同的值,尤其是当输入列表很小时.

所以{1,2,2,4} => 3个随机项=> {1,2,4}并且从不{1,2,2}

{1,2,2,4} => 4个随机项=>异常!! 或{1,2,4}取决于标志集.

我使用的一些扩展方法:

static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
    return rnd.Next(source.Count);
}

public static T GetRandom<T>(this IList<T> source)
{
    return source[source.GetRandomIndex()];
}

static void AddRandomly<T>(this ICollection<T> toCol, IList<T> fromList, int count)
{
    while (toCol.Count < count)
        toCol.Add(fromList.GetRandom());
}

public static Dictionary<int, T> ToIndexedDictionary<T>(this IEnumerable<T> lst)
{
    return lst.ToIndexedDictionary(t => t);
}

public static Dictionary<int, T> ToIndexedDictionary<S, T>(this IEnumerable<S> lst, 
                                                           Func<S, T> valueSelector)
{
    int index = -1;
    return lst.ToDictionary(t => ++index, valueSelector);
}

Run Code Online (Sandbox Code Playgroud)

如果它与几十列表中的项目1000个所有关于性能已经被重复10000次,那么你可能需要有更快的随机类比System.Random,但我不认为这是什么大不了的考虑,后者极有可能是从来没有一个瓶颈,它足够快..

编辑:如果你需要重新安排退回物品的顺序,那么没有任何东西可以击败达金的Fisher-Yates方法 - 简短,甜美和简单..

Answer 13

Pau*_*och 5

我结合以上几个答案来创建一个Lazily评估的扩展方法。我的测试表明，凯尔（Order（N））的方法比drzaus使用集合提议随机索引以选择（Order（K））慢许多倍。前者对随机数生成器执行更多调用，并对这些项进行更多次迭代。

我实施的目标是：

1）如果给定的不是IList的IEnumerable，则不要实现完整列表。如果给我一系列不计其数的项目，则我不想耗尽内存。使用Kyle的方法获得在线解决方案。

2）如果我可以确定它是一个IList，请使用drzaus的方法。如果K大于N的一半，我会冒着重击的风险，因为我一次又一次选择许多随机索引而不得不跳过它们。因此，我编写了一个索引列表以不保存。

3）我保证将按照遇到的相同顺序退回这些物品。凯尔算法无需任何改动。drzaus的算法要求我不要按照选择随机索引的顺序发出项目。我将所有索引收集到SortedSet中，然后按已排序的索引顺序发出项目。

4）如果K比N大，并且我颠倒了集合的含义，那么我将枚举所有项目并测试索引是否不在集合中。这意味着我损失了Order（K）运行时间，但是由于在这种情况下K接近N，因此我的损失不大。

这是代码：

    /// <summary>
    /// Takes k elements from the next n elements at random, preserving their order.
    /// 
    /// If there are fewer than n elements in items, this may return fewer than k elements.
    /// </summary>
    /// <typeparam name="TElem">Type of element in the items collection.</typeparam>
    /// <param name="items">Items to be randomly selected.</param>
    /// <param name="k">Number of items to pick.</param>
    /// <param name="n">Total number of items to choose from.
    /// If the items collection contains more than this number, the extra members will be skipped.
    /// If the items collection contains fewer than this number, it is possible that fewer than k items will be returned.</param>
    /// <returns>Enumerable over the retained items.
    /// 
    /// See http://stackoverflow.com/questions/48087/select-a-random-n-elements-from-listt-in-c-sharp for the commentary.
    /// </returns>
    public static IEnumerable<TElem> TakeRandom<TElem>(this IEnumerable<TElem> items, int k, int n)
    {
        var r = new FastRandom();
        var itemsList = items as IList<TElem>;

        if (k >= n || (itemsList != null && k >= itemsList.Count))
            foreach (var item in items) yield return item;
        else
        {  
            // If we have a list, we can infer more information and choose a better algorithm.
            // When using an IList, this is about 7 times faster (on one benchmark)!
            if (itemsList != null && k < n/2)
            {
                // Since we have a List, we can use an algorithm suitable for Lists.
                // If there are fewer than n elements, reduce n.
                n = Math.Min(n, itemsList.Count);

                // This algorithm picks K index-values randomly and directly chooses those items to be selected.
                // If k is more than half of n, then we will spend a fair amount of time thrashing, picking
                // indices that we have already picked and having to try again.   
                var invertSet = k >= n/2;  
                var positions = invertSet ? (ISet<int>) new HashSet<int>() : (ISet<int>) new SortedSet<int>();

                var numbersNeeded = invertSet ? n - k : k;
                while (numbersNeeded > 0)
                    if (positions.Add(r.Next(0, n))) numbersNeeded--;

                if (invertSet)
                {
                    // positions contains all the indices of elements to Skip.
                    for (var itemIndex = 0; itemIndex < n; itemIndex++)
                    {
                        if (!positions.Contains(itemIndex))
                            yield return itemsList[itemIndex];
                    }
                }
                else
                {
                    // positions contains all the indices of elements to Take.
                    foreach (var itemIndex in positions)
                        yield return itemsList[itemIndex];              
                }
            }
            else
            {
                // Since we do not have a list, we will use an online algorithm.
                // This permits is to skip the rest as soon as we have enough items.
                var found = 0;
                var scanned = 0;
                foreach (var item in items)
                {
                    var rand = r.Next(0,n-scanned);
                    if (rand < k - found)
                    {
                        yield return item;
                        found++;
                    }
                    scanned++;
                    if (found >= k || scanned >= n)
                        break;
                }
            }
        }  
    }

Run Code Online (Sandbox Code Playgroud)

我使用了专门的随机数生成器，但是您可以根据需要使用C＃的Random。（FastRandom由Colin Green编写，是SharpNEAT的一部分。周期为2 ^ 128-1，比许多RNG都要好。）

以下是单元测试：

[TestClass]
public class TakeRandomTests
{
    /// <summary>
    /// Ensure that when randomly choosing items from an array, all items are chosen with roughly equal probability.
    /// </summary>
    [TestMethod]
    public void TakeRandom_Array_Uniformity()
    {
        const int numTrials = 2000000;
        const int expectedCount = numTrials/20;
        var timesChosen = new int[100];
        var century = new int[100];
        for (var i = 0; i < century.Length; i++)
            century[i] = i;

        for (var trial = 0; trial < numTrials; trial++)
        {
            foreach (var i in century.TakeRandom(5, 100))
                timesChosen[i]++;
        }
        var avg = timesChosen.Average();
        var max = timesChosen.Max();
        var min = timesChosen.Min();
        var allowedDifference = expectedCount/100;
        AssertBetween(avg, expectedCount - 2, expectedCount + 2, "Average");
        //AssertBetween(min, expectedCount - allowedDifference, expectedCount, "Min");
        //AssertBetween(max, expectedCount, expectedCount + allowedDifference, "Max");

        var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference);
        Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange));
    }

    /// <summary>
    /// Ensure that when randomly choosing items from an IEnumerable that is not an IList, 
    /// all items are chosen with roughly equal probability.
    /// </summary>
    [TestMethod]
    public void TakeRandom_IEnumerable_Uniformity()
    {
        const int numTrials = 2000000;
        const int expectedCount = numTrials / 20;
        var timesChosen = new int[100];

        for (var trial = 0; trial < numTrials; trial++)
        {
            foreach (var i in Range(0,100).TakeRandom(5, 100))
                timesChosen[i]++;
        }
        var avg = timesChosen.Average();
        var max = timesChosen.Max();
        var min = timesChosen.Min();
        var allowedDifference = expectedCount / 100;
        var countInRange =
            timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference);
        Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange));
    }

    private IEnumerable<int> Range(int low, int count)
    {
        for (var i = low; i < low + count; i++)
            yield return i;
    }

    private static void AssertBetween(int x, int low, int high, String message)
    {
        Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message));
        Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message));
    }

    private static void AssertBetween(double x, double low, double high, String message)
    {
        Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message));
        Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message));
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 14

Jes*_*pez 5

这里有一个基于Fisher-Yates Shuffle的实现，其算法复杂度为 O(n)，其中 n 是子集或样本大小，而不是列表大小，正如 John Shedletsky 指出的那样。

public static IEnumerable<T> GetRandomSample<T>(this IList<T> list, int sampleSize)
{
    if (list == null) throw new ArgumentNullException("list");
    if (sampleSize > list.Count) throw new ArgumentException("sampleSize may not be greater than list count", "sampleSize");
    var indices = new Dictionary<int, int>(); int index;
    var rnd = new Random();

    for (int i = 0; i < sampleSize; i++)
    {
        int j = rnd.Next(i, list.Count);
        if (!indices.TryGetValue(j, out index)) index = j;

        yield return list[index];

        if (!indices.TryGetValue(i, out index)) index = i;
        indices[j] = index;
    }
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	17 年前
查看次数：	108789 次
最近记录：	6 年，6 月前