Arraylist vs List vs Dictionary

FSm*_*FSm 0 c# c#-4.0

谁能救我?我有以下代码:

private List<string> GenerateTerms(string[] docs)
{
    List <string> uniques = new List<string>();

    for (int i = 0; i < docs.Length; i++)
    {
        string[] tokens = docs[i].Split(' ');

        List<string> toktolist = new List<string>(tokens.ToList());

        var query = toktolist.GroupBy(word => word)
             .OrderByDescending(g => g.Count())
             .Select(g => g.Key)
             .Take(20000);              

        foreach (string k in query)
        {
            if (!uniques.Contains(k)) 
                uniques.Add(k);
        }
    }            

    return uniques;            
}
Run Code Online (Sandbox Code Playgroud)

它是基于最高频率从多个文档生成术语.我使用字典做了相同的程序.在这两种情况下花费了440毫秒.但令人惊讶的是,当我使用数组列表的过程时,如下面的代码

private ArrayList GenerateTerms(string[] docs)
{
    Dictionary<string, int> yy = new Dictionary<string, int>();
    ArrayList uniques = new ArrayList();

    for (int i = 0; i < docs.Length; i++)
    {
        string[] tokens = docs[i].Split(' ');
        yy.Clear();
        for (int j = 0; j < tokens.Length; j++)
            {
                if (!yy.ContainsKey(tokens[j].ToString()))
                    yy.Add(tokens[j].ToString(), 1);
                else
                    yy[tokens[j].ToString()]++;
            }

            var sortedDict = (from entry in yy
                              orderby entry.Value descending
                              select entry).Take(20000).ToDictionary
                          (pair => pair.Key, pair => pair.Value);               

            foreach (string k in sortedDict.Keys)
            {                    
                if (!uniques.Contains(k)) 
                uniques.Add(k);
            }
        }            

        return uniques;            
    }  
Run Code Online (Sandbox Code Playgroud)

它花了350毫秒.不应该列表列表比列表和字典慢?请用这个时态救救我.

Mar*_*ers 5

您的代码执行了大量不必要的工作,并使用低效的数据结构.

试试这个:

private List<string> GenerateTerms(string[] docs)
{
     var result = docs
         .SelectMany(doc => doc.Split(' ')
                               .GroupBy(word => word)
                               .OrderByDescending(g => g.Count())
                               .Select(g => g.Key)
                               .Take(20000))
         .Distinct()
         .ToList();   
     return result;
}
Run Code Online (Sandbox Code Playgroud)

重构版本使其更易于阅读:

private List<string> GenerateTerms(string[] docs)
{
    return docs.SelectMany(doc => ProcessDocument(doc)).Distinct().ToList();
}

private IEnumerable<string> ProcessDocument(string doc)
{
    return doc.Split(' ')
              .GroupBy(word => word)
              .OrderByDescending(g => g.Count())
              .Select(g => g.Key)
              .Take(10000);
}
Run Code Online (Sandbox Code Playgroud)

  • @Qaesar:不,你不对.只是停止说话并尝试代码.然后再发表评论. (2认同)