我在为我的工作写的一个程序中遇到了一堵砖墙.你不需要特别了解上下文,但长话短说,我有两个大约约650k记录的集合.
我们假设集合A是我认识的正确的集合,集合B是我知道的不正确的集合.
集合B包含一个复杂的对象,它具有与集合A中的元素相同类型的属性(换句话说,它看起来有点像这样):
// Where T : IComparable
IEnumerable<DateTime> A = ...; // Collection of T elements
IEnumerable<Complex> B = ...; // Collection of complex elements.
class Complex<DateTime>
{
public DateTime Time { get; set; }
.....
}
Run Code Online (Sandbox Code Playgroud)
我的问题是我基本上需要按顺序枚举A并查看A的当前元素是否存在于B中的Complex对象中; 如果它不存在,那么我需要创建一个Complex对象,它将封装该元素(以及其他内容).
当我意识到这两个列表长度为650,000个元素时,会出现问题.我无法减少数据集; 我必须使用这些650,000.现在我已经使用了ICollection.Contains(),我尝试了(天真)二进制搜索的实现,但它只需要太长时间.
你有什么建议吗?
编辑:如果有帮助,T实现IComparable.EDIT2:更多上下文:使用Linq To Objects从DataTable中检索IEnumerable.
IEnumerable<Complex> set = set.Tbl
.Where(dbObject => dbObject.TS.CompareTo(default(DateTime)) != 0)
.Select(Func<DataRow,Complex>) // Function that wraps the DataRow in a Complex object
// Just done to make debugging a little easier so we still have a large sample but small enough that it doesn't make me grow a beard
.Take(100000)
.AsEnumerable<Complex>();
Run Code Online (Sandbox Code Playgroud)
为了完整性,如果这个问题被存档并且其他任何人需要解决这个问题,我当前的实现看起来有点像这样
BDataSet bSet = new BDataSet();
B_LUTableAdapter adap = new B_LUTableAdapter();
adap.Fill(bSet.B_LU);
IEnumerable<Complex> w3 = bSet.B
.Where(dbObject => dbObject.TS.CompareTo(default(DateTime)) != 0)
// Function that just wraps datarow into a complex object
.Select(Func<DataRow, Complex>)
// Just for sake of debugging speed
.Take(100000)
.AsEnumerable<Complex>();
List<Complex> b = bSet.OrderBy(x => x.Time).ToList<Complex>();
// Get last & first timestamps
// Some of the timestamps in b are 01/01/1011 for some reason,
// So we do this check.
Complex start = b.Where(x => x.Time != default(DateTime)).First();
Complex end = b.Last();
List<DateTime> a = new List<DateTime>();
// RoundSeconds reduces seconds in a DateTime to 0.
DateTime current = RoundSeconds(new DateTime(start.Time.Ticks));
while (current.CompareTo(RoundSeconds(end.Time)) <= 0)
{
a.Add(current);
current = current.Add(TimeSpan.FromMinutes(1));
}
IEnumerable<DateTime> times = b.Select(x => x.Time);
var missing = a.Where(dt => times.Contains(dt));
foreach (var dt in missing)
{
adap.Insert(dt, 0, "", "", "", null, 0, 0);
// This has since been changed to List.Add()
}
Run Code Online (Sandbox Code Playgroud)
感谢Cosmin,此问题现已解决,完成的实现是:List expected = new List(); DateTime current = RoundSeconds(new DateTime(start.Time.Ticks));
while (current.CompareTo(RoundSeconds(end.Time)) <= 0)
{
expected.Add(current);
current = current.Add(TimeSpan.FromMinutes(1));
}
Console.WriteLine("Expecting {0} intervals.", expected.Count);
var missing = b.FindAllMissing(expected, x => x.Time);
if(!missing.Any()) return;
Console.WriteLine("{0} missing intervals.", missing.Count());
foreach (var dt in missing)
{
b.Add(new Complex() { /* some values */ });
//Console.WriteLine("\t> Inserted new record at {0}", dt);
}
//.....
public static IEnumerable<Basic> FindAllMissing<Basic, Complex>(this IEnumerable<Complex> complexList,
IEnumerable<Basic> basicList,
Func<Complex, Basic> selector)
{
HashSet<Basic> inComplexList = new HashSet<Basic>();
foreach (Complex c in complexList)
inComplexList.Add(selector(c));
List<Basic> missing = new List<Basic>();
foreach (Basic basic in basicList)
if (!(inComplexList.Contains(basic)))
missing.Add(basic);
return missing;
}
Run Code Online (Sandbox Code Playgroud)
一步步:
O(1)创建T第二个集合中已存在的可快速搜索的列表。我可以建议吗HashSet<T>T第一步中的所有 ' 放入集合中。O(1)现在已经有了O(n)解决方案。下面是一个将该算法实现为通用扩展方法的类,以使其更加 LINQ 友好。使其参数为IEnumerable<T>和 return IEnumerable<T>,对类型 (T和Complex) 没有做出任何假设。在我的测试中,我使用一个列表Tuple<int,int>作为复杂类型,使用一个简单列表int作为简单类型。控制台应用程序List<Tuple<int,int>>用 600000 个值填充 ,然后将 100000 个值放入 simple 中List<int>,使用枚举器来计算 ; 中未找到的所有简单值List<Tuple<int,int>>。它太快了,你没有机会看到它正在工作,当你点击F5它时,只会显示结果。
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
static class FixProblem
{
public static IEnumerable<T> FindAllThatNeedCreating<T, Complex>(this IEnumerable<Complex> list_of_complex, IEnumerable<T> list_of_T, Func<Complex, T> extract)
{
HashSet<T> T_in_list_of_complex = new HashSet<T>();
foreach (Complex c in list_of_complex)
T_in_list_of_complex.Add(extract(c));
List<T> answer = new List<T>();
foreach (T t in list_of_T)
if (!T_in_list_of_complex.Contains(t))
answer.Add(t);
return answer;
}
}
class Program
{
static void Main(string[] args)
{
// Test the code
List<Tuple<int, int>> complex = new List<Tuple<int, int>>();
List<int> simple = new List<int>();
// Fill in some random data
Random rnd = new Random();
for (int i = 1; i < 600000; i++)
complex.Add(new Tuple<int, int>(rnd.Next(), rnd.Next()));
for (int i = 1; i < 100000; i++)
simple.Add(rnd.Next());
// This is the magic line of code:
Console.WriteLine(complex.FindAllThatNeedCreating(simple, x => x.Item1).Count());
Console.ReadKey();
}
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1335 次 |
| 最近记录: |