nin*_*xel 184 .net c# linq outer-join full-outer-join
我列出了人员的姓名和名字,以及人员姓名和姓氏的清单.有些人没有名字,有些人没有姓氏; 我想在两个列表上进行完全外连接.
所以以下列表:
ID FirstName
-- ---------
1 John
2 Sue
ID LastName
-- --------
1 Doe
3 Smith
Run Code Online (Sandbox Code Playgroud)
应该产生:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
3 Smith
Run Code Online (Sandbox Code Playgroud)
我是LINQ的新手(如果我是跛脚的话,请原谅我)并找到了很多"LINQ Outer Joins"的解决方案,这些解决方案看起来非常相似,但实际上似乎是留下了外部联接.
到目前为止,我的尝试是这样的:
private void OuterJoinTest()
{
List<FirstName> firstNames = new List<FirstName>();
firstNames.Add(new FirstName { ID = 1, Name = "John" });
firstNames.Add(new FirstName { ID = 2, Name = "Sue" });
List<LastName> lastNames = new List<LastName>();
lastNames.Add(new LastName { ID = 1, Name = "Doe" });
lastNames.Add(new LastName { ID = 3, Name = "Smith" });
var outerJoin = from first in firstNames
join last in lastNames
on first.ID equals last.ID
into temp
from last in temp.DefaultIfEmpty()
select new
{
id = first != null ? first.ID : last.ID,
firstname = first != null ? first.Name : string.Empty,
surname = last != null ? last.Name : string.Empty
};
}
}
public class FirstName
{
public int ID;
public string Name;
}
public class LastName
{
public int ID;
public string Name;
}
Run Code Online (Sandbox Code Playgroud)
但这回归:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
seh*_*ehe 186
更新1:提供真正通用的扩展方法FullOuterJoin
更新2:可选地接受IEqualityComparer
密钥类型的自定义
更新3:此实现最近MoreLinq
已成为其中的一部分 - 谢谢大家!
编辑已添加FullOuterGroupJoin
(ideone).我重用了这个GetOuter<>
实现,使得它的性能降低了一些,但我的目标是"高级"代码,而不是现在的优势.
static void Main(string[] args)
{
var ax = new[] {
new { id = 1, name = "John" },
new { id = 2, name = "Sue" } };
var bx = new[] {
new { id = 1, surname = "Doe" },
new { id = 3, surname = "Smith" } };
ax.FullOuterJoin(bx, a => a.id, b => b.id, (a, b, id) => new {a, b})
.ToList().ForEach(Console.WriteLine);
}
Run Code Online (Sandbox Code Playgroud)
打印输出:
{ a = { id = 1, name = John }, b = { id = 1, surname = Doe } }
{ a = { id = 2, name = Sue }, b = }
{ a = , b = { id = 3, surname = Smith } }
Run Code Online (Sandbox Code Playgroud)
您还可以提供默认值:http://ideone.com/kG4kqO
ax.FullOuterJoin(
bx, a => a.id, b => b.id,
(a, b, id) => new { a.name, b.surname },
new { id = -1, name = "(no firstname)" },
new { id = -2, surname = "(no surname)" }
)
Run Code Online (Sandbox Code Playgroud)
印刷:
{ name = John, surname = Doe }
{ name = Sue, surname = (no surname) }
{ name = (no firstname), surname = Smith }
Run Code Online (Sandbox Code Playgroud)
加入是从关系数据库设计中借用的术语:
a
多次出现在元素b
与相应的按键(即:如果没有b
为空).数据库术语称之为inner (equi)join
.a
用于其中没有相应的元件中存在b
.(即:如果b
是空的,甚至是结果).这通常被称为left join
.a
以及b
如果没有相应的元件中的其他存在.(即使结果a
是空的)东西不通常在RDBMS看到的是一组加入[1] :
a
对应于多个b
,它基团与相应的键的记录.当您希望根据公共密钥枚举"已加入"记录时,这通常会更方便.另请参阅GroupJoin,其中也包含一些一般背景说明.
[1](我相信Oracle和MSSQL都有专有扩展)
这是一个通用的"插入式"扩展类
internal static class MyExtensions
{
internal static IEnumerable<TResult> FullOuterGroupJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> a,
IEnumerable<TB> b,
Func<TA, TKey> selectKeyA,
Func<TB, TKey> selectKeyB,
Func<IEnumerable<TA>, IEnumerable<TB>, TKey, TResult> projection,
IEqualityComparer<TKey> cmp = null)
{
cmp = cmp?? EqualityComparer<TKey>.Default;
var alookup = a.ToLookup(selectKeyA, cmp);
var blookup = b.ToLookup(selectKeyB, cmp);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
keys.UnionWith(blookup.Select(p => p.Key));
var join = from key in keys
let xa = alookup[key]
let xb = blookup[key]
select projection(xa, xb, key);
return join;
}
internal static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> a,
IEnumerable<TB> b,
Func<TA, TKey> selectKeyA,
Func<TB, TKey> selectKeyB,
Func<TA, TB, TKey, TResult> projection,
TA defaultA = default(TA),
TB defaultB = default(TB),
IEqualityComparer<TKey> cmp = null)
{
cmp = cmp?? EqualityComparer<TKey>.Default;
var alookup = a.ToLookup(selectKeyA, cmp);
var blookup = b.ToLookup(selectKeyB, cmp);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
keys.UnionWith(blookup.Select(p => p.Key));
var join = from key in keys
from xa in alookup[key].DefaultIfEmpty(defaultA)
from xb in blookup[key].DefaultIfEmpty(defaultB)
select projection(xa, xb, key);
return join;
}
}
Run Code Online (Sandbox Code Playgroud)
Jef*_*ado 114
我不知道这是否涵盖了所有情况,从逻辑上看似乎是正确的.我们的想法是采用左外连接和右外连接,然后取结果的并集.
var firstNames = new[]
{
new { ID = 1, Name = "John" },
new { ID = 2, Name = "Sue" },
};
var lastNames = new[]
{
new { ID = 1, Name = "Doe" },
new { ID = 3, Name = "Smith" },
};
var leftOuterJoin =
from first in firstNames
join last in lastNames on first.ID equals last.ID into temp
from last in temp.DefaultIfEmpty()
select new
{
first.ID,
FirstName = first.Name,
LastName = last?.Name,
};
var rightOuterJoin =
from last in lastNames
join first in firstNames on last.ID equals first.ID into temp
from first in temp.DefaultIfEmpty()
select new
{
last.ID,
FirstName = first?.Name,
LastName = last.Name,
};
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);
Run Code Online (Sandbox Code Playgroud)
这是写的,因为它在LINQ to Objects中.如果LINQ to SQL或其他,查询处理器可能不支持安全导航或其他操作.您必须使用条件运算符来有条件地获取值.
即
var leftOuterJoin =
from first in firstNames
join last in lastNames on first.ID equals last.ID into temp
from last in temp.DefaultIfEmpty()
select new
{
first.ID,
FirstName = first.Name,
LastName = last != null ? last.Name : default,
};
Run Code Online (Sandbox Code Playgroud)
Net*_*age 24
我认为大多数这些问题都存在问题,包括已接受的答案,因为它们不能很好地与Linq相比IQueryable,因为服务器往返次数过多,数据返回太多,或客户端执行过多.
对于IEnumerable我不喜欢Sehe的答案或类似因为它有过多的内存使用(一个简单的10000000双列表测试在我的32GB机器上运行Linqpad内存不足).
此外,其他大多数实际上并没有实现正确的Full Outer Join,因为他们使用具有Right Join的Union而不是带有Right Anti Semi Join的Concat,这不仅消除了结果中重复的内部连接行,而且最初在左数据或右数据中存在的任何适当的重复项.
所以这里有我的扩展来处理所有这些问题,生成SQL直接在Linq中实现连接,在服务器上执行,并且比Enumerables上的其他更快且内存更少:
public static class Ext {
public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {
return from left in leftItems
join right in rightItems on leftKeySelector(left) equals rightKeySelector(right) into temp
from right in temp.DefaultIfEmpty()
select resultSelector(left, right);
}
public static IEnumerable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {
return from right in rightItems
join left in leftItems on rightKeySelector(right) equals leftKeySelector(left) into temp
from left in temp.DefaultIfEmpty()
select resultSelector(left, right);
}
public static IEnumerable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {
return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
public static IEnumerable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {
var hashLK = new HashSet<TKey>(from l in leftItems select leftKeySelector(l));
return rightItems.Where(r => !hashLK.Contains(rightKeySelector(r))).Select(r => resultSelector(default(TLeft),r));
}
public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) where TLeft : class {
return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;
public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
var sampleAnonLR = new { left = default(TLeft), rightg = (IEnumerable<TRight>)null };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
var parmC = Expression.Parameter(typeof(TRight), "c");
var argLeft = Expression.PropertyOrField(parmP, "left");
var newleftrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, parmC), parmP, parmC), sampleAnonLR, default(TRight), default(TResult));
return leftItems.AsQueryable().GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
}
public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
var sampleAnonLR = new { leftg = (IEnumerable<TLeft>)null, right = default(TRight) };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
var parmC = Expression.Parameter(typeof(TLeft), "c");
var argRight = Expression.PropertyOrField(parmP, "right");
var newrightrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, parmC, argRight), parmP, parmC), sampleAnonLR, default(TLeft), default(TResult));
return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
}
public static IQueryable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
private static Expression<Func<TP, TResult>> CastSBody<TP, TResult>(LambdaExpression ex, TP unusedP, TResult unusedRes) => (Expression<Func<TP, TResult>>)ex;
public static object Default(this Type type) => type.GetTypeInfo().IsValueType ? Activator.CreateInstance(type) : null;
public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
var sampleAnonLgR = new { leftg = (IEnumerable<TLeft>)null, right = default(TRight) };
var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(), "lgr");
var argLeft = Expression.Constant(typeof(TLeft).Default(), typeof(TLeft));
var argRight = Expression.PropertyOrField(parmLgR, "right");
var newrightrs = CastSBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, argRight), parmLgR), sampleAnonLgR, default(TResult));
return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
}
public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
}
Run Code Online (Sandbox Code Playgroud)
Right Anti-Semi-Join之间的区别主要是Linq to Objects或源代码,但在最终答案中对服务器(SQL)方面产生了影响,删除了不必要的内容JOIN
.
使用LinqKit可以改进Expression
处理合并Expression<Func<>>
到lambda 的手动编码,但如果语言/编译器为此添加了一些帮助,那将会很好.该FullOuterJoinDistinct
和RightOuterJoin
功能出于完整性考虑,但我没有重新实现FullOuterGroupJoin
呢.
我为可以订购密钥的情况编写了另一个完整外连接版本IEnumerable
,这比将左外连接与右反半连接组合快约50%,至少在小集合上.它只排序一次后通过每个集合.
这是一个扩展方法:
public static IEnumerable<KeyValuePair<TLeft, TRight>> FullOuterJoin<TLeft, TRight>(this IEnumerable<TLeft> leftItems, Func<TLeft, object> leftIdSelector, IEnumerable<TRight> rightItems, Func<TRight, object> rightIdSelector)
{
var leftOuterJoin = from left in leftItems
join right in rightItems on leftIdSelector(left) equals rightIdSelector(right) into temp
from right in temp.DefaultIfEmpty()
select new { left, right };
var rightOuterJoin = from right in rightItems
join left in leftItems on rightIdSelector(right) equals leftIdSelector(left) into temp
from left in temp.DefaultIfEmpty()
select new { left, right };
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);
return fullOuterJoin.Select(x => new KeyValuePair<TLeft, TRight>(x.left, x.right));
}
Run Code Online (Sandbox Code Playgroud)
正如您所发现的,Linq没有"外连接"结构.您可以获得的最接近的是使用您所述查询的左外连接.为此,您可以添加在联接中未表示的姓氏列表的任何元素:
outerJoin = outerJoin.Concat(lastNames.Select(l=>new
{
id = l.ID,
firstname = String.Empty,
surname = l.Name
}).Where(l=>!outerJoin.Any(o=>o.id == l.id)));
Run Code Online (Sandbox Code Playgroud)
我猜想@sehe的方法更强大,但是直到我更好地理解它为止,我发现自己已经超越了@MichaelSander的扩展名。我对其进行了修改,以匹配此处描述的内置Enumerable.Join()方法的语法和返回类型。我在@JeffMercado解决方案下针对@ cadrell0的注释附加了“与众不同”的后缀。
public static class MyExtensions {
public static IEnumerable<TResult> FullJoinDistinct<TLeft, TRight, TKey, TResult> (
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector
) {
var leftJoin =
from left in leftItems
join right in rightItems
on leftKeySelector(left) equals rightKeySelector(right) into temp
from right in temp.DefaultIfEmpty()
select resultSelector(left, right);
var rightJoin =
from right in rightItems
join left in leftItems
on rightKeySelector(right) equals leftKeySelector(left) into temp
from left in temp.DefaultIfEmpty()
select resultSelector(left, right);
return leftJoin.Union(rightJoin);
}
}
Run Code Online (Sandbox Code Playgroud)
在示例中,您将像这样使用它:
var test =
firstNames
.FullJoinDistinct(
lastNames,
f=> f.ID,
j=> j.ID,
(f,j)=> new {
ID = f == null ? j.ID : f.ID,
leftName = f == null ? null : f.Name,
rightName = j == null ? null : j.Name
}
);
Run Code Online (Sandbox Code Playgroud)
将来,随着我学到更多,我觉得我会逐渐流行@sehe的逻辑。但是即使那样,我仍然必须小心,因为我认为,如果可行,至少要有一个与现有“ .Join()”方法的语法相匹配的重载很重要,原因有两个:
我仍然对泛型,扩展,Func语句和其他功能不熟悉,因此欢迎反馈。
编辑:没多久我就意识到我的代码有问题。我在LINQPad中执行.Dump()并查看返回类型。它只是IEnumerable,所以我尝试匹配它。但是当我实际上在扩展名上执行了.Where()或.Select()时,出现了一个错误:“'System Collections.IEnumerable'不包含'Select'和...的定义”。因此,最终我可以匹配.Join()的输入语法,但不能匹配返回行为。
编辑:将 “ TResult”添加到函数的返回类型。在阅读Microsoft文章时错过了这一点,这当然是有道理的。借助此修复程序,现在看来返回行为毕竟符合我的目标。
对于键在两个枚举中都是唯一的情况,我的干净解决方案是:
private static IEnumerable<TResult> FullOuterJoin<Ta, Tb, TKey, TResult>(
IEnumerable<Ta> a, IEnumerable<Tb> b,
Func<Ta, TKey> key_a, Func<Tb, TKey> key_b,
Func<Ta, Tb, TResult> selector)
{
var alookup = a.ToLookup(key_a);
var blookup = b.ToLookup(key_b);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key));
keys.UnionWith(blookup.Select(p => p.Key));
return keys.Select(key => selector(alookup[key].FirstOrDefault(), blookup[key].FirstOrDefault()));
}
Run Code Online (Sandbox Code Playgroud)
所以
var ax = new[] {
new { id = 1, first_name = "ali" },
new { id = 2, first_name = "mohammad" } };
var bx = new[] {
new { id = 1, last_name = "rezaei" },
new { id = 3, last_name = "kazemi" } };
var list = FullOuterJoin(ax, bx, a => a.id, b => b.id, (a, b) => "f: " + a?.first_name + " l: " + b?.last_name).ToArray();
Run Code Online (Sandbox Code Playgroud)
输出:
f: ali l: rezaei
f: mohammad l:
f: l: kazemi
Run Code Online (Sandbox Code Playgroud)