LINQ - 全面加入

nin*_*xel 184 .net c# linq outer-join full-outer-join

我列出了人员的姓名和名字,以及人员姓名和姓氏的清单.有些人没有名字,有些人没有姓氏; 我想在两个列表上进行完全外连接.

所以以下列表:

ID  FirstName
--  ---------
 1  John
 2  Sue

ID  LastName
--  --------
 1  Doe
 3  Smith
Run Code Online (Sandbox Code Playgroud)

应该产生:

ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue
 3             Smith
Run Code Online (Sandbox Code Playgroud)

我是LINQ的新手(如果我是跛脚的话,请原谅我)并找到了很多"LINQ Outer Joins"的解决方案,这些解决方案看起来非常相似,但实际上似乎是留下了外部联接.

到目前为止,我的尝试是这样的:

private void OuterJoinTest()
{
    List<FirstName> firstNames = new List<FirstName>();
    firstNames.Add(new FirstName { ID = 1, Name = "John" });
    firstNames.Add(new FirstName { ID = 2, Name = "Sue" });

    List<LastName> lastNames = new List<LastName>();
    lastNames.Add(new LastName { ID = 1, Name = "Doe" });
    lastNames.Add(new LastName { ID = 3, Name = "Smith" });

    var outerJoin = from first in firstNames
        join last in lastNames
        on first.ID equals last.ID
        into temp
        from last in temp.DefaultIfEmpty()
        select new
        {
            id = first != null ? first.ID : last.ID,
            firstname = first != null ? first.Name : string.Empty,
            surname = last != null ? last.Name : string.Empty
        };
    }
}

public class FirstName
{
    public int ID;

    public string Name;
}

public class LastName
{
    public int ID;

    public string Name;
}
Run Code Online (Sandbox Code Playgroud)

但这回归:

ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue
Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么?

seh*_*ehe 186

更新1:提供真正通用的扩展方法FullOuterJoin
更新2:可选地接受IEqualityComparer密钥类型的自定义
更新3:此实现最近MoreLinq成为其中的一部分 - 谢谢大家!

编辑已添加FullOuterGroupJoin(ideone).我重用了这个GetOuter<>实现,使得它的性能降低了一些,但我的目标是"高级"代码,而不是现在的优势.

请访问http://ideone.com/O36nWc直播

static void Main(string[] args)
{
    var ax = new[] { 
        new { id = 1, name = "John" },
        new { id = 2, name = "Sue" } };
    var bx = new[] { 
        new { id = 1, surname = "Doe" },
        new { id = 3, surname = "Smith" } };

    ax.FullOuterJoin(bx, a => a.id, b => b.id, (a, b, id) => new {a, b})
        .ToList().ForEach(Console.WriteLine);
}
Run Code Online (Sandbox Code Playgroud)

打印输出:

{ a = { id = 1, name = John }, b = { id = 1, surname = Doe } }
{ a = { id = 2, name = Sue }, b =  }
{ a = , b = { id = 3, surname = Smith } }
Run Code Online (Sandbox Code Playgroud)

您还可以提供默认值:http://ideone.com/kG4kqO

    ax.FullOuterJoin(
            bx, a => a.id, b => b.id, 
            (a, b, id) => new { a.name, b.surname },
            new { id = -1, name    = "(no firstname)" },
            new { id = -2, surname = "(no surname)" }
        )
Run Code Online (Sandbox Code Playgroud)

印刷:

{ name = John, surname = Doe }
{ name = Sue, surname = (no surname) }
{ name = (no firstname), surname = Smith }
Run Code Online (Sandbox Code Playgroud)

使用的术语解释:

加入是从关系数据库设计中借用的术语:

  • 一个加盟将重复从元素a多次出现在元素b 与相应的按键(即:如果没有b为空).数据库术语称之为inner (equi)join.
  • 一个外连接包括从元件a用于其中没有相应的元件中存在b.(即:如果b是空的,甚至是结果).这通常被称为left join.
  • 完全外部连接包括从记录a 以及b如果没有相应的元件中的其他存在.(即使结果a是空的)

东西不通常在RDBMS看到的是一组加入[1] :

  • 组加入,确实相同如上所述,代替重复从元件a对应于多个b,它基团与相应的键的记录.当您希望根据公共密钥枚举"已加入"记录时,这通常会更方便.

另请参阅GroupJoin,其中也包含一些一般背景说明.


[1](我相信Oracle和MSSQL都有专有扩展)

完整代码

这是一个通用的"插入式"扩展类

internal static class MyExtensions
{
    internal static IEnumerable<TResult> FullOuterGroupJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA, 
        Func<TB, TKey> selectKeyB,
        Func<IEnumerable<TA>, IEnumerable<TB>, TKey, TResult> projection,
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   let xa = alookup[key]
                   let xb = blookup[key]
                   select projection(xa, xb, key);

        return join;
    }

    internal static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA, 
        Func<TB, TKey> selectKeyB,
        Func<TA, TB, TKey, TResult> projection,
        TA defaultA = default(TA), 
        TB defaultB = default(TB),
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   from xa in alookup[key].DefaultIfEmpty(defaultA)
                   from xb in blookup[key].DefaultIfEmpty(defaultB)
                   select projection(xa, xb, key);

        return join;
    }
}
Run Code Online (Sandbox Code Playgroud)

  • 您可以使用[查找](http://msdn.microsoft.com/en-us/library/bb460184.aspx),而不是使用字典,其中包含您的帮助程序扩展方法中表达的功能.例如,您可以将`a.GroupBy(selectKeyA).ToDictionary();`写为`a.ToLookup(selectKeyA)`和`adict.OuterGet(key)`作为`alookup [key]`.获取密钥的集合有点棘手,但是:`alookup.Select(x => x.Keys)`. (4认同)

Jef*_*ado 114

我不知道这是否涵盖了所有情况,从逻辑上看似乎是正确的.我们的想法是采用左外连接和右外连接,然后取结果的并集.

var firstNames = new[]
{
    new { ID = 1, Name = "John" },
    new { ID = 2, Name = "Sue" },
};
var lastNames = new[]
{
    new { ID = 1, Name = "Doe" },
    new { ID = 3, Name = "Smith" },
};
var leftOuterJoin =
    from first in firstNames
    join last in lastNames on first.ID equals last.ID into temp
    from last in temp.DefaultIfEmpty()
    select new
    {
        first.ID,
        FirstName = first.Name,
        LastName = last?.Name,
    };
var rightOuterJoin =
    from last in lastNames
    join first in firstNames on last.ID equals first.ID into temp
    from first in temp.DefaultIfEmpty()
    select new
    {
        last.ID,
        FirstName = first?.Name,
        LastName = last.Name,
    };
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);
Run Code Online (Sandbox Code Playgroud)

这是写的,因为它在LINQ to Objects中.如果LINQ to SQL或其他,查询处理器可能不支持安全导航或其他操作.您必须使用条件运算符来有条件地获取值.

var leftOuterJoin =
    from first in firstNames
    join last in lastNames on first.ID equals last.ID into temp
    from last in temp.DefaultIfEmpty()
    select new
    {
        first.ID,
        FirstName = first.Name,
        LastName = last != null ? last.Name : default,
    };
Run Code Online (Sandbox Code Playgroud)

  • 如果一个人有名字和姓氏,将发生@ cadre110重复,因此union是一个有效的选择. (3认同)
  • 联盟将消除重复.如果您不期望重复,或者可以编写第二个查询以排除第一个查询中包含的内容,请改用Concat.这是UNION和UNION ALL之间的SQL差异 (2认同)

Net*_*age 24

我认为大多数这些问题都存在问题,包括已接受的答案,因为它们不能很好地与Linq相比IQueryable,因为服务器往返次数过多,数据返回太多,或客户端执行过多.

对于IEnumerable我不喜欢Sehe的答案或类似因为它有过多的内存使用(一个简单的10000000双列表测试在我的32GB机器上运行Linqpad内存不足).

此外,其他大多数实际上并没有实现正确的Full Outer Join,因为他们使用具有Right Join的Union而不是带有Right Anti Semi Join的Concat,这不仅消除了结果中重复的内部连接行,而且最初在左数据或右数据中存在的任何适当的重复项.

所以这里有我的扩展来处理所有这些问题,生成SQL直接在Linq中实现连接,在服务器上执行,并且比Enumerables上的其他更快且内存更少:

public static class Ext {
    public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        return from left in leftItems
               join right in rightItems on leftKeySelector(left) equals rightKeySelector(right) into temp
               from right in temp.DefaultIfEmpty()
               select resultSelector(left, right);
    }

    public static IEnumerable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        return from right in rightItems
               join left in leftItems on rightKeySelector(right) equals leftKeySelector(left) into temp
               from left in temp.DefaultIfEmpty()
               select resultSelector(left, right);
    }

    public static IEnumerable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    public static IEnumerable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        var hashLK = new HashSet<TKey>(from l in leftItems select leftKeySelector(l));
        return rightItems.Where(r => !hashLK.Contains(rightKeySelector(r))).Select(r => resultSelector(default(TLeft),r));
    }

    public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector)  where TLeft : class {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;

    public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        var sampleAnonLR = new { left = default(TLeft), rightg = (IEnumerable<TRight>)null };
        var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
        var parmC = Expression.Parameter(typeof(TRight), "c");
        var argLeft = Expression.PropertyOrField(parmP, "left");
        var newleftrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, parmC), parmP, parmC), sampleAnonLR, default(TRight), default(TResult));

        return leftItems.AsQueryable().GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
    }

    public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        var sampleAnonLR = new { leftg = (IEnumerable<TLeft>)null, right = default(TRight) };
        var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
        var parmC = Expression.Parameter(typeof(TLeft), "c");
        var argRight = Expression.PropertyOrField(parmP, "right");
        var newrightrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, parmC, argRight), parmP, parmC), sampleAnonLR, default(TLeft), default(TResult));

        return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
    }

    public static IQueryable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    private static Expression<Func<TP, TResult>> CastSBody<TP, TResult>(LambdaExpression ex, TP unusedP, TResult unusedRes) => (Expression<Func<TP, TResult>>)ex;

public static object Default(this Type type) => type.GetTypeInfo().IsValueType ? Activator.CreateInstance(type) : null;

    public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        var sampleAnonLgR = new { leftg = (IEnumerable<TLeft>)null, right = default(TRight) };
        var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(), "lgr");
        var argLeft = Expression.Constant(typeof(TLeft).Default(), typeof(TLeft));
        var argRight = Expression.PropertyOrField(parmLgR, "right");
        var newrightrs = CastSBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, argRight), parmLgR), sampleAnonLgR, default(TResult));

        return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
    }

    public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }
}
Run Code Online (Sandbox Code Playgroud)

Right Anti-Semi-Join之间的区别主要是Linq to Objects或源代码,但在最终答案中对服务器(SQL)方面产生了影响,删除了不必要的内容JOIN.

使用LinqKit可以改进Expression处理合并Expression<Func<>>到lambda 的手动编码,但如果语言/编译器为此添加了一些帮助,那将会很好.该FullOuterJoinDistinctRightOuterJoin功能出于完整性考虑,但我没有重新实现FullOuterGroupJoin呢.

我为可以订购密钥的情况编写了另一个完整外连接版本IEnumerable,这比将左外连接与右反半连接组合快约50%,至少在小集合上.它只排序一次后通过每个集合.

  • LINQ to Entities中不支持LINQ表达式节点类型'Invoke'.这段代码有什么限制吗?我想在IQueryables上执行完全加入 (3认同)

Mic*_*der 7

这是一个扩展方法:

public static IEnumerable<KeyValuePair<TLeft, TRight>> FullOuterJoin<TLeft, TRight>(this IEnumerable<TLeft> leftItems, Func<TLeft, object> leftIdSelector, IEnumerable<TRight> rightItems, Func<TRight, object> rightIdSelector)
{
    var leftOuterJoin = from left in leftItems
        join right in rightItems on leftIdSelector(left) equals rightIdSelector(right) into temp
        from right in temp.DefaultIfEmpty()
        select new { left, right };

    var rightOuterJoin = from right in rightItems
        join left in leftItems on rightIdSelector(right) equals leftIdSelector(left) into temp
        from left in temp.DefaultIfEmpty()
        select new { left, right };

    var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);

    return fullOuterJoin.Select(x => new KeyValuePair<TLeft, TRight>(x.left, x.right));
}
Run Code Online (Sandbox Code Playgroud)

  • +1.R⟗S=(R⟕S)∪(R⟖S),这意味着一个完整的外连接=左外连接联合所有外连接!我很欣赏这种方法的简单性. (3认同)

Kei*_*thS 6

正如您所发现的,Linq没有"外连接"结构.您可以获得的最接近的是使用您所述查询的左外连接.为此,您可以添加在联接中未表示的姓氏列表的任何元素:

outerJoin = outerJoin.Concat(lastNames.Select(l=>new
                            {
                                id = l.ID,
                                firstname = String.Empty,
                                surname = l.Name
                            }).Where(l=>!outerJoin.Any(o=>o.id == l.id)));
Run Code Online (Sandbox Code Playgroud)


pwi*_*cox 6

我猜想@sehe的方法更强大,但是直到我更好地理解它为止,我发现自己已经超越了@MichaelSander的扩展名。我对其进行了修改,以匹配此处描述的内置Enumerable.Join()方法的语法和返回类型。我在@JeffMercado解决方案下针对@ cadrell0的注释附加了“与众不同”的后缀。

public static class MyExtensions {

    public static IEnumerable<TResult> FullJoinDistinct<TLeft, TRight, TKey, TResult> (
        this IEnumerable<TLeft> leftItems, 
        IEnumerable<TRight> rightItems, 
        Func<TLeft, TKey> leftKeySelector, 
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector
    ) {

        var leftJoin = 
            from left in leftItems
            join right in rightItems 
              on leftKeySelector(left) equals rightKeySelector(right) into temp
            from right in temp.DefaultIfEmpty()
            select resultSelector(left, right);

        var rightJoin = 
            from right in rightItems
            join left in leftItems 
              on rightKeySelector(right) equals leftKeySelector(left) into temp
            from left in temp.DefaultIfEmpty()
            select resultSelector(left, right);

        return leftJoin.Union(rightJoin);
    }

}
Run Code Online (Sandbox Code Playgroud)

在示例中,您将像这样使用它:

var test = 
    firstNames
    .FullJoinDistinct(
        lastNames,
        f=> f.ID,
        j=> j.ID,
        (f,j)=> new {
            ID = f == null ? j.ID : f.ID, 
            leftName = f == null ? null : f.Name,
            rightName = j == null ? null : j.Name
        }
    );
Run Code Online (Sandbox Code Playgroud)

将来,随着我学到更多,我觉得我会逐渐流行@sehe的逻辑。但是即使那样,我仍然必须小心,因为我认为,如果可行,至少要有一个与现有“ .Join()”方法的语法相匹配的重载很重要,原因有两个:

  1. 方法的一致性有助于节省时间,避免错误和避免意外行为。
  2. 如果将来有一个现成的“ .FullJoin()”方法,我想它会尽量保持当前存在的“ .Join()”方法的语法。如果是这样,则如果要迁移到它,则可以简单地重命名函数,而无需更改参数或担心不同的返回类型会破坏代码。

我仍然对泛型,扩展,Func语句和其他功能不熟悉,因此欢迎反馈。

编辑:没多久我就意识到我的代码有问题。我在LINQPad中执行.Dump()并查看返回类型。它只是IEnumerable,所以我尝试匹配它。但是当我实际上在扩展名上执行了.Where()或.Select()时,出现了一个错误:“'System Collections.IEnumerable'不包含'Select'和...的定义”。因此,最终我可以匹配.Join()的输入语法,但不能匹配返回行为。

编辑:将 “ TResult”添加到函数的返回类型。在阅读Microsoft文章时错过了这一点,这当然是有道理的。借助此修复程序,现在看来返回行为毕竟符合我的目标。


Gui*_*cha 5

对于键在两个枚举中都是唯一的情况,我的干净解决方案是:

 private static IEnumerable<TResult> FullOuterJoin<Ta, Tb, TKey, TResult>(
            IEnumerable<Ta> a, IEnumerable<Tb> b,
            Func<Ta, TKey> key_a, Func<Tb, TKey> key_b,
            Func<Ta, Tb, TResult> selector)
        {
            var alookup = a.ToLookup(key_a);
            var blookup = b.ToLookup(key_b);
            var keys = new HashSet<TKey>(alookup.Select(p => p.Key));
            keys.UnionWith(blookup.Select(p => p.Key));
            return keys.Select(key => selector(alookup[key].FirstOrDefault(), blookup[key].FirstOrDefault()));
        }
Run Code Online (Sandbox Code Playgroud)

所以

    var ax = new[] {
        new { id = 1, first_name = "ali" },
        new { id = 2, first_name = "mohammad" } };
    var bx = new[] {
        new { id = 1, last_name = "rezaei" },
        new { id = 3, last_name = "kazemi" } };

    var list = FullOuterJoin(ax, bx, a => a.id, b => b.id, (a, b) => "f: " + a?.first_name + " l: " + b?.last_name).ToArray();
Run Code Online (Sandbox Code Playgroud)

输出:

f: ali l: rezaei
f: mohammad l:
f:  l: kazemi
Run Code Online (Sandbox Code Playgroud)