JAVA中列表上的聚合函数

Die*_*o D 3 java database mapreduce data-processing

我有一个Java对象列表,我需要减少它应用聚合函数,如通过数据库选择.

注意:数据是从多个数据库和服务调用计算的.我希望有数千行,每行总是会有相同数量的"单元格".此数量在执行之间变化.

样品:

假设我的数据ListObject[3](List<Object[]>)表示,我的数据可能是:

[{"A", "X", 1},
{"A", "Y", 5},
{"B", "X", 1},
{"B", "X", 2}]
Run Code Online (Sandbox Code Playgroud)

样本1:

SUM索引2,按索引0和1分组

[{"A", "X", 1},
{"A", "Y", 5},
{"B", "X", 3}]
Run Code Online (Sandbox Code Playgroud)

样本2:

MAX超过索引2,按索引0分组

[{"A", "Y", 5},
{"B", "X", 2}]
Run Code Online (Sandbox Code Playgroud)

有人知道一些可以在Java中模拟这种行为的框架或api吗?

我的第一个选择是在NO-SQL数据库(如Couchbase)中插入所有数据,然后应用Map-Reduce,最后得到结果.但是这个解决方案有很大的开销.

我的第二个选择是嵌入一个Groovy脚本,但它也有很大的开销.

Nic*_*olt 5

如果Java 8是一个选项,那么你可以通过Stream.collect实现你想要的.

例如:

import static java.util.stream.Collectors.*;

import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.Set;

public class Example
{
  public static void main(String[] args)
  {
    List<List<Object>> list = Arrays.asList(
      Arrays.<Object>asList("A", "X", 1),
      Arrays.<Object>asList("A", "Y", 5),
      Arrays.<Object>asList("B", "X", 1),
      Arrays.<Object>asList("B", "X", 2)
    );

    Map<Set<Object>, List<List<Object>>> groups = list.stream()
    .collect(groupingBy(Example::newGroup));

    System.out.println(groups);

    Map<Set<Object>, Integer> sums = list.stream()
    .collect(groupingBy(Example::newGroup, summingInt(Example::getInt)));

    System.out.println(sums);

    Map<Set<Object>, Optional<List<Object>>> max = list.stream()
    .collect(groupingBy(Example::newGroup, maxBy(Example::compare)));

    System.out.println(max);
  }

  private static Set<Object> newGroup(List<Object> item)
  {
    return new HashSet<>(Arrays.asList(item.get(0), item.get(1)));
  }

  private static Integer getInt(List<Object> items)
  {
    return (Integer)items.get(2);
  }

  private static int compare(List<Object> items1, List<Object> items2)
  {
    return (((Integer)items1.get(2)) - ((Integer)items2.get(2)));
  }
}
Run Code Online (Sandbox Code Playgroud)

给出以下输出:

{[A, X]=[[A, X, 1]], [B, X]=[[B, X, 1], [B, X, 2]], [A, Y]=[[A, Y, 5]]}

{[A, X]=1, [B, X]=3, [A, Y]=5}

{[A, X]=Optional[[A, X, 1]], [B, X]=Optional[[B, X, 2]], [A, Y]=Optional[[A, Y, 5]]}
Run Code Online (Sandbox Code Playgroud)

或者,使用Java 8示例作为灵感,虽然更冗长,但您可以在旧版本的Java中实现相同的功能:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class Example
{
  public static void main(String[] args)
  {
    List<List<Object>> list = Arrays.asList(
      Arrays.<Object>asList("A", "X", 1),
      Arrays.<Object>asList("A", "Y", 5),
      Arrays.<Object>asList("B", "X", 1),
      Arrays.<Object>asList("B", "X", 2)
    );

    Function<List<Object>, Set<Object>> groupBy = new Function<List<Object>, Set<Object>>()
    {
      @Override
      public Set<Object> apply(List<Object> item)
      {
        return new HashSet<>(Arrays.asList(item.get(0), item.get(1)));
      }
    };

    Map<Set<Object>, List<List<Object>>> groups = group(
      list,
      groupBy
    );

    System.out.println(groups);

    Map<Set<Object>, Integer> sums = sum(
      list,
      groupBy,
      new Function<List<Object>, Integer>()
      {
        @Override
        public Integer apply(List<Object> item)
        {
          return (Integer)item.get(2);
        }
      }
    );

    System.out.println(sums);

    Map<Set<Object>, List<Object>> max = max(
      list,
      groupBy,
      new Comparator<List<Object>>()
      {
        @Override
        public int compare(List<Object> items1, List<Object> items2)
        {
          return (((Integer)items1.get(2)) - ((Integer)items2.get(2)));
        }
      }
    );

    System.out.println(max);

  }

  public static <K, V> Map<K, List<V>> group(Collection<V> items, Function<V, K> groupFunction)
  {
    Map<K, List<V>> groupedItems = new HashMap<>();

    for (V item : items)
    {
      K key = groupFunction.apply(item);

      List<V> itemGroup = groupedItems.get(key);
      if (itemGroup == null)
      {
        itemGroup = new ArrayList<>();
        groupedItems.put(key, itemGroup);
      }

      itemGroup.add(item);
    }

    return groupedItems;
  }

  public static <K, V> Map<K, Integer> sum(Collection<V> items, Function<V, K> groupFunction, Function<V, Integer> intGetter)
  {
    Map<K, Integer> sums = new HashMap<>();

    for (V item : items)
    {
      K key = groupFunction.apply(item);
      Integer sum = sums.get(key);

      sums.put(key, sum != null ? sum + intGetter.apply(item) : intGetter.apply(item));
    }

    return sums;
  }

  public static <K, V> Map<K, V> max(Collection<V> items, Function<V, K> groupFunction, Comparator<V> comparator)
  {
    Map<K, V> maximums = new HashMap<>();

    for (V item : items)
    {
      K key = groupFunction.apply(item);
      V maximum = maximums.get(key);

      if (maximum == null || comparator.compare(maximum, item) < 0)
      {
        maximums.put(key, item);
      }
    }

    return maximums;
  }

  private static interface Function<T, R>
  {
    public R apply(T value);
  }
}
Run Code Online (Sandbox Code Playgroud)

给出以下输出:

{[A, X]=[[A, X, 1]], [A, Y]=[[A, Y, 5]], [B, X]=[[B, X, 1], [B, X, 2]]}

{[A, X]=1, [A, Y]=5, [B, X]=3}

{[A, X]=[A, X, 1], [A, Y]=[A, Y, 5], [B, X]=[B, X, 2]}   
Run Code Online (Sandbox Code Playgroud)