使用JPA和Hibernate时,DISTINCT如何工作

Question

使用JPA和Hibernate时,DISTINCT如何工作

DISTINCT在JPA中使用哪一列,是否可以更改它？

这是使用DISTINCT的示例JPA查询:

select DISTINCT c from Customer c

Run Code Online (Sandbox Code Playgroud)

这没有多大意义 - 什么专栏是基于什么？它是否在实体上指定为注释,因为我找不到？

我想指定列来区分,例如:

select DISTINCT(c.name) c from Customer c

Run Code Online (Sandbox Code Playgroud)

我正在使用MySQL和Hibernate.

Answer 1

Αλέ*_*κος 56

你很亲密

select DISTINCT(c.name) from Customer c

Run Code Online (Sandbox Code Playgroud)

This only returns an array of that column though. How to return whole entities with this approach? (15认同)
@cen - 你要求的是不合逻辑的.如果我有两个客户(id = 1234,name ="Joe Customer")和(id = 2345,name ="Joe Customer"),应该为这样的查询返回？结果将是不确定的.现在,你可以用类似的东西来强制它(不确定这个的语法是如何工作的,但这应该给出一般的想法):`从客户c中选择c,其中id为(从客户d中选择min(d.id)) group by d.name)`...但这取决于情况,因为你需要根据你可用的属性来选择其中一个实体. (4认同)

Answer 2

Tom*_*asz 13

@Entity
@NamedQuery(name = "Customer.listUniqueNames", 
            query = "SELECT DISTINCT c.name FROM Customer c")
public class Customer {
        ...

        private String name;

        public static List<String> listUniqueNames() {
             return = getEntityManager().createNamedQuery(
                   "Customer.listUniqueNames", String.class)
                   .getResultList();
        }
}

Run Code Online (Sandbox Code Playgroud)

Answer 3

kaz*_*aki 11

更新:请参阅最高投票的答案.

我自己现在已经过时了.因历史原因而留在这里.

在联接中通常需要HQL中的区别,而不是像您自己的简单示例中那样.

另请参见如何在HQL中创建不同的查询

没有冒犯,但这怎么可能被接受作为答案呢？ (12认同)

Answer 4

Yan*_*ski 10

我同意kazanaki的回答,这对我很有帮助.我想选择整个实体,所以我用过

 select DISTINCT(c) from Customer c

Run Code Online (Sandbox Code Playgroud)

在我的情况下,我有多对多的关系,我想在一个查询中加载具有集合的实体.

我使用了LEFT JOIN FETCH,最后我不得不将结果区分开来.

Answer 5

Vla*_*cea 7

正如我在本文中所解释的，根据底层的JPQL或Criteria API查询类型，DISTINCT在JPA中具有两个含义。

标量查询

对于返回标量投影的标量查询，例如以下查询：

List<Integer> publicationYears = entityManager
.createQuery(
    "select distinct year(p.createdOn) " +
    "from Post p " +
    "order by year(p.createdOn)", Integer.class)
.getResultList();

LOGGER.info("Publication years: {}", publicationYears);

Run Code Online (Sandbox Code Playgroud)

该DISTINCT关键字应传递给底层的SQL语句，因为我们希望之前，返回结果集数据库引擎过滤重复：

SELECT DISTINCT
    extract(YEAR FROM p.created_on) AS col_0_0_
FROM
    post p
ORDER BY
    extract(YEAR FROM p.created_on)

-- Publication years: [2016, 2018]

Run Code Online (Sandbox Code Playgroud)

实体查询

对于实体查询，DISTINCT具有不同的含义。

如果不使用DISTINCT，则查询如下所示：

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

Run Code Online (Sandbox Code Playgroud)

将要加入post和这样的post_comment表：

SELECT p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1, 1]

Run Code Online (Sandbox Code Playgroud)

但是，父post记录在每个关联post_comment行的结果集中都是重复的。出于这个原因，List的Post实体将包含重复的Post实体引用。

为了消除Post实体引用，我们需要使用DISTINCT：

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

Run Code Online (Sandbox Code Playgroud)

但是随后DISTINCT还传递给SQL查询，这是完全不希望的：

SELECT DISTINCT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

Run Code Online (Sandbox Code Playgroud)

通过传递DISTINCT给SQL查询，EXECUTION PLAN将执行一个额外的Sort阶段，该阶段将增加开销而不会带来任何值，因为父子组合始终由于子PK列而返回唯一记录：

Unique  (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
  ->  Sort  (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
        Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
              Hash Cond: (pc.post_id = p.id)
              ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
              ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
                          Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
                          Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms

Run Code Online (Sandbox Code Playgroud)

具有HINT_PASS_DISTINCT_THROUGH的实体查询

为了从执行计划中消除排序阶段，我们需要使用HINT_PASS_DISTINCT_THROUGHJPA查询提示：

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

Run Code Online (Sandbox Code Playgroud)

现在，SQL查询将不包含DISTINCT但Post实体引用重复项将被删除：

SELECT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

Run Code Online (Sandbox Code Playgroud)

执行计划将确认我们这次不再具有额外的排序阶段：

Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
  Hash Cond: (pc.post_id = p.id)
  ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
  ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
              Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
              Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms

Run Code Online (Sandbox Code Playgroud)

谢谢，非常有用的答案！！阅读您在这里提到的文章和 Spring Data JPA 参考文档后，通过在方法顶部添加此注释，在我的 Spring Data JPA 存储库上实现了这一点：`@QueryHints(@QueryHint(name = "hibernate.query.passDistinctThrough", value =“假”））` (3认同)
上周买的，虽然不是全部 ;-) 可能是我读过的最好的 IT 书 (2认同)
@IsmailYavuz `PASS_DISTINCT_THROUGH` 由 [HHH-10965](https://hibernate.atlassian.net/browse/HHH-10965) 实现，自 Hibernate ORM 5.2.2 起可用。Spring Boot 1.5.9 非常旧，使用 Hibernate ORM 5.0.12。因此，如果您想从这些出色的功能中受益，您需要升级您的依赖项。 (2认同)

Answer 6

fin*_*rod 5

我会使用 JPA 的构造函数表达式功能。另请参阅以下答案：

JPQL 构造函数表达式 - org.hibernate.hql.ast.QuerySyntaxException：表未映射

按照问题中的例子，它会是这样的。

SELECT DISTINCT new com.mypackage.MyNameType(c.name) from Customer c

Run Code Online (Sandbox Code Playgroud)

归档时间：	16 年，5 月前
查看次数：	135954 次
最近记录：	7 年前