Postgres LEFT JOIN创建的行多于左表

Are*_*bre 1 postgresql left-join sql-insert

我在Windows 7 x64上运行Postgres 9.1.3 32位.(已使用32位因为没有Windows的PostGIS版本与64位兼容的Postgres)(编辑:作为PostGIS的2.0,它是与Windows Postgres的64位兼容.)

我有一个查询,左边连接一个table(consistent.master)与临时表,然后将结果数据插入第三个表(consistent.masternew).

由于这是a left join,因此结果表应与查询中的左表具有相同的行数.但是,如果我运行这个:

SELECT count(*)
FROM consistent.master
Run Code Online (Sandbox Code Playgroud)

我得到2085343.但如果我运行这个:

SELECT count(*)
FROM consistent.masternew
Run Code Online (Sandbox Code Playgroud)

我得到2085703.

怎么masternew会有更多的行master?不应该与查询中的左表masternew有相同的行数master

以下是查询.在mastermasternew表应该是相同的结构.

--temporary table created here
--I am trying to locate where multiple tickets were written on
--a single traffic stop
WITH stops AS (
    SELECT citation_id,
           rank() OVER (ORDER BY offense_timestamp,
                     defendant_dl,
                     offense_street_number,
                     offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction=1
)

--Here's the insert statement. Below you'll see it's
--pulling data from a select query
INSERT INTO consistent.masternew (arrest_id,
  citation_id,
  defendant_dl,
  defendant_dl_state,
  defendant_zip,
  defendant_race,
  defendant_sex,
  defendant_dob,
  vehicle_licenseplate,
  vehicle_licenseplate_state,
  vehicle_registration_expiration_date,
  vehicle_year,
  vehicle_make,
  vehicle_model,
  vehicle_color,
  offense_timestamp,
  offense_street_number,
  offense_street_name,
  offense_crossstreet_number,
  offense_crossstreet_name,
  offense_county,
  officer_id,
  offense_code,
  speed_alleged,
  speed_limit,
  work_zone,
  school_zone,
  offense_location,
  source,
  citing_jurisdiction,
  the_geom)

--Here's the select query that the insert statement is using.    
SELECT stops.stop,
  master.citation_id,
  defendant_dl,
  defendant_dl_state,
  defendant_zip,
  defendant_race,
  defendant_sex,
  defendant_dob,
  vehicle_licenseplate,
  vehicle_licenseplate_state,
  vehicle_registration_expiration_date,
  vehicle_year,
  vehicle_make,
  vehicle_model,
  vehicle_color,
  offense_timestamp,
  offense_street_number,
  offense_street_name,
  offense_crossstreet_number,
  offense_crossstreet_name,
  offense_county,
  officer_id,
  offense_code,
  speed_alleged,
  speed_limit,
  work_zone,
  school_zone,
  offense_location,
  source,
  citing_jurisdiction,
  the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id
Run Code Online (Sandbox Code Playgroud)

如果它很重要,我已经运行VACUUM FULL ANALYZE并重新索引两个表.(不确定命令;通过pgAdmin III完成.)

Rém*_*émi 9

左连接的行数不一定与左表中的行数相同.基本上,它就像一个普通的连接,除了左表的行也不会出现在普通连接中.因此,如果右表中有多行与左表中的一行匹配,则结果中的行可以多于左表的行数.

为了做你想做的事,你应该使用group by和count来检测倍数.

select citation_id
from stops join master on stops.citation_id = master.citation_id
group by citation_id
having count(*) > 1
Run Code Online (Sandbox Code Playgroud)


And*_*den 7

有时你知道有多个,但不在乎。您只想获取第一个或顶部条目。
如果是这样,您可以使用SELECT DISTINCT ON

FROM consistent.master LEFT JOIN (SELECT DISTINCT ON (citation_id) * FROM stops) s
ON s.citation_id = master.citation_id
Run Code Online (Sandbox Code Playgroud)

citation_id您想要为每场比赛获取第一(任意)行的列在哪里。

您可能希望确保这是确定性的并ORDER BY与其他一些可排序列一起使用:

SELECT DISTINCT ON (citation_id) * FROM stops ORDER BY citation_id, created_at
Run Code Online (Sandbox Code Playgroud)

  • 叹息,谷歌带我找到我自己的答案......我觉得一定有更好的方法! (3认同)