Jak*_*ake 4 sql postgresql join sum count
I've searched through several suggestions on this site and haven't quite been able to get what I'm after. I suspect there's just a syntax/punctuation issue that I'm just missing.
I work on a database using phpPgAdmin that tracks lots of information related to a population of baboons being studied. I'm trying to make a query to identify, for each individual baboon, how many tissue samples of different types we have collected for them and how many DNA samples we have of different types for each of them There are three tables that are pertinent to my problem:
Table: "biograph" has basic info about all the animals in the group, though the name is all I care about here.
name | birth
-----+-----------
A21 | 1968-07-01
AAR | 2002-03-30
ABB | 1998-09-10
ABD | 2005-03-15
ABE | 1986-01-01
Run Code Online (Sandbox Code Playgroud)
Table: "babtissue" tracks information, including the below three columns, about different tissues that have been collected over the years. Some lines in this table represent tissue samples that we no longer have, but are still referred to elsewhere in the database, so the "avail" column helps us screen for samples that we still have around.
name | sample_type | avail
-----+-------------+------
A21 | BLOOD | Y
A21 | BLOOD | Y
A21 | TISSUE | N
ABB | BLOOD | Y
ABB | TISSUE | Y
Run Code Online (Sandbox Code Playgroud)
Table: "dna" is similar to babtissue.
name | sample_type | avail
-----+-------------+------
ABB | GDNA | N
ABB | WGA | Y
ACC | WGA | N
ALE | GDNA | Y
ALE | GDNA | Y
Run Code Online (Sandbox Code Playgroud)
Altogether, I'm trying to write a query that will return every name from biograph and tells me in one column how many 'BLOOD', 'TISSUE', 'GDNA', and 'WGA' samples I have for each individual. Something like...
name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21 | 2 | 0 | 0 | 0 | ?
AAR | 0 | 0 | 0 | 0 | ?
ABB | 1 | 1 | 0 | 1 | ?
ACC | 0 | 0 | 0 | 0 | ?
ALE | 0 | 0 | 2 | 0 | ?
Run Code Online (Sandbox Code Playgroud)
(Apologies for the weird formatting above, I'm not very familiar with writing this way)
The latest version of the query that I've tried:
select b.name,
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas
from biograph b
left join babtissue t on b.name=t.name
left join dna d on b.name=d.name
where b.name is not NULL
group by b.name
order by b.name
Run Code Online (Sandbox Code Playgroud)
I don't receive any errors when doing it this way, but I know the numbers it gives me are wrong--too high. I figure this has something to do with my use of more than one join, and that something about my join syntax needs to change.
Any ideas?
The numbers are too high because you're joining to babtissue
and then also to dna
, which is going to cause duplicates.
You can try to break it up. I don't know if this syntax will work for your database, but I believe that it follows ANSI standards, so give it a shot...
SELECT
SQ.name,
SUM(CASE WHEN T.sample_type = 'BLOOD' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS bloodsamps,
SUM(CASE WHEN T.sample_type = 'TISSUE' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS tissuesamps,
SQ.gdnas,
SQ.wgas
FROM
(
SELECT
B.name,
SUM(CASE WHEN D.sample_type = 'GDNA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS gdnas,
SUM(CASE WHEN D.sample_type = 'WGA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS wgas
FROM
biograph B
LEFT JOIN dna D ON D.name = B.name
GROUP BY
B.name
) AS SQ
LEFT JOIN babtissue T on T.name = SQ.name
WHERE SQ.name is not NULL
GROUP BY SQ.name, SQ.gdnas, SQ.wgas
ORDER BY SQ.name
Run Code Online (Sandbox Code Playgroud)
Can the name really be NULL?