finding shortest path up to ten degrees of separation

nov*_*ice 9 php mysql sql postgresql recursive-query

I have the following three tables in SQL:

select * from movie limit 2;

  id  |           title            | year | content_rating | duration |    lang    |       country        |  gross   |  budget  | director_id 
------+----------------------------+------+----------------+----------+------------+----------------------+----------+----------+-------------
  407 | 102 Dalmatians             | 2000 | G              |      100 | English    | USA                  | 66941559 | 85000000 |        2174
 3699 | 10 Cloverfield Lane        | 2016 | PG-13          |      104 | English    | USA                  | 71897215 | 15000000 |        1327
(2 rows)
Run Code Online (Sandbox Code Playgroud)
select * from actor limit 3;

  id  |         name         | facebook_likes 
------+----------------------+----------------
  408 | Christian Bale       |          23000
 1430 | Donna Murphy         |            553
   66 | Robert Downey Jr.    |          21000
(3 rows)
Run Code Online (Sandbox Code Playgroud)
select * from acting limit 3;

 movie_id | actor_id 
----------+----------
      407 |     2024
     3699 |     1841
     3016 |       11
(3 rows)
Run Code Online (Sandbox Code Playgroud)

Given two actors a1 and a2, I want to find the shortest path between a1 and a2.

For example, let's say a1 = 'Tom Cruise' and a2 = 'Robert Downey Jr'.

The output should be

Tom Cruise was in Days of Thunder with Robert Duvall -> Robert Duvall was in Lucky You with Robert Downey Jr.

In this case, Tom Cruise was 2 degrees away from Robert Downey Jr, with Robert Durvall connecting them. At most, I'd like to output up to 10 degrees, and after that ignore any connections.

I tried implementing the solution SQL query 6 degrees of separation for network analysis using recursive CTE but I don't think I've applied it properly. Help is appreciated, thanks in advance :)

Attempted query:

with recursive cte as (
select actor.name, movie.title, 1 as level from movie
left join acting on acting.movie_id = movie.id 
left join actor on actor.id = acting.actor_id
where actor.name = 'Tom Cruise'
union  
select actor.name, movie.title, level+1 from movie
left join acting on acting.movie_id = movie.id 
left join actor on actor.id = acting.actor_id
inner join cte on cte.name = actor.name
where cte.name = actor.name and cte.level < 10
)
select * from cte
Run Code Online (Sandbox Code Playgroud)

Bur*_*lin 5

我不确定查询中的第二个选择将返回什么,但是这是一种获取参与者之间分离程度的方法:

假设我们有一个演员ID表格Origin。为了获得与我们表中的演员之一在同一部电影中播放过的所有演员,我们需要从Origin开始,先与Acting再加入Movie,以便获得我们的原始演员在其中播放过的所有电影。 ,然后再次与“代理”和“ Actor”表一起获得所需的内容。请注意,代理表出现了两次。如果将其应用于递归CTE和您的问题,请注意在您的示例中Origin表将是Cte,我们将获得以下信息:

WITH RECURSIVE cte(id, distance) AS (
    SELECT actor.id, 0 
    FROM actor
    WHERE actor.name = 'Tom Cruise'

    UNION

    SELECT DISTINCT actor.id, cte.distance + 1
    FROM cte
    JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
    JOIN movie ON (acting1.movie_id = movie.id)
    JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
    JOIN actor ON (acting2.actor_id = actor.id)
    WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
)
Run Code Online (Sandbox Code Playgroud)

此后,cte表将包含类型(id,dist)的元组,这意味着存在一个从Tom Cruise到具有该ID且距离为dist的演员的路径。

DISTINCT是出于效率原因。我们的Cte表中会有很多坏对(第二个值大于真实距离),尤其是在角色图很密集的情况下,但是正确的元组在Cte表中。正确的元组是指元组(演员,距离),因此距离是起始演员(例如,汤姆·克鲁斯)和该演员之间的最短路径

编辑:我不好,UNION已经做到了,所以重复不需要DISTINCT。

为了获得该距离,我们添加带有group by子句的select:

WITH RECURSIVE cte(id, distance) AS (
    SELECT actor.id, 0 
    FROM actor
    WHERE actor.name = 'Tom Cruise'

    UNION

    SELECT actor.id, cte.distance + 1
    FROM cte
    JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
    JOIN movie ON (acting1.movie_id = movie.id)
    JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
    JOIN actor ON (acting2.actor_id = actor.id)
    WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
)
SELECT id, MIN(distance) AS distance
FROM cte
GROUP BY id
ORDER BY 2 ASC;
Run Code Online (Sandbox Code Playgroud)

小罗伯特·唐尼(Robert Downey Jr)说,如果您想查看给定第二个演员的结果,那么这将为您提供关于分离度的答案:

WITH RECURSIVE cte(id, distance) AS (
    SELECT actor.id, 0 
    FROM actor
    WHERE actor.name = 'Tom Cruise'

    UNION

    SELECT actor.id, cte.distance + 1
    FROM cte
    JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
    JOIN movie ON (acting1.movie_id = movie.id)
    JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
    JOIN actor ON (acting2.actor_id = actor.id)
    WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
), distance_table (id, distance) AS (
    SELECT id, MIN(distance) AS distance
    FROM cte
    GROUP BY id
)
SELECT 'Tom Cruise and ' || actor.name || ' are separated by ' ||
       COALESCE(TO_CHAR(distance_table.distance, '999999'), 'more than 10') || ' degrees of separation'
FROM actor
LEFT JOIN distance_table ON (actor.id = distance_table.id)
WHERE actor.name = 'Robert Downey Jr';
Run Code Online (Sandbox Code Playgroud)

尽管我一般不希望直接从数据库中计算此类信息,但是如果您希望有一条消息告诉演员之间的路径,例如您提供的演员(汤姆·克鲁斯(Tom Cruise)在《雷霆时代》中罗伯特·杜瓦尔(Robert Duvall)->罗伯特·杜瓦尔(Robert Duvall)和小罗伯特·唐尼(Robert Downey Jr.)在《幸运的你》中,那么这样的话可能会返回:

WITH RECURSIVE cte(id, name, distance, message) AS (
    SELECT actor.id, actor.name, 0, ''
    FROM actor
    WHERE actor.name = 'Tom Cruise'

    UNION

    SELECT actor.id, actor.name, cte.distance + 1, 
           cte.message || '> ' || cte.name || ' was in ' ||
           movie.title || ' with ' || actor.name || ' '
    FROM cte
    JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
    JOIN movie ON (acting1.movie_id = movie.id)
    JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
    JOIN actor ON (acting2.actor_id = actor.id)
    WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
), distance_table (id, distance) AS (
    SELECT id, MIN(distance) AS distance
    FROM cte
    GROUP BY id
)
SELECT id, name, message, distance
FROM cte
WHERE (id, distance) IN (SELECT * FROM distance_table)
ORDER BY distance;
Run Code Online (Sandbox Code Playgroud)