我有一堆存储在表中的URL,等待脚本删除.但是,其中许多网址来自同一网站.我想以"网站友好"的顺序返回这些网址(也就是说,尝试避免连续存在来自同一网站的两个网址),这样我就不会在短时间内通过制作过多的http请求而意外阻止.
数据库布局是这样的:
create table urls (
site varchar, -- holds e.g. www.example.com or stockoverflow.com
url varchar unique
);
示例结果:
SELECT url FROM urls ORDER BY mysterious_round_robin_function(site); http://www.example.com/some/file http://stackoverflow.com/questions/ask http://use.perl.org/ http://www.example.com/some/other/file http://stackoverflow.com/tags
我想到了像" ORDER BY site <> @last_site DESC" 这样的东西,但我不知道如何写这样的东西.
有关其工作原理的更详细说明,请参阅我博客中的这篇文章:
新的PostgreSQL 8.4:
SELECT *
FROM (
SELECT site, url, ROW_NUMBER() OVER (PARTITION BY site ORDER BY url) AS rn
FROM urls
)
ORDER BY
rn, site
Run Code Online (Sandbox Code Playgroud)
对于旧版本:
SELECT site,
(
SELECT url
FROM urls ui
WHERE ui.site = sites.site
ORDER BY
url
OFFSET total
LIMIT 1
) AS url
FROM (
SELECT site, generate_series(0, cnt - 1) AS total
FROM (
SELECT site, COUNT(*) AS cnt
FROM urls
GROUP BY
site
) s
) sites
ORDER BY
total, site
Run Code Online (Sandbox Code Playgroud)
,尽管它可能效率较低。
| 归档时间: |
|
| 查看次数: |
3037 次 |
| 最近记录: |