Kai*_*ran 6 sql sql-server sorting natural-sort
感谢您花时间阅读所有这些,很多!感谢所有爱好者!
如何自然排序?
即.命令一组字母数字数据显示为:
Season 1, Season 2, Season 10, Season 20
Run Code Online (Sandbox Code Playgroud)
代替
Season 1, Season 10, Season 2, Season 20
Run Code Online (Sandbox Code Playgroud)
我以非常实用的格式使用一个非常实用的电视季节示例作为案例.
我希望完成以下任务:
我花了大约2个小时在线研究,另外3个小时建立了这个解决方案.一些参考资料来自:
在SO和其他站点上找到的一些解决方案仅适用于90%的案例.但是,如果文本中有多个数值,则大多数/全部都不起作用,或者如果文本中根本没有找到数字,则会导致SQL错误.
我创建了这个SQLFiddle链接来玩(包括以下所有代码).
这是create语句:
create table tvseason
(
title varchar(100)
);
insert into tvseason (title)
values ('100 Season 03'), ('100 Season 1'),
('100 Season 10'), ('100 Season 2'),
('100 Season 4'), ('Show Season 1 (2008)'),
('Show Season 2 (2008)'), ('Show Season 10 (2008)'),
('Another Season 01'), ('Another Season 02'),
('Another 1st Anniversary Season 01'),
('Another 2nd Anniversary Season 01'),
('Another 10th Anniversary Season 01'),
('Some Show Another No Season Number'),
('Some Show No Season Number'),
('Show 2 Season 1'),
('Some Show With Season Number 1'),
('Some Show With Season Number 2'),
('Some Show With Season Number 10');
Run Code Online (Sandbox Code Playgroud)
这是我的工作解决方案(只能解决下面的标准#7):
select
title, "index", titleLeft,
convert(int, coalesce(nullif(titleRightTrim2, ''), titleRight)) titleRight
from
(select
title, "index", titleLeft, titleRight, titleRightTrim1,
case
when PATINDEX('%[^0-9]%', titleRightTrim2) = 0
then titleRightTrim2
else left(titleRightTrim2, PATINDEX('%[^0-9]%', titleRightTrim2) - 1)
end as titleRightTrim2
from
(select
title,
len(title) - PATINDEX('%[0-9] %', reverse(title)) 'index',
left(title, len(title) - PATINDEX('%[0-9] %', reverse(title))) titleLeft,
ltrim(right(title, PATINDEX('%[0-9] %', reverse(title)))) titleRight,
ltrim(right(title, PATINDEX('%[0-9] %', reverse(title)))) titleRightTrim1,
left(ltrim(right(title, PATINDEX('%[0-9] %', reverse(title)))), PATINDEX('% %', ltrim(right(title, PATINDEX('%[0-9] %', reverse(title)))))) titleRightTrim2
from
tvseason) x) y
order by
titleLeft, titleRight
Run Code Online (Sandbox Code Playgroud)
要考虑的标准:
这是输出:
title
100 Season 1
100 Season 2
100 Season 03
100 Season 4
100 Season 10
**Case 7 here**
Another 10th Anniversary Season 01
Another 1st Anniversary Season 01
Another 2nd Anniversary Season 01
Another Season 01
Another Season 02
Show (2008) Season 1
Show (2008) Season 2
Show 2 The 75th Anniversary Season 1
Show Season 1 (2008)
Show Season 2 (2008)
Show Season 10 (2008)
Some Show Another No Season Number
Some Show No Season Number
Some Show With Season Number 1
Some Show With Season Number 2
Some Show With Season Number 10
Run Code Online (Sandbox Code Playgroud)
我认为这可以解决问题......我只是识别从非数字到数字的变化。我没有做过任何大规模的测试,但它应该相当快。
SET QUOTED_IDENTIFIER ON;
GO
SET ANSI_NULLS ON;
GO
ALTER FUNCTION dbo.tfn_SplitForSort
/* ===================================================================
11/11/2018 JL, Created: Comments
=================================================================== */
--===== Define I/O parameters
(
@string VARCHAR(8000)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_Tally (n) AS (
SELECT TOP (LEN(@string))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n2 a CROSS JOIN cte_n2 b
),
cte_split_string AS (
SELECT
col_num = ROW_NUMBER() OVER (ORDER BY t.n) + CASE WHEN LEFT(@string, 1) LIKE '[0-9]' THEN 0 ELSE 1 END,
string_part = SUBSTRING(@string, t.n, LEAD(t.n, 1, 8000) OVER (ORDER BY t.n) - t.n)
FROM
cte_Tally t
CROSS APPLY ( VALUES (SUBSTRING(@string, t.n, 2)) ) s (str2)
WHERE
t.n = 1
OR SUBSTRING(@string, t.n - 1, 2) LIKE '[0-9][^0-9]'
OR SUBSTRING(@string, t.n - 1, 2) LIKE '[^0-9][0-9]'
)
SELECT
so_01 = ISNULL(MAX(CASE WHEN ss.col_num = 1 THEN CONVERT(FLOAT, ss.string_part) END), 99999999),
so_02 = MAX(CASE WHEN ss.col_num = 2 THEN ss.string_part END),
so_03 = MAX(CASE WHEN ss.col_num = 3 THEN CONVERT(FLOAT, ss.string_part) END),
so_04 = MAX(CASE WHEN ss.col_num = 4 THEN ss.string_part END),
so_05 = MAX(CASE WHEN ss.col_num = 5 THEN CONVERT(FLOAT, ss.string_part) END),
so_06 = MAX(CASE WHEN ss.col_num = 6 THEN ss.string_part END),
so_07 = MAX(CASE WHEN ss.col_num = 7 THEN CONVERT(FLOAT, ss.string_part) END),
so_08 = MAX(CASE WHEN ss.col_num = 8 THEN ss.string_part END),
so_09 = MAX(CASE WHEN ss.col_num = 9 THEN CONVERT(FLOAT, ss.string_part) END),
so_10 = MAX(CASE WHEN ss.col_num = 10 THEN ss.string_part END)
FROM
cte_split_string ss;
GO
Run Code Online (Sandbox Code Playgroud)
正在使用的函数...
SELECT
ts.*
FROM
#tvseason ts
CROSS APPLY dbo.tfn_SplitForSort (ts.title) sfs
ORDER BY
sfs.so_01,
sfs.so_02,
sfs.so_03,
sfs.so_04,
sfs.so_05,
sfs.so_06,
sfs.so_07,
sfs.so_08,
sfs.so_09,
sfs.so_10;
Run Code Online (Sandbox Code Playgroud)
结果:
id title
----------- ------------------------------------------
2 100 Season 1
4 100 Season 2
1 100 Season 03
5 100 Season 4
3 100 Season 10
11 Another 1st Anniversary Season 01
12 Another 2nd Anniversary Season 01
13 Another 10th Anniversary Season 01
9 Another Season 01
10 Another Season 02
16 Show 2 Season 1
6 Show Season 1 (2008)
7 Show Season 2 (2008)
8 Show Season 10 (2008)
14 Some Show Another No Season Number
15 Some Show No Season Number
17 Some Show With Season Number 1
18 Some Show With Season Number 2
19 Some Show With Season Number 10
Run Code Online (Sandbox Code Playgroud)
--================================================ ======================
[编辑 2020-09-23] 我正在回顾我的一些旧帖子,当我遇到这个帖子时,想看看我是否可以开始使用单值输出。将 10 列添加到 ORDER BY 只是笨拙的......经过一番思考后,我想到将 FLOAT 转换为 BINARY 并将 BINARY 转换回 VARCHAR,我可以使用 STRING_AGG() 函数重新组装字符串。最终结果将是产生所需排序的字符串。
CREATE FUNCTION dbo.human_sort_string
/* ===================================================================
09/23/2020 JL, Created: Just a test
=================================================================== */
--===== Define I/O parameters
(
@string varchar(8000)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)), -- 10
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b), -- 100
cte_Tally (n) AS (
SELECT TOP (LEN(@string))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n2 a CROSS JOIN cte_n2 b -- 10,000
),
cte_Parsed AS (
SELECT
t.n,
parsed_val = SUBSTRING(@string, ISNULL(NULLIF(t.n, 1), 0) + 1, LEAD(t.n, 1, 8000) OVER (ORDER BY t.n) - ISNULL(NULLIF(t.n, 1), 0))
FROM
cte_Tally t
CROSS APPLY ( VALUES (SUBSTRING(@string, t.n, 2)) ) sv (sub_val)
WHERE
t.n = 1
OR
sv.sub_val LIKE '[0-9][^0-9]'
OR
sv.sub_val LIKE '[^0-9][0-9]'
)
SELECT
sort_string = STRING_AGG(ISNULL(CONVERT(varchar(8000), CONVERT(binary(8), TRY_CONVERT(float, p.parsed_val)), 2), p.parsed_val), '') WITHIN GROUP (ORDER BY p.n)
FROM
cte_Parsed p;
GO
Run Code Online (Sandbox Code Playgroud)
现在,外部查询看起来像这样......
SELECT
ts.id,
td.title
FROM
#tvseason ts
CROSS APPLY dbo.human_sort_string(ts.title) hss
ORDER BY
hss.sort_string;
Run Code Online (Sandbox Code Playgroud)
实际结果与之前的函数相同。
就我个人而言,我会尽量避免在 SQL 中进行复杂的字符串操作。我可能会将其转储到文本文件中,并使用 C# 或 Python 等正则表达式对其进行处理。然后将其写回数据库中的单独列中。众所周知,SQL 不擅长字符串操作。
然而,这是我对 SQL 方法的尝试。Season [number]
这个想法基本上是首先消除其中没有字符串的任何行。这处理了没有季节可供解析的情况。我选择将它们包含在 null 中,但是您可以轻松地在 where 子句中省略它们,或者为它们提供一些默认值。我使用该stuff()
函数来截断字符串之前的所有内容Season [number]
,因此更容易使用。
现在我们的字符串以季号开头,并可能以一些垃圾结尾。我使用 case 语句来查看是否有垃圾(任何非数字的内容),如果有,我会取出最左边的数字字符并丢弃其余的。如果一开始只有数字,我就保持原样。
最后,将其转换为 int,并按它排序。
if object_id('tempdb.dbo.#titles') is not null drop table #titles
create table #titles (Title varchar(100))
insert into #titles (TItle)
select title = '100 Season 1'
union all select '100 Season 2'
union all select '100 Season 03'
union all select '100 Season 4'
union all select '100 Season 10'
union all select 'Another 10th Anniversary Season 01'
union all select 'Another 1st Anniversary Season 01'
union all select 'Another 2nd Anniversary Season 01'
union all select 'Another Season 01'
union all select 'Another Season 02'
union all select 'Show (2008) Season 1'
union all select 'Show (2008) Season 2'
union all select 'Show 2 The 75th Anniversary Season 1'
union all select 'Show Season 1 (2008)'
union all select 'Show Season 2 (2008)'
union all select 'Show Season 10 (2008)'
union all select 'Some Show Another No Season Number'
union all select 'Some Show No Season Number'
union all select 'Some Show With Season Number 1'
union all select 'Some Show With Season Number 2'
union all select 'Some Show With Season Number 10'
;with src as
(
select
Title,
Trimmed = case when Title like '%Season [0-9]%'
then stuff(title, 1, patindex('%season [0-9]%', title) + 6, '')
else null
end
from #titles
)
select
Season = cast(case when Trimmed like '%[^0-9]%' then left(Trimmed, patindex('%[^0-9]%', Trimmed))
else Trimmed
end as int),
Title
from src
order by Season
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
506 次 |
最近记录: |