在MySQL中自然排序

Bla*_*laM 75 mysql sql sorting natural-sort

有没有一种优雅的方法在MySQL数据库中进行高性能的自然排序?

例如,如果我有这个数据集:

  • 最终幻想
  • 最终幻想4
  • 最终幻想10
  • 最终幻想12
  • 最终幻想12:Promathia的链条
  • 最终幻想冒险
  • 最终幻想起源
  • 最终幻想战术

任何其他优雅的解决方案,而不是将游戏的名称分解为他们的组件

  • 标题:"最终幻想"
  • 数量:"12"
  • 副标题:"Promathia链"

确保它们以正确的顺序出现?(4后4,而不是2).

这样做是一个痛苦的**,因为时不时有另一个游戏打破了解析游戏标题的机制(例如"Warhammer 40,000","James Bond 007")

slo*_*ype 85

这是一个快速的解决方案:

SELECT alphanumeric, 
       integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric
Run Code Online (Sandbox Code Playgroud)

  • 如果一切都是"最终幻想",那就太好了,但它把"高飞"放在FF套件之前. (44认同)
  • 在kludge上堆积kludge:`SELECT alphanumeric,integer FROM sorting_test ORDER BY SOUNDEX(alphanumeric),LENGTH(alphanumeric),alphanumeric`.如果这完全有效,那是因为SOUNDEX方便地丢弃数字,从而确保例如`apple1`在`z1`之前. (5认同)
  • 此解决方案始终不起作用.它有时会破裂.你应该使用这个:http://stackoverflow.com/a/12257917/384864 (4认同)

小智 53

刚发现这个:

SELECT names FROM your_table ORDER BY games + 0 ASC
Run Code Online (Sandbox Code Playgroud)

当数字位于前面时,自然排序也适用于中间.

  • 在我的情况下(数字字段)工作得很好,谢谢. (3认同)
  • 我没有尝试过,但我严重怀疑。它使用前面的数字的原因是因为`games` 用作数字上下文,因此在比较之前转换为数字。如果在中间,它将始终转换为 0,并且排序将变为伪随机。 (2认同)
  • 当数字位于中间(例如“最终幻想 100”或“最终幻想 2”)时,这实际上不起作用。《最终幻想100》将首先上映。然而,当整数首先为“100 Final Fantasy”时,它确实有效 (2认同)

Ric*_*oth 49

与@plalx发布的功能相同,但重写为MySQL:

DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE position int;
    DECLARE tmp_position int;
    SET position = 5000;
    SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; 
    SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;

    IF (position = 5000) THEN RETURN 0; END IF;
    RETURN position;
END
;;

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;
Run Code Online (Sandbox Code Playgroud)

用法:

SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")
Run Code Online (Sandbox Code Playgroud)

  • 这是唯一真正有效的解决方案.我也测试过drupals代码,但有时会失败.谢啦! (4认同)
  • @MarkSteudel 我们使用与此类似的函数(尽管不是这个确切的函数)对多个表进行自然排序,其中最大的约 500 万行。但是,我们不会在查询中直接调用它,而是使用它来设置 `nat_name` 列的值。每次更新行时,我们使用触发器来运行该函数。这种方法以增加一列为代价,为您提供自然排序,而没有真正的性能成本。 (4认同)

Mic*_*ren 21

我认为这就是为什么很多东西按发布日期排序的原因.

解决方案可能是在表中为"SortKey"创建另一列.这可能是标题的清理版本,它符合您为便于排序或计数器而创建的模式.

  • 这绝对是正确的方法,但它本身并不是一个答案! (3认同)

pla*_*alx 16

刚刚MSSQL 2000编写了这个函数:

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO
Run Code Online (Sandbox Code Playgroud)


Con*_*oyP 15

MySQL不允许这种"自然排序",因此看起来最好的方法就是如上所述拆分数据集(单独的id字段等),或者失败,根据非标题元素,数据库中的索引元素(日期,数据库中的插入ID等)执行排序.

让db为你做排序几乎总是比将大数据集读入你选择的编程语言并在那里排序更快,所以如果你在这里对db模式有任何控制,那么看看添加如上所述,易于排序的字段,从长远来看,它将为您节省大量的麻烦和维护.

MySQL错误讨论论坛上不时出现添加"自然排序"的请求,许多解决方案围绕剥离数据的特定部分并将其投射到ORDER BY查询的一部分,例如

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned) 
Run Code Online (Sandbox Code Playgroud)

这种解决方案几乎可以用于上面的最终幻想示例,但不是特别灵活,不太可能干净地扩展到数据集,包括"Warhammer 40,000"和"James Bond 007"我害怕.


Fil*_*lmJ 8

所以,虽然我知道你已经找到了一个令人满意的答案,但我一直在努力解决这个问题,我们之前已经确定它在SQL中不能很好地完成,我们将不得不在JSON上使用javascript阵列.

这是我使用SQL解决它的方法.希望这对其他人有帮助:

我有以下数据:

Scene 1
Scene 1A
Scene 1B
Scene 2A
Scene 3
...
Scene 101
Scene XXA1
Scene XXA2

我实际上并没有"投"一些东西,虽然我认为这也可能有效.

我首先替换了数据中不变的部分,在本例中是"场景",然后做了一个LPAD来排序.这似乎允许alpha字符串正确排序以及编号的字符串.

我的ORDER BY条款如下:

ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')
Run Code Online (Sandbox Code Playgroud)

显然这对原始问题没有帮助,但原始问题并不是那么统一 - 但我想这可能适用于许多其他相关问题,所以把它放在那里.


Jee*_*Bee 5

  1. 在表格中添加排序键(排名). ORDER BY rank

  2. 使用"发布日期"列. ORDER BY release_date

  3. 从SQL中提取数据时,让对象进行排序,例如,如果提取到Set中,使其成为TreeSet,并使您的数据模型实现Comparable并在此处制定自然排序算法(如果您使用插入排序就足够了)一个没有集合的语言)因为在创建模型并将其插入集合时,您将逐个从SQL读取行


小智 5

关于Richard Toth的最佳回复/sf/answers/858054221/

注意包含2字节(或更多)字符和数字的UTF8编码字符串,例如

12 ???
Run Code Online (Sandbox Code Playgroud)

LENGTH()udf_NaturalSortFormat函数中使用MySQL 将返回字符串的字节长度并且不正确,而使用CHAR_LENGTH()将返回正确的字符长度.

在我的情况下,使用LENGTH()导致的查询永远不会完成并导致MySQL的100%CPU使用率

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;
Run Code Online (Sandbox Code Playgroud)

ps我会把这个作为对原版的评论添加但我没有足够的声誉(还)


小智 5

订购:
0
1
2
10
23
101
205
1000
a
aac
b
casdsadsa
css

使用此查询:

选择
    列名
从
    表名
订购依据
    列名 REGEXP '^\d*[^\da-z&\.\' \-\"\!\@\#\$\%\^\*\(\)\;\:\\,\?\/ \~\`\|\_\-]' 降序,
    列名 + 0,
    列名;