选择每列值的30%

lui*_*uis 0 sql sql-server random

假设我们有一个带有'A'列的表,其值从0到N.我想为每列具有相同值的30%选择"A"列.

So if I have this:
A|  B
-------
0 hello
0 test
0 hi
1 blah1
1 blah2
1 blah3
1 blah4
1 blah5
1 blah6

Result:
A|  B
-------
0 hello
1 blah1
1 blah4
Run Code Online (Sandbox Code Playgroud)

它可能是blah1或任何其他不是blah4的blah4,而blah4可能是任何其他不是blah1的blah,基本上它可能是随机的或跳过的.

顺便说一句,实际的表是巨大的,谈论太字节,所以考虑性能.

KM.*_*KM. 6

尝试这样的事情:

DECLARE @YourTable table (A int, b varchar(10))
INSERT @YourTable VALUES (0, 'hello') --OP's data
INSERT @YourTable VALUES (0, 'test')
INSERT @YourTable VALUES (0, 'hi')
INSERT @YourTable VALUES (1, 'blah1')
INSERT @YourTable VALUES (1, 'blah2')
INSERT @YourTable VALUES (1, 'blah3')
INSERT @YourTable VALUES (1, 'blah4')
INSERT @YourTable VALUES (1, 'blah5')
INSERT @YourTable VALUES (1, 'blah6')

;WITH NumberedRows AS
(   SELECT 
        A,B,ROW_NUMBER() OVER (PARTITION BY A ORDER BY A,B) AS RowNumber
        FROM @YourTable
)
, GroupCounts AS
(   SELECT
        A,MAX(RowNumber) AS MaxA
        FROM NumberedRows
        GROUP BY A
)
SELECT
    n.a,n.b
    FROM NumberedRows           n
        INNER JOIN GroupCounts  c ON n.A=c.A
    WHERE n.RowNUmber<=(c.MaxA+1)*0.3
Run Code Online (Sandbox Code Playgroud)

OUTPUT:

a           b
----------- ----------
0           hello
1           blah1
1           blah2

(3 row(s) affected)
Run Code Online (Sandbox Code Playgroud)

编辑基于Andriy M的评论中的好主意

;WITH NumberedRows AS
(   SELECT 
        A,B,ROW_NUMBER() OVER (PARTITION BY A ORDER BY A,B) AS RowNumber
            ,COUNT(*) OVER (PARTITION BY A) AS TotalOf
        FROM @YourTable
)
SELECT
    n.a,n.b
    FROM NumberedRows            n
    WHERE n.RowNumber<=(n.TotalOf+1)*0.3
    ORDER BY A
Run Code Online (Sandbox Code Playgroud)

OUTPUT:

a           b
----------- ----------
0           hello
1           blah1
1           blah2

(3 row(s) affected)
Run Code Online (Sandbox Code Playgroud)

编辑这里是"随机"行,使用Andriy M的想法:

DECLARE @YourTable table (A int, b varchar(10))
INSERT @YourTable VALUES (0, 'hello') --OP's data
INSERT @YourTable VALUES (0, 'test')
INSERT @YourTable VALUES (0, 'hi')
INSERT @YourTable VALUES (1, 'blah1')
INSERT @YourTable VALUES (1, 'blah2')
INSERT @YourTable VALUES (1, 'blah3')
INSERT @YourTable VALUES (1, 'blah4')
INSERT @YourTable VALUES (1, 'blah5')
INSERT @YourTable VALUES (1, 'blah6')

;WITH NumberedRows AS
(   SELECT 
        A,B,ROW_NUMBER() OVER (PARTITION BY A ORDER BY newid()) AS RowNumber
        FROM @YourTable
)
, GroupCounts AS (SELECT A,COUNT(A) AS MaxA FROM NumberedRows GROUP BY A)
SELECT
    n.A,n.B
    FROM NumberedRows           n
        INNER JOIN GroupCounts  c ON n.A=c.A
    WHERE n.RowNUmber<=(c.MaxA+1)*0.3
    ORDER BY n.A
Run Code Online (Sandbox Code Playgroud)

OUTPUT:

a           b
----------- ----------
0           hi
1           blah3
1           blah6

(3 row(s) affected)
Run Code Online (Sandbox Code Playgroud)