如何根据百分比将列表分成 3 部分？

Question

如何根据百分比将列表分成 3 部分？

我有一个文件列表，我想把它分成 3 部分：训练、验证和测试。我试过这段代码，我不知道它是否正确。

files = glob.glob("/dataset/%s/*" % emotion)
training = files[:int(len(files)*0.8)] #get first 80% of file list
validation = files[-int(len(files)*0.1):] #get middle 10% of file list
testing = files[-int(len(files)*0.1):] #get last 10% of file list

Run Code Online (Sandbox Code Playgroud)

我不确定测试列表是重复的还是文件列表的最后 10% 是正确的。

Answer 1

zip*_*ipa 8

您可以利用 numpy 拆分：

train, validate, test = np.split(files, [int(len(files)*0.8), int(len(files)*0.9)])

Run Code Online (Sandbox Code Playgroud)

@zipa 我今天通过谷歌找到了你非常有用的答案。因此，即使三年后改进你的答案也可能（可能）帮助未来的读者。 (7认同)
@CharlieParker OP 要求 3 个部分，因此 `np.split` 中使用的值（两个索引）就是它的完成方式。除了三年前接受的答案之外，为什么您认为编辑现在会有帮助？ (2认同)

Answer 2

Cha*_*ker 6

与 zipa 的答案相同，但有一个独立的示例：

# splitting list of files into 3 train, val, test

import numpy as np

def split_two(lst, ratio=[0.5, 0.5]):
    assert(np.sum(ratio) == 1.0)  # makes sure the splits make sense
    train_ratio = ratio[0]
    # note this function needs only the "middle" index to split, the remaining is the rest of the split
    indices_for_splittin = [int(len(lst) * train_ratio)]
    train, test = np.split(lst, indices_for_splittin)
    return train, test

def split_three(lst, ratio=[0.8, 0.1, 0.1]):
    import numpy as np

    train_r, val_r, test_r = ratio
    assert(np.sum(ratio) == 1.0)  # makes sure the splits make sense
    # note we only need to give the first 2 indices to split, the last one it returns the rest of the list or empty
    indicies_for_splitting = [int(len(lst) * train_r), int(len(lst) * (train_r+val_r))]
    train, val, test = np.split(lst, indicies_for_splitting)
    return train, val, test

files = list(range(10))
train, test = split_two(files)
print(train, test)
train, val, test = split_three(files)
print(train, val, test)

Run Code Online (Sandbox Code Playgroud)

输出：

[0 1 2 3 4] [5 6 7 8 9]
[0 1 2 3 4 5 6 7] [8] [9]

Run Code Online (Sandbox Code Playgroud)

np.split 文档。

Answer 3

Fly*_*ler 5

是testing脚本重复的validation？是的，您以完全相同的方式创建它们，您正在提取最后 10% 用于验证和测试：

files = [1,2,3,4,5,6,7,8,9,10]
training = files[:int(len(files)*0.8)] #[1, 2, 3, 4, 5, 6, 7, 8]
validation = files[-int(len(files)*0.1):] #[10]
testing = files[-int(len(files)*0.1):] #[10]

Run Code Online (Sandbox Code Playgroud)

如果你想坚持你原来的方法，我建议你做这样的事情（但是 np 方法更优雅）：

files = [1,2,3,4,5,6,7,8,9,10]
training = files[:int(len(files)*0.8)] #[1, 2, 3, 4, 5, 6, 7, 8]
validation = files[int(len(files)*0.8):int(len(files)*0.9)] #[9]
testing = files[int(len(files)*0.9):] #[10]

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，7 月前
查看次数：	4871 次
最近记录：	4 年，8 月前