我是python的新手,有人知道什么是一个好方法?我可以编写脚本,但使用包可能会更快.
我有这个.csv文件(gigabytes large):
name, value, time
A, 1, 10
B, 2, 10
C, 3, 10
C, 3, 10 (should ignore duplicates, or non complete (A,B,C) entries
A, 4, 12 (should be sorted by time, this entry should be at the end, after time==11)
B, 5, 12
C, 6, 12
B, 7, 11 (order of A,B,C might be different)
C, 8, 11
A, 9, 11
Run Code Online (Sandbox Code Playgroud)
将其转换为新的.csv文件,其中包含:
time, A, B, C
10, 1, 2, 3
11, 9, 7, 8
12, 4, 5, 6
Run Code Online (Sandbox Code Playgroud)
我认为需要drop_duplicates有pivot:
df = df.drop_duplicates().pivot('time','name','value')
print (df)
name A B C
time
10 1 2 3
11 9 7 8
12 4 5 6
Run Code Online (Sandbox Code Playgroud)