熊猫dataframe.loc方法太慢

Let*_*t4U 6 python pandas pandas-groupby

我有一个带有+ 100K行的数据框,如下所示:

   user  document
0  john      book
1  jane   article
2  jane      book
3  jane      book
4   jim   article
5  john      book
6   jim  blogpost
7  jane  blogpost
8  jane  blogpost
9  jane  blogpost
Run Code Online (Sandbox Code Playgroud)

我需要这样的数据框:

      blogpost  article  book
john         1        3     0
jane         0        0     1
jim          4        0     2
Run Code Online (Sandbox Code Playgroud)

也就是说,每个user, document组合都需要下载数量。

我正在做.groupby(['user', 'document']),然后df.loc用来设置下载数量:

df = pd.DataFrame(index=users, columns=documents)
df.fillna(0, inplace=True)

grouped = records.groupby(['user', 'document'])
for elem in grouped:
    user, document = elem[0]
    downloads = len(elem[1])
    df.loc[user, document] = downloads
Run Code Online (Sandbox Code Playgroud)

但是,df.loc以这种方式使用时会非常慢...我注释掉了df.loc..一行,发现循环快速完成,因此几乎可以肯定的是df.loc访问速度很慢。

我如何更快地获得此结果?

最低工作示例:

records = pd.DataFrame([
    ('john', 'book'), 
    ('jane', 'article'),
    ('jane','book'),
    ('jane','book'),
    ('jim', 'article'), 
    ('john', 'book'),
    ('jim', 'blogpost'), 
    ('jane', 'blogpost'),
    ('jane', 'blogpost'),
    ('jane', 'blogpost')
    ], columns=['user', 'document'])
print(records)

users = list(set(records['user']))
users.sort()
documents = list(set(records['document']))
documents.sort()

print(users)
print(documents)

df = pd.DataFrame(index=users, columns=documents)
df.fillna(0, inplace=True)
print(df)

grouped = records.groupby(['user', 'document'])
for elem in grouped:
    user, document = elem[0]
    downloads = len(elem[1])
    df.loc[user, document] = downloads
Run Code Online (Sandbox Code Playgroud)

WeN*_*Ben 5

有很多的方式实现这一点没有循环,pivotpivot_tablecrosstabgroupby count

pd.crosstab(df.user,df.document)
Out[1283]: 
document  article  blogpost  book
user                             
jane            1         3     2
jim             1         1     0
john            0         0     2
Run Code Online (Sandbox Code Playgroud)