将 pandas 数据帧列中的值映射到数字序列

Question

将 pandas 数据帧列中的值映射到数字序列

我有一个像这样的数据框（请丢弃第一列）：

    user_id created_at  count
1   12136   2017-02-19  4
2   12136   2017-02-16  4
3   12136   2017-02-17  2
4   72349   2017-02-17  8
5   72349   2017-02-19  2
7   72672   2017-02-20  3
8   72672   2017-02-19  2

Run Code Online (Sandbox Code Playgroud)

所以，我想将此值映射到从 0 开始的整数值：

12136 -> 0
72349 -> 1 
72672 -> 2

Run Code Online (Sandbox Code Playgroud)

同样，对于created_at列（从最小值开始）

2017-02-16 -> 0
2017-02-17 -> 1
2017-02-19 -> 2
2017-02-20 -> 3

Run Code Online (Sandbox Code Playgroud)

最后我应该有这个数据框（请注意，为没有用户活动的日期添加 0 值）：

user_id created_at  count
0       0           4
0       1           2
0       2           4
0       3           0
1       0           0
1       1           8
1       2           2
1       3           0
2       0           0
2       1           0
2       2           2
2       3           3

Run Code Online (Sandbox Code Playgroud)

我还需要获取这些列表：

label1 = [12136, 72349, 72672]
label2 = ['2017-02-16', '2017-02-17', '2017-02-19', '2017-02-20']

Run Code Online (Sandbox Code Playgroud)

我想知道是否有任何方法可以帮助我有效地执行此操作？

Answer 1

cs9*_*s95 4

首先，获取您的清单。

list1 = df.user_id.unique()
print(list1)
array([12136, 72349, 72672])

list2 = df.created_at.unique()
print(list2)
array(['2017-02-19', '2017-02-16', '2017-02-17', '2017-02-20'], dtype=object)

Run Code Online (Sandbox Code Playgroud)

user_id将和列转换created_at为cat代码。

df['user_id'] = df['user_id'].astype('category').cat.codes
df['created_at'] = df['created_at'].astype('category').cat.codes

print(df)
   user_id  created_at  count
1        0           2      4
2        0           0      4
3        0           1      2
4        1           1      8
5        1           2      2
7        2           3      3
8        2           2      2

Run Code Online (Sandbox Code Playgroud)

使用 agroupby和 areindex运算。

df = df.set_index('created_at').groupby('user_id', as_index=False)\
       .apply(lambda x: x.reindex(df.created_at.unique()))\
       .sort_index().reset_index([1])

Run Code Online (Sandbox Code Playgroud)

清理你的专栏。

df.user_id = df.groupby(level=0).user_id.transform(lambda x: x.ffill().bfill())
df['count'] = df['count'].fillna(0)

print(df.astype(int))

   created_at  user_id  count
0           0        0      4
0           1        0      2
0           2        0      4
0           3        0      0
1           0        1      0
1           1        1      8
1           2        1      2
1           3        1      0
2           0        2      0
2           1        2      0
2           2        2      2
2           3        2      3

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，8 月前
查看次数：	3740 次
最近记录：	8 年，8 月前