用排序索引替换熊猫列

vik*_*kky 7 python pandas pandas-groupby

我有一个示例 DF,试图用升序排序索引替换列值列表:

DF:

df = pd.DataFrame(np.random.randint(0,10,size=(7,3)),columns=["a","b","c"])
df["d1"]=["Apple","Mango","Apple","Mango","Mango","Mango","Apple"]
df["d2"]=["Orange","lemon","lemon","Orange","lemon","Orange","lemon"]
df["date"] = ["2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-02-01","2002-02-01","2002-02-01"]
df["date"] = pd.to_datetime(df["date"])

    a   b   c    d1      d2       date
0   2   7   9   Apple   Orange  2002-01-01
1   6   0   9   Mango   lemon   2002-01-01
2   8   0   0   Apple   lemon   2002-01-01
3   4   4   4   Mango   Orange  2002-01-01
4   5   0   8   Mango   lemon   2002-02-01
5   6   1   6   Mango   Orange  2002-02-01
6   7   2   7   Apple   lemon   2002-02-01
Run Code Online (Sandbox Code Playgroud)

第1步:

Group the DF by "date" column, sample group on "2002-01-01"


        a   b   c    d1      d2       date
    0   2   7   9   Apple   Orange  2002-01-01
    1   6   0   9   Mango   lemon   2002-01-01
    2   8   0   0   Apple   lemon   2002-01-01
    3   4   4   4   Mango   Orange  2002-01-01
Run Code Online (Sandbox Code Playgroud)

第2步:

在该组中,将列的值替换为["d1","d2"]基于 的排序平均值的索引(而不是 DF 索引)c

例如在上面的组中 mean(c, d1="Apple") = [9+0]/2 => 4.5mean(c, d1="Mango") = [9+4]/2 => 6.5所以ascending sorted indexApple:0Mango:1

所以列的值d1将被替换如下:

            a   b   c   d1       d2       date
        0   2   7   9   0      Orange   2002-01-01
        1   6   0   9   1      lemon    2002-01-01
        2   8   0   0   0      lemon    2002-01-01
        3   4   4   4   1      Orange   2002-01-01
Run Code Online (Sandbox Code Playgroud)

将此应用于整个df. 我有一种遍历组和每一行的蛮力方法,任何有关更pandas基础解决方案的建议都将有助于提高效率。

And*_* L. 1

您可以使用pivot_tablegroupby.rank来创建排名。之后使用map将值分配回来

df1 = df.pivot_table('c', ['date','d1']).groupby(level=0).rank(method='dense')-1
df['d1'] = df[['date','d1']].agg(tuple, axis=1).map(df1.c).astype('int')

Out[255]:
   a  b  c  d1      d2        date
0  2  7  9   0  Orange  2002-01-01
1  6  0  9   1   lemon  2002-01-01
2  8  0  0   0   lemon  2002-01-01
3  4  4  4   1  Orange  2002-01-01
4  5  0  8   0   lemon  2002-02-01
5  6  1  6   0  Orange  2002-02-01
6  7  2  7   0   lemon  2002-02-01
Run Code Online (Sandbox Code Playgroud)

注意:组的2002-02-01平均值相同7Mango因此Apple排名为全部0