如何一次对数据中的所有列进行分类？（使所有值变为高、中、低）

Question

如何一次对数据中的所有列进行分类？（使所有值变为高、中、低）

CAS*_*_BS 5 python dataframe pandas categorical-data

我正在尝试将数据集中的所有值转换为分类值，我希望将所有数值分类为低、平均值或高，具体取决于它们的分位数值。

因此，如果该值低于系列的 25%，则会转换为“低”

我尝试使用分配然后应用我提供的函数：

def turn_into_categorical(row):
    quantile_level = [.25, .5, .75]
    for r in row:
        cut = refugees_T_F_V_P_full_data.r.quantile(quantile_level)
        if r >= cut[.75]:
            return "High"
        elif r >= cut[.25] and r < cut[0.75]:
            return "Average"
        else:
            return "Low"

refugees_T_F_V_P_full_data.apply(turn_into_categorical, axis = 1)

Run Code Online (Sandbox Code Playgroud)

但是，该代码不能很好地工作。我也通过 iterrows 尝试过，但我想知道是否有更快的方法？

这是我想要转换的数据，除年份和月份之外的所有数字都应根据其分位数值分为低、中、高。

    Year  Month  Central Equatoria  Eastern Equatoria  Gogrial  Jonglei
0   2014     10                6.0                1.0      0.0      3.0   
1   2014     11                4.0                3.0      0.0     12.0   
2   2014     12                3.0                5.0      0.0     11.0   
3   2015      1                7.0                2.0      0.0      4.0   
4   2015      2                5.0                5.0      0.0     10.0   
5   2015      3                7.0                5.0      0.0      8.0   
6   2015      4                4.0                1.0      0.0      6.0   
7   2015      5                5.0                0.0      0.0      7.0   
8   2015      6                4.0                1.0      0.0      6.0   
9   2015      7               15.0                2.0      0.0      9.0   
10  2015      8               10.0                7.0      0.0      9.0   
11  2015      9               12.0                0.0      0.0      8.0   
12  2015     10               12.0                0.0      0.0      5.0   
13  2015     11                8.0                5.0      0.0     10.0   
14  2015     12                5.0                7.0      0.0      3.0

Run Code Online (Sandbox Code Playgroud)

预期结果：（示例）

    Year  Month  Central Equatoria  Eastern Equatoria  Gogrial  Jonglei
0   2014     10                High             Medium      Low      Medium  
1   2014     11                Low              Medium      Low     high

Run Code Online (Sandbox Code Playgroud)

Answer 1

CAS*_*_BS 0

最终使用最古老的方式：

new_df = pd.DataFrame()
name_list = list(df)

for name in name_list:
    if name != 'Year' and name != 'Month':
        new_row = []
        quantiles = df[name].quantile([.25, .5, .75])
        row_list = df[name].tolist()
        for i, value in enumerate(row_list):
            if value < quantiles[.25]:
                new_row.append("Low")
            elif value < quantiles[.75] and value >= quantiles[.25]:
                new_row.append("Average")
            else:
                new_row.append("High")
        series = pd.Series(new_row)
        new_df[name] = series.values

new_df.head()

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，11 月前
查看次数：	2692 次
最近记录：	6 年，11 月前