Python 中重复的列范围乘法和除法

Question

Python 中重复的列范围乘法和除法

我正在尝试分组进行一些列范围的乘法和除法，并将它们连接在一起。但我想大规模地做到这一点 - 目前它有点重复 - 具有 A 字符串的列除以具有 D 字符串的列并乘以 WC 列，然后在组列 B 和 C 上向 D 列和 WC 列重复类似的过程。最后，我将它们合并到同一个数据框中。我怎样才能使这个过程更有效率？

输入：

df = pd.DataFrame({"cid" : {0 : "cd1", 1 : "cd2", 2 : "cd3"},
                   "A1970" : {0 : 3.2, 1 : 3.5, 2 : .4},
                   "A1980" : {0 : 3.1, 1 : 3.6, 2 : .5},
                   "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
                   "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
                   "C1970" : {0 : 3.2, 1 : 3.3, 2 : .3},
                   "C1980" : {0 : 3.3, 1 : 3.4, 2 : .3},
                   "D1970" : {0 : 2.4, 1 : 1.3, 2 : .7},
                   "D1980" : {0 : 3.2, 1 : 1.3, 2 : .2},
                   "WC" : {0 :0.5, 1 : 0.3, 2 : .1}
                  }).set_index(['cid'])

#      A1970  A1980  B1970  B1980  C1970  C1980  D1970  D1980   WC
# cid                                                             
# cd1    3.2    3.1    2.5    3.2    3.2    3.3    2.4    3.2  0.5
# cd2    3.5    3.6    1.2    1.3    3.3    3.4    1.3    1.3  0.3
# cd3    0.4    0.5    0.7    0.1    0.3    0.3    0.7    0.2  0.1

Run Code Online (Sandbox Code Playgroud)

加工：

df_a = (df.filter(regex='A')
        .div(df.filter(regex='D').values)
        .multiply(df["WC"], axis="index")
        .add_suffix("_rt"))

#      A1970_rt  A1980_rt
# cid                    
# cd1  0.666667  0.484375
# cd2  0.807692  0.830769
# cd3  0.057143  0.250000

df_b = (df.filter(regex='B')
        .div(df.filter(regex='D').values)
        .multiply(df["WC"], axis="index")
        .add_suffix("_rt"))

df_c = (df.filter(regex='C')
        .div(df.filter(regex='D').values)
        .multiply(df["WC"], axis="index")
        .add_suffix("_rt"))

Run Code Online (Sandbox Code Playgroud)

Answer 1

jez*_*ael 6

解决方案和后MultiIndex几年可能匹配的列：A,B,CD

df1 = df.set_index('WC', append=True)
df1.columns = pd.MultiIndex.from_frame(df1.columns.str.extract('(\D+)(\d+)', expand=True))

wc = df1.index.get_level_values('WC')
df1 = df1.loc[:, ['A','B','C']].div(df1.xs('D', axis=1)).mul(wc, axis='index')
df1.columns = df1.columns.map('_'.join)
df1 = df1.add_suffix("_rt").reset_index()
print (df1)
   cid   WC  A_1970_rt  A_1980_rt  B_1970_rt  B_1980_rt  C_1970_rt  C_1980_rt
0  cd1  0.5   0.666667   0.484375   0.520833       0.50   0.666667   0.515625
1  cd2  0.3   0.807692   0.830769   0.276923       0.30   0.761538   0.784615
2  cd3  0.1   0.057143   0.250000   0.100000       0.05   0.042857   0.150000

Run Code Online (Sandbox Code Playgroud)

Answer 2

moz*_*way 5

使用底层 numpy 数据：

创建一个新的数据框：

regex = r'^[ABC]'

N = 3 # number of A/B/C columns
# or if needed to calculate it programmatically
# N = df.filter(regex=regex).shape[1] // df.filter(regex='^D').shape[1]

(df.filter(regex=regex)   # get A/B/C columns
   .mul(df['WC'], axis=0) # multiply by WC
   # divide by D (tiled to match the number of A/B/C)
   .div(np.tile(df.filter(regex='D').values, (1,N)))
   .add_suffix('_rt')     # rename columns
)

Run Code Online (Sandbox Code Playgroud)

输出：

     A1970_rt  A1980_rt  B1970_rt  B1980_rt  C1970_rt  C1980_rt
cid                                                            
cd1  0.666667  0.484375  0.520833      0.50  0.666667  0.515625
cd2  0.807692  0.830769  0.276923      0.30  0.761538  0.784615
cd3  0.057143  0.250000  0.100000      0.05  0.042857  0.150000

Run Code Online (Sandbox Code Playgroud)

就地更新原始数据框

     A1970_rt  A1980_rt  B1970_rt  B1980_rt  C1970_rt  C1980_rt
cid                                                            
cd1  0.666667  0.484375  0.520833      0.50  0.666667  0.515625
cd2  0.807692  0.830769  0.276923      0.30  0.761538  0.784615
cd3  0.057143  0.250000  0.100000      0.05  0.042857  0.150000

Run Code Online (Sandbox Code Playgroud)

输出：

        A1970     A1980     B1970  B1980     C1970     C1980  D1970  D1980   WC
cid                                                                            
cd1  0.666667  0.484375  0.520833   0.50  0.666667  0.515625    2.4    3.2  0.5
cd2  0.807692  0.830769  0.276923   0.30  0.761538  0.784615    1.3    1.3  0.3
cd3  0.057143  0.250000  0.100000   0.05  0.042857  0.150000    0.7    0.2  0.1

Run Code Online (Sandbox Code Playgroud)

归档时间：	3 年，11 月前
查看次数：	116 次
最近记录：	3 年，8 月前