Cha*_*lha 5 python transpose pandas
有很多类似标题的问题,但我无法解决我的数据集遇到的问题。
数据集:
ID Country Type Region Gender IA01_Raw IA01_Class1 IA01_Class2 IA02_Raw IA02_Class1 IA02_Class2 QA_Include QA_Comments
SC1 France A Europe Male 4 8 1 J 4 1 yes N/A
SC2 France A Europe Female 2 7 2 Q 6 4 yes N/A
SC3 France B Europe Male 3 7 2 K 8 2 yes N/A
SC4 France A Europe Male 4 8 2 A 2 1 yes N/A
SC5 France B Europe Male 1 7 1 F 1 3 yes N/A
ID6 France A Europe Male 2 8 1 R 3 7 yes N/A
ID7 France B Europe Male 2 8 1 Q 4 6 yes N/A
UC8 France B Europe Male 4 8 2 P 4 2 yes N/A
Run Code Online (Sandbox Code Playgroud)
所需输出:
ID Country Type Region Gender IA Raw Class1 Class2 QA_Include QA_Comments
SC1 France A Europe Male 01 K 8 1 yes N/A
SC1 France A Europe Male 01 L 8 1 yes N/A
SC1 France A Europe Male 01 P 8 1 yes N/A
SC1 France A Europe Male 02 Q 8 1 yes N/A
SC1 France A Europe Male 02 R 8 1 yes N/A
SC1 France A Europe Male 02 T 8 1 yes N/A
SC1 France A Europe Male 03 G 8 1 yes N/A
SC1 France A Europe Male 03 R 8 1 yes N/A
SC1 France A Europe Male 03 G 8 1 yes N/A
SC1 France A Europe Male 04 K 8 1 yes N/A
SC1 France A Europe Male 04 A 8 1 yes N/A
SC1 France A Europe Male 04 P 8 1 yes N/A
SC1 France A Europe Male 05 R 8 1 yes N/A
....
Run Code Online (Sandbox Code Playgroud)
在数据集中,我的列名称为IA[X]_NAME,其中X = 1..9和NAME = Raw, Class1和Class2。
我想做的就是转置这些列,使其看起来像所需输出中所示的表,即IA将显示X值,就像这个raw和类将显示它们的透视值一样。
因此,为了实现它,我将列切片为:
idVars = list(excel_df_final.columns[0:40]) + list(excel_df_final.columns[472:527]) #These contain columns like ID, Country, Type etc
valueVars = excel_df_final.columns[41:472].tolist() #All the IA_ columns
Run Code Online (Sandbox Code Playgroud)
我不知道这一步是否必要,但这给了我完美的列切片,但当我将其放入时,melt它无法正常工作。我几乎尝试了其他问题中可用的所有方法。
pd.melt(excel_df_final, id_vars=idVars,value_vars=valueVars)
Run Code Online (Sandbox Code Playgroud)
我也尝试过这个:
excel_df_final.set_index(idVars)[41:472].unstack()
Run Code Online (Sandbox Code Playgroud)
但没有用,这里是从宽到长的实现,它也不起作用:
pd.wide_to_long(excel_df_final, stubnames = ['IA', 'Raw', 'Class1', 'Class2'], i=idVars, j=valueVars)
Run Code Online (Sandbox Code Playgroud)
我从宽到长得到的错误是:
ValueError:操作数无法与形状一起广播 (95,) (431,)
由于我的数据集实际上有 526 列,因此我将它们分为两个列表,其中一个包含 95 个列名称,其余i431 个是我需要在行中显示的列名称,如示例中所示数据集。
这将帮助您开始。本质是利用set_index,列转换为MultiIndex,然后stack。可能存在更好的解决方案,但我会这样做,因为这是输出的一个简单步骤。
# Set the index with columns that we don't want to "transpose"
df2 = df.set_index([
'ID', 'Country', 'Type', 'Region', 'Gender', 'QA_Include', 'QA_Comments'])
# Convert headers to MultiIndex -- this is so we can melt IA values
df2.columns = pd.MultiIndex.from_tuples(map(tuple, df2.columns.str.split('_')))
# Call stack to replicate data, then reset the index
out = df2.stack(level=0).reset_index().rename({'level_7': 'IA'}, axis=1)
Run Code Online (Sandbox Code Playgroud)
out
ID Country Type Region Gender QA_Include QA_Comments IA Class1 Class2 Raw
0 SC1 France A Europe Male yes NaN IA01 8 1 4
1 SC1 France A Europe Male yes NaN IA02 4 1 J
2 SC2 France A Europe Female yes NaN IA01 7 2 2
3 SC2 France A Europe Female yes NaN IA02 6 4 Q
4 SC3 France B Europe Male yes NaN IA01 7 2 3
5 SC3 France B Europe Male yes NaN IA02 8 2 K
6 SC4 France A Europe Male yes NaN IA01 8 2 4
7 SC4 France A Europe Male yes NaN IA02 2 1 A
8 SC5 France B Europe Male yes NaN IA01 7 1 1
9 SC5 France B Europe Male yes NaN IA02 1 3 F
10 ID6 France A Europe Male yes NaN IA01 8 1 2
11 ID6 France A Europe Male yes NaN IA02 3 7 R
12 ID7 France B Europe Male yes NaN IA01 8 1 2
13 ID7 France B Europe Male yes NaN IA02 4 6 Q
14 UC8 France B Europe Male yes NaN IA01 8 2 4
15 UC8 France B Europe Male yes NaN IA02 4 2 P
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
135 次 |
| 最近记录: |