相关疑难解决方法(0)

按组规范化DataFrame

假设我有一些数据生成如下:

N = 20
m = 3
data = np.random.normal(size=(N,m)) + np.random.normal(size=(N,m))**3
Run Code Online (Sandbox Code Playgroud)

然后我创建一些分类变量:

indx = np.random.randint(0,3,size=N).astype(np.int32)
Run Code Online (Sandbox Code Playgroud)

并生成一个DataFrame:

import pandas as pd
df = pd.DataFrame(np.hstack((data, indx[:,None])), 
             columns=['a%s' % k for k in range(m)] + [ 'indx'])
Run Code Online (Sandbox Code Playgroud)

我可以得到每组的平均值:

df.groubpy('indx').mean()
Run Code Online (Sandbox Code Playgroud)

我不确定如何做的是然后减去原始数据中每列的每个组的平均值,以便每个列中的数据通过组内的平均值进行标准化.任何建议,将不胜感激.

python pandas

15
推荐指数
2
解决办法
1万
查看次数

使用相同列,不同索引级别对齐DataFrame

我有两个pandas DataFrames - weightLand Use列上有一个简单的索引.concentration有一个MultiIndex on Land UseParameter.

import pandas
from io import StringIO

conc_string = StringIO("""\
Land Use,Parameter,1E,1N,1S,2
Airfield,BOD5 (mg/l),0.418,0.118,0.226,1.063
Airfield,Ortho P (mg/l),0.002,0.001,0.001,0.002
Airfield,TSS (mg/l),1.773,11.47,0.862,0.183
Airfield,Zn (mg/l),0.001,0.001,4.95E-05,0.001
"Commercial",BOD5 (mg/l),0.036,0.0419,,0.315
"Commercial",Cu (mg/l),4.37E-05,7.34E-05,,0.00039
"Commercial",O&G (mg/l),0.0385,0.127,,0.263
Open Space,TSS (mg/l),0.371,3.01,1.209,0.147
Open Space,Zn (mg/l),0.0127,0.0069,0.0132,0.007
"Parking Lot",BOD5 (mg/l),0.924,0.0668,2.603,3.19
"Parking Lot",O&G (mg/l),1.02,0.149,1.347,1.88
"Rooftops",BOD5 (mg/l),0.135,1.00,0.0562,0.310""")

weight_string = StringIO("""\
Land Use,1E,1N,1S,2
Airfield,0.511,0.0227,0.0616,0.394
Commercial,0.0005,0.1704,0,0.1065
Open Space,0.0008,0.005,0.0002,0.0004
"Parking Lot",0.33,0.514,0.252,0.171
Rooftops,0.081,0.028,8.50E-05,0.003""")

concentration = pandas.read_csv(conc_string, index_col=[0,1])
weight = pandas.read_csv(weight_string, index_col=0)
Run Code Online (Sandbox Code Playgroud)

在这种情况下,柱(1E,1N,1S和2)是排水池.

我想做的是将所有浓度除以Parameter盆地的重量(柱名)和 …

python pandas

4
推荐指数
1
解决办法
1505
查看次数

使用多索引列对多个列求和

我有一个从数据透视表创建的数据框,看起来类似于:

import pandas as pd
d = {('company1', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0, 'April- 2014': 499.0, 'August- 2012': 431.0, 'August- 2013': 496.0, 'August- 2014': 221.0},
('company1', 'False Positive'): {'April- 2012': 0.0, 'April- 2013': 544.0, 'April- 2014': 50.0, 'August- 2012': 0.0, 'August- 2013': 0.0, 'August- 2014': 426.0},
('company1', 'True Positive'): {'April- 2012': 0.0, 'April- 2013': 140.0, 'April- 2014': 24.0, 'August- 2012': 0.0, 'August- 2013': 0.0,'August- 2014': 77.0},
('company2', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0, 'April- 2014': …
Run Code Online (Sandbox Code Playgroud)

python pandas

1
推荐指数
1
解决办法
783
查看次数

标签 统计

pandas ×3

python ×3