ccs*_*csv 6 python iteration pandas
假设我想用循环中的值创建和填充空数据框.
import pandas as pd
import numpy as np
years = [2013, 2014, 2015]
dn=pd.DataFrame()
for year in years:
df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
year: [1, 1, 1 ],
}).set_index('Incidents')
print (df1)
dn=dn.append(df1, ignore_index = False)
Run Code Online (Sandbox Code Playgroud)
即使忽略index为false,append也会给出一个对角矩阵:
>>> dn
2013 2014 2015
Incidents
C 1 NaN NaN
B 1 NaN NaN
A 1 NaN NaN
C NaN 1 NaN
B NaN 1 NaN
A NaN 1 NaN
C NaN NaN 1
B NaN NaN 1
A NaN NaN 1
[9 rows x 3 columns]
Run Code Online (Sandbox Code Playgroud)
它应该如下所示:
>>> dn
2013 2014 2015
Incidents
C 1 1 1
B 1 1 1
A 1 1 1
[3 rows x 3 columns]
Run Code Online (Sandbox Code Playgroud)
有没有更好的方法呢?有没有办法解决附加问题?
我有熊猫版'0.13.1-557-g300610e'
unu*_*tbu 11
import pandas as pd
years = [2013, 2014, 2015]
dn = []
for year in years:
df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
year: [1, 1, 1 ],
}).set_index('Incidents')
dn.append(df1)
dn = pd.concat(dn, axis=1)
print(dn)
Run Code Online (Sandbox Code Playgroud)
产量
2013 2014 2015
Incidents
C 1 1 1
B 1 1 1
A 1 1 1
Run Code Online (Sandbox Code Playgroud)
请注意,在循环外调用pd.concat 一次比pd.concat循环的每次迭代调用更节省时间.
每次调用pd.concat新空间时都会为新的DataFrame分配,并且每个组件DataFrame的所有数据都会复制到新的DataFrame中.如果你pd.concat从for循环中调用那么你最终会按照n**2副本的顺序进行操作,其中n是年份.
如果您在列表中累积部分DataFrame并在列表pd.concat外调用一次,那么Pandas只需要执行n副本dn.
| 归档时间: |
|
| 查看次数: |
11178 次 |
| 最近记录: |