lud*_*fet 5 python numpy pandas
我正在寻找创建一个数据框架,该数据框架是两个不相关的系列的组合。
如果我们采用两个数据框:
A = ['a','b','c']
B = [1,2,3,4]
dfA = pd.DataFrame(A)
dfB = pd.DataFrame(B)
Run Code Online (Sandbox Code Playgroud)
我正在寻找此输出:
A B
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
8 c 1
9 c 2
10 c 3
11 c 4
Run Code Online (Sandbox Code Playgroud)
一种方法可能是使列表直接循环并创建DataFrame,但必须有更好的方法。我确定我在熊猫文件中遗漏了一些东西。
result = []
for i in A:
for j in B:
result.append([i,j])
result_DF = pd.DataFrame(result,columns=['A','B'])
Run Code Online (Sandbox Code Playgroud)
最终,我正在考虑将月份和UUID结合起来,可以正常工作,但是计算需要花费很多时间,并且对索引的依赖过多。通用解决方案显然会更好:
from datetime import datetime
start = datetime(year=2016,month=1,day=1)
end = datetime(year=2016,month=4,day=1)
months = pd.DatetimeIndex(start=start,end=end,freq="MS")
benefit = pd.DataFrame(index=months)
A = [UUID('d48259a6-80b5-43ca-906c-8405ab40f9a8'),
UUID('873a65d7-582c-470e-88b6-0d02df078c04'),
UUID('624c32a6-9998-49f4-92b6-70e712355073'),
UUID('7207ab0c-3c7f-477e-b5bc-fbb8059c1dec')]
dfA = pd.DataFrame(A)
result = pd.DataFrame(columns=['A','month'])
for i in dfA.index:
newdf = pd.DataFrame(index=benefit.index)
newdf['A'] = dfA.iloc[i,0]
newdf['month'] = newdf.index
result = pd.concat([result,newdf])
result
Run Code Online (Sandbox Code Playgroud)
您可以使用np.meshgrid:
pd.DataFrame(np.array(np.meshgrid(dfA, dfB, )).T.reshape(-1, 2))
0 1
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
8 c 1
9 c 2
10 c 3
11 c 4
Run Code Online (Sandbox Code Playgroud)
分别对长度和的对象进行粗略~2000x加速:DataFrame300400
A = ['a', 'b', 'c'] * 100
B = [1, 2, 3, 4] * 100
dfA = pd.DataFrame(A)
dfB = pd.DataFrame(B)
Run Code Online (Sandbox Code Playgroud)
np.meshgrid:
%%timeit
pd.DataFrame(np.array(np.meshgrid(dfA, dfB, )).T.reshape(-1, 2))
100 loops, best of 3: 8.45 ms per loop
Run Code Online (Sandbox Code Playgroud)
对比cross:
%timeit cross(dfA, dfB)
1 loop, best of 3: 16.3 s per loop
Run Code Online (Sandbox Code Playgroud)
所以如果我正确理解你的例子,你可以:
A = ['a', 'b', 'c']
dfA = pd.DataFrame(A)
start = datetime(year=2016, month=1, day=1)
end = datetime(year=2016, month=4, day=1)
months = pd.DatetimeIndex(start=start, end=end, freq="MS")
dfB = pd.DataFrame(months.month)
pd.DataFrame(np.array(np.meshgrid(dfA, dfB, )).T.reshape(-1, 2))
Run Code Online (Sandbox Code Playgroud)
还可以得到:
0 1
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
8 c 1
9 c 2
10 c 3
11 c 4
Run Code Online (Sandbox Code Playgroud)