bph*_*phi 1 python sql sql-server python-2.7 pandas
我有一个表,其中包含Date,Identifier和Price等列
| Identifier | Date | Price |
|------------|----------|-------|
| 693477AA|1990/10/31| 100|
| 353477ZB|1991/08/31| 101|
| 123457ZB|1992/08/31| 105|
Run Code Online (Sandbox Code Playgroud)
我正在使用pandas read_sql函数从SQL Server数据库中获取数据.使用SQL或pandas DataFrame功能我需要将数据转换为以下pandas DataFrame格式.
693477AA 353477ZB 123457ZB
Date
1988-1-1 NaN NaN 99.41
1988-1-2 100.54 NaN 98.11
1988-1-3 99.45 NaN NaN
Run Code Online (Sandbox Code Playgroud)
因此,表中的每个DISTINCT日期都有一个(可能是空的)价格条目.对于满足条件的标识符集.
现在我让它使用for循环,
data = []
identifiers = "SELECT DISTINCT Identifier FROM TABLE WHERE [Condition]"
for id in identifiers:
data.append("SELECT Date, Price FROM TABLE WHERE Identifier=[id] ORDER BY DATE")
pandas.concat(data, axis=1)
Run Code Online (Sandbox Code Playgroud)
然而,这仅适用于非常严格的[条件],因为该表非常大(> 3M行).
如何实现SQL,DataFrame操作或两者的组合以实现所需的格式?
谢谢.
我们可以使用pivot()函数:
In [144]: df.pivot(index='Date', columns='Identifier', values='Price').rename_axis(None, 1)
Out[144]:
123457ZB 353477ZB 693477AA
Date
1990/10/31 NaN NaN 100.0
1991/08/31 NaN 101.0 NaN
1992/08/31 105.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
In [149]: df.set_index(['Date','Identifier'])['Price'].unstack('Identifier')
Out[149]:
Identifier 123457ZB 353477ZB 693477AA
Date
1990/10/31 NaN NaN 100.0
1991/08/31 NaN 101.0 NaN
1992/08/31 105.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
或者交叉表():
In [154]: pd.crosstab(index=df['Date'], columns=df['Identifier'],
values=df['Price'], aggfunc='first') \
.rename_axis(None, 1)
Out[154]:
123457ZB 353477ZB 693477AA
Date
1990/10/31 NaN NaN 100.0
1991/08/31 NaN 101.0 NaN
1992/08/31 105.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
In [156]: df.pivot_table(index='Date', columns='Identifier', values='Price', fill_value=0).rename_axis(None, 1)
Out[156]:
123457ZB 353477ZB 693477AA
Date
1990/10/31 0 0 100
1991/08/31 0 101 0
1992/08/31 105 0 0
Run Code Online (Sandbox Code Playgroud)
PS如果你喜欢在SQL Server端"转移"数据 - 请检查这个问题