Mat*_* M. 5 python union set pandas rolling-computation
我有一个数据框,其中包含一列中的一组 id 和另一列中的日期:
import pandas as pd
df = pd.DataFrame([['2018-01-01', {1, 2, 3}],
['2018-01-02', {3}],
['2018-01-03', {3, 4, 5}],
['2018-01-04', {5, 6}]],
columns=['timestamp', 'ids'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace=True)
ids
timestamp
2018-01-01 {1, 2, 3}
2018-01-02 {3}
2018-01-03 {3, 4, 5}
2018-01-04 {5, 6}
Run Code Online (Sandbox Code Playgroud)
我正在寻找的是一个函数,它可以为我提供每天最后 x 天的 ID。所以,假设 x=3,我希望结果是:
ids
timestamp
2018-01-01 {1, 2, 3}
2018-01-02 {1, 2, 3}
2018-01-03 {1, 2, 3, 4, 5}
2018-01-04 {3, 4, 5, 6}
Run Code Online (Sandbox Code Playgroud)
我试过了
df.rolling(3).agg(set.union)
Run Code Online (Sandbox Code Playgroud)
但这会导致以下错误:
Traceback (most recent call last):
File "C:\Users\m.manhertz\Envs\demo-8EG6nosu\lib\site-packages\pandas\core\window.py", line 222, in _prep_values
values = _ensure_float64(values)
File "pandas\_libs\algos_common_helper.pxi", line 3182, in pandas._libs.algos.ensure_float64
File "pandas\_libs\algos_common_helper.pxi", line 3187, in pandas._libs.algos.ensure_float64
TypeError: float() argument must be a string or a number, not 'set'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\m.manhertz\Envs\demo-8EG6nosu\lib\site-packages\pandas\core\window.py", line 1561, in aggregate
return super(Rolling, self).aggregate(arg, *args, **kwargs)
File "C:\Users\m.manhertz\Envs\demo-8EG6nosu\lib\site-packages\pandas\core\window.py", line 321, in aggregate
return self.apply(arg, raw=False, args=args, kwargs=kwargs)
File "C:\Users\m.manhertz\Envs\demo-8EG6nosu\lib\site-packages\pandas\core\window.py", line 1580, in apply
func, raw=raw, args=args, kwargs=kwargs)
File "C:\Users\m.manhertz\Envs\demo-8EG6nosu\lib\site-packages\pandas\core\window.py", line 1003, in apply
center=False, raw=raw)
File "C:\Users\m.manhertz\Envs\demo-8EG6nosu\lib\site-packages\pandas\core\window.py", line 844, in _apply
values = self._prep_values(b.values)
File "C:\Users\m.manhertz\Envs\demo-8EG6nosu\lib\site-packages\pandas\core\window.py", line 225, in _prep_values
"".format(values.dtype))
TypeError: cannot handle this type -> object
Run Code Online (Sandbox Code Playgroud)
Pandas 的设计目的不是在对象内保存诸如list
, set
, 之类的可迭代对象。因此,您的逻辑不可矢量化。您最好的选择可能是列表理解:dict
pd.Series
import pandas as pd
df = pd.DataFrame([['2018-01-01', {1, 2, 3}],
['2018-01-02', {3}],
['2018-01-03', {3, 4, 5}],
['2018-01-04', {3, 6}]],
columns=['timestamp', 'ids'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace=True)
df['ids'] = [set.union(*df.iloc[max(0, i-2): i+1, 0]) for i in range(len(df.index))]
print(df)
ids
timestamp
2018-01-01 {1, 2, 3}
2018-01-02 {1, 2, 3}
2018-01-03 {1, 2, 3, 4, 5}
2018-01-04 {3, 4, 5, 6}
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
504 次 |
最近记录: |