zio*_*zio 4 python numpy pandas
我有一个数据框,我试图填写'日期'列(文本)中的值,如下所示:
使用dfs=pd.read_html(pageUrl,infer_types=False)
then 生成数据帧df=dfs[0]
Date Time datetime Year
0 None None 2007
1 May 1 0:58 None 2007
2 1:00 None 2007
3 1:30 None 2007
4 1:45 None 2007
5 3:45 None 2007
6 4:45 None 2007
7 6:30 None 2007
8 7:15 None 2007
9 7:45 None 2007
Run Code Online (Sandbox Code Playgroud)
df.dtypes
显示;
Date object
Time object
datetime object
Year int64
dtype: object
Run Code Online (Sandbox Code Playgroud)
首先,我尝试按行填写.如果当前"日期"为空,则尝试向后移动一行以获取先前的值:
def fillDate(r):
if r['Date']=="":
p=r.shift(-1)
r['Date']=p['Date']
return r
Run Code Online (Sandbox Code Playgroud)
然后
df.apply(fillDate,axis=1)
Run Code Online (Sandbox Code Playgroud)
这会使用"时间"填充"日期"列.
那么我尝试使用axis = 0(每列基础)并修改函数,因此它只将它应用于'Date'列(我无法看到如何将其应用于一列)
def fillDate(r):
if r.name=='Date':
if r['Date']=="":
p=r.shift(-1)
r['Date']=p['Date']
return r
Run Code Online (Sandbox Code Playgroud)
然后
df.apply(fillDate,axis=0)
Run Code Online (Sandbox Code Playgroud)
给出了错误
KeyError: ('Date', u'occurred at index Date')
Run Code Online (Sandbox Code Playgroud)
目的是使用"日期"为空白时前一个单元格中的值填充"日期"中的值.
我怎样才能做到这一点?
In [16]: df = pd.read_fwf(StringIO(data),widths=[5,12,8,8,6],header=0,names=['idx','date','time','datetime','year'])
# simulate what the OP actually has (though this doesn't happen upon read in)
In [30]: df['date'] = df['date'].fillna('')
In [31]: df
Out[31]:
idx date time datetime year
0 0 None None 2007
1 1 May 1 0:58 None 2007
2 2 1:00 None 2007
3 3 1:30 None 2007
4 4 1:45 None 2007
5 5 3:45 None 2007
6 6 4:45 None 2007
7 7 6:30 None 2007
8 8 7:15 None 2007
9 9 7:45 None 2007
In [32]: df.loc[df.date=='','date'] = np.nan
In [33]: df
Out[33]:
idx date time datetime year
0 0 NaN None None 2007
1 1 May 1 0:58 None 2007
2 2 NaN 1:00 None 2007
3 3 NaN 1:30 None 2007
4 4 NaN 1:45 None 2007
5 5 NaN 3:45 None 2007
6 6 NaN 4:45 None 2007
7 7 NaN 6:30 None 2007
8 8 NaN 7:15 None 2007
9 9 NaN 7:45 None 2007
In [34]: df['date'] = df['date'].ffill()
In [35]: df
Out[35]:
idx date time datetime year
0 0 NaN None None 2007
1 1 May 1 0:58 None 2007
2 2 May 1 1:00 None 2007
3 3 May 1 1:30 None 2007
4 4 May 1 1:45 None 2007
5 5 May 1 3:45 None 2007
6 6 May 1 4:45 None 2007
7 7 May 1 6:30 None 2007
8 8 May 1 7:15 None 2007
9 9 May 1 7:45 None 2007
Run Code Online (Sandbox Code Playgroud)