假设我在 R 中有一系列日期,如下所示:
d <- as.Date(c('2001-01-01', '2001-01-02', '2001-01-04', '2001-01-05'))
Run Code Online (Sandbox Code Playgroud)
日期2001-01-03缺失。有没有快速的方法来识别这个?事实上,我有一个比 4 个观察结果更长的系列。
我有两个带有以下查询的 Athena 表:
select
date,
uid,
logged_hrs,
extract(hour from start_time) as hour
from schema.table1
where building = 'MKE'
and pt_date between date '2019-01-01' and date '2019-01-09'
Run Code Online (Sandbox Code Playgroud)
和
select
associate_uid as uid,
date(substr(fcdate_utc, 1, 10)) as pt_date,
learning_curve_level
from tenure.learningcurve
where warehouse_id = 'MKE'
and date(substr(fcdate_utc, 1, 10)) between date '2019-01-01' and date '2019-01-09'
Run Code Online (Sandbox Code Playgroud)
我想加入他们的uid和pt_date。我怎样才能做到这一点?
我试过:
select (select
date,
uid,
logged_hrs,
extract(hour from start_time) as hour
from schema.table1
where building = 'MKE'
and pt_date between date '2019-01-01' …Run Code Online (Sandbox Code Playgroud) 我有以下 DataFrame 有一些缺失值。我想用ffill()两个来填补缺失值var1,并var2通过分组date和building。我可以一次为一个变量执行此操作,但是当我尝试为两个变量执行此操作时,它会崩溃。我怎样才能同时对两个变量执行此操作,同时也不修改但保留var3或var4?
df = pd.DataFrame({
'date': ['2019-01-01','2019-01-01','2019-01-01','2019-01-01','2019-02-01','2019-02-01','2019-02-01','2019-02-01'],
'building': ['a', 'a', 'b', 'b', 'a', 'a', 'b', 'b'],
'var1': [1.5, np.nan, 2.1, 2.2, 1.2, 1.3, 2.4, np.nan],
'var2': [100, 110, 105, np.nan, 102, np.nan, 103, 107],
'var3': [10, 11, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
'var4': [1, 2, 3, 4, 5, 6, 7, 8]
})
df
date building var1 var2 var3 var4
0 2019-01-01 a 1.5 100.0 …Run Code Online (Sandbox Code Playgroud) 我想使用 Pandas 的dropna函数axis=1来删除列,但仅限于具有某些thresh集合的列的子集。更具体地说,我想传递一个关于在dropna操作中要忽略哪些列的参数。我怎样才能做到这一点?下面是我尝试过的示例。
import pandas as pd
df = pd.DataFrame({
'building': ['bul2', 'bul2', 'cap1', 'cap1'],
'date': ['2019-01-01', '2019-02-01', '2019-01-01', '2019-02-01'],
'rate1': [301, np.nan, 250, 276],
'rate2': [250, 300, np.nan, np.nan],
'rate3': [230, np.nan, np.nan, np.nan],
'rate4': [230, np.nan, 245, np.nan],
})
# Only retain columns with more than 3 non-missing values
df.dropna(1, thresh=3)
building date rate1
0 bul2 2019-01-01 301.0
1 bul2 2019-02-01 NaN
2 cap1 2019-01-01 250.0
3 cap1 2019-02-01 276.0 …Run Code Online (Sandbox Code Playgroud)