R-从数据框中过滤数据

Yu *_*eng 5 r filter dataframe

我是R的新手,真的不确定如何在日期框架中过滤数据。

我创建了一个包含两列的数据框,包括每月日期和相应的温度。它的长度为324。

> head(Nino3.4_1974_2000)
  Month_common               Nino3.4_degree_1974_2000_plain
1   1974-01-15                       -1.93025
2   1974-02-15                       -1.73535
3   1974-03-15                       -1.20040
4   1974-04-15                       -1.00390
5   1974-05-15                       -0.62550
6   1974-06-15                       -0.36915
Run Code Online (Sandbox Code Playgroud)

过滤规则是选择大于或等于0.5度的温度。另外,它必须至少连续5个月。

我已经消除了温度低于0.5度的数据(请参见下文)。

for (i in 1) {
el_nino=Nino3.4_1974_2000[which(Nino3.4_1974_2000$Nino3.4_degree_1974_2000_plain >= 0.5),]
}

> head(el_nino)
   Month_common               Nino3.4_degree_1974_2000_plain
32   1976-08-15                      0.5192000
33   1976-09-15                      0.8740000
34   1976-10-15                      0.8864501
35   1976-11-15                      0.8229501
36   1976-12-15                      0.7336500
37   1977-01-15                      0.9276500
Run Code Online (Sandbox Code Playgroud)

但是,我仍然需要连续提取5个月。我希望有人能帮助我。

mat*_*fee 4

如果您始终可以信赖一个月的间隔,那么让我们暂时放弃时间信息:

\n\n
temps <- Nino3.4_1974_2000$Nino3.4_degree_1974_2000_plain\n
Run Code Online (Sandbox Code Playgroud)\n\n

因此,由于该向量中的每个温度总是相隔一个月,因此我们只需查找 、 的游程temps[i]>=0.5,并且游程的长度必须至少为 5。

\n\n

如果我们执行以下操作:

\n\n
ofinterest <- temps >= 0.5\n
Run Code Online (Sandbox Code Playgroud)\n\n

我们将得到一个ofinterest带有值等的向量,其中TRUE FALSE FALSE TRUE TRUE ....它的值>= 0.5,否则。TRUEtemps[i]FALSE

\n\n

为了重新表述你的问题,我们只需要查找至少TRUE连续五次出现的情况。

\n\n

为此,我们可以使用该函数rle?rle给出:

\n\n
> ?rle\nDescription\n     Compute the lengths and values of runs of equal values in a vector\n     - or the reverse operation.\nValue:\n     \xe2\x80\x98rle()\xe2\x80\x99 returns an object of class \xe2\x80\x98"rle"\xe2\x80\x99 which is a list with\n     components:    \n lengths: an integer vector containing the length of each run.\n  values: a vector of the same length as \xe2\x80\x98lengths\xe2\x80\x99 with the\n          corresponding values.\n
Run Code Online (Sandbox Code Playgroud)\n\n

所以我们用rlewhich来统计连续TRUE连续和连续FALSE连续的所有条纹,并寻找连续至少5条TRUE

\n\n

我将编一些数据来演示:

\n\n
# for you, temps <- Nino3.4_1974_2000$Nino3.4_degree_1974_2000_plain\ntemps <- runif(1000) \n\n# make a vector that is TRUE when temperature is >= 0.5 and FALSE otherwise\nofinterest <- temps >= 0.5\n\n# count up the runs of TRUEs and FALSEs using rle:\nruns <- rle(ofinterest) \n\n# we need to find points where runs$lengths >= 5 (ie more than 5 in a row), \n# AND runs$values is TRUE (so more than 5 \'TRUE\'s in a row).\nstreakIs <- which(runs$lengths>=5 & runs$values)\n\n# these are all the el_nino occurences. \n# We need to convert `streakIs` into indices into our original `temps` vector.\n# To do this we add up all the `runs$lengths` up to `streakIs[i]` and that gives\n#  the index into `temps`.\n# that is:\n# startMonths <- c()\n# for ( n in streakIs ) {\n#     startMonths <- c(startMonths,   sum(runs$lengths[1:(n-1)]) + 1\n# }\n#\n# However, since this is R we can vectorise with sapply:\nstartMonths <- sapply(streakIs, function(n) sum(runs$lengths[1:(n-1)])+1)\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在,如果您这样做,Nino3.4_1974_2000$Month_common[startMonths]您将获得厄尔尼诺现象开始的所有月份。

\n\n

归结起来只有几行:

\n\n
runs <- rle(Nino3.4_1974_2000$Nino3.4_degree_1974_2000_plain>=0.5) \nstreakIs <- which(runs$lengths>=5 & runs$values)\nstartMonths <- sapply(streakIs, function(n) sum(runs$lengths[1:(n-1)])+1)\nNino3.4_1974_2000$Month_common[startMonths]\n
Run Code Online (Sandbox Code Playgroud)\n