Given a vector:
c("kuku", "pupu", "lilu","","ff","rrrr", "", "rrr")
Run Code Online (Sandbox Code Playgroud)
How can I split it by ""
?
To get 3 vectors:
c("kuku", "pupu", "lilu")
c("ff","rrrr")
c("rrr")
Run Code Online (Sandbox Code Playgroud) 这是我的data_frame
对象:
structure(list(dt = structure(c(17702, 17702, 17702, 17702, 17703,
17703, 17704, 17705, 17705, 17706, 17706, 17706, 17706), class = "Date"),
uuid_lev = c(4L, 5L, 8L, 10L, 6L, 8L, 8L, 1L, 7L, 2L, 3L,
7L, 9L), mean_call_duration = c(57.8043647700702, 222.806,
132.73, 74.976645858206, 204.53, 138.8385, 138.21, 113.478,
162.656, 127.714, 145.507732189148, 168.676, 73.928), median_call_duration = c(29,
78, 25.6666666666667, 29, 36, 23.875, 23.5, 25, 44, 14, 30,
46, 16), max_call_duration = c(2117, 4589, 5137, 4470, 3966,
5137, 5137, 3249, 5137, 7201, 7201, 5137, …
Run Code Online (Sandbox Code Playgroud) 给定一个数据框:
df <- structure(list(a = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4), b = c(34,
343, 54, 11, 55, 62, 59, -9, 0, -0.5)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)
我想从每组中获取最后N个观察值/行:
df %>%
dplyr::group_by(a) %>%
dplyr::last(2)
Run Code Online (Sandbox Code Playgroud)
给我错误的结果。
我希望它是:
a b
1 343
1 54
2 55
2 62
3 59
3 -9
4 0
4 -0.5
Run Code Online (Sandbox Code Playgroud)
请告知这里有什么问题吗?
我得到的错误是:
order(order_by)[[n]]错误:下标超出范围
我有很多这样的字符串:
2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0
Run Code Online (Sandbox Code Playgroud)
我想提取紧跟在最后一个"/"
并以以下结尾的子字符串"_"
:
556662
Run Code Online (Sandbox Code Playgroud)
我已经找到了如何提取: /01/01/07/556662
通过使用以下正则表达式: (\/)(.*?)(?=\_)
请告知我如何捕获正确的组。
dataframe
给定一个iris
默认值,如何配置purrr::map_dfr()
函数在每一行上运行dataframe
并执行函数foo
。
这是我的 df 的一行,请考虑到该值始终是一个大 JSON:
structure(list(Key = "2019/01/04/14/kuku@pupu.com_2ed026cb-8e9f-4392-9cc4-9f580b9d3aab_1345a5a4-3d5b-48a0-a678-67ed09a6f487_2019-01-04-14-52-43-537",
LastModified = "2019-01-04T14:52:44.000Z", ETag = "\"1c6269ab8b7baa85f0d2567de417f0d0\"",
Size = 35280, Owner = "e7c0d260939d15d18866126da3376642e2d4497f18ed762b608ed2307778bdf1",
StorageClass = "STANDARD", Bucket = "comp-kukupupu-streamed-data",
user_name = "kuku@pupu.com", value = list(---here goes a large json),
obs_id = 1137L), row.names = 1L, class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
我的功能是:
extract_scroll_data <- function(df) {
tryCatch({
j <- fromJSON(unlist(df$value))
if (is_empty(fromJSON(j$sensorsData)) | is_empty(fromJSON(j$eventList))) {
return(tibble())
} else {
return(set_names(as_tibble(fromJSON(j$eventList, bigint_as_char = TRUE),
.name_repair = "unique"),
nm …
Run Code Online (Sandbox Code Playgroud) 后 sudo pip3 install ray
我创建了一个函数foo()
,在 ray 装饰器中定义:
import ray
ray.init()
@ray.remote
def foo(x):
print(x)
Run Code Online (Sandbox Code Playgroud)
我希望能够foo
在并行和常规模式下使用两者(忽略装饰器)。
如果我想在foo
没有 a 的情况下使用.remote( blabla_variable )
它会返回一个错误。
当我不需要装饰器时,请告知如何“忽略”它。
当我运行 NER 模型时,我得到:
UserWarning: [W031] Model 'en_model' (0.0.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.2)
Run Code Online (Sandbox Code Playgroud)
请告知我该如何修复它?
Python 3.7.9、spaCy 2.3.2、Ubuntu 18.04。