具有这样的数据框:
dframe <- data.frame(id = c(1,2,3), Google = c(2,1,1), Yahoo = c(0,1,1), Amazon = c(1,1,0))
Run Code Online (Sandbox Code Playgroud)
如何测试每列是否包含二进制(0和1)表示形式(每行的最大数量不大于1)
例
colname, binary_status
Google, False
Yahoo, True
Amazon, True
Run Code Online (Sandbox Code Playgroud) 有这样的数据框:
data.frame(text = c("separate1: and: more","another 20: 42")
Run Code Online (Sandbox Code Playgroud)
如何在每一行中使用第一个 : 进行分隔?预期输出示例
data.frame(text1 = c("separate1","another 20"), text2 = c("and: more","42")
Run Code Online (Sandbox Code Playgroud) 在这样的数据框中:
df <- data.frame(id = c(1,2,3), text = c("hi my name is E","hi what's your name","name here"))
Run Code Online (Sandbox Code Playgroud)
我想保留一行中同时包含 hi 和 name 单词的行。消耗输出示例:
df <- data.frame(id = c(1,2,3), text = c("hi my name is E","hi what's your name"))
Run Code Online (Sandbox Code Playgroud)
我试试这个,但它不能正常工作:
library(tidyverse)
df %>%
filter(str_detect(text, 'name&hi'))
Run Code Online (Sandbox Code Playgroud) 有一个带文本的数据框
df = data.frame(id=c(1,2), text = c("My best friend John works and Google", "However he would like to work at Amazon as he likes to use python and stay at Canada")
Run Code Online (Sandbox Code Playgroud)
无需任何预处理
怎么可能像这样提取名称实体识别
示例结果词
dfresults = data.frame(id=c(1,2), ner_words = c("John, Google", "Amazon, python, Canada")
Run Code Online (Sandbox Code Playgroud) 我想避免使用外部列表:
list <- c("Google", "Yahoo", "Amazon")
Run Code Online (Sandbox Code Playgroud)
数据帧中在第一个时间戳(最旧的时间戳)中记录的值,如下所示:
dframe <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), name = c("Google",
"Google", "Yahoo", "Amazon", "Amazon", "Google", "Amazon"), date = c("2008-11-01",
"2008-11-02", "2008-11-01", "2008-11-04", "2008-11-01", "2008-11-02",
"2008-11-03")), class = "data.frame", row.names = c(NA, -7L))
Run Code Online (Sandbox Code Playgroud)
预期的输出是这样的:
Run Code Online (Sandbox Code Playgroud)id name date 1 Google 2008-11-01 1 Yahoo 2008-11-01 1 Amazon 2008-11-04 2 Amazon 2008-11-01 2 Google 2008-11-02
如何做到这一点?
使用此功能,它仅保留每个id的第一条记录,而不保留第一次记录的列表中的每个单个值的第一条记录
library(data.table)
setDT(dframe)
date_list_first = dframe[order(date)][!duplicated(id)]
Run Code Online (Sandbox Code Playgroud) 在字符串数据列中,如何检查每一行是否存在字母并将其删除。
例子
I am a text r r o n n r and here
Run Code Online (Sandbox Code Playgroud)
并将其作为输出
I am a text and here
Run Code Online (Sandbox Code Playgroud) 在 lda 分析中
library(topicmodels)
# parameters for Gibbs sampling
burnin <- 4000
iter <- 2000
thin <- 500
seed <-list(1969,5,25,102855,2012)
nstart <- 5
best <- TRUE
#Number of topics
k <- 10
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
#Run LDA with Gibbs
ldaOut <-LDA(AssociatedPress[1:20,], k, method="Gibbs", control=list(nstart=nstart, seed = seed, best = best, burnin =
burnin, iter = iter, thin=thin))
Run Code Online (Sandbox Code Playgroud)
如何创建网格搜索以找到参数的最佳值?