这是一个虚拟数据:
father<- c(1, 1, 1, 1, 1)
mother<- c(1, 1, 1, NA, NA)
children <- c(NA, NA, 2, 5, 2)
cousins <- c(NA, 5, 1, 1, 4)
dataset <- data.frame(father, mother, children, cousins)
dataset
father mother children cousins
1 1 NA NA
1 1 NA 5
1 1 2 1
1 NA 5 1
1 NA 2 4
Run Code Online (Sandbox Code Playgroud)
我想过滤这一行:
father mother children cousins
1 1 NA NA
Run Code Online (Sandbox Code Playgroud)
我可以这样做:
test <- dataset %>%
filter(father==1 & mother==1) %>%
filter (is.na(children)) %>%
filter …Run Code Online (Sandbox Code Playgroud) 我的数据:
Caterina Guonçallvez braçeyro
Francisco Ro[dr]í[gueJz luveyro
Johao de Miranda calçeteyro
Lucas Martinz Mal-Cuzinhado, braçeyro
Francisquo d[e] Arruda braçeyro
Francisquo de Miranda braçeyro
Run Code Online (Sandbox Code Playgroud)
- 名字姓氏
- 名字姓氏带有brakets和J(brakets ocr识别)
- 名字姓氏带连字符
- 名字姓氏带粒子
- 名字姓氏带粒子带刹车
预期产出
Caterina Guonçallvez
Francisco Ro[dr]í[gueJz
Johao de Miranda
Lucas Martinz Mal-Cuzinhado
Francisquo d[e] Arruda
Francisquo de Miranda
Run Code Online (Sandbox Code Playgroud)
名字以大写字母开头
名称的最后一部分后跟一个空格(或带空格的逗号)和一个以小写字母开头的单词,如"braçeyro"或"calçeteyro"(人们的工作)
data < - readLines("clipboard",encoding ="latin1")
我尝试了什么:
^([a-zA-Zàáâäãå????èéêë??ìíîï??òóôöõøùúûü??ÿý??ñç?šžÀÁÂÄÃÅ?????ÈÉÊËÌÍÎÏ???ÒÓÔÖÕØÙÚÛÜ??ŸÝ??ÑßÇŒÆ?ŠŽ?ð])\w+[A-Z ,.'-]\w+
Run Code Online (Sandbox Code Playgroud)
给
Antonio Guomez
Caterina Guon
Francisco Ro
Johao de
Francisquo d
使用dplyr处理4行data.frame对象时,我想创建一个新的"id"列,其中包含前缀字符串和值序列.
我的期望:
columnA|columnB|columnC|id
data data data id-1
data data data id-2
data data data id-3
data data data id-4
Run Code Online (Sandbox Code Playgroud)
我尝试了什么:
library (dplyr)
y <- x %>%
mutate (id = "id- " & seq(from = 1, to =4, by = 1))
Run Code Online (Sandbox Code Playgroud) 我正在做家谱:
我已经根据sqldf https://www.r-bloggers.com/exploring-recursive-ctes-with-sqldf/改编了Bob Horton的示例
我的资料:
person father
Guillou Arthur NA
Cleach Marc NA
Guillou Eric Guillou Arthur
Guillou Jacques Guillou Arthur
Cleach Franck Cleach Marc
Cleach Leo Cleach Marc
Cleach Herbet Cleach Leo
Cleach Adele Cleach Herbet
Guillou Jean Guillou Eric
Guillou Alan Guillou Eric
Run Code Online (Sandbox Code Playgroud)
我的结果是,后代按“ Guillou Arthur”(没有父亲的头等人物)的等级排序:
name parent_name level
Guillou Arthur NA 1
Guillou Eric Guillou Arthur 2
Guillou Jacques Guillou Arthur 2
Guillou Alan Guillou Eric 3
Guillou Jean Guillou Eric 3
Run Code Online (Sandbox Code Playgroud)
您可以使用sqldf进行递归查询来构建此表:
数据 :
person …Run Code Online (Sandbox Code Playgroud) 我必须读取 sav 文件\n我使用该包haven
library(haven)\ndataset<- read_sav("datafile.sav")\nRun Code Online (Sandbox Code Playgroud)\n在控制台中我可以看到标签:
\n\n dput(head(voyages$portdep))\n structure(c(50422, 50299, 50299, 50299, NA, NA), label = "Port of departure", labels = c(Alicante = 10101, \n Barcelona = 10102, Bilbao = 10103, Cadiz = 10104, Figuera = 10105, \n Gibraltar = 10106, `La Coru\xc3\xb1a` = 10107, Santander = 10110, Seville = 10111, \n `San Lucar` = 10112, Vigo = 10113, `Spain, port unspecified` = 10199, \n Lagos = 10202, Lisbon = 10203, Oporto = 10204, `Ilho do …Run Code Online (Sandbox Code Playgroud) 我想更新新列中的值.
这是我的数据:
people<- c("father", "parents", "father", "children", "girl", "boy", "grand father", "grand mother", "grandparents" )
dataset0 <- data.frame(people)
dataset0
Run Code Online (Sandbox Code Playgroud)
并输出:
father
parents
father
children
girl
boy
grand father
grand mother
grandparents
Run Code Online (Sandbox Code Playgroud)
预期产量:
people people_update
father parents
parents parents
father parents
children children
girl children
boy children
grand father grandparents
grand mother grandparents
grandparents grandparents
Run Code Online (Sandbox Code Playgroud)
我试着用 replace()
dataset <- dataset0 %>%
mutate(people_update = replace(people, people =="girl", "children")) %>%
mutate(people_update = replace(people, people =="boy", "children"))
dataset
Run Code Online (Sandbox Code Playgroud)
但这不起作用.第二个mutate()命令取消第一个mutate()命令.
我正在尝试使用滑块输入选择要在数据表中显示的行数。
这是我的应用程序的 pageLength 为 2
library(shiny)
library(DT)
# Dummy data
dataset <- data.frame(lng = c(-5, -5, -5, -5, -15, -15, -10),
lat = c(8, 8, 8, 8, 33, 33, 20),
year = c(2018, 2018, 2018, 2017, 2017, 2017, 2016),
type = c('A', 'A', 'A', 'A', 'B', 'B', 'A'),
id =c("1", "1", "1", "1", "2", "2", "3"))
ui <- fluidPage(
sidebarLayout(
sidebarPanel(
sliderInput("rows",
"Number of rows",
min = 1,
max = 50,
value = 1)
),
# datable output
DT::dataTableOutput(outputId = …Run Code Online (Sandbox Code Playgroud) Dplyr:如何根据整数序列重复每一行(1:3)
我正在办理登记册(关于比利时的例子):
country<- c("belg")
country <- as.data.frame(country)
Run Code Online (Sandbox Code Playgroud)
该注册包含3页:
library(dplyr)
country2 <- country %>%
slice(rep(1:n(), each=3)) %>%
mutate(pages = row_number())
Run Code Online (Sandbox Code Playgroud)
我的输出:
country page
belg 1
belg 2
belg 3
Run Code Online (Sandbox Code Playgroud)
预期结果:每个Register'pages包含三行(根据整数序列重复每一行(1:3))
country page row_id
belg 1 1
belg 1 2
belg 1 3
belg 2 1
belg 2 2
belg 2 3
...
Run Code Online (Sandbox Code Playgroud)
我尝试了什么:
将它添加到我的dplyr管道:
%>%
group_by(pages) %>%
mutate(row_id = seq(1:3)) %>%
ungroup()
Run Code Online (Sandbox Code Playgroud) 我有这个data.frame对象:
subject <- c("Nantes", "Nantes", "Nantes", "Brest", "Brest", "Rennes")
page <- c(1, 2, 3, 1, 2, 1)
rows <- c(2, 3, 4, 6, 2, 3)
df <- data.frame (subject,page, rows)
Run Code Online (Sandbox Code Playgroud)
这是输出:
subject page rows
Nantes 1 2
Nantes 2 3
Nantes 3 4
Brest 1 6
Brest 2 2
Rennes 1 3
Run Code Online (Sandbox Code Playgroud)
南特的主题:第1页第2
页,第3页.每个页面都有不同的行数.对于南特,page1有2行.
我想要的:根据1:nrow序列复制每一行.
例如:我需要将Nantes第1页dpulicate两次
subject page rows
Nantes 1 1
Nantes 1 2
Nantes 2 1
Nantes 2 2
Nantes 2 3
Nantes 3 1
Nantes 3 2 …Run Code Online (Sandbox Code Playgroud) 我想根据缩放级别显示我的标记标签。基于(https://rstudio.github.io/leaflet/shiny.html)我尝试使用“input$MAPID_zoom”。在我的示例中,当缩放级别 ( ) 低于 6 时,location_name应显示存储在中的标签。mapscale
我尝试过的:
library(shiny)
library(leaflet)
# my data
df <- data.frame(
location_name = c('S1', 'S2'),
lng = c(-1.554136, -2.10401),
lat = c(47.218637, 47.218637),
stringsAsFactors = FALSE)
# UI
ui <- shinyUI(fluidPage(
leafletOutput('map')
))
# server
server <- shinyServer(function(input, output, session) {
mapscale <- observe({
input$map_zoom # get zoom level
})
output$map <- renderLeaflet({
leaflet() %>%
addTiles() %>%
addMarkers(data=df, lng = ~lng, lat = ~lat,
label =~if(mapscale<6, location_name))
})
})
shinyApp(ui = …Run Code Online (Sandbox Code Playgroud) 我有一个数据集:
var1 <- c(333, 213, 456)
var2 <- c(3, 10, 500)
var3 <- c(356, 813, 856)
var4 <- c("aaa", "bbb", "ccc")
var5 <- c(589, 111, 989)
dataset <- data.frame(var1, var2, var3, var4, var5)
Run Code Online (Sandbox Code Playgroud)
我想根据值范围保留列:子集列的所有值必须在 99 到 1000 之间。
我试过的:
library (dplyr)
dataset2 <- dataset %>%
select_if(~.>99 & . <1000)
Run Code Online (Sandbox Code Playgroud)
我想要的是 :
数据集2:var1、var3、var 4
我的数据:
data <- data.frame(column1 = c("A","B","C","D"), column2 = c(4, NA, NA, 1))
Run Code Online (Sandbox Code Playgroud)
我的烟斗:
library (dplyr)
data2 <- data %>%
filter (grepl("A|B|D", column1))
Run Code Online (Sandbox Code Playgroud)
我的问题:我怎样(简单地)继续我的管道添加一个包含column2总数的行(total = 5)?
如何重新编码data.frame中的因素(或字符串)的逻辑?
data <- data.frame(year = c(2015, 2015, 2016, 2016),
column2 = c(4, NA, 9, 1))
library (dplyr)
missing_data <- data %>%
count(year, complete.cases(column2))
names(missing_data)[2] = "col2"
Run Code Online (Sandbox Code Playgroud)
我的结果:
year col2 n
(dbl) (lgl) (int)
2015 FALSE 1
2015 TRUE 1
2016 TRUE 2
Run Code Online (Sandbox Code Playgroud)
我想要的是:
year col2 n
(dbl) (int)
2015 unknown 1
2015 known 1
2016 known 2
Run Code Online (Sandbox Code Playgroud)
我尝试过的(在dplyr链中):
mutate(col2 = as.factor(col2))
Run Code Online (Sandbox Code Playgroud)