Vec*_* JX 5 r dataframe rmysql dplyr
我在下面提到了 R 中的数据框。
ID Amount Date
IK-1 100 2020-01-01
IK-2 110 2020-01-02
IK-3 120 2020-01-03
IK-4 109 2020-01-03
IK-5 104 2020-01-03
Run Code Online (Sandbox Code Playgroud)
我正在使用ID以下代码从 MySQL 获取一些详细信息。
library(RMySQL)
conn<- connection
query<-paste0("SELECT c.ID,e.Parameters, d.status
FROM Table1 c
left outer join Table2 d ON d.seq_id=c.ID
LEFT outer JOIN Table3 e ON e.role_id=d.role
where c.ID IN (", paste(shQuote(dataframe$ID, type = "sh"),
collapse = ', '),")
and e.Parameters in
('Section1',
'Section2','Section3',
'Section4');")
res1 <- dbGetQuery(conn,query)
res2<-res1[res1$Parameters=="Section1",4:5]
colnames(res2)[colnames(res2)=="status"] <- "Section1_Status"
Run Code Online (Sandbox Code Playgroud)
上面的代码工作正常,如果我传递了 ~1000 ID,但是在一次传递 10000 或更多 ID 时会引发 R 终止错误。
如何创建循环并批量传递 Id 以获得 10000 ID 的最终输出。
错误信息:
Warning message:
In dbFetch(rs, n = n, ...) : error while fetching rows
Run Code Online (Sandbox Code Playgroud)
小智 5
在您的 SQL 查询之前将 ID 的数据框传递到临时表中,然后使用它对您正在使用的 ID 进行内部连接,这样您就可以避免循环。你所要做的就是在调用它时使用dbWriteTable和设置参数temporary = TRUE。
前任:
library(DBI)
library(RMySQL)
con <- dbConnect(RMySQL::MySQL(), user='user',
password='password', dbname='database_name', host='host')
#here we write the table into the DB and then declare it as temporary
dbWriteTable(conn = con, value = dataframe, name = "id_frame", temporary = T)
res1 <- dbGetQuery(con = conn, "SELECT c.ID,e.Parameters, d.status
FROM Table1 c
left outer join Table2 d ON d.seq_id=c.ID
LEFT outer JOIN Table3 e ON e.role_id=d.role
Inner join id_frame idf on idf.ID = c.ID
and e.Parameters in
('Section1',
'Section2','Section3',
'Section4');")
Run Code Online (Sandbox Code Playgroud)
这应该可以提高代码的性能,并且您不再需要使用 where 语句在 R 中循环。如果它不能正常工作,请告诉我。
# Load Packages
library(dplyr) # only needed to create the initial dataframe
library(RMySQL)
# create the initial dataframe
df <- tribble(
~ID, ~Amount, ~Date
, "IK-1" , 100 , 2020-01-01
, "IK-2" , 110 , 2020-01-02
, "IK-3" , 120 , 2020-01-03
, "IK-4" , 109 , 2020-01-03
, "IK-5" , 104 , 2020-01-03
)
# first helper function
createIDBatchVector <- function(x, batchSize){
paste0(
"'"
, sapply(
split(x, ceiling(seq_along(x) / batchSize))
, paste
, collapse = "','"
)
, "'"
)
}
# second helper function
createQueries <- function(IDbatches){
paste0("
SELECT c.ID,e.Parameters, d.status
FROM Table1 c
LEFT OUTER JOIN Table2 d ON d.seq_id =c.ID
LEFT OUTER JOIN Table3 e ON e.role_id = d.role
WHERE c.ID IN (", IDbatches,")
AND e.Parameters in ('Section1','Section2','Section3','Section4');
")
}
# ------------------------------------------------------------------
# and now the actual script
# first we create a vector that contains one batch per element
IDbatches <- createIDBatchVector(df$ID, 2)
# It looks like this:
# [1] "'IK-1','IK-2'" "'IK-3','IK-4'" "'IK-5'"
# now we create a vector of SQL-queries out of that
queries <- createQueries(IDbatches)
cat(queries) # use cat to show what they look like
# it looks like this:
# SELECT c.ID,e.Parameters, d.status
# FROM Table1 c
# LEFT OUTER JOIN Table2 d ON d.seq_id =c.ID
# LEFT OUTER JOIN Table3 e ON e.role_id = d.role
# WHERE c.ID IN ('IK-1','IK-2')
# AND e.Parameters in ('Section1','Section2','Section3','Section4');
#
# SELECT c.ID,e.Parameters, d.status
# FROM Table1 c
# LEFT OUTER JOIN Table2 d ON d.seq_id =c.ID
# LEFT OUTER JOIN Table3 e ON e.role_id = d.role
# WHERE c.ID IN ('IK-3','IK-4')
# AND e.Parameters in ('Section1','Section2','Section3','Section4');
#
# SELECT c.ID,e.Parameters, d.status
# FROM Table1 c
# LEFT OUTER JOIN Table2 d ON d.seq_id =c.ID
# LEFT OUTER JOIN Table3 e ON e.role_id = d.role
# WHERE c.ID IN ('IK-5')
# AND e.Parameters in ('Section1','Section2','Section3','Section4');
# and now the loop
df_final <- data.frame() # initialize a dataframe
conn <- connection # open a connection
for (query in queries){ # iterate over the queries
df_final <- rbind(df_final, dbGetQuery(conn,query))
}
# And here the connection should be closed. (I don't know the function call for this.)
Run Code Online (Sandbox Code Playgroud)