library(rjson)
filenames <- list.files(pattern="*.json") # gives a character vector, with each file name represented by an entry
Run Code Online (Sandbox Code Playgroud)
现在,我想将所有JSON文件作为一个dataFrame导入到R中。我怎么做?
我第一次尝试
myJSON <- lapply(filenames, function(x) fromJSON(file=x)) # should return a list in which each element is one of the JSON files
Run Code Online (Sandbox Code Playgroud)
但是上面的代码要花一些时间才能终止,因为我有15,000个文件,而且我知道它不会返回单个数据帧。有更快的方法吗?
样本JSON文件:
{"Reviews": [{"Ratings": {"Service": "4", "Cleanliness": "5"}, "AuthorLocation": "Boston", "Title": "\u201cExcellent Hotel & Location\u201d", "Author": "gowharr32", "ReviewID": "UR126946257", "Content": "We enjoyed the Best Western Pioneer Square....", "Date": "March 29, 2012"}, {"Ratings": {"Overall": "5"},"AuthorLocation": "Chicago",....},{...},....}]}
Run Code Online (Sandbox Code Playgroud)
对于在这里寻找 purrr / tidyverse 解决方案的任何人:
library(purrr)
library(tidyverse)
library(jsonlite)
path <- "./your_path"
files <- dir(path, pattern = "*.json")
data <- files %>%
map_df(~fromJSON(file.path(path, .), flatten = TRUE))
Run Code Online (Sandbox Code Playgroud)
通过以下方式并行:
library(parallel)
cl <- makeCluster(detectCores() - 1)
json_files<-list.files(path ="your/json/path",pattern="*.json",full.names = TRUE)
json_list<-parLapply(cl,json_files,function(x) rjson::fromJSON(file=x,method = "R"))
stopCluster(cl)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3368 次 |
| 最近记录: |