小编Yog*_*mar的帖子

如何计算不同组的共同值?

我正在尝试使用igraph包创建用于创建网络图表的数据框.我有样本数据"mydata_data",我想创建"expected_data".

我可以很容易地计算访问特定商店的客户数量,但我如何计算去存储x1和存储x2等的常见客户组.

我有500多个商店,所以我不想手动创建列.可重现目的的样本数据如下:

mydata_data<-data.frame(
  Customer_Name=c("A","A","C","D","D","B"),
  Store_Name=c("x1","x2","x2","x2","x3","x1"))

expected_data<-data.frame(
 Store_Name=c("x1","x2","x3","x1_x2","x2_x3","x1_x3"), 
 Customers_Visited=c(2,3,1,1,1,0))
Run Code Online (Sandbox Code Playgroud)

r igraph dplyr

6
推荐指数
1
解决办法
147
查看次数

如何在R中使用sparklyr读取S3文件夹/存储桶中的所有文件?

我已尝试下面的代码及其组合,以便读取 S3 文件夹中给出的所有文件,但似乎没有任何效果。敏感信息/代码已从下面的脚本中删除。有 6 个文件,每个文件 6.5 GB。

#Spark Connection
sc<-spark_connect(master = "local" , config=config)


rd_1<-spark_read_csv(sc,name = "Retail_1",path = "s3a://mybucket/xyzabc/Retail_Industry/*/*",header = F,delimiter = "|")


# This is the S3 bucket/folder for files [One of the file names Industry_Raw_Data_000]
s3://mybucket/xyzabc/Retail_Industry/Industry_Raw_Data_000
Run Code Online (Sandbox Code Playgroud)

这是我得到的错误

Error: org.apache.spark.sql.AnalysisException: Path does not exist: s3a://mybucket/xyzabc/Retail_Industry/*/*;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:710)
Run Code Online (Sandbox Code Playgroud)

r amazon-s3 rstudio apache-spark sparklyr

5
推荐指数
1
解决办法
1903
查看次数

如何为当天出现的设备分配编号?

我有数据框(df),它具有设备ID和本地日期列。我想将用户ID分配给始终在所有本地日期一起显示的设备ID。我在下面提供了示例

device_id <- c("x1", "x1", "x1", "x2", "x2", "x3", "x3", "x3", "x4", "x4", "x5", 
           "x5", "x5", "x5", "x5", "x5", "x5", "x6", "x6", "x7", "x7", "x8", 
           "x8", "x9", "x9", "x9")

local_date <- c("2019-01-13", "2019-01-14", "2019-01-15", "2019-01-03", "2019-01-04", 
                "2019-01-10", "2019-01-11", "2019-01-12", "2019-01-11", "2019-01-12", 
                "2019-01-03", "2019-01-05", "2019-01-06", "2019-01-07", "2019-01-08", 
                "2019-01-13", "2019-01-23", "2019-01-03", "2019-01-04", "2019-10-23", 
                "2019-10-28", "2019-10-23", "2019-10-28", "2019-01-13", "2019-01-14", 
                "2019-01-15")

df <- data.frame(device_id, local_date)

df$local_date <- as.Date(df$local_date)
Run Code Online (Sandbox Code Playgroud)

这是我要创建的数据框。

expected_df <- data.frame(device_id=c("x1", "x9", "x2", "x6", "x3", "x4", "x5", "x7", "x8"), 
                          user_id=c(1, 1, 2, 2, …
Run Code Online (Sandbox Code Playgroud)

r dataframe dplyr

2
推荐指数
1
解决办法
46
查看次数

标签 统计

r ×3

dplyr ×2

amazon-s3 ×1

apache-spark ×1

dataframe ×1

igraph ×1

rstudio ×1

sparklyr ×1