R,dplyr:该函数可根据条件快速构建互补行的列表

use*_*916 6 r dplyr

我有一个约80,000行乘26列的数据集。这些行对应于“ SKU”或机器人构建集的唯一ID。这些列对应于26个不同的机器人零件。一个单元包含一部分对构建整个机器人的贡献。行的比例之和可能不等于1.0,因为构建集不一定总具有构建整个机器人所需的零件的100%。

主要目标是构建一个接受SKU作为输入并输出互补SKU列表的功能。补充行定义为:

  1. 如果给定的行的列非零值,则补码对该列的值必须为零。

目标是找到补充给定SKU的所有可能的SKU集,从而可以构建整个机器人。另外,重要的是要看到weightedPrice这套“科学怪人” SKU的每个机器人的加权收入(“ ”)。很好地展示了weightedPrice每个补充SKU的添加如何变化。

最低工作玩具示例(MWE):

set.seed(1)
a = runif(n=60, min=0, max=0.2)
a[a<0.12] = 0
n = 10
A = as.data.frame(matrix(a,              
                         nrow=n,         
                         ncol=6,         
                         byrow = TRUE))
A$rowTally <- rowSums(A != 0)
A$sku <- seq(from = 1, to = n)
A$totalDollarSales <- runif(n=n, min=1*10^2, max=1*10^6)
A$totalUnitSales <- runif(n=n, min=1*10^2, max=1*10^6)
names(A) <- c("p1_prop", "p2_prop", "p3_prop", "p4_prop", "p5_prop", "p6_prop", "rowTally", "sku", "totalDollarSales", "totalUnitSales")
A <- A[c("sku", "p1_prop", "p2_prop", "p3_prop", "p4_prop", "p5_prop", "p6_prop", "rowTally", "totalDollarSales", "totalUnitSales")]
A$dollarsPerRobot <- A$totalDollarSales/A$totalUnitSales
A

   sku   p1_prop   p2_prop   p3_prop   p4_prop   p5_prop   p6_prop rowTally
1    1 0         0         0         0.1816416 0         0.1796779        2
2    2 0.1889351 0.1321596 0.1258228 0         0         0                3
3    3 0.1374046 0         0.1539683 0         0.1435237 0.1983812        4
4    4 0         0.1554890 0.1869410 0         0.1303348 0                3
5    5 0         0         0         0         0.1739382 0                1
6    6 0         0         0         0         0.1654747 0.1336933        2
7    7 0.1588480 0         0.1447422 0         0.1641893 0.1294120        4
8    8 0.1565866 0         0         0.1578712 0         0                2
9    9 0.1464627 0.1385463 0         0.1722419 0         0                3
10  10 0         0         0         0         0.1324010 0                1
   totalDollarSales totalUnitSales dollarsPerRobot
1         912884.64       339139.0       2.6917711
2         293674.01       839456.4       0.3498383
3         459119.82       346748.8       1.3240703
4         332461.43       333841.6       0.9958659
5         650905.38       476403.6       1.3662898
6         258090.98       892209.1       0.2892718
7         478597.39       864353.0       0.5537059
8         766334.04       390050.5       1.9647044
9          84338.49       777343.0       0.1084959
10        875333.80       960621.9       0.9112157
Run Code Online (Sandbox Code Playgroud)

我正在尝试编写一个函数:

def frankensteinRobot(df, sku, skuRowTally):
    1. find another SKU in dataframe, df.
       - must have non-overlapping parts with existing SKU set
       - rowTally <= skuRowTally (want to find small SKUs to add)
       - must be relatively same number of totalUnitSales
    2. append new SKU to list, and take mininum of totalUnitSales. 
    3. Calculate the weighted, per robot price
       dollarsPerRobotSKU_1*(1/length(SKU_list))+...+dollarsPerRobotSKU_n*(1/length(SKU_list)) 
       and append to the end of a list so we can track profitability with each additional SKU.
    4. repeat steps 1, 2 & 3.
Run Code Online (Sandbox Code Playgroud)

我只能弄清楚如何找到下一个互补的SKU,而不是整个SKU:

A_candidates <- sapply(data.frame(outer(1:nrow(A), 1:nrow(A), Vectorize(check_compliment))), which)
Run Code Online (Sandbox Code Playgroud)

输入示例:

frankensteinRobot(df = A, sku = 5, skuRowTally = 3)
Run Code Online (Sandbox Code Playgroud)

输出示例(请注意,由于我的MWE只有10行,因此示例输出列表仅包含2个元素,但实际上它们会更长。此外,我不确定哪种数据结构合适。也许是1列的数据框是列表?):

[list of SKUs]; [propSum]; [maxLb]; [list of weightedPrice]

[5, 2]; [propSum=0.6208557]; [maxLb=476403.6]; [0.8580641)
[5, 8]; [propSum=0.488396]; [maxLb=390050.5]; [1.665497]
[5, 9]; [propSum=0.6311891]; [maxLb=476403.6]; [0.7373929]
Run Code Online (Sandbox Code Playgroud)

输入示例:

frankensteinRobot(df = A, sku = 6, skuRowTally = 2)
Run Code Online (Sandbox Code Playgroud)

输出示例:

[6, 8]; [propSum=0.6136258]; [maxLb=390050.5]; [1.126988]
Run Code Online (Sandbox Code Playgroud)

Roa*_*247 2

所以我的编码词汇不是很广泛,但我想我会尝试一下我所知道的,并且我设法用一个小数据集(比你的OP示例稍大)来做到这一点。它似乎有效并产生了非常接近所需输出的结果。我尝试用更大的数据集(甚至不接近 80,000 x 26)来证明它,但它很快就停止了。任何比我更有编码经验的人都可能会发现,考虑到数据集的大小,这不是一个好方法。我不建议在更大的数据集上使用它,但考虑到我花了时间在它上面,它暂时有效,并且也许它可以作为替代更快的函数并实现更好的结果的灵感 - 我想我会无论如何发布它。它确实在一步中产生了一条错误消息,我不知道为什么,但实际上仍然工作得很好。由于错误,我无法将其放入函数中,但脚本完成了这项工作。

# (df = A, SKU = 5, skuRowTally =  26)    
a = runif(n=120, min=0, max=0.2)
a[a<0.12] = 0
n = 20
A = as.data.frame(matrix(a,              
                         nrow=n,         
                         ncol=6,         
                         byrow = TRUE))
A$rowTally <- rowSums(A != 0)
A$sku <- seq(from = 1, to = n)
A$totalDollarSales <- runif(n=n, min=1*10^2, max=1*10^6)
A$totalUnitSales <- runif(n=n, min=1*10^2, max=1*10^6)
names(A) <- c("p1_prop", "p2_prop", "p3_prop", "p4_prop", "p5_prop", "p6_prop", "rowTally", "sku", "totalDollarSales", "totalUnitSales")
A <- A[c("sku", "p1_prop", "p2_prop", "p3_prop", "p4_prop", "p5_prop", "p6_prop", "rowTally", "totalDollarSales", "totalUnitSales")]
A$dollarsPerRobot <- A$totalDollarSales/A$totalUnitSales


Output <- unique(rbind(A[which(A$sku == 5),],A[which(A$rowTally <= 26),])) # change df, SKU and skuRowTally here

for(i in 2:7) { #change 2:7 to your columns with parts props
  if(Output[which(Output$sku == 5),][i] !=  0) { # change SKU here
    Output <- Output[which(Output[,i] == 0),]
    Output <- rbind(A[which(A$sku == 5),],Output) # change SKU here
  }
}

Sets <- vector('list', nrow(Output)) 
head_list <- paste(rep("V",nrow(Output)),seq(1:nrow(Output)),sep="")
for(i in 2:nrow(Output)){
  Sets[[i]] <- as.data.frame(t(combn(Output$sku,i)))
  Sets[[i]] <- Sets[[i]][which(Sets[[i]][,1]==5),] # change SKU here
}

for(i in 2:length(Sets)){
  for(j in min(which(seq(1,length(head_list))>i),na.rm = TRUE):max(which(seq(1,length(head_list))>i),na.rm=TRUE)){
    Sets[[i]][,head_list[j]]<-NA
  }
}

Sets <- do.call(rbind,Sets)

Binary.Output <- Output

for(i in 2:7){ #change 2:7 to your columns with parts props
  Binary.Output[,i] <- ifelse(Binary.Output[,i] == 0,0,1)
}

for(i in 1:nrow(Sets)){
  Sets$Good.Combo[i] <-
    ifelse(any(apply(Binary.Output[which(Binary.Output$sku %in% Sets[i,1:nrow(Output)]),], MARGIN = 2, sum)[2:7] > 1),"BAD","GOOD") #change 2:7 to your columns with parts props
}

Sets <- Sets[which(Sets$Good.Combo == "GOOD"),]

for(i in 1:nrow(Sets)){
  Sets$Total.Parts[i] <-
    sum(Binary.Output[which(Binary.Output$sku %in% Sets[i,1:nrow(Binary.Output)]),][2:7]) #change 2:7 to your columns with parts props
  Sets$Complete[i] <- 
    ifelse(Sets$Total.Parts[i]< ncol(Output[,2:7]), "INCOMPLETE", "COMPLETE")
  Sets$MaxLb[i] <-
    min(Output[which(Output$sku %in% Sets[i,1:nrow(Output)]),"totalDollarSales"],na.rm = TRUE)
  Sets$Prop.Sum[i] <-
    sum(Output[which(Output$sku %in% Sets[i,1:nrow(Output)]),2:7])

}

for(i in 1:nrow(Sets)) {
  DPR <- c(1:length(c(t(Sets[i,1:nrow(Output)]))[which(!is.na(c(t(Sets[i,1:nrow(Output)]))))]))
  for (j in 1:length(DPR))  { 
    DPR[j] <- Output[which(Output$sku %in% Sets[i,1:nrow(Output)]),"dollarsPerRobot"][j]*1/length(DPR)
  }
  Sets$weightedPrice[i] <- sum(DPR)
}

print(Sets)
  V1 V2 V3 V4 V5 Good.Combo Total.Parts   Complete    MaxLb  Prop.Sum weightedPrice
1  5  4 NA NA NA       GOOD           4 INCOMPLETE 82485.02 0.6324902     2.6031580
2  5  7 NA NA NA       GOOD           5 INCOMPLETE 82485.02 0.8323490    13.2757958
3  5  9 NA NA NA       GOOD           4 INCOMPLETE 82485.02 0.6152630     1.4411304
4  5 10 NA NA NA       GOOD           4 INCOMPLETE 82485.02 0.6117570     0.5724854
5  5  4  7 NA NA       GOOD           6   COMPLETE 82485.02 0.9854523    10.5475486
6  5  4  9 NA NA       GOOD           5 INCOMPLETE 82485.02 0.7683664     2.6577717
7  5  4 10 NA NA       GOOD           5 INCOMPLETE 82485.02 0.7648603     2.0786750
Run Code Online (Sandbox Code Playgroud)