我有一个约80,000行乘26列的数据集。这些行对应于“ SKU”或机器人构建集的唯一ID。这些列对应于26个不同的机器人零件。一个单元包含一部分对构建整个机器人的贡献。行的比例之和可能不等于1.0,因为构建集不一定总具有构建整个机器人所需的零件的100%。
主要目标是构建一个接受SKU作为输入并输出互补SKU列表的功能。补充行定义为:
目标是找到补充给定SKU的所有可能的SKU集,从而可以构建整个机器人。另外,重要的是要看到weightedPrice这套“科学怪人” SKU的每个机器人的加权收入(“ ”)。很好地展示了weightedPrice每个补充SKU的添加如何变化。
最低工作玩具示例(MWE):
set.seed(1)
a = runif(n=60, min=0, max=0.2)
a[a<0.12] = 0
n = 10
A = as.data.frame(matrix(a,
nrow=n,
ncol=6,
byrow = TRUE))
A$rowTally <- rowSums(A != 0)
A$sku <- seq(from = 1, to = n)
A$totalDollarSales <- runif(n=n, min=1*10^2, max=1*10^6)
A$totalUnitSales <- runif(n=n, min=1*10^2, max=1*10^6)
names(A) <- c("p1_prop", "p2_prop", "p3_prop", "p4_prop", "p5_prop", "p6_prop", "rowTally", "sku", "totalDollarSales", "totalUnitSales")
A <- A[c("sku", "p1_prop", "p2_prop", "p3_prop", "p4_prop", "p5_prop", "p6_prop", "rowTally", "totalDollarSales", "totalUnitSales")]
A$dollarsPerRobot <- A$totalDollarSales/A$totalUnitSales
A
sku p1_prop p2_prop p3_prop p4_prop p5_prop p6_prop rowTally
1 1 0 0 0 0.1816416 0 0.1796779 2
2 2 0.1889351 0.1321596 0.1258228 0 0 0 3
3 3 0.1374046 0 0.1539683 0 0.1435237 0.1983812 4
4 4 0 0.1554890 0.1869410 0 0.1303348 0 3
5 5 0 0 0 0 0.1739382 0 1
6 6 0 0 0 0 0.1654747 0.1336933 2
7 7 0.1588480 0 0.1447422 0 0.1641893 0.1294120 4
8 8 0.1565866 0 0 0.1578712 0 0 2
9 9 0.1464627 0.1385463 0 0.1722419 0 0 3
10 10 0 0 0 0 0.1324010 0 1
totalDollarSales totalUnitSales dollarsPerRobot
1 912884.64 339139.0 2.6917711
2 293674.01 839456.4 0.3498383
3 459119.82 346748.8 1.3240703
4 332461.43 333841.6 0.9958659
5 650905.38 476403.6 1.3662898
6 258090.98 892209.1 0.2892718
7 478597.39 864353.0 0.5537059
8 766334.04 390050.5 1.9647044
9 84338.49 777343.0 0.1084959
10 875333.80 960621.9 0.9112157
Run Code Online (Sandbox Code Playgroud)
我正在尝试编写一个函数:
def frankensteinRobot(df, sku, skuRowTally):
1. find another SKU in dataframe, df.
- must have non-overlapping parts with existing SKU set
- rowTally <= skuRowTally (want to find small SKUs to add)
- must be relatively same number of totalUnitSales
2. append new SKU to list, and take mininum of totalUnitSales.
3. Calculate the weighted, per robot price
dollarsPerRobotSKU_1*(1/length(SKU_list))+...+dollarsPerRobotSKU_n*(1/length(SKU_list))
and append to the end of a list so we can track profitability with each additional SKU.
4. repeat steps 1, 2 & 3.
Run Code Online (Sandbox Code Playgroud)
我只能弄清楚如何找到下一个互补的SKU,而不是整个SKU:
A_candidates <- sapply(data.frame(outer(1:nrow(A), 1:nrow(A), Vectorize(check_compliment))), which)
Run Code Online (Sandbox Code Playgroud)
输入示例:
frankensteinRobot(df = A, sku = 5, skuRowTally = 3)
Run Code Online (Sandbox Code Playgroud)
输出示例(请注意,由于我的MWE只有10行,因此示例输出列表仅包含2个元素,但实际上它们会更长。此外,我不确定哪种数据结构合适。也许是1列的数据框是列表?):
[list of SKUs]; [propSum]; [maxLb]; [list of weightedPrice]
[5, 2]; [propSum=0.6208557]; [maxLb=476403.6]; [0.8580641)
[5, 8]; [propSum=0.488396]; [maxLb=390050.5]; [1.665497]
[5, 9]; [propSum=0.6311891]; [maxLb=476403.6]; [0.7373929]
Run Code Online (Sandbox Code Playgroud)
输入示例:
frankensteinRobot(df = A, sku = 6, skuRowTally = 2)
Run Code Online (Sandbox Code Playgroud)
输出示例:
[6, 8]; [propSum=0.6136258]; [maxLb=390050.5]; [1.126988]
Run Code Online (Sandbox Code Playgroud)
所以我的编码词汇不是很广泛,但我想我会尝试一下我所知道的,并且我设法用一个小数据集(比你的OP示例稍大)来做到这一点。它似乎有效并产生了非常接近所需输出的结果。我尝试用更大的数据集(甚至不接近 80,000 x 26)来证明它,但它很快就停止了。任何比我更有编码经验的人都可能会发现,考虑到数据集的大小,这不是一个好方法。我不建议在更大的数据集上使用它,但考虑到我花了时间在它上面,它暂时有效,并且也许它可以作为替代更快的函数并实现更好的结果的灵感 - 我想我会无论如何发布它。它确实在一步中产生了一条错误消息,我不知道为什么,但实际上仍然工作得很好。由于错误,我无法将其放入函数中,但脚本完成了这项工作。
# (df = A, SKU = 5, skuRowTally = 26)
a = runif(n=120, min=0, max=0.2)
a[a<0.12] = 0
n = 20
A = as.data.frame(matrix(a,
nrow=n,
ncol=6,
byrow = TRUE))
A$rowTally <- rowSums(A != 0)
A$sku <- seq(from = 1, to = n)
A$totalDollarSales <- runif(n=n, min=1*10^2, max=1*10^6)
A$totalUnitSales <- runif(n=n, min=1*10^2, max=1*10^6)
names(A) <- c("p1_prop", "p2_prop", "p3_prop", "p4_prop", "p5_prop", "p6_prop", "rowTally", "sku", "totalDollarSales", "totalUnitSales")
A <- A[c("sku", "p1_prop", "p2_prop", "p3_prop", "p4_prop", "p5_prop", "p6_prop", "rowTally", "totalDollarSales", "totalUnitSales")]
A$dollarsPerRobot <- A$totalDollarSales/A$totalUnitSales
Output <- unique(rbind(A[which(A$sku == 5),],A[which(A$rowTally <= 26),])) # change df, SKU and skuRowTally here
for(i in 2:7) { #change 2:7 to your columns with parts props
if(Output[which(Output$sku == 5),][i] != 0) { # change SKU here
Output <- Output[which(Output[,i] == 0),]
Output <- rbind(A[which(A$sku == 5),],Output) # change SKU here
}
}
Sets <- vector('list', nrow(Output))
head_list <- paste(rep("V",nrow(Output)),seq(1:nrow(Output)),sep="")
for(i in 2:nrow(Output)){
Sets[[i]] <- as.data.frame(t(combn(Output$sku,i)))
Sets[[i]] <- Sets[[i]][which(Sets[[i]][,1]==5),] # change SKU here
}
for(i in 2:length(Sets)){
for(j in min(which(seq(1,length(head_list))>i),na.rm = TRUE):max(which(seq(1,length(head_list))>i),na.rm=TRUE)){
Sets[[i]][,head_list[j]]<-NA
}
}
Sets <- do.call(rbind,Sets)
Binary.Output <- Output
for(i in 2:7){ #change 2:7 to your columns with parts props
Binary.Output[,i] <- ifelse(Binary.Output[,i] == 0,0,1)
}
for(i in 1:nrow(Sets)){
Sets$Good.Combo[i] <-
ifelse(any(apply(Binary.Output[which(Binary.Output$sku %in% Sets[i,1:nrow(Output)]),], MARGIN = 2, sum)[2:7] > 1),"BAD","GOOD") #change 2:7 to your columns with parts props
}
Sets <- Sets[which(Sets$Good.Combo == "GOOD"),]
for(i in 1:nrow(Sets)){
Sets$Total.Parts[i] <-
sum(Binary.Output[which(Binary.Output$sku %in% Sets[i,1:nrow(Binary.Output)]),][2:7]) #change 2:7 to your columns with parts props
Sets$Complete[i] <-
ifelse(Sets$Total.Parts[i]< ncol(Output[,2:7]), "INCOMPLETE", "COMPLETE")
Sets$MaxLb[i] <-
min(Output[which(Output$sku %in% Sets[i,1:nrow(Output)]),"totalDollarSales"],na.rm = TRUE)
Sets$Prop.Sum[i] <-
sum(Output[which(Output$sku %in% Sets[i,1:nrow(Output)]),2:7])
}
for(i in 1:nrow(Sets)) {
DPR <- c(1:length(c(t(Sets[i,1:nrow(Output)]))[which(!is.na(c(t(Sets[i,1:nrow(Output)]))))]))
for (j in 1:length(DPR)) {
DPR[j] <- Output[which(Output$sku %in% Sets[i,1:nrow(Output)]),"dollarsPerRobot"][j]*1/length(DPR)
}
Sets$weightedPrice[i] <- sum(DPR)
}
print(Sets)
V1 V2 V3 V4 V5 Good.Combo Total.Parts Complete MaxLb Prop.Sum weightedPrice
1 5 4 NA NA NA GOOD 4 INCOMPLETE 82485.02 0.6324902 2.6031580
2 5 7 NA NA NA GOOD 5 INCOMPLETE 82485.02 0.8323490 13.2757958
3 5 9 NA NA NA GOOD 4 INCOMPLETE 82485.02 0.6152630 1.4411304
4 5 10 NA NA NA GOOD 4 INCOMPLETE 82485.02 0.6117570 0.5724854
5 5 4 7 NA NA GOOD 6 COMPLETE 82485.02 0.9854523 10.5475486
6 5 4 9 NA NA GOOD 5 INCOMPLETE 82485.02 0.7683664 2.6577717
7 5 4 10 NA NA GOOD 5 INCOMPLETE 82485.02 0.7648603 2.0786750
Run Code Online (Sandbox Code Playgroud)