Lul*_*ulY 10 string combinations r character overlap
我有一个只有三个单词的字符串,如下所示:
first_string <- c("self", "funny", "nymph")
Run Code Online (Sandbox Code Playgroud)
正如你所看到的,这个向量的单词可以全部组合成一个单词,因为字母中有一些重叠,即我们得到 self fun un ny mph。我们称其为单词列车。
此外,我还有另一个包含很多单词的向量。设第二个向量为:
second_string <- c("house", "garden", "duck", "evil", "fluff")
Run Code Online (Sandbox Code Playgroud)
我想知道第二个字符串的哪些单词可以添加到单词序列中。在这种情况下,这是houseand fluff(可以添加在 self fun ny mphhouse的末尾,并且可以放在and之间)。所以这里的预期输出是:fluffselffunny
expected <- data.frame(word= c("house", "fluff"), word_train= c("selfunnymphouse", "selfluffunnymph"))
Run Code Online (Sandbox Code Playgroud)
重叠可以是任意长度,即自我和滑稽仅与一个角色重叠,但滑稽和若虫在两个角色中重叠。
编辑
新词可以改变第一个词串的词序。例如,如果第二个向量包含单词,hugs我们可以将单词 train nymp h ug s el funny放在和nymph之前。selffunny
Cas*_* V. 11
我想知道你为什么问这个问题,但无论如何,这是一个有趣的练习。这是我的实现:
library('dplyr')
# define cars -------------------------------------------------------------
original_cars <- c("self", "funny", "nymph")
new_cars <- c("house", "garden", "duck", "evil", "fluff")
cars <- c(original_cars, new_cars)
# get all possible connections ('parts') per car --------------------------
car_parts <- lapply(seq_along(cars), \(car_id) {
car = cars[car_id]
n = nchar(car)
ids <- rep(car_id, n)
names <- rep(car, n)
left <- vapply(seq_len(n), \(i) substr(car, 1, i), "")
right <- vapply(seq_len(n), \(i) substr(car, n-i+1, n), "")
overlap <- nchar(left)
data.frame(car.id = ids, car.name = names, left = left, right = right, overlap = overlap)
}) |> do.call(rbind, args=_)
# > car_parts
# car.id car.name left right overlap
# 1 1 self s f 1
# 2 1 self se lf 2
# 3 1 self sel elf 3
# 4 1 self self self 4
# 5 2 funny f y 1
# 6 2 funny fu ny 2
# 7 2 funny fun nny 3
# 8 2 funny funn unny 4
# 9 2 funny funny funny 5
# 10 3 nymph n h 1
# [...]
# get all possible connections between two cars ---------------------------
connections <- inner_join(car_parts |> select(-left),
car_parts |> select(-right),
by = c('overlap', 'right' = 'left'),
suffix = c('.left', '.right')) |>
filter(car.id.left != car.id.right) |>
mutate(connection.id = row_number()) |>
select(connection.id, car.id.left, car.id.right, car.name.left, car.name.right, coupling = right)
rm(car_parts)
# > connections
# connection.id car.id.left car.id.right car.name.left car.name.right coupling
# 1 1 1 2 self funny f
# 2 2 1 8 self fluff f
# 3 3 2 3 funny nymph ny
# 4 4 3 4 nymph house h
# 5 5 4 7 house evil e
# 6 6 4 1 house self se
# 7 7 5 3 garden nymph n
# 8 8 8 2 fluff funny f
# function to store valid trains ------------------------------------------
# example:
# valid_trains <- list()
# valid_trains <- add_valid_train( valid_trains, c(1, 8), c(2) )
add_valid_train <- function(valid_trains, train_cars, train_connections) {
names = c(cars[train_cars[1]],
vapply(train_connections, \(x) connections$car.name.right[x], "") )
couplings = vapply(train_connections, \(x) connections$coupling[x], "")
append(valid_trains, list(list(cars = train_cars, names = names, couplings = couplings)))
}
# function to recursively find next cars to add to train ------------------
# example:
# add_car(9, 5, c(1,2,3), c(1,3,5))
add_car <- function(valid_trains, new_car, new_connection = NULL, train_cars = c(), train_connections = c(), depth = 0) {
cat(strrep(' ',depth), cars[new_car],'\n', sep='')
# store current train as valid
train_cars <- c(train_cars, new_car)
train_connections <- c(train_connections, new_connection)
# find next possible cars to add; save train if no more options, otherwise add all options
options <- connections |> filter(car.id.left == new_car, ! car.id.right %in% train_cars)
if(nrow(options) == 0) valid_trains <- add_valid_train(valid_trains, train_cars, train_connections) # save only the longest options
for(i in seq_len(nrow(options))) valid_trains <- add_car(valid_trains, options$car.id.right[i], options$connection.id[i], train_cars, train_connections, depth+1)
return(valid_trains)
}
# get all valid trains ----------------------------------------------------
valid_trains <- list()
for(i in seq_along(cars)) add_car(valid_trains, i) -> valid_trains
# filter valid trains that have all cars from `original_cars` -------------
mask <- vapply(valid_trains, \(x) all(seq_along(original_cars) %in% x$cars), T)
new_trains <- lapply(valid_trains[mask], \(x) {
x$newcars <- setdiff(x$cars, seq_along(original_cars))
x$newnames <- cars[x$newcars]
x
})
# print names of all trains that contain all 'original' cars:
#
# > sapply(new_trains, \(x) x$names)
# [[1]] "self" "funny" "nymph" "house" "evil"
# [[2]] "self" "fluff" "funny" "nymph" "house" "evil"
# [[3]] "funny" "nymph" "house" "self" "fluff"
# [[4]] "nymph" "house" "self" "funny"
# [[5]] "nymph" "house" "self" "fluff" "funny"
# [[6]] "house" "self" "funny" "nymph"
# [[7]] "house" "self" "fluff" "funny" "nymph"
# [[8]] "garden" "nymph" "house" "self" "funny"
# [[9]] "garden" "nymph" "house" "self" "fluff" "funny"
# [[10]] "fluff" "funny" "nymph" "house" "self"
## All possible trains are in `valid_trains`, all of those where *all* the original cars are used are in `new_trains`.
##
## It is possible that some trains are subsets of others.
Run Code Online (Sandbox Code Playgroud)
编辑:当我查看您自己的实现时,我认为您对最长的火车感兴趣。现在您解释了目的,我调整了算法以获取原始汽车,并查看哪些新车可以单独添加到原始集合中。使用以前的代码,一长串潜在的新名字将创建一些巨大的火车,这对于命名一个家庭来说是非常不可行的。
library('dplyr')
# define cars -------------------------------------------------------------
original_cars <- c("self", "funny", "nymph")
new_cars <- c("house", "garden", "duck", "evil", "fluff")
# function to get all possible connections between a set of cars ----------
# example:
# cars <- c("self", "funny", "nymph", "house")
# get_connections(cars)
#
# > get_connections(c("self", "funny", "nymph", "house"))
# connection.id car.id.left car.id.right car.name.left car.name.right coupling
# 1 1 1 2 self funny f
# 2 2 2 3 funny nymph ny
# 3 3 3 4 nymph house h
# 4 4 4 1 house self se
get_connections <- function(cars) {
# get all connections the cars can make
car_parts <- lapply(seq_along(cars), \(car_id) {
car = cars[car_id]
n = nchar(car)
ids <- rep(car_id, n)
names <- rep(car, n)
left <- vapply(seq_len(n), \(i) substr(car, 1, i), "")
right <- vapply(seq_len(n), \(i) substr(car, n-i+1, n), "")
overlap <- nchar(left)
data.frame(car.id = ids, car.name = names, left = left, right = right, overlap = overlap)
}) |> do.call(rbind, args=_)
# > car_parts
# car.id car.name left right overlap
# 1 1 self s f 1
# 2 1 self se lf 2
# 3 1 self sel elf 3
# 4 1 self self self 4
# 5 2 funny f y 1
# 6 2 funny fu ny 2
# [...]
# return all possible connections between two cars
inner_join(car_parts |> select(-left),
car_parts |> select(-right),
by = c('overlap', 'right' = 'left'),
suffix = c('.left', '.right')) |>
filter(car.id.left != car.id.right) |>
mutate(connection.id = row_number()) |>
select(connection.id, car.id.left, car.id.right, car.name.left, car.name.right, coupling = right)
}
# function to store valid trains ------------------------------------------
# example:
# cars <- c("self", "funny", "nymph", "house")
# connections <- get_connections(cars)
# valid_trains <- list()
# valid_trains <- add_valid_train( cars, connections, valid_trains, c(2, 3), c(2) )
add_valid_train <- function(cars, connections, valid_trains, train_cars, train_connections) {
names = c(cars[train_cars[1]],
vapply(train_connections, \(x) connections$car.name.right[x], "") )
couplings = vapply(train_connections, \(x) connections$coupling[x], "")
append(valid_trains, list(list(cars = train_cars, names = names, couplings = couplings)))
}
# function to recursively find next cars to add to train ------------------
# example:
# cars <- c("self", "funny", "nymph", "house")
# connections <- get_connections(cars)
# valid_trains <- list()
# add_car(cars, connections, valid_trains, 2)
add_car <- function(cars, connections, valid_trains, new_car, new_connection = NULL, train_cars = c(), train_connections = c(), depth = 0) {
cat(strrep(' ',depth), cars[new_car], '\n', sep='')
# store current train as valid
train_cars <- c(train_cars, new_car)
train_connections <- c(train_connections, new_connection)
# find next possible cars to add
options <- connections |> filter(car.id.left == new_car, ! car.id.right %in% train_cars)
for(i in seq_len(nrow(options))) valid_trains <- add_car(cars, connections, valid_trains, options$car.id.right[i], options$connection.id[i], train_cars, train_connections, depth+1)
# save train if no more options
if(nrow(options) == 0) valid_trains <- add_valid_train(cars, connections, valid_trains, train_cars, train_connections)
return(valid_trains)
}
# find individual new cars that can be added to existing cars --------------
results <- lapply(new_cars, function(new_car) {
cat('adding "',new_car,'":\n', sep='')
cars <- c(original_cars, new_car)
connections <- get_connections(cars)
# get all possible trains
valid_trains <- list()
for(i in seq_along(cars)) add_car(cars, connections, valid_trains, i) -> valid_trains
cat('\n')
# return only trains where all cars are used
valid_trains <- valid_trains[ sapply(valid_trains, \(x) length(x$cars)) == length(cars) ]
return(list(new_car = new_car, options = length(valid_trains), trains = valid_trains))
})
for(result in results) {
cat('\n', result$new_car, ': ', result$options, ' options ', sep='')
for(train in result$trains) {
cat('[',train$names,'] ')
}
}
# detailed results are in `results`
Run Code Online (Sandbox Code Playgroud)
house: 4 options [ self funny nymph house ] [ funny nymph house self ] [ nymph house self funny ] [ house self funny nymph ]
garden: 0 options
duck: 0 options
evil: 0 options
fluff: 1 options [ self fluff funny nymph ]
Run Code Online (Sandbox Code Playgroud)