Jay*_*Bee 3 regex string split r
我有一份成分清单如下:
Ingredients <- "Starch (Corn | Potato | Wheat) | Vegetables (27%) [Pea (23%) (Flakes | Pieces) | Carrot Pieces | Onion Powder | Spinach Powder] | Croutons (10%) (Wheat Flour | Vegetable Oil | Salt | Yeast) | Maltodextrin | Natural Flavours (Contain Milk and Soybeans) | Creamer [Contains Milk | Mineral Salts (339 or 340 | 450 or 451)] | Salt | Mineral Salt (Potassium Chloride) | Sugar | Flavour Enhancer (621) | Vegetable Oil | Bacon Powder (0.5%) | Parsley | Natural Colour (Turmeric) | Burnt Sugar | Food Acid (Lactic) | Pepper Extract"
Run Code Online (Sandbox Code Playgroud)
我想将它们分成变量下数据框中的值ingredients。
但我在编写代码时遇到了麻烦,因为分隔符|在列表中以各种方式使用。所以我想在|它不包含在括号()或方括号内的地方进行拆分[]。并且真的不知道如何解决这个问题。
也就是说,我们最终得到的成分值为 ,Starch (Corn | Potato | Wheat)另一个为Vegetables (27%) [Pea (23%) (Flakes | Pieces) | Carrot Pieces | Onion Powder | Spinach Powder],另一个为Salt(加上其他成分,但前两个对我来说是更棘手的情况)。
正则表达式从此答案修改。
这个想法是首先将括号(和)|之间的字符替换为其他字符(在我的示例中)。剩下的字符应该是字符串的真正分隔符。然后使用拆分并将符号替换回。最后,删除每个字符串末尾不需要的空格。()[]@|strsplit|@|trims()
library(dplyr)
strsplit(gsub("\\|(?=[^()]*\\))", "@", Ingredients, perl=TRUE) %>%
gsub("\\|(?=[^\\[\\]]*\\])", "@", ., perl=TRUE), "\\|") %>%
unlist() %>%
gsub("@", "\\|", .) %>%
trimws()
[1] "Starch (Corn | Potato | Wheat)"
[2] "Vegetables (27%) [Pea (23%) (Flakes | Pieces) | Carrot Pieces | Onion Powder | Spinach Powder]"
[3] "Croutons (10%) (Wheat Flour | Vegetable Oil | Salt | Yeast)"
[4] "Maltodextrin"
[5] "Natural Flavours (Contain Milk and Soybeans)"
[6] "Creamer [Contains Milk | Mineral Salts (339 or 340 | 450 or 451)]"
[7] "Salt"
[8] "Mineral Salt (Potassium Chloride)"
[9] "Sugar"
[10] "Flavour Enhancer (621)"
[11] "Vegetable Oil"
[12] "Bacon Powder (0.5%)"
[13] "Parsley"
[14] "Natural Colour (Turmeric)"
[15] "Burnt Sugar"
[16] "Food Acid (Lactic)"
[17] "Pepper Extract"
Run Code Online (Sandbox Code Playgroud)
您可以使用递归正则表达式:
pat <- r"(([^\[\]|]*[\[(](?:[^\[)(\]]*(?1)?)+[\])])| ([^|]+))"
regmatches(Ingredients, gregexpr(pat, Ingredients, perl = TRUE))
[[1]]
[1] "Starch (Corn | Potato | Wheat)"
[2] " Vegetables (27%) [Pea (23%) (Flakes | Pieces) | Carrot Pieces | Onion Powder | Spinach Powder]"
[3] " Croutons (10%) (Wheat Flour | Vegetable Oil | Salt | Yeast)"
[4] " Maltodextrin "
[5] " Natural Flavours (Contain Milk and Soybeans)"
[6] " Creamer [Contains Milk | Mineral Salts (339 or 340 | 450 or 451)]"
[7] " Salt "
[8] " Mineral Salt (Potassium Chloride)"
[9] " Sugar "
[10] " Flavour Enhancer (621)"
[11] " Vegetable Oil "
[12] " Bacon Powder (0.5%)"
[13] " Parsley "
[14] " Natural Colour (Turmeric)"
[15] " Burnt Sugar "
[16] " Food Acid (Lactic)"
[17] " Pepper Extract"
Run Code Online (Sandbox Code Playgroud)