aka*_*h87 1 regex sql apache-spark apache-spark-sql
我正在使用 REGEXP 过滤具有 10 行的数据集,如下所示:
ID Product
1 "VENLAFAXINE HCL CAP ER 24HR 37.5 MG (BASE EQUIVALENT)"
2 "MINOXIDIL POWDER"
3 "MENTHOL LOZENGE 10 MG"
4 "ZINC CHLORIDE GRANULES"
5 "CLOPIDOGREL BISULFATE TAB 75 MG (BASE EQUIV)"
6 "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)"
7 "DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)"
8 "METHYLPREDNISOLONE DOSE P (16)"
9 "MILLIPRED DP (13)"
10 "ZONACORT 7 DAY"
Run Code Online (Sandbox Code Playgroud)
并且会让它看起来像
ID Product
6 "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)"
7 "DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)"
8 "METHYLPREDNISOLONE DOSE P (16)"
9 "MILLIPRED DP (13)"
Run Code Online (Sandbox Code Playgroud)
实际上,我想根据最后一个字符是否是括号内的数字来过滤数据集。我试过使用无济于事:
ID Product
1 "VENLAFAXINE HCL CAP ER 24HR 37.5 MG (BASE EQUIVALENT)"
2 "MINOXIDIL POWDER"
3 "MENTHOL LOZENGE 10 MG"
4 "ZINC CHLORIDE GRANULES"
5 "CLOPIDOGREL BISULFATE TAB 75 MG (BASE EQUIV)"
6 "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)"
7 "DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)"
8 "METHYLPREDNISOLONE DOSE P (16)"
9 "MILLIPRED DP (13)"
10 "ZONACORT 7 DAY"
Run Code Online (Sandbox Code Playgroud)
在 中base R,我们可以使用grepl左括号 ( \\() 后跟一位或多位数字 ( \\d+),然后匹配字符串\\)末尾 ( $)的右括号 ( )
subset(df1, grepl("\\(\\d+\\)$", Product))
# ID Product
#6 6 METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)
#7 7 DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)
#8 8 METHYLPREDNISOLONE DOSE P (16)
#9 9 MILLIPRED DP (13)
Run Code Online (Sandbox Code Playgroud)
df1 <- structure(list(ID = 1:10, Product = c("VENLAFAXINE HCL CAP ER 24HR 37.5 MG (BASE EQUIVALENT)",
"MINOXIDIL POWDER", "MENTHOL LOZENGE 10 MG", "ZINC CHLORIDE GRANULES",
"CLOPIDOGREL BISULFATE TAB 75 MG (BASE EQUIV)", "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)",
"DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)", "METHYLPREDNISOLONE DOSE P (16)",
"MILLIPRED DP (13)", "ZONACORT 7 DAY")), class = "data.frame", row.names = c(NA,
-10L))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
73 次 |
| 最近记录: |