从不同格式的相同字符串中提取数字/字符

lll*_*lll 1 regex split r

我是R的新手,我有一个如下数据集:

  Artist                                                                              Medium.Size
  1     HIROSHI SUGIMOTO (B. 1948)                     gelatin silver print mounted on paper \n 20 x 24 in. (50.8 x 61 cm.)
  2     HIROSHI SUGIMOTO (B. 1948)                     gelatin silver print mounted on paper \n 20 x 24 in. (50.8 x 61 cm.)
  3     HIROSHI SUGIMOTO (B. 1948)                                 gelatin silver print \n 20 x 24 inches (50.7 x 63.2 cm.)
  4     HIROSHI SUGIMOTO (B. 1948)                                 gelatin silver print \n 20 x 24 inches (50.7 x 63.2 cm.)
  5     HIROSHI SUGIMOTO (B. 1948)                   gelatin silver print mounted on paper \n 20 x 24 in. (50.8 x 60.9 cm.)
  6     HIROSHI SUGIMOTO (B. 1948)                     gelatin silver print mounted on paper \n 20 x 24 in. (50.8 x 61 cm.)
  7     Richard Phillips (b. 1963)                                       graphite on paper \n 12 x 8? in. (30.4 x 21.5 cm.)
  8        Marlene Dumas (b. 1953)                       ink, acrylic and graphite on paper \n 26 x 19? in. (66 x 50.1 cm.)
  9       Lisa Yuskavage (b. 1962)                            oil and graphite on panel \n 7 5/8 x 9? in. (19.3 x 24.7 cm.)
  10      Lisa Yuskavage (b. 1962)                    watercolor and graphite on paper \n 7 5/8 x 10? in. (19.3 x 26.6 cm.)
  11      Barnaby Furnas (b. 1973)                      urethane and wax medium on canvas \n 40 x 30 in. (101.6 x 76.2 cm.)
Run Code Online (Sandbox Code Playgroud)

我想在第二列中提取信息,并在第一个"\n"和括号中的表达式之前获取有关中等单词的信息.

我试过用

 split = strsplit(impression$Medium.Size, ", | \n | \\(")
Run Code Online (Sandbox Code Playgroud)

但似乎它返回给我一个不同大小的列表

 [[3517]]
 [1] "oil on canvas\n 25 ? x 32 in." "65.4 x 81.3 cm.)"             

 [[3518]] 
 [1] "bronze with green and brown patina\n Height: 15 in." "38 cm.); Length:  25 5/8"                            
 [3] "65 cm.); Width: 27 5/8 in."                          "70 cm.)"   
Run Code Online (Sandbox Code Playgroud)

我希望得到的是类似的东西

  medium                size
 graphite on paper     50.8*61cm
Run Code Online (Sandbox Code Playgroud)

Jaa*_*aap 5

你可以使用splitstackshape -package,如下所示:

library(splitstackshape)
cSplit(impression, "Medium", sep = "\n", direction = "wide", fixed = TRUE)
Run Code Online (Sandbox Code Playgroud)

这将为您提供一个data.table,其中-columnMedium分为两列.