我正在使用Ruby 1.9,我想知道是否有一个简单的正则表达方式来做到这一点.
我有很多字符串看起来像这样的一些变化:
str = "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment"
Run Code Online (Sandbox Code Playgroud)
我的想法是,我想将这个字符串分解为其功能组件
字符串的"语法"是存在"密钥",其由一个或多个"单词或其他字符"(例如干预模型)组成,后跟冒号(:).每个键都有一个相应的"值"(例如,并行赋值),紧跟在冒号(:)之后......"值"由单词,逗号(无论如何)组成,但"值"的结尾用逗号表示.
键/值对的数量是可变的.我还假设冒号(:)不允许成为"值"的一部分,并且逗号(,)不允许成为"键"的一部分.
人们会认为有一种"regexy"方法可以将其分解为组件,但我尝试制作一个合适的匹配正则表达式只会选择第一个键/值对,而我不确定如何捕获其他键.关于如何捕捉其他比赛的任何想法?
regex = /(([^,]+?): ([^:]+?,))+?/
=> /(([^,]+?): ([^:]+?,))+?/
irb(main):139:0> str = "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment"
=> "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment"
irb(main):140:0> str.match regex
=> #<MatchData "Allocation: Random," 1:"Allocation: Random," 2:"Allocation" 3:" Random,">
irb(main):141:0> $1
=> "Allocation: Random,"
irb(main):142:0> $2
=> "Allocation"
irb(main):143:0> $3
=> " Random,"
irb(main):144:0> $4
=> nil
Run Code Online (Sandbox Code Playgroud)
irb(main):003:0> pp Hash[ *str.split(/\s*([^,]+:)\s+/)[1..-1] ]
{"Allocation:"=>"Random,",
"Control:"=>"Active Control,",
"Endpoint Classification:"=>"Safety Study,",
"Intervention Model:"=>"Parallel Assignment,",
"Masking:"=>
"Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor),",
"Primary Purpose:"=>"Treatment"}
Run Code Online (Sandbox Code Playgroud)
不需要正则表达式的空白部分,但有助于稍微清理输出.我留给你做后续的小清理,比如从键的末尾删除冒号或从值中删除逗号.
| 归档时间: |
|
| 查看次数: |
1435 次 |
| 最近记录: |