当#对是变量时,正则表达式将字符串分成"键"/"值"对?

use*_*146 3 ruby regex string

我正在使用Ruby 1.9,我想知道是否有一个简单的正则表达方式来做到这一点.

我有很多字符串看起来像这样的一些变化:

str = "Allocation:  Random, Control:  Active Control, Endpoint Classification:  Safety Study, Intervention Model:  Parallel Assignment, Masking:  Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose:  Treatment"
Run Code Online (Sandbox Code Playgroud)

我的想法是,我想将这个字符串分解为其功能组件

  • 分配:随机
  • 控制:主动控制
  • 终点分类:安全性研究
  • 干预模式:并行分配
  • 掩蔽:双盲(主题,看护人,调查员,结果,评估员)
  • 主要目的:治疗

字符串的"语法"是存在"密钥",其由一个或多个"单词或其他字符"(例如干预模型)组成,后跟冒号(:).每个键都有一个相应的"值"(例如,并行赋值),紧跟在冒号(:)之后......"值"由单词,逗号(无论如何)组成,但"值"的结尾用逗号表示.

键/值对的数量是可变的.我还假设冒号(:)不允许成为"值"的一部分,并且逗号(,)不允许成为"键"的一部分.

人们会认为有一种"regexy"方法可以将其分解为组件,但我尝试制作一个合适的匹配正则表达式只会选择第一个键/值对,而我不确定如何捕获其他键.关于如何捕捉其他比赛的任何想法?

 regex = /(([^,]+?): ([^:]+?,))+?/
=> /(([^,]+?): ([^:]+?,))+?/
irb(main):139:0> str = "Allocation:  Random, Control:  Active Control, Endpoint Classification:  Safety Study, Intervention Model:  Parallel Assignment, Masking:  Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose:  Treatment"
=> "Allocation:  Random, Control:  Active Control, Endpoint Classification:  Safety Study, Intervention Model:  Parallel Assignment, Masking:  Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose:  Treatment"
irb(main):140:0> str.match regex
=> #<MatchData "Allocation:  Random," 1:"Allocation:  Random," 2:"Allocation" 3:" Random,">
irb(main):141:0> $1
=> "Allocation:  Random,"
irb(main):142:0> $2
=> "Allocation"
irb(main):143:0> $3
=> " Random,"
irb(main):144:0> $4
=> nil
Run Code Online (Sandbox Code Playgroud)

Phr*_*ogz 6

irb(main):003:0> pp Hash[ *str.split(/\s*([^,]+:)\s+/)[1..-1] ]
{"Allocation:"=>"Random,",
 "Control:"=>"Active Control,",
 "Endpoint Classification:"=>"Safety Study,",
 "Intervention Model:"=>"Parallel Assignment,",
 "Masking:"=>
  "Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor),",
 "Primary Purpose:"=>"Treatment"}
Run Code Online (Sandbox Code Playgroud)

不需要正则表达式的空白部分,但有助于稍微清理输出.我留给你做后续的小清理,比如从键的末尾删除冒号或从值中删除逗号.