我是Python和pyparsing的新手.我需要完成以下任务.
我的示例文本行是这样的:
12 items - Ironing Service 11 Mar 2009 to 10 Apr 2009
Washing service (3 Shirt) 23 Mar 2009
Run Code Online (Sandbox Code Playgroud)
我需要提取项目描述,期间
tok_date_in_ddmmmyyyy = Combine(Word(nums,min=1,max=2)+ " " + Word(alphas, exact=3) + " " + Word(nums,exact=4))
tok_period = Combine((tok_date_in_ddmmmyyyy + " to " + tok_date_in_ddmmmyyyy)|tok_date_in_ddmmmyyyy)
tok_desc = Word(alphanums+"-()") but stop before tok_period
Run Code Online (Sandbox Code Playgroud)
这该怎么做?
我建议将SkipTo视为最合适的pyparsing类,因为你对不需要的文本有一个很好的定义,但在此之前会接受很多东西.以下是使用SkipTo的几种方法:
text = """\
12 items - Ironing Service 11 Mar 2009 to 10 Apr 2009
Washing service (3 Shirt) 23 Mar 2009"""
# using tok_period as defined in the OP
# parse each line separately
for tx in text.splitlines():
print SkipTo(tok_period).parseString(tx)[0]
# or have pyparsing search through the whole input string using searchString
for [[td,_]] in SkipTo(tok_period,include=True).searchString(text):
print td
Run Code Online (Sandbox Code Playgroud)
两个for循环都打印以下内容:
12 items - Ironing Service
Washing service (3 Shirt)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1095 次 |
| 最近记录: |