gmo*_*evt 1 python regex bibtex
在python中解析这个结果的最佳方法是什么?我试过正则表达式,但无法让它工作.我正在寻找标题词,作者等作为键.
@article{perry2000epidemiological,
title={An epidemiological study to establish the prevalence of urinary symptoms and felt need in the community: the Leicestershire MRC Incontinence Study},
author={Perry, Sarah and Shaw, Christine and Assassa, Philip and Dallosso, Helen and Williams, Kate and Brittain, Katherine R and Mensah, Fiona and Smith, Nigel and Clarke, Michael and Jagger, Carol and others},
journal={Journal of public health},
volume={22},
number={3},
pages={427--434},
year={2000},
publisher={Oxford University Press}
}
Run Code Online (Sandbox Code Playgroud)
这看起来像引文格式.你可以像这样解析它:
>>> import re
>>> kv = re.compile(r'\b(?P<key>\w+)={(?P<value>[^}]+)}')
>>> citation = """
... @article{perry2000epidemiological,
... title={An epidemiological study to establish the prevalence of urinary symptoms and felt need in the community: the Leicestershire MRC Incontinence
... Study},
... author={Perry, Sarah and Shaw, Christine and Assassa, Philip and Dallosso, Helen and Williams, Kate and Brittain, Katherine R and Mensah, Fiona and
... Smith, Nigel and Clarke, Michael and Jagger, Carol and others},
... journal={Journal of public health},
... volume={22},
... number={3},
... pages={427--434},
... year={2000},
... publisher={Oxford University Press}
... }
... """
>>> dict(kv.findall(citation))
{'author': 'Perry, Sarah and Shaw, Christine and Assassa, Philip and Dallosso, Helen and Williams, Kate and Brittain, Katherine R and Mensah, Fiona and Smith, Nigel and Clarke, Michael and Jagger, Carol and others',
'journal': 'Journal of public health',
'number': '3',
'pages': '427--434',
'publisher': 'Oxford University Press',
'title': 'An epidemiological study to establish the prevalence of urinary symptoms and felt need in the community: the Leicestershire MRC Incontinence Study',
'volume': '22',
'year': '2000'}
Run Code Online (Sandbox Code Playgroud)
正则表达式使用两个命名的捕获组(主要是为了在视觉上表示什么是什么).
[^}]只要您不希望使用"嵌套"花括号,就可以方便地使用.换句话说,值只是大括号内的任何非大括号字符中的一个或多个.您可能正在寻找BibTeX解析器:https://bibtexparser.readthedocs.io/en/master/
\n\n来源:https ://bibtexparser.readthedocs.io/en/master/tutorial.html#step-0-vocabulary
\n\n输入/创建 bibtex 文件:
\n\n\n\n\nRun Code Online (Sandbox Code Playgroud)\nbibtex = """@ARTICLE{Cesar2013,\n author = {Jean C\xc3\xa9sar},\n title = {An amazing title},\n year = {2013},\n month = jan,\n volume = {12},\n pages = {12--23},\n journal = {Nice Journal},\n abstract = {This is an abstract. This line should be long enough to test\n multilines...},\n comments = {A comment},\n keywords = {keyword1, keyword2}\n}\n"""\n\nwith open(\'bibtex.bib\', \'w\') as bibfile:\n bibfile.write(bibtex)\n
解析它:
\n\n\n\n\nRun Code Online (Sandbox Code Playgroud)\nimport bibtexparser\n\nwith open(\'bibtex.bib\') as bibtex_file:\n bib_database = bibtexparser.load(bibtex_file)\n\nprint(bib_database.entries)\n
输出:
\n\n\n\nRun Code Online (Sandbox Code Playgroud)\n[{\'journal\': \'Nice Journal\',\n \'comments\': \'A comment\',\n \'pages\': \'12--23\',\n \'month\': \'jan\',\n \'abstract\': \'This is an abstract. This line should be long enough to test\\nmultilines...\',\n \'title\': \'An amazing title\',\n \'year\': \'2013\',\n \'volume\': \'12\',\n \'ID\': \'Cesar2013\',\n \'author\': \'Jean C\xc3\xa9sar\',\n \'keyword\': \'keyword1, keyword2\',\n \'ENTRYTYPE\': \'article\'}]\n