使用Python的选择性文本

Adr*_*ian 1 python text selection

我是python的初学者,我将它用于我的硕士论文,所以我不知道那么多.我有一堆年度报告(采用txt格式)文件,我想选择"ITEM1"之间的所有文本.和"ITEM2.".我正在使用重新包装.我的问题是,有时候,在那些10ks中,有一个名为"ITEM1A"的部分.我希望代码能够识别出这个并停在"ITEM1A".并在输出中输入"ITEM1"之间的文本.和"ITEM1A.".在我附加到这篇文章的代码中,我试图让它停在"ITEM1A.",但它没有,它继续进一步因为"ITEM1A".在文件中多次出现.我会理想的是让它停在它看到的第一个.代码如下:

import os
import re

#path to where 10k are
saved_path = "C:/Users/Adrian PC/Desktop/Thesis stuff/10k abbot/python/Multiple 10k/saved files/"

#path to where to save the txt with the selected text between ITEM 1 and ITEM 2
selected_path = "C:/Users/Adrian PC/Desktop/Thesis stuff/10k abbot/python/Multiple 10k/10k_select/"

#get a list of all the items in that specific folder and put it in a variable
list_txt = os.listdir(saved_path)


for text in list_txt:
    file_path = saved_path+text
    file = open(file_path,"r+", encoding="utf-8")
    file_read = file.read()
    # looking between ITEM 1 and ITEM 2
    res = re.search(r'(ITEM[\s\S]*1\.[\w\W]*)(ITEM+[\s\S]*1A\.)', file_read)
    item_text_section = res.group(1)
    saved_file = open(selected_path + text, "w+", encoding="utf-8")     # save the file with the complete names
    saved_file.write(item_text_section)                                 # write to the new text files with the selected text
    saved_file.close()                                                  # close the file
    print(text)                                                         #show the progress
    file.close()
Run Code Online (Sandbox Code Playgroud)

如果有人对如何解决这个问题有任何建议,那就太好了.谢谢!

ARR*_*ARR 5

试试以下正则表达式:

ITEM1\.([\s\S]*?)ITEM1A\.
Run Code Online (Sandbox Code Playgroud)

添加问号会使其变得非贪婪,因此它会在第一次出现时停止

  • @axm__是的,我会的.对困惑感到抱歉.这是我的第一篇文章.请记住未来!再次感谢你! (2认同)