在python中读取文件中的下一个单词

Question

在python中读取文件中的下一个单词

我正在寻找python中的文件中的一些单词.找到每个单词后,我需要从文件中读取下两个单词.我找了一些解决方案,但我找不到只读下一个字.

# offsetFile - file pointer
# searchTerms - list of words

for line in offsetFile:
    for word in searchTerms:
        if word in line:
           # here get the next two terms after the word

Run Code Online (Sandbox Code Playgroud)

感谢您的时间.

更新:只需要第一次出现.实际上在这种情况下只能出现一个单词.

文件:

accept 42 2820 access 183 3145 accid 1 4589 algebra 153 16272 algem 4 17439 algol 202 6530

Run Code Online (Sandbox Code Playgroud)

字:['access','algebra']

当我遇到'access'和'algebra'时搜索文件,我需要分别为183 3145和153 16272的值.

Answer 1

kin*_*all 16

处理此问题的一种简单方法是使用生成器读取文件,该文件一次从文件中生成一个单词.

def words(fileobj):
    for line in fileobj:
        for word in line.split():
            yield word

Run Code Online (Sandbox Code Playgroud)

然后找到你感兴趣的单词并阅读接下来的两个单词:

with open("offsetfile.txt") as wordfile:
    wordgen = words(wordfile)
    for word in wordgen:
        if word in searchterms:   # searchterms should be a set() to make this fast
            break
    else:
        word = None               # makes sure word is None if the word wasn't found

    foundwords = [word, next(wordgen, None), next(wordgen, None)]

Run Code Online (Sandbox Code Playgroud)

现在foundwords[0]是你找到的词,foundwords[1]是之后的词,是它之后foundwords[2]的第二个词.如果没有足够的单词,那么列表中的一个或多个元素将是None.

如果你想强制它只在一行内匹配,它会有点复杂,但通常你可以将文件视为一系列单词.

是的,如果要查找多个匹配项,则需要一个额外的循环才能继续运行.这很容易添加. (2认同)

归档时间：	13 年，6 月前
查看次数：	22319 次
最近记录：	9 年，11 月前