从文件中提取单词

nik*_*hil 4 python

我使用python打开一个文件,以查找打开的文件中是否存在预定义的单词集.我在列表中选择了预定义的单词集,并打开了必须测试的文件.现在是否有任何方法可以在python而不是行中提取单词.多数民众赞成使我的工作更容易.

Hug*_*ell 7

import re

def get_words_from_string(s):
    return set(re.findall(re.compile('\w+'), s.lower()))

def get_words_from_file(fname):
    with open(fname, 'rb') as inf:
        return get_words_from_string(inf.read())

def all_words(needle, haystack):
    return set(needle).issubset(set(haystack))

def any_words(needle, haystack):
    return set(needle).intersection(set(haystack))

search_words = get_words_from_string("This is my test")
find_in = get_words_from_string("If this were my test, I is passing")

print any_words(search_words, find_in)

print all_words(search_words, find_in)
Run Code Online (Sandbox Code Playgroud)

回报

set(['this', 'test', 'is', 'my'])
True
Run Code Online (Sandbox Code Playgroud)

  • (耸肩)确定,逐行解析文件,随时累积字集. (2认同)