前面的字长

Question

前面的字长

我必须创建一个函数,它接受一个参数字,并返回文本中单词之前的单词的平均长度(以字符为单位).如果单词恰好是文本中出现的第一个单词,则该单词的前一个单词的长度应为零.例如

>>> average_length("the")
4.4
>>> average_length('whale')
False
average_length('ship.')
3.0

Run Code Online (Sandbox Code Playgroud)

这是我到目前为止所写的,

def average_length(word):
    text = "Call me Ishmael. Some years ago - never mind how long..........."
    words = text.split()
    wordCount = len(words)

    Sum = 0
    for word in words:
        ch = len(word)
        Sum = Sum + ch
    avg = Sum/wordCount
    return avg

Run Code Online (Sandbox Code Playgroud)

我知道这根本不对,但是我无法正确处理这个问题.这个问题要求我在文本中找到单词的每个实例,当你这样做时,计算文本中紧接在它之前的单词的长度.不是每个单词从开头到那个单词,只有一个.

我还应该提到所有的测试只会使用'Moby Dick'中的第一段来测试我的代码:

"叫我以实玛利.几年前 - 没关系多长时间 - 我的钱包里没有钱,也没有什么特别令我感兴趣的,我觉得我会稍微航行一下,看看这个世界的水域.这是我驱除脾脏和调节血液循环的一种方式.每当我发现自己的嘴巴变得严峻;每当我的灵魂中充满潮湿,毛躁的十一月;每当我发现自己在棺材仓库前不由自主地停顿,并带来在我遇到的每一次葬礼的后方;特别是每当我的hypos得到我这样的优势时,它需要一个强有力的道德原则来防止我故意踩到街上,并有条不紊地敲掉别人的帽子 - 然后,我说我是时候尽快出海了.这是我用手枪和球的替代品.随着哲学的蓬勃发展,卡托把自己扔在剑上;我悄悄地带上了船.这里没有什么令人惊讶的.知道了,等等大多数所有男性,无论是时间还是其他人,都非常珍惜和我一样对待海洋的感情."

Answer 1

Pad*_*ham 1

根据您对不导入和简单方法的要求，以下函数无需任何导入即可完成此操作，注释和变量名称应该使函数逻辑非常清晰：

def match_previous(lst, word):
    # keep matches_count of how many times we find a match and total lengths
    matches_count = total_length_sum = 0.0
    # pull first element from list to use as preceding word
    previous_word = lst[0]
    # slice rest of words from the list 
    # so we always compare two consecutive words
    rest_of_words = lst[1:]
    # catch where first word is "word" and add 1 to matches_count
    if previous_word == word:
        matches_count += 1
    for current_word in rest_of_words:
        # if the current word matches our "word"
        # add length of previous word to total_length_sum
        # and increase matches_count.
        if word == current_word:
            total_length_sum += len(previous_word)
            matches_count += 1
        # always update to keep track of word just seen
        previous_word = current_word
    # if  matches_count is 0 we found no word in the text that matched "word"
    return total_length_sum / matches_count if matches_count else False

Run Code Online (Sandbox Code Playgroud)

它需要两个参数，单词的分割列表和要搜索的单词：

In [41]: text = "Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to previous_wordent me from deliberately stepping into the street, and methodically knocking people's hats off - then, I acmatches_count it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me."

In [42]: match_previous(text.split(),"the")
Out[42]: 4.4

In [43]: match_previous(text.split(),"ship.")
Out[43]: 3.0

In [44]: match_previous(text.split(),"whale")
Out[44]: False

In [45]: match_previous(text.split(),"Call")
Out[45]: 0.0

Run Code Online (Sandbox Code Playgroud)

显然，您可以执行与您自己的函数相同的操作，使用单个参数在函数中分割文本。返回 False 的唯一方法是，如果我们找不到该单词的匹配项，您可以看到 call 返回 0.0，因为它是文本中的第一个单词。

如果我们在代码中添加一些打印并使用枚举：

def match_previous(lst, word):
    matches_count = total_length_sum = 0.0
    previous_word = lst[0]
    rest_of_words = lst[1:]
    if previous_word == word:
        print("First word matches.")
        matches_count += 1
    for ind, current_word in enumerate(rest_of_words, 1):
        print("On iteration {}.\nprevious_word = {} and current_word = {}.".format(ind, previous_word, current_word))
        if word == current_word:
            total_length_sum += len(previous_word)
            matches_count += 1
            print("We found a match at index {} in our list of words.".format(ind-1))
        print("Updating previous_word from {} to {}.".format(previous_word, current_word))
        previous_word = current_word
    return total_length_sum / matches_count if matches_count else False

Run Code Online (Sandbox Code Playgroud)

并使用一个小样本列表运行它，我们可以看到会发生什么：

In [59]: match_previous(["bar","foo","foobar","hello", "world","bar"],"bar")
First word matches.
On iteration 1.
previous_word = bar and current_word = foo.
Updating previous_word from bar to foo.
On iteration 2.
previous_word = foo and current_word = foobar.
Updating previous_word from foo to foobar.
On iteration 3.
previous_word = foobar and current_word = hello.
Updating previous_word from foobar to hello.
On iteration 4.
previous_word = hello and current_word = world.
Updating previous_word from hello to world.
On iteration 5.
previous_word = world and current_word = bar.
We found a match at index 4 in our list of words.
Updating previous_word from world to bar.
Out[59]: 2.5

Run Code Online (Sandbox Code Playgroud)

使用的优点iter是我们不需要通过切片剩余部分来创建新列表，要在代码中使用它，您只需将函数的开头更改为：

def match_previous(lst, word):
    matches_count = total_length_sum = 0.0
    # create an iterator
    _iterator = iter(lst)
    # pull first word from iterator
    previous_word = next(_iterator)
    if previous_word == word:
        matches_count += 1
    # _iterator will give us all bar the first word we consumed with  next(_iterator)
    for current_word in _iterator:

Run Code Online (Sandbox Code Playgroud)

每次使用迭代器中的一个元素时，我们都会移动到下一个元素：

In [61]: l = [1,2,3,4]

In [62]: it = iter(l)

In [63]: next(it)
Out[63]: 1

In [64]: next(it)
Out[64]: 2
# consumed two of four so we are left with two
In [65]: list(it)
Out[65]: [3, 4]

Run Code Online (Sandbox Code Playgroud)

字典真正有意义的唯一方法是，如果您将多个单词添加到您的函数中，您可以使用*args来执行此操作：

def sum_previous(text):
    _iterator = iter(text.split())
    previous_word = next(_iterator)
    # set first k/v pairing with the first word
    # if  "total_lengths" is 0 at the end we know there
    # was only one match at the very start
    avg_dict = {previous_word: {"count": 1.0, "total_lengths": 0.0}}
    for current_word in _iterator:
        # if key does not exist, it creates a new key/value pairing
        avg_dict.setdefault(current_word, {"count": 0.0, "total_lengths": 0.0})
        # update value adding word length and increasing the count
        avg_dict[current_word]["total_lengths"] += len(previous_word)
        avg_dict[current_word]["count"] += 1
        previous_word = current_word
    # return the dict so we can use it outside the function.
    return avg_dict


def match_previous_generator(*args):
    # create our dict mapping words to sum of all lengths of their preceding words.
    d = sum_previous(text)
    # for every word we pass to the function.
    for word in args:
        # use dict.get with a default of an empty dict.
        #  to catch when a word is not in out text.
        count = d.get(word, {}).get("count")
        # yield each word and it's avg or False for non existing words.
        yield (word, d[word]["total_lengths"] / count if count else False)

Run Code Online (Sandbox Code Playgroud)

然后只需传入文本和所有要搜索的单词，就可以在生成器函数上调用 list ：

In [69]: list(match_previous_generator("the","Call", "whale", "ship."))
Out[69]: [('the', 4.4), ('Call', 0.0), ('whale', False), ('ship.', 3.0)]

Run Code Online (Sandbox Code Playgroud)

或者迭代它：

In [70]: for tup in match_previous_generator("the","Call", "whale", "ship."):
   ....:     print(tup)
   ....:     
('the', 4.4)
('Call', 0.0)
('whale', False)
('ship.', 3.0)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，10 月前
查看次数：	1447 次
最近记录：	9 年，10 月前