如何将单词转换为python 3中的数字(自己的键和值)？

Question

如何将单词转换为python 3中的数字(自己的键和值)？

我正在编写一个Python 3脚本,它将在文本文件中使用单词并将它们转换为数字(我自己的,而不是ASCII,所以没有ord函数).我已经将每个字母分配给一个整数,并希望每个单词都是其字母数值的总和.目标是将具有相同数值的每个单词分组到字典中.我很难将拆分词重新组合成数字并将它们加在一起.我完全坚持使用这个脚本(它尚未完成.

**顺便说一句,我知道下面创建l_n字典的简单方法,但是因为我已经把它写出来了,我现在有点懒得改变它,但是在脚本完成后会这样做.

l_n = {
    "A": 1, "a": 1,
    "B": 2, "b": 2,
    "C": 3, "c": 3,
    "D": 4, "d": 4,
    "E": 5, "e": 5,
    "F": 6, "f": 6,
    "G": 7, "g": 7,
    "H": 8, "h": 8,
    "I": 9, "i": 9,
    "J": 10, "j": 10,
    "K": 11, "k": 11,
    "L": 12, "l": 12,
    "M": 13, "m": 13,
    "N": 14, "n": 14,
    "O": 15, "o": 15,
    "P": 16, "p": 16,
    "Q": 17, "q": 17,
    "R": 18, "r": 18,
    "S": 19, "s": 19,
    "T": 20, "t": 20,
    "U": 21, "u": 21,
    "V": 22, "v": 22,
    "W": 23, "w": 23,
    "X": 24, "x": 24,
    "Y": 25, "y": 25,
    "Z": 26, "z": 26,
    }

words_list = []

def read_words(file):
    opened_file = open(file, "r")
    contents = opened_file.readlines()

    for i in range(len(contents)):
        words_list.extend(contents[i].split())

    opened_file.close()

    return words_list

read_words("file1.txt")
new_words_list = list(set(words_list))

numbers_list = []
w_n = {}

def words_to_numbers(new_words_list, l_n):
    local_list = new_words_list[:]
    local_number_list = []

    for word in local_list:
        local_number_list.append(word.split())
        for key in l_n:
            local_number_list = local_number_list.replace( **#I am stuck on the logic in this section.**

words_to_numbers(new_words_list, l_n)
print(local_list)

Run Code Online (Sandbox Code Playgroud)

我试过在stackoverflow上寻找答案,但无法找到答案.

谢谢您的帮助.

Answer 1

Pad*_*ham 6

您将不得不处理标点符号,但您只需要将每个单词字母的值相加并将它们分组,您可以使用defaultdict:

lines = """am writing a Python script that will take words in a text file and convert them into numbers (my own, not ASCII, so no ord function).
I have assigned each letter to an integer and would like each word to be the sum of its letters' numerical value.
The goal is to group each word with the same numerical value into a dictionary.
I am having great trouble recombining the split words as numbers and adding them together"""

from collections import defaultdict

d = defaultdict(list)
for line in lines.splitlines():
    for word in line.split():
        d[sum(l_n.get(ch,0) for ch in word)].append(word)

Run Code Online (Sandbox Code Playgroud)

输出:

from pprint import pprint as pp

pp(dict(d))
{1: ['a', 'a', 'a'],
 7: ['be'],
 9: ['I', 'I'],
 14: ['am', 'am'],
 15: ['an'],
 17: ['each', 'each', 'each'],
 19: ['and', 'and', 'and'],
 20: ['as'],
 21: ['of'],
 23: ['in'],
 28: ['is'],
 29: ['no'],
 32: ['file'],
 33: ['the', 'The', 'the', 'the'],
 34: ['so'],
 35: ['to', 'to', 'goal', 'to'],
 36: ['have'],
 37: ['take', 'ord', 'like'],
 38: ['(my', 'same'],
 39: ['adding'],
 41: ['ASCII,'],
 46: ['them', 'them'],
 48: ['its'],
 49: ['that', 'not'],
 51: ['great'],
 52: ['own,'],
 53: ['sum'],
 56: ['will'],
 58: ['into', 'into'],
 60: ['word', 'word', 'with'],
 61: ['value.', 'value', 'having'],
 69: ['text'],
 75: ['would'],
 76: ['split'],
 77: ['group'],
 78: ['assigned', 'integer'],
 79: ['words', 'words'],
 80: ['letter'],
 85: ['script'],
 92: ['numbers', 'numbers'],
 93: ['trouble'],
 96: ['numerical', 'numerical'],
 97: ['convert'],
 98: ['Python', 'together'],
 99: ["letters'"],
 100: ['writing'],
 102: ['function).'],
 109: ['recombining'],
 118: ['dictionary.']}

Run Code Online (Sandbox Code Playgroud)

sum(l_n.get(ch,0) for ch in word)获取单词中所有字母的总和,我们将其用作键,然后将该单词作为值附加.defaultdict处理重复的键,因此我们将结束列表中具有相同总和的所有单词.

同样,John评论说你可以简单地在dict和call中存储一组小写字母 .lower sum(l_n.get(ch,0) for ch in word.lower())

如果要删除所有标点符号,可以使用str.translate:

from collections import defaultdict
from string import punctuation
d = defaultdict(list)
for line in lines.splitlines():
    for word in line.split():
        word = word.translate(None,punctuation)
        d[sum(l_n.get(ch,0) for ch in word)].append(word)

Run Code Online (Sandbox Code Playgroud)

哪个会输出:

{1: ['a', 'a', 'a'],
 7: ['be'],
 9: ['I', 'I'],
 14: ['am', 'am'],
 15: ['an'],
 17: ['each', 'each', 'each'],
 19: ['and', 'and', 'and'],
 20: ['as'],
 21: ['of'],
 23: ['in'],
 28: ['is'],
 29: ['no'],
 32: ['file'],
 33: ['the', 'The', 'the', 'the'],
 34: ['so'],
 35: ['to', 'to', 'goal', 'to'],
 36: ['have'],
 37: ['take', 'ord', 'like'],
 38: ['my', 'same'],
 39: ['adding'],
 41: ['ASCII'],
 46: ['them', 'them'],
 48: ['its'],
 49: ['that', 'not'],
 51: ['great'],
 52: ['own'],
 53: ['sum'],
 56: ['will'],
 58: ['into', 'into'],
 60: ['word', 'word', 'with'],
 61: ['value', 'value', 'having'],
 69: ['text'],
 75: ['would'],
 76: ['split'],
 77: ['group'],
 78: ['assigned', 'integer'],
 79: ['words', 'words'],
 80: ['letter'],
 85: ['script'],
 92: ['numbers', 'numbers'],
 93: ['trouble'],
 96: ['numerical', 'numerical'],
 97: ['convert'],
 98: ['Python', 'together'],
 99: ['letters'],
 100: ['writing'],
 102: ['function'],
 109: ['recombining'],
 118: ['dictionary']}

Run Code Online (Sandbox Code Playgroud)

如果您不想出现重复的单词,请使用集合:

d = defaultdict(set)
....
d[sum(l_n.get(ch,0) for ch in word)].add(word)

Run Code Online (Sandbox Code Playgroud)

+1用于保持单词的计数,这样一个简单的映射包含所有的操作只需要一堆数字.顺便说一句,你也可以在字母上使用lower,这样`l_n`只需要有一组字母表的字母 (2认同)
@JohnRuddell,是的,下级会做的.没有看到他们行为具有相同的价值 (2认同)

归档时间：	10 年，8 月前
查看次数：	4658 次
最近记录：	10 年，7 月前