T9系统到小键盘

And*_*rup 2 python

我正在尝试制作一个类似于手机中的 T9 系统,但使用键盘代替。我真的需要一些关于如何做到这一点的建议。

我已经找到了一个文本文件,其中包含我想要使用的单词。我希望能够使用数字 2 按钮作为 'abc' 3 = 'def', 4='ghi'.. 等等 如果有人感到无聊或者只是可以帮助我走上这条路,那么我将不胜感激。

the*_*olf 5

这是一个蛮力T9模仿者:

import itertools 

n2l={2:'abc',3:'def',4:'ghi',5:'jkl',6:'mno',7:'pqrs',8:'tuv',9:'wxyz'}

with open('/usr/share/dict/words','r') as di:  # UNIX 250k unique word list 
    all_words={line.strip() for line in di}

def combos(*nums):
    t=[n2l[i] for i in nums]
    return tuple(''.join(t) for t in itertools.product(*(t)))

def t9(*nums):
    combo=combos(*nums)
    return sorted(word for word in all_words if word.startswith(combo))

def try_it(*nums):
    l=list(t9(*nums))
    print('  {:10} {:10,} words'.format(','.join(str(i) for i in nums),len(l)))
    if len(l)<100:
        print(nums,'yields:',l)

try_it(2)
try_it(2,3)
try_it(2,3,4)
try_it(2,3,3,4)
try_it(2,3,3,4,5)
Run Code Online (Sandbox Code Playgroud)

印刷:

  2              41,618 words
  2,3             4,342 words
  2,3,4             296 words
  2,3,3,4           105 words
  2,3,3,4,5          16 words
(2, 3, 3, 4, 5) yields: ['aedile', 'aedileship', 'aedilian', 'aedilic', 'aedilitian', 
    'aedility', 'affiliable', 'affiliate', 'affiliation', 'bedikah', 'befilch', 
    'befile', 'befilleted', 'befilmed', 'befilth', 'cedilla']
Run Code Online (Sandbox Code Playgroud)

您可以看到,从 25 万个单词(一个非常大的集合)开始需要 5 个数字才能收敛到可管理的大小。

虽然此代码是说明性的并且可以帮助您入门,但您还需要两件事:

  1. 较小的一组单词;-) 和
  2. 将出现在 UI 的 T9 自动完成区域中的更常见单词的排名。(即,“affiliate”或“affiliation”比“aedile”或“befilth”更有可能是(2,3,3,4,5)中所需的单词。这些需要以某种方式进行排名......)

拿2

这是加权的快速尝试。我读了同样的大字典(常见的 Unix“单词”文件),然后用古腾堡计划的《福尔摩斯历险记》对这些单词进行加权。您可以使用任何好的文本集合来做到这一点。

from collections import Counter
import re
import itertools 

all_words=Counter()
n2l={2:'abc',3:'def',4:'ghi',5:'jkl',6:'mno',7:'pqrs',8:'tuv',9:'wxyz'}
with open('/usr/share/dict/words','r') as di:  # UNIX 250k unique word list 
     all_words.update({line.strip() for line in di if len(line) < 6}) 

with open('holmes.txt','r') as fin:   # http://www.gutenberg.org/ebooks/1661.txt.utf-8
    for line in fin:
         all_words.update([word.lower() for word in re.findall(r'\b\w+\b',line)])

def combos(*nums):
    t=[n2l[i] for i in nums]
    return tuple(''.join(t) for t in itertools.product(*(t)))

def t9(*nums):
    combo=combos(*nums)
    c1=combos(nums[0])
    first_cut=(word for word in all_words if word.startswith(c1))
    return (word for word in first_cut if word.startswith(combo))

def try_it(*nums):
    s=set(t9(*nums))
    n=10
    print('({}) produces {:,} words. Top {}:'.format(','.join(str(i) for i in nums),
            len(s),min(n,len(s))))
    for i, word in enumerate(
          [w for w in sorted(all_words,key=all_words.get, reverse=True) if w in s],1):
        if i<=n:
            print ('\t{:2}:  "{}" -- weighted {}'.format(i, word, all_words[word]))

    print()        

try_it(2)
try_it(2,3)
try_it(2,3,4)
try_it(2,3,3,4)
try_it(6,6,8,3)   
try_it(2,3,3,4,5)      
Run Code Online (Sandbox Code Playgroud)

印刷:

(2) produces 2,584 words. Top 10:
     1:  "and" -- weighted 3089
     2:  "a" -- weighted 2701
     3:  "as" -- weighted 864
     4:  "at" -- weighted 785
     5:  "but" -- weighted 657
     6:  "be" -- weighted 647
     7:  "all" -- weighted 411
     8:  "been" -- weighted 394
     9:  "by" -- weighted 372
    10:  "are" -- weighted 356

(2,3) produces 261 words. Top 10:
     1:  "be" -- weighted 647
     2:  "been" -- weighted 394
     3:  "before" -- weighted 166
     4:  "after" -- weighted 99
     5:  "between" -- weighted 60
     6:  "better" -- weighted 51
     7:  "behind" -- weighted 50
     8:  "certainly" -- weighted 45
     9:  "being" -- weighted 45
    10:  "bed" -- weighted 40

(2,3,4) produces 25 words. Top 10:
     1:  "behind" -- weighted 50
     2:  "being" -- weighted 45
     3:  "began" -- weighted 25
     4:  "beg" -- weighted 13
     5:  "ceiling" -- weighted 10
     6:  "beginning" -- weighted 7
     7:  "begin" -- weighted 6
     8:  "beggar" -- weighted 6
     9:  "begging" -- weighted 4
    10:  "begun" -- weighted 4

(2,3,3,4) produces 5 words. Top 5:
     1:  "additional" -- weighted 4
     2:  "addition" -- weighted 3
     3:  "addicted" -- weighted 1
     4:  "adding" -- weighted 1
     5:  "additions" -- weighted 1

(6,6,8,3) produces 11 words. Top 10:
     1:  "note" -- weighted 38
     2:  "notes" -- weighted 9
     3:  "move" -- weighted 5
     4:  "moved" -- weighted 4
     5:  "novel" -- weighted 4
     6:  "movement" -- weighted 3
     7:  "noted" -- weighted 2
     8:  "moves" -- weighted 1
     9:  "moud" -- weighted 1
    10:  "november" -- weighted 1

(2,3,3,4,5) produces 0 words. Top 0:
Run Code Online (Sandbox Code Playgroud)