假设我有两个列表:
t1 = ["abc","def","ghi"]
t2 = [1,2,3]
Run Code Online (Sandbox Code Playgroud)
如何使用python合并它,以便输出列表将是:
t = [("abc",1),("def",2),("ghi",3)]
Run Code Online (Sandbox Code Playgroud)
我尝试的程序是:
t1 = ["abc","def"]
t2 = [1,2]
t = [ ]
for a in t1:
for b in t2:
t.append((a,b))
print t
Run Code Online (Sandbox Code Playgroud)
输出是:
[('abc', 1), ('abc', 2), ('def', 1), ('def', 2)]
Run Code Online (Sandbox Code Playgroud)
我不想重复输入.
如何使用nltk Python模块和WordNet找到单词域?
假设我有像(交易,需求汇票,支票,存折)这样的词,所有这些词的域名都是"BANK".我们如何在Python中使用nltk和WordNet来实现这一目标?
我正在尝试通过hypernym和hyponym关系:
例如:
from nltk.corpus import wordnet as wn
sports = wn.synset('sport.n.01')
sports.hyponyms()
[Synset('judo.n.01'), Synset('athletic_game.n.01'), Synset('spectator_sport.n.01'), Synset('contact_sport.n.01'), Synset('cycling.n.01'), Synset('funambulism.n.01'), Synset('water_sport.n.01'), Synset('riding.n.01'), Synset('gymnastics.n.01'), Synset('sledding.n.01'), Synset('skating.n.01'), Synset('skiing.n.01'), Synset('outdoor_sport.n.01'), Synset('rowing.n.01'), Synset('track_and_field.n.01'), Synset('archery.n.01'), Synset('team_sport.n.01'), Synset('rock_climbing.n.01'), Synset('racing.n.01'), Synset('blood_sport.n.01')]
Run Code Online (Sandbox Code Playgroud)
和
bark = wn.synset('bark.n.02')
bark.hypernyms()
[Synset('noise.n.01')]
Run Code Online (Sandbox Code Playgroud) 假设我有两个同义词集synset(car.n.01') 和synset('bank.n.01'),如果我想在wordnet 层次结构中找到这两个同义词集之间的距离,那么我该如何使用nltk?
我在互联网上搜索过,但我得到了类似 lin、resnik、jcn 等的相似性算法,这些算法不能解决我的问题。
请帮我解决这个问题。
我已经为我的项目目的编写了以下程序.
import glob
import os
path = "/home/madhusudan/1/*.txt"
files = glob.glob(path)
s1 = "<add>\n<doc>\n\t<field name = \"id\">"
s2 = "</field>\n"
s3 = "<field name = \"features\">"
s4 = "</doc>\n</add>"
i = 150
for file in files:
f = open(file,"r")
str = f.read()
file1 = "/home/madhusudan/2/"+os.path.splitext(os.path.basename(file))[0] + ".xml"
f1 = open(file1,"w")
content = s1 + str(i) + s2 + s3 + f.read() + s2 + s4
f1.write(content)
i = i + 1
Run Code Online (Sandbox Code Playgroud)
运行此代码时,我收到以下错误:
Traceback (most recent call last):
File "test.py", line …Run Code Online (Sandbox Code Playgroud) 我正在研究多义词消歧项目,为此我试图从输入查询中找到多义词。我这样做的方式是:
#! /usr/bin/python
from nltk.corpus import stopwords
from nltk.corpus import wordnet as wn
stop = stopwords.words('english')
print "enter input query"
string = raw_input()
str1 = [i for i in string.split() if i not in stop]
a = list()
for w in str1:
if(len(wn.synsets(w)) > 1):
a.append(w)
Run Code Online (Sandbox Code Playgroud)
这里的列表 a 将包含多义词。但是使用这种方法几乎所有的词都会被认为是多义词。例如,如果我的输入查询是“牛奶是白色的”,那么它将 ('milk','white','colour') 存储为多义词