我的目标是输入3个查询,并找出哪个查询与一组5个文档最相似。
到目前为止,我已经计算出tf-idf执行以下操作的文档:
from sklearn.feature_extraction.text import TfidfVectorizer
def get_term_frequency_inverse_data_frequency(documents):
allDocs = []
for document in documents:
allDocs.append(nlp.clean_tf_idf_text(document))
vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(allDocs)
return matrix
def get_tf_idf_query_similarity(documents, query):
tfidf = get_term_frequency_inverse_data_frequency(documents)
Run Code Online (Sandbox Code Playgroud)
我现在遇到的问题是我拥有tf-idf文档,我对该查询执行哪些操作,以便可以找到与文档的余弦相似度?
我试图创建一个在预提交上运行的Mercurial预提交钩子pylint。我的项目使用虚拟环境。
我已经设置了挂钩来调用pylint更改的文件,但出现错误:
Traceback (most recent call last):
File "/home/barmstrong/.virtualenvs/amp/bin/pylint", line 10, in <module>
sys.exit(run_pylint())
File "/home/barmstrong/.virtualenvs/amp/lib/python3.6/site-packages/pylint/__init__.py", line 20, in run_pylint
Run(sys.argv[1:])
File "/home/barmstrong/.virtualenvs/amp/lib/python3.6/site-packages/pylint/lint.py", line 1583, in __init__
linter.load_plugin_modules(plugins)
File "/home/barmstrong/.virtualenvs/amp/lib/python3.6/site-packages/pylint/lint.py", line 636, in load_plugin_modules
module = modutils.load_module_from_name(modname)
File "/home/barmstrong/.virtualenvs/amp/lib/python3.6/site-packages/astroid/modutils.py", line 202, in load_module_from_name
return load_module_from_modpath(dotted_name.split("."), path, use_sys)
File "/home/barmstrong/.virtualenvs/amp/lib/python3.6/site-packages/astroid/modutils.py", line 244, in load_module_from_modpath
mp_file, mp_filename, mp_desc = imp.find_module(part, path)
File "/usr/lib/python3.6/imp.py", line 297, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'common'
Run Code Online (Sandbox Code Playgroud)
我相信这是由于.pylintrc尝试从我的项目目录中加载的文件中的自定义插件所致: …
当我首先使用 python 维基百科库来获取页面时,当页面确实存在时运行此代码会出错。
import wikipedia
wikiResults = wikipedia.search("megaman 64")
result = wikiResults[0]
page = wikipedia.page(result)
Run Code Online (Sandbox Code Playgroud)
返回的错误:
wikipedia.exceptions.PageError: Page id "mega man legends video games" does
not match any pages. Try another id!
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
谢谢
编辑:提供MCVE
我的a*算法并不总是采用最短的路径.
在这张图片中,机器人必须穿过黑色方块,河流和树木都是障碍物.黑线是它所采用的路径,显然不是最短的路径,因为它不应该浸入.
这是我的代码*和我正在使用的启发式:
def HeuristicCostEstimate(start, goal):
(x1, y1) = start
(x2, y2) = goal
return abs(x1 - x2) + abs(y1 - y2)
def AStar(grid, start, goal):
entry = 1
openSet = []
heappush(openSet,(1, entry, start))
cameFrom = {}
currentCost = {}
cameFrom[tuple(start)] = None
currentCost[tuple(start)] = 0
while not openSet == []:
current = heappop(openSet)[2]
print(current)
if current == goal:
break
for next in grid.Neighbours(current):
newCost = currentCost[tuple(current)] + grid.Cost(current, next)
if tuple(next) not in currentCost or newCost < …Run Code Online (Sandbox Code Playgroud) 在python中,如果我导入请求并执行:
t = requests.get("http://www.azlyrics.com/u/urban.html")
Run Code Online (Sandbox Code Playgroud)
我得到这个例外:
raise BadStatusLine(line)
http.client.BadStatusLine: ''
Run Code Online (Sandbox Code Playgroud)
有谁知道如何解决这一问题?
我有一个充满数千行的csv文件。我加载文件并将其转换为pandas数据框,但随后我希望每12行分割一次文件,并将其存储为数据框列表。我该怎么做呢?
EVALUATION_FILE = 'training/evaluation.csv'
data = pd.read_csv(
EVALUATION_FILE,
engine='python',
index_col=None
)
Run Code Online (Sandbox Code Playgroud)
我以这种方式加载文件,但我希望对其进行更改,以使其每12行分割并追加到一个列表中。我该怎么做?