我正在尝试在 colab 中使用以下 spacy 模块:
\n\nhttps://spacy.io/universe/project/neuralcoref
\n\n我安装以下软件包:
\n\n!pip install spacy\nimport spacy \n!pip show spacy\n\n!git clone https://github.com/huggingface/neuralcoref.git\nimport neuralcoref\nRun Code Online (Sandbox Code Playgroud)\n\n安装后我得到以下输出:
\n\nName: spacy\nVersion: 2.2.4\nSummary: Industrial-strength Natural Language Processing (NLP) in Python\nHome-page: https://spacy.io\nAuthor: Explosion\nAuthor-email: contact@explosion.ai\nLicense: MIT\nLocation: /usr/local/lib/python3.6/dist-packages\nRequires: thinc, murmurhash, preshed, blis, srsly, cymem, setuptools, plac, requests, tqdm, numpy, wasabi, catalogue\nRequired-by: fastai, en-core-web-sm\nCloning into \'neuralcoref\'...\nremote: Enumerating objects: 48, done.\nremote: Counting objects: 100% (48/48), done.\nremote: Compressing objects: 100% (44/44), done.\nremote: Total 739 (delta 14), reused 10 (delta 1), pack-reused 691\nReceiving objects: …Run Code Online (Sandbox Code Playgroud) 我想知道是否有一种方法可以检查然后删除某些不唯一的行?
我的数据框看起来像这样:
ID1 ID2 weight
0 2 4 0.5
1 3 7 0.8
2 4 2 0.5
3 7 3 0.8
4 8 2 0.5
5 3 8 0.5
Run Code Online (Sandbox Code Playgroud)
编辑:我添加了更多行,以显示应保留可能具有相同权重的其他唯一行。
我认为当我使用 pandas 时,drop_duplicates(subset=['ID1', 'ID2','weight'], keep=False)它会单独考虑每一行,但不会认识到第 0 行和第 2 行以及第 1 行和第 4 行实际上是相同的值?
我想知道是否有办法检查一个列表中两个以上项目的组合是否存在于另一个列表中?
list_1 = ['apple','soap','diet coke','banana','sweets','mash','fruit','veggies']
for string in lists:
strings = string.split()
print(strings)
Run Code Online (Sandbox Code Playgroud)
字符串的示例输出:
['today', 'i','bought','banana','but','forgot','soap', 'and','veggies']# this line should identify 'banana', 'soap' and 'veggies'
['maybe', 'there','are','more','sweets','left','later'] # this line should be ignored, because not more than 2 items of the list are in it
['food', 'shopping','is','boring','and','i','hate','mash','with','veggies']# this line should identify 'mash' and 'veggies'
Run Code Online (Sandbox Code Playgroud)
我知道通过使用这段代码,我至少可以检查是否有任何元素出现在字符串中:
combinations = any(i in list_1 for i in strings)
Run Code Online (Sandbox Code Playgroud)