小编Ben*_*enP的帖子

import spacy
from spacy import displacy
import en_core_web_sm
nlp = en_core_web_sm.load()

text = "This is a text about Apple Inc based in San Fransisco. "\
        "And here is some text about Samsung Corp. "\
        "Now, here is some more text about Apple and its products for customers in Norway"

doc = nlp(text)

for ent in doc.ents:
    print('ID:{}\t{}\t"{}"\t'.format(ent.label,ent.label_,ent.text,))


displacy.render(doc, jupyter=True, style='ent')

Run Code Online (Sandbox Code Playgroud)

ID:381    ORG "Apple Inc" 
ID:382    GPE "San Fransisco" 
ID:381    ORG "Samsung Corp." 
ID:381    ORG "Apple" …

Run Code Online (Sandbox Code Playgroud)

python nlp information-extraction spacy ner

Ben*_*enP

2019 04-03

5
推荐指数

1
解决办法

1622
查看次数

通过在 bash 中重新排序模式来重命名文件

我有一个格式为 pdf 的文件：

Author-YYYY-rest_of_text_seperated_by_underscores.pdf
John-2010-some_file.pdf
Smith-2009-some_other_file.pdf

Run Code Online (Sandbox Code Playgroud)

我需要重命名文件，以便年份是第一个，例如

YYYY-Author-rest_of_text_seperated_by_underscores.pdf
2010-John-some_file.pdf
2009-Smith-some_other_file.pdf

Run Code Online (Sandbox Code Playgroud)

所以这意味着将 'YYYY-' 元素移动到开头。

我没有unix“重命名”，必须依赖sed、awk等。我很高兴就地重命名。

我一直在尝试调整这个答案，但运气不佳。使用 sed 批量重命名文件

bash rename

Ben*_*enP

2017 05-23

2
推荐指数

1
解决办法

74
查看次数

熊猫数据框到邻接矩阵

我有一个如下形式的熊猫数据框：

index | id    | group
0     | abc   | A
1     | abc   | B
2     | abc   | B
3     | abc   | C
4     | def   | A
5     | def   | B
6     | ghi   | B
7     | ghi   | C

Run Code Online (Sandbox Code Playgroud)

我想将其转换为加权图/邻接矩阵，其中节点是“组”，权重是每组对共享 ID 的总和：

权重是每个 id 的组对组合的计数，因此：

AB = 'abc' indexes (0,1),(0,2) + 'def' indexes (4,5) = 3

AC = 'abc' (0,3) = 1

BC = 'abc' (2,3), (1,3) + 'ghi' (6,7) = 3

Run Code Online (Sandbox Code Playgroud)

结果矩阵将是： …

python matrix dataframe data-structures pandas

Ben*_*enP

2018 03-22

2
推荐指数

1
解决办法

2934
查看次数

维基数据 SPARQL - 获取公司实体及其总部位置

我在提取公司总部的位置属性时遇到问题。

我的查询：查找所有公司或子类，并返回一些基本属性，例如 ISIN 和 URL 以及总部位置。

我尝试使用此示例来扩展查询的 Headquarter 部分，以返回位置信息，例如城市、国家以及坐标 latitude 和 longitude。然而，我陷入了坚持价值观或标签的困境。

谢谢

SELECT
  ?item ?itemLabel ?web ?isin ?hq ?hqloc ?inception

# valueLabel is only useful for properties with item-datatype
WHERE 
{
  ?item p:P31/ps:P31/wdt:P279* wd:Q783794.

  OPTIONAL{?item wdt:P856 ?web.} # get item
  OPTIONAL{?item wdt:P946 ?isin.} # get item
  OPTIONAL{?item wdt:P571 ?inception.} # get item
  OPTIONAL{?item wdt:P159 ?hq.}  

  OPTIONAL{?item p:P159 ?hqItem. # get property
           ?hqItem ps:P159 wd:Q515. # get property-statement wikidata-entity
           ?hqItem pq:P17 ?hqloc. # get …

Run Code Online (Sandbox Code Playgroud)

sparql wikidata

Ben*_*enP

2021 01-14

0
推荐指数

1
解决办法

1483
查看次数