小编Nem*_*emo的帖子

sklearn中RepeatedStratifiedKFold和StratifiedKFold的区别

RepeatedStratifiedKFold我尝试阅读和的文档StratifiedKFold，但无法区分这两种方法之间的区别，除了在每次重复中以不同的随机化RepeatedStratifiedKFold重复StratifiedKFold n次。

我的问题是：这两种方法返回的结果相同吗？在执行操作时我应该使用哪一种方法来分割不平衡的GridSearchCV数据集？选择该方法的理由是什么？

python classification machine-learning scikit-learn cross-validation

Nem*_*emo

2023 03-08

10
推荐指数

1
解决办法

9579
查看次数

在 EntityRuler 中使用 RegEx 作为短语模式

我尝试FRT使用 EntityRuler查找实体，如下所示：

from spacy.lang.en import English
from spacy.pipeline import EntityRuler

nlp = English()
ruler = EntityRuler(nlp)
patterns = [{"label": "FRT", "pattern": [{'REGEX': "[Aa]ppl[e|es])"}]},
            {"label": "BRN", "pattern": [{"LOWER": "granny"}, {"LOWER": "smith"}]}]

ruler.add_patterns(patterns)
nlp.add_pipe(ruler)

doc = nlp(u"Apple is red. Granny Smith apples are green.")
print([(ent.text, ent.label_) for ent in doc.ents])

Run Code Online (Sandbox Code Playgroud)

然后我得到了这个结果

[('Apple', 'FRT'), ('is', 'FRT'), ('red', 'FRT'), ('.', 'FRT'), ('Granny Smith', 'BRN'), ('apples', 'FRT'), ('is', 'FRT'), ('green', 'FRT'), ('.', 'FRT')]

Run Code Online (Sandbox Code Playgroud)

你能告诉我如何修复我的代码，以便我得到这个结果

[('Apple', 'FRT'), ('Granny Smith', 'BRN'), ('apples', 'FRT')]

Run Code Online (Sandbox Code Playgroud)

先感谢您。

python spacy

Nem*_*emo

lucky-day

6
推荐指数

1
解决办法

1508
查看次数

使用 git repo 中的新内容更新本地文件夹

我在我的 git bash 中使用这个命令将一个 git repo 克隆到我的电脑中

git clone https://github.com/xxx/yyy.git

Run Code Online (Sandbox Code Playgroud)

它在我的电脑中创建了一个名为 yyy 的文件夹。

如何使用https://github.com/xxx/yyy.git 的新内容更新我电脑中的文件夹 yyy （它是否称为远程仓库）？

我按照使用 Github 存储库中的更改更新本地存储库中的说明进行操作，特别是git pull origin master但它们都没有工作并返回错误$ git pull origin master fatal: Not a git repository (or any of the parent directories): .git。

我也尝试过，git pull https://github.com/xxx/yyy.git因为我认为如果我成功了git clone https://github.com/xxx/yyy.git，git pull https://github.com/xxx/yyy.git必须工作，否则 git 语法不是很好。

我应该再次“克隆”以覆盖我电脑中的现有文件夹吗？为什么我不能“拉”？

github git-bash

Nem*_*emo

2019 03-18

4
推荐指数

2
解决办法

1万
查看次数

使用 spaCy 添加多个 EntityRuler（ValueError: 'entity_ruler' 已存在于管道中）

以下链接显示了如何添加自定义实体规则，其中实体跨越多个令牌。执行此操作的代码如下：

import spacy
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', parse=True, tag=True, entity=True)

animal = ["cat", "dog", "artic fox"]
ruler = EntityRuler(nlp)
for a in animal:
    ruler.add_patterns([{"label": "animal", "pattern": a}])
nlp.add_pipe(ruler)

doc = nlp("There is no cat in the house and no artic fox in the basement")

with doc.retokenize() as retokenizer:
    for ent in doc.ents:
        retokenizer.merge(doc[ent.start:ent.end])

Run Code Online (Sandbox Code Playgroud)

我尝试添加另一个自定义实体标尺，如下所示：

flower = ["rose", "tulip", "african daisy"]
ruler = EntityRuler(nlp)
for f in flower:
    ruler.add_patterns([{"label": "flower", "pattern": f}])
nlp.add_pipe(ruler)

Run Code Online (Sandbox Code Playgroud)

但我收到了这个错误：

---------------------------------------------------------------------------
ValueError …

Run Code Online (Sandbox Code Playgroud)

python spacy

Nem*_*emo

2019 08-21

4
推荐指数

1
解决办法

2452
查看次数

带有 SpaCy 的自定义实体标尺未返回匹配项

此链接显示了如何创建自定义实体标尺。

我基本上复制并修改了另一个自定义实体标尺的代码，并使用它在 a 中查找匹配项，doc如下所示：

nlp = spacy.load('en_core_web_lg')
ruler = EntityRuler(nlp)

grades = ["Level 1", "Level 2", "Level 3", "Level 4"]
for item in grades:
    ruler.add_patterns([{"label": "LEVEL", "pattern": item}])

nlp.add_pipe(ruler)

doc = nlp('Level 2 employee first 12 months 1032.70')

with doc.retokenize() as retokenizer:
    for ent in doc.ents:
        retokenizer.merge(doc[ent.start:ent.end])

matcher = Matcher(nlp.vocab)
pattern =[{'ENT_TYPE': {'REGEX': 'LEVEL'}}, {'ORTH': 'employee'}]
matcher.add('PAY_LEVEL', None, pattern)
matches = matcher(doc)

for match_id, start, end in matches:
    span = doc[start:end]
    print(span)

Run Code Online (Sandbox Code Playgroud)

但是，当我运行代码（在 Jupyter notebook 中）时，没有返回任何内容。 …

python spacy

Nem*_*emo

2019 08-17

3
推荐指数

1
解决办法

1973
查看次数

spaCy 的正则表达式与 Python 的正则表达式不同

我有以下文字

text = 'Monday to Friday 12 midnight to 5am 30% . Midnight Friday to 6am Saturday 30% . 9pm Saturday to Midnight Saturday 25% . Midnight Saturday to 6am Sunday 100% . 6am Sunday to 9pm Sunday 50%'

Run Code Online (Sandbox Code Playgroud)

当我使用普通的正则表达式时，我得到了以下内容

import re
regex = '\d{1}[a|p]m'
re.findall(regex, text)

# Returned:
['5am', '6am', '9pm', '6am', '6am', '6pm']

Run Code Online (Sandbox Code Playgroud)

但是，当我regex在 spaCy 中使用它时，我一无所获

from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_lg')

matcher = Matcher(nlp.vocab)
pattern = [{'TEXT': {'REGEX': '\d{1}[a|p]m'}}]
matcher.add('TIME', None, pattern) …

Run Code Online (Sandbox Code Playgroud)

python regex spacy

Nem*_*emo

lucky-day

3
推荐指数

1
解决办法

2762
查看次数

ValueError：使用可迭代设置时必须具有相等的 len 键和值

当我运行这个玩具代码时

test = pd.DataFrame({'a': [1, 2, 3, 4]})
test['b'] = ''
for i in range(len(test)):
    test['b'].loc[i] = [5, 6, 7]

Run Code Online (Sandbox Code Playgroud)

我有一个警告

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)

Run Code Online (Sandbox Code Playgroud)

loc但如果我按照这种方法使用

test = pd.DataFrame({'a': [1, 2, 3, 4]})
test['b'] = ''
for i in range(len(test)):
    test.loc[i, 'b'] = [5, 6, 7]

Run Code Online (Sandbox Code Playgroud)

我收到一个错误

ValueError: Must have equal len keys and …

Run Code Online (Sandbox Code Playgroud)

python dataframe python-3.x pandas

Nem*_*emo

2022 03-31

1
推荐指数

1
解决办法

2万
查看次数

虽然可以调用min()或max()自己调用，但函数mean()必须依赖于其他导入的程序包，例如Numpy，即np.mean()。如果最小值和最大值的概念对于标度/范围是自然的，那么它不应该在标度/范围的中间（即Mean）也被视为自然的吗？这种不一致的根本原因是什么？请注意，这不是一个基于意见的问题，我真的很想知道从基本软件包中排除该mean()功能的原因。