小编Avi*_*jit的帖子

如何修改与Python中特定正则表达式匹配的文本？

我需要在一个句子中标记负面背景.算法如下:

检测否定符(不是/从不/不是/不是/等)
检测结束标点符号的子句(.;:!？)
将_NEG添加到它之间的所有单词.

现在,我已经定义了一个正则表达式来挑选出所有这些出现的情况:

def replacenegation(text):
    match=re.search(r"((\b(never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)\b)|\b\w+n't\b)((?![.:;!?]).)*[.:;!?\b]", text)
    if match:
        s=match.group()
        print s
        news=""
        wlist=re.split(r"[.:;!? ]" , s)
        wlist=wlist[1:]
        print wlist
        for w in wlist:
            if w:
                news=news+" "+w+"_NEG"
        print news

Run Code Online (Sandbox Code Playgroud)

我可以检测并替换匹配的组.但是,我不知道如何在此操作后重新创建完整的句子.同样对于多个匹配,match.groups()给出了错误的输出.

例如,如果我的输入句子是:

I don't like you at all; I should not let you know my happiest secret.

Run Code Online (Sandbox Code Playgroud)

输出应该是:

I don't like_NEG you_NEG at_NEG all_NEG ; I should not let_NEG you_NEG know_NEG my_NEG happiest_NEG secret_NEG .

Run Code Online (Sandbox Code Playgroud)

我该怎么做呢？

python regex nlp python-2.7

Avi*_*jit

2016 01-01

7
推荐指数

1
解决办法

218
查看次数

在 R 中分解 xts 后保留时间戳

我有一个在 R 中xts调用的时间序列hourplot，周期为 24（每小时数据）超过两周，由POSIXlt类的时间戳对象索引，如下所示：

> dput(hourplot)

    structure(c(1, 1, 1, 1, 1, 1, 1.11221374045802, 1.3368, 1.18, 
1.0032, 1, 1, 1, 1, 1, 1, 1.0736, 1.2536, 1, 1.0032, 1.1856, 
1.0048, 1, 1, 1, 1, 1, 1, 1, 1, 1.04045801526718, 1.20229007633588, 
1.00229007633588, 1, 1, 1, 1, 1, 1, 1, 1.1152, 1.008, 1, 1, 1.2648, 
1.1832, 1, 1, 1, 1, 1, 1, 1, 1.0424, 1.2952, 1.6496, 1.1208, 
1.0216, 1, 1, 1, 1, 1, 1, 1.1256, 1, 1, 1, …

Run Code Online (Sandbox Code Playgroud)

r time-series decomposition xts

Avi*_*jit

2017 04-04

4
推荐指数

1
解决办法

2380
查看次数