标签: text-segmentation

如何将段落分成句子

我一直在尝试使用:

$string="The Dr. is here!!! I am glad I'm in the U.S.A. for the Dr. quality is great!!!!!!";
preg_match_all('~.*?[?.!]~s',$string,$sentences);
print_r($sentences);

Run Code Online (Sandbox Code Playgroud)

但它不适用于Dr.,USA等.

有没有人有更好的建议？

php regex split text-segmentation

Sco*_*ler

2014 09-08

2
推荐指数

1
解决办法

5276
查看次数

使用Regex进行句子分割

我有很少的短信(SMS)消息,我想用句点('.')作为分隔符对它们进行分段.我无法处理以下类型的消息.如何在Python中使用Regex对这些消息进行分段.

分割前:

'hyper count 16.8mmol/l.plz review b4 5pm.just to inform u.thank u'
'no of beds 8.please inform person in-charge.tq'

分割后:

'hyper count 16.8mmol/l' 'plz review b4 5pm' 'just to inform u' 'thank u'
'no of beds 8' 'please inform person in-charge' 'tq'

每行都是单独的消息

更新:

我正在进行自然语言处理,我觉得可以对待'16.8mmmol/l'并且'no of beds 8.2 cups of tea.'同样如此.80%的准确度对我来说已足够,但我希望尽可能减少False Positive.

python regex text-segmentation

Mag*_*gie

2014 09-08

2
推荐指数

1
解决办法

1346
查看次数

将段落段落为句子

我正在尝试将段落分段为句子.我选择'.','？' 和'!' 作为分割符号.我试过了:

format = r'((! )|(. )|(? ))'
delimiter = re.compile(format)
s = delimiter.split(line)

Run Code Online (Sandbox Code Playgroud)

但它给了我 sre_constants.error: unexpected end of pattern

我也试过了

format = [r'(! )',r'(? )',r'(. )']
delimiter = re.compile(r'|'.join(format))

Run Code Online (Sandbox Code Playgroud)

它也会导致错误.

我的方法有什么问题？

python regex python-2.7 text-segmentation

Chu*_*Nan

2014 09-08

2
推荐指数

1
解决办法

416
查看次数

使用Perl将段落转换为句子

我在做Perl编程.我需要阅读一个段落并将每个句子打印出来作为一行.

谁知道怎么做？

以下是我的代码:

#! /C:/Perl64/bin/perl.exe

use utf8;

if (! open(INPUT, '< text1.txt')){
die "cannot open input file: $!";
}

if (! open(OUTPUT, '> output.txt')){
die "cannot open input file: $!";
}

select OUTPUT;

while (<INPUT>){
print "$_";
}

close INPUT;
close OUTPUT;
select STDOUT;

Run Code Online (Sandbox Code Playgroud)

perl text-segmentation

new*_*new

2014 09-17

1
推荐指数

1
解决办法

2890
查看次数

将句子分成单词

例如我有这样的哨兵：

$text = "word, word w.d. word!..";

Run Code Online (Sandbox Code Playgroud)

我需要这样的数组

Array
(
    [0] => word
    [1] => word
    [2] => w.d
    [3] => word".
)

Run Code Online (Sandbox Code Playgroud)

我是新来的正则表达式。

这是我尝试过的：

function divide_a_sentence_into_words($text){ 
    return preg_split('/(?<=[\s])(?<!f\s)\s+/ix', $text, -1, PREG_SPLIT_NO_EMPTY); 
}

Run Code Online (Sandbox Code Playgroud)

这个

$text = "word word, w.d. word!..";
$split = preg_split("/[^\w]*([\s]+[^\w]*|$)/", $text, -1, PREG_SPLIT_NO_EMPTY);
print_r($split);

Run Code Online (Sandbox Code Playgroud)

的作品，但我有第二个问题，我想在mu常规表达式中写列表“ wd”是特例。.例如，这句话是我的列表“ wd”，“ mr。”，“ dr。”

如果我要输入文字：

$ text =“单词，单词博士wd word！..”;

我需要数组：

Array (
  [0] => word
  [1] => dr.
  [2] => word
  [3] => w.d
  [4] => word 
)

Run Code Online (Sandbox Code Playgroud)

对不起，英语不好。

php text-segmentation

Gun*_*uno

2014 09-17

1
推荐指数

2
解决办法

1万
查看次数

链接器合成中的“紧凑展开信息”是什么意思

当我分析xcode生成的链接映射文件时，在链接器合成部分中，有一个名为“compact unwind info”的数据。

compact unwind info 858.57KB    858572  Unchecked

Run Code Online (Sandbox Code Playgroud)

大约需要858kb的空间大小。我想知道这个空间中的实际数据是什么。有什么办法可以减小这个尺寸吗？

链接器合成部分的总输出：

compact unwind info 858.57KB
helper helper   24B
objc image info 8B
non-lazy-pointer    8B
non-lazy-pointer-to-local: dyld_stub_binder 8B
non-lazy-pointer-to-local: _vm_page_size    8B
non-lazy-pointer-to-local: _tanh    8B
non-lazy-pointer-to-local: _tan 8B
non-lazy-pointer-to-local: _strdup  8B
non-lazy-pointer-to-local: _strcmp  8B
non-lazy-pointer-to-local: _sinh    8B
non-lazy-pointer-to-local: _sin 8B
non-lazy-pointer-to-local: _realloc 8B
non-lazy-pointer-to-local: _protocol_getName    8B
non-lazy-pointer-to-local: _object_getIndexedIvars  8B
non-lazy-pointer-to-local: _objc_readClassPair  8B
non-lazy-pointer-to-local: _objc_lookUpClass    8B
non-lazy-pointer-to-local: _objc_getRequiredClass   8B
non-lazy-pointer-to-local: _objc_getProtocol    8B
non-lazy-pointer-to-local: _objc_getMetaClass   8B
non-lazy-pointer-to-local: _objc_getClass   8B
non-lazy-pointer-to-local: _objc_copyClassNamesForImage 8B
non-lazy-pointer-to-local: _objc_allocateClassPair …

Run Code Online (Sandbox Code Playgroud)

xcode linker text-segmentation

Hik*_*ari

2017 05-11

1
推荐指数

1
解决办法

1798
查看次数