将文本分成单词问题PHP,复杂的问题

Gra*_*it 2 php split

我试图将文本分成单词:

$delimiterList = array(" ", ".", "-", ",", ";", "_", ":",
           "!", "?", "/", "(", ")", "[", "]", "{", "}", "<", ">", "\r", "\n",
           '"');
$words = mb_split($delimiterList, $string);
Run Code Online (Sandbox Code Playgroud)

用字符串工作得很好,但我遇到了一些与数字有关的情况.

例如,如果我有文本"看看这个.我的分数是3.14,我很高兴." 现在阵列是

[0]=>Look,
[1]=>at,
[2]=>this,
[3]=>My,
[4]=>score,
[5]=>is,
[6]=>3,
[7]=>14,
[8]=>and, ....
Run Code Online (Sandbox Code Playgroud)

然后3.14分为3和14,这在我的情况下不应该发生.我的意思是点应分两个字符串而不是两个数字.应该是这样的:

[0]=>Look,
[1]=>at,
[2]=>this,
[3]=>My,
[4]=>score,
[5]=>is,
[6]=>3.14,
[7]=>and, ....
Run Code Online (Sandbox Code Playgroud)

但我不知道如何避免这种情况!

任何人都知道如何解决这个问题?

Thanx,Granit

pto*_*mli 9

或者使用正则表达式:)

<?php
$str = "Look at this.My score is 3.14, and I am happy about it.";

// alternative to handle Marko's example (updated)
// /([\s_;?!\/\(\)\[\]{}<>\r\n"]|\.$|(?<=\D)[:,.\-]|[:,.\-](?=\D))/

var_dump(preg_split('/([\s\-_,:;?!\/\(\)\[\]{}<>\r\n"]|(?<!\d)\.(?!\d))/',
                    $str, null, PREG_SPLIT_NO_EMPTY));

array(13) {
  [0]=>
  string(4) "Look"
  [1]=>
  string(2) "at"
  [2]=>
  string(4) "this"
  [3]=>
  string(2) "My"
  [4]=>
  string(5) "score"
  [5]=>
  string(2) "is"
  [6]=>
  string(4) "3.14"
  [7]=>
  string(3) "and"
  [8]=>
  string(1) "I"
  [9]=>
  string(2) "am"
  [10]=>
  string(5) "happy"
  [11]=>
  string(5) "about"
  [12]=>
  string(2) "it"
}
Run Code Online (Sandbox Code Playgroud)


Jef*_*ber 6

看看strtok.它允许您动态更改解析令牌,因此您可以在while循环中手动拆分字符串,将每个拆分字推入数组.