检测发送文本所需的SMS数量的最佳方法

AFT*_*AFT 9 php sms encoding ascii ucs2

我正在寻找一个php中的代码/ lib,我会调用它并将文本传递给它,它会告诉我:

  1. 我需要使用什么编码才能将此文本作为SMS发送(7,8,16位)
  2. 我将使用多少短信发送此文本(必须聪明地计算"分段信息",如http://ozekisms.com/index.php?owpn=612)

你知道任何代码/ lib存在会为我做这个吗?

我再也不想发送短信或转发短信,只是为了给我提供有关短信的信息

更新:

好的,我做了下面的代码,似乎工作正常,如果你有更好的/优化的代码/解决方案/ lib,请告诉我

$text = '\@£$¥èéùìòÇØøÅå?_?????????ÆæßÉ -./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€' ; //"\\". //'"';//' ';

print $text . "\n";
print isGsm7bit($text). "\n";
print getNumberOfSMSsegments($text). "\n";




function getNumberOfSMSsegments($text,$MaxSegments=6){
/*
http://en.wikipedia.org/wiki/SMS

Larger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, 
in which case each message will start with a user data header (UDH) containing segmentation information. 
Since UDH is part of the payload, the number of available characters per segment is lower: 
153 for 7-bit encoding, 
134 for 8-bit encoding and 
67 for 16-bit encoding. 
The receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. 
While the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, 
and long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. 
Some providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.
*/
$TotalSegment=0;
$textlen = mb_strlen($text);
if($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don't allow empty SMS

if(isGsm7bit($text)){ //7-bit
    $SingleMax=160;
    $ConcatMax=153;
}else{ //UCS-2 Encoding (16-bit)
    $SingleMax=70;
    $ConcatMax=67;
}

if($textlen<=$SingleMax){
    $TotalSegment = 1;
}else{
    $TotalSegment = ceil($textlen/$ConcatMax);
}

if($TotalSegment>$MaxSegments) return false; //SMS is very big.
return $TotalSegment;
}

function isGsm7bit($text){
$gsm7bitChars = "\\\@£\$¥èéùìòÇ\nØø\rÅå?_?????????ÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€";
$textlen = mb_strlen($text);
for ($i = 0;$i < $textlen; $i++){
    if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\")){return false;} //strpos not able to detect \ in string
}
return true;
}
Run Code Online (Sandbox Code Playgroud)

spr*_*key 5

我在这里添加一些额外的信息,因为先前的答案不太正确。

这些是问题:

  • 需要将当前的字符串编码指定为mb_string,否则可能无法正确收集
  • 在7位GSM编码中,基本字符集扩展字符(^ {} \ [〜] |€)每个都需要14位来编码,因此它们每个都算作两个字符。
  • 在UCS-2编码中,必须警惕16位BMP之外的表情符号和其他字符,因为...
  • 带有UCS-2的GSM会计数16位字符,因此,如果您有一个字符(U + 1F4A9),并且您的运营商和电话都偷偷地支持UTF-16而不只是UCS-2,它将被编码为16个代理对UTF-16中的两位字符,因此算作字符串长度的两个16位字符。mb_strlen只会将此视为一个字符。

如何计算7位字符:

到目前为止,我想到的是计算7位字符的方法:

// Internal encoding must be set to UTF-8,
// and the input string must be UTF-8 encoded for this to work correctly
protected function count_gsm_string($str)
{
    // Basic GSM character set (one 7-bit encoded char each)
    $gsm_7bit_basic = "@£$¥èéùìòÇ\nØø\rÅå?_?????????ÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà";

    // Extended set (requires escape code before character thus 2x7-bit encodings per)
    $gsm_7bit_extended = "^{}\\[~]|€";

    $len = 0;

    for($i = 0; $i < mb_strlen($str); $i++) {
        if(mb_strpos($gsm_7bit_basic, $str[$i]) !== FALSE) {
            $len++;
        } else if(mb_strpos($gsm_7bit_extended, $str[$i]) !== FALSE) {
            $len += 2;
        } else {
            return -1; // cannot be encoded as GSM, immediately return -1
        }
    }

    return $len;
}
Run Code Online (Sandbox Code Playgroud)

如何计算16位字符:

  • 将字符串转换为UTF-16表示形式(以保留表情符号字符mb_convert_encoding($str, 'UTF-16', 'UTF-8')
    • 请勿转换为UCS-2,因为这会造成mb_convert_encoding
  • 用2计数字节count(unpack('C*', $utf16str))并除以2,以得到计入GSM多段长度的UCS-2 16位字符的数量

* caveat emptor,一个用于计数字节的单词:

  • 不要使用strlen计数的字节数。尽管它可能会起作用,但strlen在具有多字节功能的版本的PHP安装中通常会过载,并且将来还会更改API
  • 避免mb_strlen($str, 'UCS-2')。虽然它目前可以正常工作,并且会为一堆便便字符正确返回2(因为它看起来像两个16位UCS-2字符),但是mb_convert_encoding当从> 16位转换为UCS-2时,其稳定状态是有损的。谁说mb_strlen将来不会有损?
  • 避免mb_strlen($str, '8bit') / 2。它目前也可以使用,并且建议在PHP文档注释中将其作为计数字节的方法。但是,IMO与上述UCS-2技术存在相同的问题。
  • 这使得最安全的方式电流(IMO)作为unpack荷兰国际集团到一个字节数组,并计算这一点。

那么,这是什么样的呢?

// Internal encoding must be set to UTF-8,
// and the input string must be UTF-8 encoded for this to work correctly
protected function count_ucs2_string($str)
{
    $utf16str = mb_convert_encoding($str, 'UTF-16', 'UTF-8');
    // C* option gives an unsigned 16-bit integer representation of each byte
    // which option you choose doesn't actually matter as long as you get one value per byte
    $byteArray = unpack('C*', $utf16str);
    return count($byteArray) / 2;
}
Run Code Online (Sandbox Code Playgroud)

放在一起:

function multipart_count($str)
{
    $one_part_limit = 160; // use a constant i.e. GSM::SMS_SINGLE_7BIT
    $multi_limit = 153; // again, use a constant
    $max_parts = 3; // ... constant

    $str_length = count_gsm_string($str);
    if($str_length === -1) {
        $one_part_limit = 70; // ... constant
        $multi_limit = 67; // ... constant
        $str_length = count_ucs2_string($str);
    }

    if($str_length <= $one_part_limit) {
        // fits in one part
        return 1;
    } else if($str_length > ($max_parts * $multi_limit) {
        // too long
        return -1; // or throw exception, or false, etc.
    } else {
        // divide the string length by multi_limit and round up to get number of parts
        return ceil($str_length / $multi_limit);
    }
}
Run Code Online (Sandbox Code Playgroud)

变成了图书馆...

https://bitbucket.org/solvam/smstools


AFT*_*AFT 4

迄今为止我拥有的最佳解决方案:

\n\n
$text = \'\\@\xc2\xa3$\xc2\xa5\xc3\xa8\xc3\xa9\xc3\xb9\xc3\xac\xc3\xb2\xc3\x87\xc3\x98\xc3\xb8\xc3\x85\xc3\xa5\xce\x94_\xce\xa6\xce\x93\xce\x9b\xce\xa9\xce\xa0\xce\xa8\xce\xa3\xce\x98\xce\x9e\xc3\x86\xc3\xa6\xc3\x9f\xc3\x89 -./0123456789:;<=>?\xc2\xa1ABCDEFGHIJKLMNOPQRSTUVWXYZ\xc3\x84\xc3\x96\xc3\x91\xc3\x9c\xc2\xa7\xc2\xbfabcdefghijklmnopqrstuvwxyz\xc3\xa4\xc3\xb6\xc3\xb1\xc3\xbc\xc3\xa0^{}[~]|\xe2\x82\xac\' ; //"\\\\". //\'"\';//\' \';\n\nprint $text . "\\n";\nprint isGsm7bit($text). "\\n";\nprint getNumberOfSMSsegments($text). "\\n";\n\nfunction getNumberOfSMSsegments($text,$MaxSegments=6){\n/*\nhttp://en.wikipedia.org/wiki/SMS\n\nLarger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, \nin which case each message will start with a user data header (UDH) containing segmentation information. \nSince UDH is part of the payload, the number of available characters per segment is lower: \n153 for 7-bit encoding, \n134 for 8-bit encoding and \n67 for 16-bit encoding. \nThe receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. \nWhile the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, \nand long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. \nSome providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.\n*/\n$TotalSegment=0;\n$textlen = mb_strlen($text);\nif($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don\'t allow empty SMS\n\nif(isGsm7bit($text)){ //7-bit\n    $SingleMax=160;\n    $ConcatMax=153;\n}else{ //UCS-2 Encoding (16-bit)\n    $SingleMax=70;\n    $ConcatMax=67;\n}\n\nif($textlen<=$SingleMax){\n    $TotalSegment = 1;\n}else{\n    $TotalSegment = ceil($textlen/$ConcatMax);\n}\n\nif($TotalSegment>$MaxSegments) return false; //SMS is very big.\nreturn $TotalSegment;\n}\n\nfunction isGsm7bit($text){\n$gsm7bitChars = "\\\\\\@\xc2\xa3\\$\xc2\xa5\xc3\xa8\xc3\xa9\xc3\xb9\xc3\xac\xc3\xb2\xc3\x87\\n\xc3\x98\xc3\xb8\\r\xc3\x85\xc3\xa5\xce\x94_\xce\xa6\xce\x93\xce\x9b\xce\xa9\xce\xa0\xce\xa8\xce\xa3\xce\x98\xce\x9e\xc3\x86\xc3\xa6\xc3\x9f\xc3\x89 !\\"#\xc2\xa4%&\'()*+,-./0123456789:;<=>?\xc2\xa1ABCDEFGHIJKLMNOPQRSTUVWXYZ\xc3\x84\xc3\x96\xc3\x91\xc3\x9c\xc2\xa7\xc2\xbfabcdefghijklmnopqrstuvwxyz\xc3\xa4\xc3\xb6\xc3\xb1\xc3\xbc\xc3\xa0^{}[~]|\xe2\x82\xac";\n$textlen = mb_strlen($text);\nfor ($i = 0;$i < $textlen; $i++){\n    if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\\\")){return false;} //strpos not     able to detect \\ in string\n}\nreturn true;\n}\n
Run Code Online (Sandbox Code Playgroud)\n