我如何截断一个java,String
以便我知道一旦它是UTF-8编码它将适合给定数量的字节存储?
我正在使用Zemanta API,每次调用最多可接受8 KB的文本.我正在使用JavaScript从网页中提取要发送给Zemanta的文本,所以我正在寻找一个能够以8 KB的速度截断我的文本的函数.
Zemanta应该自己进行截断(例如,如果你发送一个更大的字符串),但是我需要在进行API调用之前将这个文本稍微移动一下,所以我想保持有效负载尽可能小.
假设8 KB的文本是8,192个字符并且相应地截断是否安全?(每个字符1个字节;每KB 1,024个字符; 8 KB = 8,192个字节/字符)或者,在某些情况下,这是不准确还是仅为真?
是否有更优雅的方法根据实际文件大小截断字符串?
I'd like to shorten a string using textwrap.shorten
or a function like it. The string can potentially have non-ASCII characters. What's special here is that the maximal width
is for the bytes
encoding of the string. This problem is motivated by the fact that several database column definitions and some message buses have a bytes
based max length.
For example:
>>> import textwrap
>>> s = '? Ilsa, le méchant ? ? gardien ?'
# Available function that I …
Run Code Online (Sandbox Code Playgroud)