如何获得随机的unicode字符串

Question

如何获得随机的unicode字符串

abh*_*bhi 3 encoding utf-8 python-2.7 python-unicode

我正在测试基于REST的服务，输入之一是文本字符串。所以我从我的python代码发送随机的unicode字符串。到目前为止，我发送的unicode字符串在ascii范围内，因此一切正常。

现在，我尝试发送超出ASCII范围的字符，并且遇到编码错误。这是我的代码。我已经通过此链接，但仍然无法绕过它。

# coding=utf-8

import os, random, string
import json

junk_len = 512
junk =  (("%%0%dX" % junk_len) % random.getrandbits(junk_len * 8))

for i in xrange(1,5):
    if(len(junk) % 8 == 0):
        print u'decoding to hex'
        message = junk.decode("hex")

    print 'Hex chars %s' %message
    print u' '.join(message.encode("utf-8").strip())

Run Code Online (Sandbox Code Playgroud)

第一行打印没有任何问题，但是如果不对其进行编码，就无法将其发送到REST服务。因此，第二行尝试将其编码为utf-8。这是代码行，失败并显示以下消息。

UnicodeDecodeError：'ascii'编解码器无法解码位置7的字节0x81：序数不在范围内（128）

Answer 1

Ala*_*ack 5

正如其他人所说，由于字节序列必须正确，因此很难制作有效的随机UTF-8字节。

当Unicode将所有字符映射到0x0000到0x10FFFF之间的数字时，所有需要做的就是随机生成该范围内的数字以获得有效的Unicode地址。将随机数传递给unichar（或char在Py3上），将在随机代码点返回字符的Unicode字符串。

然后，您需要做的就是让Python编码为UTF-8以创建有效的UTF-8序列。

因为，在整个Unicode范围内，存在许多空白和不可打印的字符（由于字体限制），因此在基本多语言平面中使用0000-D7FF范围和返回字符，则系统更可能将其打印出来。当编码为UTF-8时，每个字符最多产生3个字节的序列。

普通随机

import random

def random_unicode(length):
    # Create a list of unicode characters within the range 0000-D7FF
    random_unicodes = [unichr(random.randrange(0xD7FF)) for _ in xrange(0, length)] 
    return u"".join(random_unicodes)

my_random_unicode_str = random_unicode(length=512)
my_random_utf_8_str = my_random_unicode_str.encode('utf-8')

Run Code Online (Sandbox Code Playgroud)

独特随机

import random

def unique_random_unicode(length):
    # create a list of unique randoms.
    random_ints = random.sample(xrange(0xD7FF), length)

    ## convert ints into Unicode characters
    # for each random int, generate a list of Unicode characters
    random_unicodes = [unichr(x) for x in random_ints]
    # join the list
    return u"".join(random_unicodes) 

my_random_unicode_str = unique_random_unicode(length=512)
my_random_utf_8_str = my_random_unicode_str.encode('utf-8')

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，5 月前
查看次数：	2529 次
最近记录：	8 年，3 月前