相关疑难解决方法(0)

从Python中的字符串中删除表情符号

我在Python中发现了这个用于删除表情符号的代码,但它无效.你能帮忙解决其他问题吗?

我已经观察到我的所有emjois都开始了\xf但是当我尝试搜索时str.startswith("\xf")我得到了无效的字符错误.

emoji_pattern = r'/[x{1F601}-x{1F64F}]/u'
re.sub(emoji_pattern, '', word)
Run Code Online (Sandbox Code Playgroud)

这是错误:

Traceback (most recent call last):
  File "test.py", line 52, in <module>
    re.sub(emoji_pattern,'',word)
  File "/usr/lib/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/lib/python2.7/re.py", line 244, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range
Run Code Online (Sandbox Code Playgroud)

列表中的每个项目都可以是单词 ['This', 'dog', '\xf0\x9f\x98\x82', 'https://t.co/5N86jYipOI']

更新:我使用了其他代码:

emoji_pattern=re.compile(ur" " " [\U0001F600-\U0001F64F] # emoticons \
                                 |\
                                 [\U0001F300-\U0001F5FF] # symbols & pictographs\
                                 |\
                                 [\U0001F680-\U0001F6FF] # transport & map symbols\
                                 |\
                                 [\U0001F1E0-\U0001F1FF] …
Run Code Online (Sandbox Code Playgroud)

python string unicode special-characters emoji

25
推荐指数
12
解决办法
5万
查看次数

与Python 3.4,Unicode,不同语言和Windows有什么关系?

快乐的例子:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

czech = u'Leoš Janá?ek'.encode("utf-8")
print(czech)

pl = u'Zdzis?aw Beksi?ski'.encode("utf-8")
print(pl)

jp = u'??? ?? ??'.encode("utf-8")
print(jp)

chinese = u'??'.encode("utf-8")
print(chinese)

MIR = u'?????? ??? ?????????? ????????'.encode("utf-8")
print(MIR)

pt = u'Minha Língua Portuguesa: çáà'.encode("utf-8")
print(pt)
Run Code Online (Sandbox Code Playgroud)

不愉快的输出:

b'Leo\xc5\xa1 Jan\xc3\xa1\xc4\x8dek'
b'Zdzis\xc5\x82aw Beksi\xc5\x84ski'
b'\xe3\x83\xaa\xe3\x83\xb3\xe3\x82\xb0 \xe5\xb1\xb1\xe6\x9d\x91 \xe8\xb2\x9e\xe5\xad\x90'
b'\xe4\xba\x94\xe8\xa1\x8c'
b'\xd0\x9c\xd0\xb0\xd1\x88\xd0\xb8\xd0\xbd\xd0\xb0 \xd0\xb4\xd0\xbb\xd1\x8f \xd0\x98\xd0\xbd\xd0\xb6\xd0\xb5\xd0\xbd\xd0\xb5\xd1\x80\xd0\xbd\xd1\x8b\xd1\x85 \xd0\xa0\xd0\xb0\xd1\x81\xd1\x87\xd1\x91\xd1\x82\xd0\xbe\xd0\xb2'
b'Minha L\xc3\xadngua Portuguesa: \xc3\xa7\xc3\xa1\xc3\xa0'
Run Code Online (Sandbox Code Playgroud)

如果我像这样打印它们:

jp = u'??? ?? ??'
print(jp)
Run Code Online (Sandbox Code Playgroud)

我明白了:

Traceback (most recent call last):
  File "x.py", line 5, in <module>
    print(jp)
  File …
Run Code Online (Sandbox Code Playgroud)

python unicode

24
推荐指数
2
解决办法
2万
查看次数

标签 统计

python ×2

unicode ×2

emoji ×1

special-characters ×1

string ×1