我正在导入的一堆推文在他们阅读时遇到了这个问题
b'I posted a new photo to Facebook'
Run Code Online (Sandbox Code Playgroud)
我收集b指示它是一个字节.但这证明是有问题的,因为在我最终编写的CSV文件中,b它不会消失,并且会干扰未来的代码.
有没有一种简单的方法可以b从我的文本行中删除这个前缀?
请记住,我似乎需要将文本编码为utf-8或tweepy,无法将其从网络中提取出来.
这是我正在分析的链接内容:
https://www.dropbox.com/s/sjmsbuhrghj7abt/new_tweets.txt?dl=0
new_tweets = 'content in the link'
Run Code Online (Sandbox Code Playgroud)
outtweets = [[tweet.text.encode("utf-8").decode("utf-8")] for tweet in new_tweets]
print(outtweets)
Run Code Online (Sandbox Code Playgroud)
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-21-6019064596bf> in <module>()
1 for screen_name in user_list:
----> 2 get_all_tweets(screen_name,"instance file")
<ipython-input-19-e473b4771186> in get_all_tweets(screen_name, mode)
99 with open(os.path.join(save_location,'%s.instance' % screen_name), 'w') as f:
100 writer = csv.writer(f)
--> 101 writer.writerows(outtweets)
102 else:
103 with open(os.path.join(save_location,'%s.csv' % screen_name), 'w') as f:
C:\Users\Stan Shunpike\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode characters in position 64-65: character maps to <undefined>
Run Code Online (Sandbox Code Playgroud)
hir*_*ist 85
你需要解码的bytes你想要的字符串:
b = b'1234'
print(b.decode('utf-8')) # '1234'
Run Code Online (Sandbox Code Playgroud)
Jon*_*mar 12
它只是让你知道你打印的对象不是字符串,而是字节对象作为字节文字.人们以不完整的方式解释这一点,所以这是我的看法.
考虑通过键入字节文字来创建字节对象(字面上定义一个字节对象而不实际使用字节对象,例如通过键入b'')并将其转换为以utf-8编码的字符串对象.(注意,这里的转换意味着解码)
byte_object= b"test" # byte object by literally typing characters
print(byte_object) # Prints b'test'
print(byte_object.decode('utf8')) # Prints "test" without quotations
Run Code Online (Sandbox Code Playgroud)
你看,我们只是应用这个.decode(utf8)功能.
https://docs.python.org/3.3/library/stdtypes.html#bytes
https://docs.python.org/3.3/reference/lexical_analysis.html#string-and-bytes-literals
stringliteral ::= [stringprefix](shortstring | longstring)
stringprefix ::= "r" | "u" | "R" | "U"
shortstring ::= "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring ::= "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem ::= shortstringchar | stringescapeseq
longstringitem ::= longstringchar | stringescapeseq
shortstringchar ::= <any source character except "\" or newline or the quote>
longstringchar ::= <any source character except "\">
stringescapeseq ::= "\" <any source character>
bytesliteral ::= bytesprefix(shortbytes | longbytes)
bytesprefix ::= "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
shortbytes ::= "'" shortbytesitem* "'" | '"' shortbytesitem* '"'
longbytes ::= "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""'
shortbytesitem ::= shortbyteschar | bytesescapeseq
longbytesitem ::= longbyteschar | bytesescapeseq
shortbyteschar ::= <any ASCII character except "\" or newline or the quote>
longbyteschar ::= <any ASCII character except "\">
bytesescapeseq ::= "\" <any ASCII character>
Run Code Online (Sandbox Code Playgroud)
小智 6
****如何删除在python中解码字符串的b' '字符****
import base64
a='cm9vdA=='
b=base64.b64decode(a).decode('utf-8')
print(b)
Run Code Online (Sandbox Code Playgroud)
您需要对其进行解码以将其转换为字符串。在python3中检查有关字节字面量的答案 。
In [1]: b'I posted a new photo to Facebook'.decode('utf-8')
Out[1]: 'I posted a new photo to Facebook'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
67416 次 |
| 最近记录: |