如何在python中删除字符串中的b前缀？

Question

如何在python中删除字符串中的b前缀？

我正在导入的一堆推文在他们阅读时遇到了这个问题

b'I posted a new photo to Facebook'

Run Code Online (Sandbox Code Playgroud)

我收集b指示它是一个字节.但这证明是有问题的,因为在我最终编写的CSV文件中,b它不会消失,并且会干扰未来的代码.

有没有一种简单的方法可以b从我的文本行中删除这个前缀？

请记住,我似乎需要将文本编码为utf-8或tweepy,无法将其从网络中提取出来.

这是我正在分析的链接内容:

https://www.dropbox.com/s/sjmsbuhrghj7abt/new_tweets.txt?dl=0

new_tweets = 'content in the link'

Run Code Online (Sandbox Code Playgroud)

代码尝试

outtweets = [[tweet.text.encode("utf-8").decode("utf-8")] for tweet in new_tweets]
print(outtweets)

Run Code Online (Sandbox Code Playgroud)

错误

UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-21-6019064596bf> in <module>()
      1 for screen_name in user_list:
----> 2     get_all_tweets(screen_name,"instance file")

<ipython-input-19-e473b4771186> in get_all_tweets(screen_name, mode)
     99             with open(os.path.join(save_location,'%s.instance' % screen_name), 'w') as f:
    100                 writer = csv.writer(f)
--> 101                 writer.writerows(outtweets)
    102         else:
    103             with open(os.path.join(save_location,'%s.csv' % screen_name), 'w') as f:

C:\Users\Stan Shunpike\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20 
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 64-65: character maps to <undefined>

Run Code Online (Sandbox Code Playgroud)

Answer 1

hir*_*ist 85

你需要解码的bytes你想要的字符串:

b = b'1234'
print(b.decode('utf-8'))  # '1234'

Run Code Online (Sandbox Code Playgroud)

`.encode（“ utf-8”）。decode（“ utf-8”）`绝对没有任何作用（如果它可以工作的话）...您使用的是python 3，对吗？py3在`bytes'和`str`之间有很强的区别。您的代码中的某些内容似乎使用了cp1252编码...您可以尝试使用open（...，mode ='w'，encoding ='utf-8'）打开文件，而只写str `到文件；或者您忘记了所有编码，并以二进制形式写入文件：`open（...，mode ='wb'）`（注意`b`）而仅写入`bytes'。有帮助吗？ (2认同)

Answer 2

Jon*_*mar 12

它只是让你知道你打印的对象不是字符串,而是字节对象作为字节文字.人们以不完整的方式解释这一点,所以这是我的看法.

考虑通过键入字节文字来创建字节对象(字面上定义一个字节对象而不实际使用字节对象,例如通过键入b'')并将其转换为以utf-8编码的字符串对象.(注意,这里的转换意味着解码)

byte_object= b"test" # byte object by literally typing characters
print(byte_object) # Prints b'test'
print(byte_object.decode('utf8')) # Prints "test" without quotations

Run Code Online (Sandbox Code Playgroud)

你看,我们只是应用这个.decode(utf8)功能.

Python中的字节

https://docs.python.org/3.3/library/stdtypes.html#bytes

字符串文字由以下词法定义描述:

https://docs.python.org/3.3/reference/lexical_analysis.html#string-and-bytes-literals

stringliteral   ::=  [stringprefix](shortstring | longstring)
stringprefix    ::=  "r" | "u" | "R" | "U"
shortstring     ::=  "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring      ::=  "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem ::=  shortstringchar | stringescapeseq
longstringitem  ::=  longstringchar | stringescapeseq
shortstringchar ::=  <any source character except "\" or newline or the quote>
longstringchar  ::=  <any source character except "\">
stringescapeseq ::=  "\" <any source character>

bytesliteral   ::=  bytesprefix(shortbytes | longbytes)
bytesprefix    ::=  "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
shortbytes     ::=  "'" shortbytesitem* "'" | '"' shortbytesitem* '"'
longbytes      ::=  "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""'
shortbytesitem ::=  shortbyteschar | bytesescapeseq
longbytesitem  ::=  longbyteschar | bytesescapeseq
shortbyteschar ::=  <any ASCII character except "\" or newline or the quote>
longbyteschar  ::=  <any ASCII character except "\">
bytesescapeseq ::=  "\" <any ASCII character>

Run Code Online (Sandbox Code Playgroud)

Answer 3

小智 6

****如何删除在python中解码字符串的b' '字符****

import base64
a='cm9vdA=='
b=base64.b64decode(a).decode('utf-8')
print(b)

Run Code Online (Sandbox Code Playgroud)

Answer 4

sal*_*hed 5

您需要对其进行解码以将其转换为字符串。在python3中检查有关字节字面量的答案。

In [1]: b'I posted a new photo to Facebook'.decode('utf-8')
Out[1]: 'I posted a new photo to Facebook'

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，9 月前
查看次数：	67416 次
最近记录：	6 年，10 月前