Python 中的抓取错误:“charmap”编解码器无法编码字符/无法将 str 连接到字节

Ory*_*yon 2 python web-scraping python-requests

当我尝试从“url”中抓取一些带有 Finish-Names 的文本时,出现上述错误。我尝试过的解决方案和相应的错误,在代码中注释如下。我既不知道如何解决这些问题,也不知道确切的问题是什么。我是 Python 初学者。任何帮助表示赞赏。

我的代码:

from lxml import html
import requests

page = requests.get('url')

site = page.text  # ERROR -> 'charmap' codec can't encode character '\x84' in  
      #  position {x}: character maps to <undefined>
# site = site.encode('utf-8', errors='replace')  # ERROR -> can't concat str to bytes
# site = site.encode('ascii', errors='replace')  # ERROR -> can't concat str to bytes

with open('url.txt', 'a') as file:
    try:
        file.write(site + '\n')
    except Exception as err:
        file.write('an ERROR occured: ' + str(err) + '\n')
Run Code Online (Sandbox Code Playgroud)

和原来的异常:

Traceback (most recent call last):
  File "...\parse.py", line 12, in <module> 
  file.write(site + '\n') File 
"...\python36\lib\encodings\cp1252.py", line 19, in encode return 
codecs.charmap_encode(input,self.errors,encoding_table)[0] 
UnicodeEncodeError: 'charmap' codec can't encode character '\x84' in position 
12591: character maps to <undefined>
Run Code Online (Sandbox Code Playgroud)

问候,多米尼克

Tri*_*sto 5

试试这个

with open('url.txt', 'a',encoding='utf-8') as file:
Run Code Online (Sandbox Code Playgroud)