Python web-scraping错误 - TypeError:不能在类字节对象上使用字符串模式

Question

Python web-scraping错误 - TypeError:不能在类字节对象上使用字符串模式

Jtw*_*twa 4 findall scraper web-scraping python-3.x

我想建立一个网络刮板.目前,我正在学习Python.这是非常基础!

Python代码

import urllib.request
import re

htmlfile = urllib.request.urlopen("http://basketball.realgm.com/")

htmltext = htmlfile.read()
title = re.findall('<title>(.*)</title>', htmltext)

print (htmltext)

Run Code Online (Sandbox Code Playgroud)

错误:

  File "C:\Python33\lib\re.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

Run Code Online (Sandbox Code Playgroud)

Answer 1

tim*_*geb 5

您必须解码数据.既然有问题的网站说

charset=iso-8859-1

Run Code Online (Sandbox Code Playgroud)

用那个.在这种情况下,utf-8不起作用.

htmltext = htmlfile.read().decode('iso-8859-1')

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，4 月前
查看次数：	2826 次
最近记录：	11 年，3 月前