相关疑难解决方法(0)

使用Python在Pandas中读取CSV文件时的UnicodeDecodeError

我正在运行一个处理30,000个类似文件的程序.随机数量正在停止并产生此错误......

   File "C:\Importer\src\dfman\importer.py", line 26, in import_chr
     data = pd.read_csv(filepath, names=fields)
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
     return _read(filepath_or_buffer, kwds)
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
     return parser.read()
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in read
     ret = self._engine.read(nrows)
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in read
     data = self._reader.read(nrows)
   File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas\parser.c:6745)
   File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:6964)
   File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas\parser.c:7780)
   File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:8793)
   File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens …
Run Code Online (Sandbox Code Playgroud)

python csv unicode dataframe pandas

329
推荐指数
13
解决办法
32万
查看次数

utf8编解码器无法解码python中的字节0x96

我正在尝试检查许多网站的页面上是否有某个单词.该脚本可以运行15个站点,然后停止.

UnicodeDecodeError:'utf8'编解码器无法解码位置15344中的字节0x96:无效的起始字节

我对stackoverflow进行了搜索,发现了很多问题,但我似乎无法理解我的情况出了什么问题.

我想要解决它,或者如果有错误跳过该网站.请告诉我如何做到这一点,因为我是新手,下面的代码本身花了我一天的时间来写.顺便说一下脚本停止的网站是http://www.homestead.com

filetocheck = open("bloglistforcommenting","r")
resultfile = open("finalfile","w")

for countofsites in filetocheck.readlines():
        sitename = countofsites.strip()
        htmlfile = urllib.urlopen(sitename)
        page = htmlfile.read().decode('utf8')
        match = re.search("Enter your name", page)
        if match:
            print "match found  : " + sitename
            resultfile.write(sitename+"\n")

        else:
            print "sorry did not find the pattern " +sitename

print "Finished Operations"
Run Code Online (Sandbox Code Playgroud)

根据Mark的评论,我改变了代码来实现beautifulsoup

htmlfile = urllib.urlopen("http://www.homestead.com")
page = BeautifulSoup((''.join(htmlfile)))
print page.prettify() 
Run Code Online (Sandbox Code Playgroud)

现在我收到了这个错误

page = BeautifulSoup((''.join(htmlfile)))
TypeError: 'module' object is not callable
Run Code Online (Sandbox Code Playgroud)

我正在尝试从http://www.crummy.com/software/BeautifulSoup/documentation.html#Quick%20Start快速启动示例.如果我复制粘贴它,那么代码工作正常.

我终于开始工作了.感谢大家的帮助.这是最终的代码.

import urllib
import re …
Run Code Online (Sandbox Code Playgroud)

python

24
推荐指数
2
解决办法
3万
查看次数

标签 统计

python ×2

csv ×1

dataframe ×1

pandas ×1

unicode ×1