我正在运行一个处理30,000个类似文件的程序.随机数量正在停止并产生此错误......
File "C:\Importer\src\dfman\importer.py", line 26, in import_chr
data = pd.read_csv(filepath, names=fields)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
return parser.read()
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in read
ret = self._engine.read(nrows)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in read
data = self._reader.read(nrows)
File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas\parser.c:6745)
File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:6964)
File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas\parser.c:7780)
File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:8793)
File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens …Run Code Online (Sandbox Code Playgroud) 我正在尝试检查许多网站的页面上是否有某个单词.该脚本可以运行15个站点,然后停止.
UnicodeDecodeError:'utf8'编解码器无法解码位置15344中的字节0x96:无效的起始字节
我对stackoverflow进行了搜索,发现了很多问题,但我似乎无法理解我的情况出了什么问题.
我想要解决它,或者如果有错误跳过该网站.请告诉我如何做到这一点,因为我是新手,下面的代码本身花了我一天的时间来写.顺便说一下脚本停止的网站是http://www.homestead.com
filetocheck = open("bloglistforcommenting","r")
resultfile = open("finalfile","w")
for countofsites in filetocheck.readlines():
sitename = countofsites.strip()
htmlfile = urllib.urlopen(sitename)
page = htmlfile.read().decode('utf8')
match = re.search("Enter your name", page)
if match:
print "match found : " + sitename
resultfile.write(sitename+"\n")
else:
print "sorry did not find the pattern " +sitename
print "Finished Operations"
Run Code Online (Sandbox Code Playgroud)
根据Mark的评论,我改变了代码来实现beautifulsoup
htmlfile = urllib.urlopen("http://www.homestead.com")
page = BeautifulSoup((''.join(htmlfile)))
print page.prettify()
Run Code Online (Sandbox Code Playgroud)
现在我收到了这个错误
page = BeautifulSoup((''.join(htmlfile)))
TypeError: 'module' object is not callable
Run Code Online (Sandbox Code Playgroud)
我正在尝试从http://www.crummy.com/software/BeautifulSoup/documentation.html#Quick%20Start快速启动示例.如果我复制粘贴它,那么代码工作正常.
我终于开始工作了.感谢大家的帮助.这是最终的代码.
import urllib
import re …Run Code Online (Sandbox Code Playgroud)