我在论坛上看到了这个错误并阅读了回复,但我仍然不明白它是什么或如何解决它。我正在从互联网上的 16k 个链接中抓取数据,我的脚本从每个链接中抓取类似的信息并将其写入 .csv 中,其中一些日期是在此错误之前写入的。
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 541, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 508, in _read_next_chunk_size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 558, in _readall_chunked
chunk_left = self._get_chunk_left()
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 543, in _get_chunk_left
raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception …Run Code Online (Sandbox Code Playgroud) 我试图找到一种方法来拉出一些链接及其相关的文字与美丽的汤.HTML如下:
<tr>
<td align="left" bgcolor="#ffff99">
<font size="2">
<a href="link/I/Want.htm">
<b>Text I Want</b>
</a>
</font>
</td>
<tr>
<td align="left" bgcolor="#ffff99">
<font size="2">
<a href="link/I/Want.htm2">
<b>Text I Want2</b>
</a>
</font>
</td>
Run Code Online (Sandbox Code Playgroud)
我可以拉链接没问题:
soup.find_all('a', href=re.compile('link/I/Want'))
Run Code Online (Sandbox Code Playgroud)
但是我希望能够拉动文本并将其与链接相关联.要么让它们在列表中背靠背,要么将它们放在相同顺序的单独列表中,这样我就可以使用zip()函数.