Tim*_*yJC 3 python html-table beautifulsoup html-parsing web-scraping
我正在尝试使用 Beautiful Soup 在 ajax 页面上抓取表格,并使用 TextTable 库以表格形式打印出来。
import BeautifulSoup
import urllib
import urllib2
import getpass
import cookielib
import texttable
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
...
def show_queue():
url = 'https://www.animenfo.com/radio/nowplaying.php'
values = {'ajax' : 'true', 'mod' : 'queue'}
data = urllib.urlencode(values)
f = opener.open(url, data)
soup = BeautifulSoup.BeautifulSoup(f)
stable = soup.find('table')
table = texttable.Texttable()
header = stable.findAll('th')
header_text = []
for th in header:
header_append = th.find(text=True)
header.append(header_append)
table.header(header_text)
rows = stable.find('tr')
for tr in rows:
cells = []
cols = tr.find('td')
for td in cols:
cells_append = td.find(text=True)
cells.append(cells_append)
table.add_row(cells)
s = table.draw
print s
...
Run Code Online (Sandbox Code Playgroud)
尽管代码中显示了我尝试抓取的相关 HTML 的 URL,但下面是一个示例:
<table cellspacing="0" cellpadding="0">
<tbody>
<tr>
<th>Artist - Title</th>
<th>Album</th>
<th>Album Type</th>
<th>Series</th>
<th>Duration</th>
<th>Type of Play</th>
<th>
<span title="...">Time to play</span>
</th>
</tr>
<tr>
<td class="row1">
<a href="..." class="songinfo">Song 1</a>
</td>
<td class="row1">
<a href="..." class="album_link">Album 1</a>
</td>
<td class="row1">...</td>
<td class="row1">
</td>
<td class="row1" style="text-align: center">
5:43
</td>
<td class="row1" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row1" style="text-align: center">
~0:00:00
</td>
</tr>
<tr>
<td class="row2">
<a href="..." class="songinfo">Song2</a>
</td>
<td class="row2">
<a href="..." class="album_link">Album 2</a>
</td>
<td class="row2">...</td>
<td class="row2">
</td>
<td class="row2" style="text-align: center">
6:16
</td>
<td class="row2" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row2" style="text-align: center">
~0:05:43
</td>
</tr>
<tr>
<td class="row1">
<a href="..." class="songinfo">Song 3</a>
</td>
<td class="row1">
<a href="..." class="album_link">Album 3</a>
</td>
<td class="row1">...</td>
<td class="row1">
</td>
<td class="row1" style="text-align: center">
4:13
</td>
<td class="row1" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row1" style="text-align: center">
~0:11:59
</td>
</tr>
<tr>
<td class="row2">
<a href="..." class="songinfo">Song 4</a>
</td>
<td class="row2">
<a href="..." class="album_link">Album 4</a>
</td>
<td class="row2">...</td>
<td class="row2">
</td>
<td class="row2" style="text-align: center">
5:34
</td>
<td class="row2" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row2" style="text-align: center">
~0:16:12
</td>
</tr>
<tr>
<td class="row1"><a href="..." class="songinfo">Song 5</a>
</td>
<td class="row1">
<a href="..." class="album_link">Album 5</a>
</td>
<td class="row1">...</td>
<td class="row1"></td>
<td class="row1" style="text-align: center">
4:23
</td>
<td class="row1" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row1" style="text-align: center">
~0:21:46
</td>
</tr>
<tr>
<td style="height: 5px;">
</td></tr>
<tr>
<td class="row2" style="font-style: italic; text-align: center;" colspan="5">There are x songs in the queue with a total length of x:y:z.</td>
</tr>
</tbody>
</table>
Run Code Online (Sandbox Code Playgroud)
每当我尝试运行此脚本函数时,它就会中止并显示TypeError: find() takes no keyword argumentson line header_append = th.find(text=True)。我有点困惑,因为我似乎正在做代码示例中显示的事情,而且它似乎应该有效,但事实并非如此。
简而言之,我如何修复代码以便不出现 TypeError 以及我做错了什么?
编辑:我在编写脚本时参考的文章和文档:
您收到错误的原因TypeError: find() takes no keyword arguments是因为您实际上是在调用find()字符串。
find是一个不带关键字参数的python 字符串方法。例子:
>>> 'hello'.find('l')
2
>>> 'hello'.find('l', foo='bar')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: find() takes no keyword arguments
Run Code Online (Sandbox Code Playgroud)
beautifulsoupTag也有一个find方法,这就是您想要使用的方法。
在代码中的某个时刻,当您想要使用标签时,您最终调用了字符串 find。
Python 使用鸭子类型,在这种情况下可能会导致混乱。