beautifulsoup 4 + python：字符串返回“无”

Question

beautifulsoup 4 + python：字符串返回“无”

cro*_*eaf 2 python parsing beautifulsoup html-parsing

我试图用 BeautifulSoup4 和 Python 2.7.6 解析一些 html，但字符串返回“None”。我试图解析的 HTML 是：

<div class="booker-booking">
    2&nbsp;rooms
    &#0183;
    USD&nbsp;0
    <!-- Commission: USD  -->
</div>

Run Code Online (Sandbox Code Playgroud)

我的Python片段是：

 data = soup.find('div', class_='booker-booking').string

Run Code Online (Sandbox Code Playgroud)

我还尝试过以下两种：

data = soup.find('div', class_='booker-booking').text
data = soup.find('div', class_='booker-booking').contents[0]

Run Code Online (Sandbox Code Playgroud)

两者都返回：

u'\n\t\t2\xa0rooms \n\t\t\xb7\n\t\tUSD\xa00\n\t\t\n

Run Code Online (Sandbox Code Playgroud)

我最终试图将第一行放入一个仅表示“2 Rooms”的变量中，将第三行放入另一个仅表示“USD 0”的变量中。

Answer 1

jfs*_*jfs 5

.string返回，None因为文本节点不是唯一的子节点（有注释）。

\n\n

from bs4 import BeautifulSoup, Comment\n\nsoup = BeautifulSoup(html)\ndiv = soup.find(\'div\', \'booker-booking\')\n# remove comments\ntext = " ".join(div.find_all(text=lambda t: not isinstance(t, Comment)))\n# -> u\'\\n    2\\xa0rooms\\n    \\xb7\\n    USD\\xa00\\n     \\n\'\n

Run Code Online (Sandbox Code Playgroud)\n\n

要删除 Unicode 空白：

\n\n

text = " ".join(text.split())\n# -> u\'2 rooms \\xb7 USD 0\'\nprint text\n# -> 2 rooms \xc2\xb7 USD 0\n

Run Code Online (Sandbox Code Playgroud)\n\n

要获取最终变量：

\n\n

var1, var2 = [s.strip() for s in text.split(u"\\xb7")]\n# -> u\'2 rooms\', u\'USD 0\'\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	11 年，10 月前
查看次数：	10374 次
最近记录：	11 年，10 月前