如何使用漂亮的汤 python 从脚本标签中提取 json?

fre*_*123 4 html python json beautifulsoup web-scraping

我想reviewCount使用美丽的汤从脚本标签中提取。尝试了不同的方法,但没有成功。

<script type="application/json" data-initial-state="review-filter">
{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}
</script>
Run Code Online (Sandbox Code Playgroud)

Jam*_*wis 5

这应该有效,我绝对确定有一种更优雅的方法:

import json
from bs4 import BeautifulSoup

html = '''
<script type="application/json" data-initial-state="review-filter">
{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}
</script>
'''

soup = BeautifulSoup(html, 'html.parser')
res = soup.find('script')
json_object = json.loads(res.contents[0])

for language in json_object['languages']:
    print('{}: {}'.format(language['displayName'], language['reviewCount']))
Run Code Online (Sandbox Code Playgroud)

输出:

Toutes les langues: 573
français: 567
English: 6
Run Code Online (Sandbox Code Playgroud)