小编lai*_*o b的帖子

使用BeautifulSoup提取<Script的内容

1 /我正在尝试使用美丽的汤提取脚本的一部分,但它打印无.怎么了 ?

URL = "http://www.reuters.com/video/2014/08/30/woman-who-drank-restaurants-tainted-tea?videoId=341712453"
oururl= urllib2.urlopen(URL).read()
soup = BeautifulSoup(oururl)

for script in soup("script"):
        script.extract()

list_of_scripts = soup.findAll("script")
print list_of_scripts
Run Code Online (Sandbox Code Playgroud)

2 /目标是提取属性"transcript"的值:

<script type="application/ld+json">
{
    "@context": "http://schema.org",
    "@type": "VideoObject",
    "video": {
        "@type": "VideoObject",
        "headline": "Woman who drank restaurant&#039;s tainted tea hopes for industry...",
        "caption": "Woman who drank restaurant&#039;s tainted tea hopes for industry...",  
        "transcript": "Jan Harding is speaking out for the first time about the ordeal that changed her life.               SOUNDBITE: JAN HARDING, DRANK TAINTED TEA, SAYING:               \"Immediately my …
Run Code Online (Sandbox Code Playgroud)

python beautifulsoup python-2.7

8
推荐指数
2
解决办法
1万
查看次数

标签 统计

beautifulsoup ×1

python ×1

python-2.7 ×1