小编lai*_*o b的帖子

使用BeautifulSoup提取<Script的内容

1 /我正在尝试使用美丽的汤提取脚本的一部分,但它打印无.怎么了？

URL = "http://www.reuters.com/video/2014/08/30/woman-who-drank-restaurants-tainted-tea?videoId=341712453"
oururl= urllib2.urlopen(URL).read()
soup = BeautifulSoup(oururl)

for script in soup("script"):
        script.extract()

list_of_scripts = soup.findAll("script")
print list_of_scripts

Run Code Online (Sandbox Code Playgroud)

2 /目标是提取属性"transcript"的值:

<script type="application/ld+json">
{
    "@context": "http://schema.org",
    "@type": "VideoObject",
    "video": {
        "@type": "VideoObject",
        "headline": "Woman who drank restaurant&#039;s tainted tea hopes for industry...",
        "caption": "Woman who drank restaurant&#039;s tainted tea hopes for industry...",  
        "transcript": "Jan Harding is speaking out for the first time about the ordeal that changed her life.               SOUNDBITE: JAN HARDING, DRANK TAINTED TEA, SAYING:               \"Immediately my …

Run Code Online (Sandbox Code Playgroud)

python beautifulsoup python-2.7

lai*_*o b

2014 10-04

8
推荐指数

2
解决办法

1万
查看次数

标签统计

beautifulsoup ×1

python ×1

python-2.7 ×1

使用BeautifulSoup提取<Script的内容

标签 统计

小编lai_o b的帖子

标签统计