相关疑难解决方法(0)

在 Python 中使用 BeautifulSoup 从脚本标签中提取数据

我想使用 Python 中的 BeautifulSoup 从“script”标签中的代码中提取“SNG_TITLE”和“ART_NAME”值。(整个脚本太长无法粘贴)

<script>window.__DZR_APP_STATE__ = {"TAB":{"loved":{"data":[{"SNG_ID":"126884459","PRODUCT_TRACK_ID":"360276641","UPLOAD_ID":0,"SNG_TITLE":"Heathens","ART_ID":"647650","PROVIDER_ID":"3","ART_NAME":"Twenty One Pilots","ARTISTS":[{"ART_ID":"647650","ROLE_ID":"0","ARTISTS_SONGS_ORDER":"1","ART_NAME":"Twenty One Pilots","ART_PICTURE":"259dcf52853363d79753ec301377645d","SMARTRADIO":"1","RANK":"487762","LOCALES":[],"__TYPE__":"artist"}],"ALB_ID":"13371165","ALB_TITLE":"Heathens","TYPE":0,"MD5_ORIGIN":"5cea723b83af1ff0a62d65d334b978d4","VIDEO":false,"DURATION":"195","ALB_PICTURE":"3dfc8c9e406cf1bba8ce0695a44a9b7e","ART_PICTURE":"259dcf52853363d79753ec301377645d","RANK_SNG":"967143","SMARTRADIO":"1","FILESIZE_AAC_64":0,"FILESIZE_MP3_64":"0","FILESIZE_MP3_128":"3135946","FILESIZE_MP3_256":0,"FILESIZE_MP3_320":"7839868","FILESIZE_FLAC":"21777150","FILESIZE":"3135946","GAIN":"-12","MEDIA_VERSION":"4","DISK_NUMBER":"1","TRACK_NUMBER":"1","VERSION":"","EXPLICIT_LYRICS":"0","RIGHTS":{"STREAM_ADS_AVAILABLE":true,"STREAM_ADS":"2000-01-01","STREAM_SUB_AVAILABLE":true,"STREAM_SUB":"2000-01-01"},"ISRC":"USAT21601930","DATE_ADD":1497886149,"HIERARCHICAL_TITLE":"","SNG_CONTRIBUTORS":{"mainartist":["Twenty One Pilots"],"engineer":["Adam Hawkins"],"mixer":["Adam Hawkins"],"masterer":["Chris Gehringer"],"drums":["Josh Dun"],"producer":["Mike Elizondo","Tyler Joseph"],"programmer":["Mike Elizondo","Tyler Joseph"],"vocals":["Tyler Joseph"],"writer":["Tyler Joseph"]},"LYRICS_ID":30553991,"__TYPE__":"song"},{"SNG_ID":"99976952","PRODUCT_TRACK_ID":"171067651","UPLOAD_ID":0,"SNG_TITLE":"Stressed Out","ART_ID":"647650","PROVIDER_ID":"3","ART_NAME":"Twenty One Pilots","ARTISTS":[{"ART_ID":"647650","ROLE_ID":"0","ARTISTS_SONGS_ORDER":"1","ART_NAME":"Twenty One Pilots", ...</script>
Run Code Online (Sandbox Code Playgroud)

代码的想法是打印出用户名,可以在给定页面上找到的所有歌曲和艺术家姓名。

import requests
from bs4 import BeautifulSoup

base_url = 'https://www.deezer.com/en/profile/1589856782/loved'

r = requests.get(base_url)

soup = BeautifulSoup(r.text, 'html.parser')

user_name = soup.find(class_='user-name')
print(user_name.text)
Run Code Online (Sandbox Code Playgroud)

这将打印用户名。

for script in soup.find_all('script'):
    print(script.contents) 
Run Code Online (Sandbox Code Playgroud)

如果我理解正确,我需要的脚本是一本字典,所以我只需要找到它并获取它的内容。问题是我不知道如何具体找到这个“脚本”。它没有任何属性或任何使它独一无二的东西。所以我尝试了一个循环,找到页面上的所有脚本并打印出它们的内容,但不知道如何进一步进行。

如何在页面上只找到这个特定的“脚本”?我可以以不同的方式访问这些值吗?

python beautifulsoup deezer

4
推荐指数
1
解决办法
1万
查看次数

使用Python中的BeautifulSoup从脚本标签中提取文本

您能帮我解决这个小问题吗?我正在寻找使用美丽的汤(Python)从以下代码中的SCRIPT标记(不在正文中)提取电子邮件,电话和姓名值。我是Python的新手,并且博客建议使用Beautiful汤进行提取。

我尝试使用以下代码获取页面-

fileDetails = BeautifulSoup(urllib2.urlopen('http://www.example.com').read())
results = fileDetails.find(email:")
Run Code Online (Sandbox Code Playgroud)

该Ajax请求代码不再在页面中重复。我们还可以编写try and catch以便在页面中未找到它时不会引发任何错误。

<script type="text/javascript" language='javascript'> 
$(document).ready( function (){

   $('#message').click(function(){
       alert();
   });

    $('#addmessage').click(function(){
        $.ajax({ 
            type: "POST",
            url: 'http://www.example.com',
            data: { 
                email: 'abc@g.com', 
                phone: '9999999999', 
                name: 'XYZ'
            }
        });
    });
});
Run Code Online (Sandbox Code Playgroud)

一旦得到这个,我也想存储在一个excel文件中。

谢谢您的期待。

python urllib2 beautifulsoup

3
推荐指数
2
解决办法
1万
查看次数

标签 统计

beautifulsoup ×2

python ×2

deezer ×1

urllib2 ×1