RJa*_*mes 8 html python json beautifulsoup python-3.x
我试图从这里的脚本标签中的代码中提取campaign_hearts和postal_code(整个代码太长,无法发布):
<script>
...
"campaign_hearts":4817,"social_share_total":11242,"social_share_last_update":"2020-01-17T10:51:22-06:00","location":{"city":"Los Angeles, CA","country":"US","postal_code":"90012"},"is_partner":false,"partner":{},"is_team":true,"team":{"name":"Team STEVENS NATION","team_pic_url":"https://d2g8igdw686xgo.cloudfront.net
...
Run Code Online (Sandbox Code Playgroud)
我可以使用以下代码识别我需要的脚本:
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
from time import sleep
import requests
import re
import json
page = requests.get("https://www.gofundme.com/f/eric-stevens-care-trust")
soup = BeautifulSoup(page.content, 'html.parser')
all_scripts = soup.find_all('script')
all_scripts[0]
Run Code Online (Sandbox Code Playgroud)
但是,我不知道如何提取我想要的值。(我对 Python 非常陌生。) 该线程为类似问题推荐了以下解决方案(经过编辑以反映我正在使用的 html)。
data = json.loads(all_scripts[0].get_text()[27:])
Run Code Online (Sandbox Code Playgroud)
但是,运行它会产生错误: JSONDecodeError: Expecting value: line 1 column 1 (char 0).
现在我已经确定了正确的脚本,我该怎么做才能提取我需要的值?我也尝试过此处列出的解决方案,但在导入解析器时遇到问题。
您使用的库越多;代码变得越低效!这是一个更简单的解决方案 -
#This imports the website content.
import requests
url = "https://www.gofundme.com/f/eric-stevens-care-trust"
a = requests.post(url)
a= (a.content)
print(a)
#These will show your data.
campaign_hearts = str(a,'utf-8').split('campaign_hearts":')[1]
campaign_hearts = campaign_hearts.split(',"social_share_total"')[0]
print(campaign_hearts)
postal_code = str(a,'utf-8').split('postal_code":"')[1]
postal_code = postal_code.split('"},"is_partner')[0]
print(postal_code)
Run Code Online (Sandbox Code Playgroud)