SMT*_*MTH 1 python beautifulsoup web-scraping python-requests
我正在尝试从网页中的表格中获取一些动态值。此图像表示我希望从该页面获取的值。应该有任何方法可以使用请求来获取它们。为了让您知道,我在开发工具中查找了任何隐藏的 api,还通过页面源代码中的脚本标签查找了值,但我找不到。
这是网站网址
这是我所追求的预期输出。
这是我到目前为止写的:
import requests
from bs4 import BeautifulSoup
url = "https://www.dailyfx.com/sentiment"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.text,"lxml")
for items in soup.select(".dfx-technicalSentimentCard__barContainer"):
data = [item.get("data-value") for item in items.select("[data-type='long-value-info'],[data-type='short-value-info']")]
print(data)
Run Code Online (Sandbox Code Playgroud)
上面的脚本产生如下的空输出:
['--', '--']
['--', '--']
['--', '--']
['--', '--']
['--', '--']
['--', '--']
['--', '--']
Run Code Online (Sandbox Code Playgroud)
如何使用请求从该表中获取值?
由于内容是动态加载的,因此您必须使用 selenium 来收集所需的信息
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
chrome_options = Options()
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--headless")
path_to_chromedriver = 'chromedriver'
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=path_to_chromedriver)
driver.get('https://www.dailyfx.com/sentiment')
driver.find_element_by_tag_name('body').send_keys(Keys.PAGE_DOWN)
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.PAGE_DOWN)
soup = BeautifulSoup(driver.page_source, "lxml")
for items in soup.select(".dfx-technicalSentimentCard__barContainer"):
data = [item.get("data-value") for item in items.select("[data-type='long-value-info'],[data-type='short-value-info']")]
print(data)
driver.quit()
Run Code Online (Sandbox Code Playgroud)
对于此代码,我们可以看到以下输出:
['43', '57']
['53', '47']
['38', '62']
['56', '44']
['57', '43']
['39', '61']
['48', '52']
['77', '23']
['41', '59']
['55', '45']
['56', '44']
['74', '26']
['65', '35']
['87', '13']
['55', '45']
['32', '68']
['43', '57']
['45', '55']
['64', '36']
['56', '44']
['84', '16']
['86', '14']
['97', '3']
['90', '10']
Run Code Online (Sandbox Code Playgroud)