我用Python编写了一个脚本,只在SO中显示我的个人资料中可见的名称.问题是我想使用请求模块登录该站点,一旦我登录,我希望使用Selenium获取配置文件名称.底线是 - 当我得到个人资料网址时,我希望Selenium重新使用该网址来获取个人资料名称.
此工作解决方案使用请求:
import requests
from bs4 import BeautifulSoup
url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"
req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
fkey = sauce.select_one("[name='fkey']")['value']
payload = {
'fkey': fkey,
'ssrc': 'head',
'email': my_username,
'password': my_password,
'oauth_version':'',
'oauth_server':''
}
res = requests.post(url,data=payload)
soup = BeautifulSoup(res.text,"lxml")
item = soup.select_one("div[class^='gravatar-wrapper-']").get("title")
print(item)
Run Code Online (Sandbox Code Playgroud)
我现在想做的是:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"
driver = webdriver.Chrome()
req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
fkey = sauce.select_one("[name='fkey']")['value']
payload = {
'fkey': fkey,
'ssrc': …Run Code Online (Sandbox Code Playgroud) 我已经创建了一个脚本来从网页上抓取name并email寻址。当我运行脚本时,我得到了name相应的信息,但是在email这种情况下,我得到了aeccdcd7cfc0eedadcc783cdc1dc80cdc1c3。email每次运行脚本时,我得到的字符串(而不是字符串)都会更改。
到目前为止,我已经尝试过:
import requests
from bs4 import BeautifulSoup
url = "https://www.seafoodsource.com/supplier-directory/Tri-Cor-Flexible-Packaging-Inc"
res = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text,'lxml')
name = soup.select_one("[class$='-supplier-view-main-container'] > h1").text
email = soup.select_one("[class='__cf_email__']").get("data-cfemail")
print(f'{"Name: "}{name}\n{"Email: "}{email}')
Run Code Online (Sandbox Code Playgroud)
电流输出:
Name: Tri-Cor Flexible Packaging Inc
Email: aeccdcd7cfc0eedadcc783cdc1dc80cdc1c3
Run Code Online (Sandbox Code Playgroud)
预期产量:
Name: Tri-Cor Flexible Packaging Inc
Email: bryan@tri-cor.com
Run Code Online (Sandbox Code Playgroud)
PS我不追求任何与任何浏览器模拟器相关的解决方案,例如硒。
如何使用请求从该页面获取该电子邮件?
我创建了一个脚本,可通过随后发送两个https请求从目标页面获取html元素。我的脚本可以完美地完成任务。但是,我必须复制chrome开发工具中的四个值以填充其中的四个键payload,以便发送最终的http请求以到达目标页面。这是起始链接,下面是有关如何到达目标页面的说明。
Find Hotel按钮(如果chek-out默认情况下check-in日期比日期长至少一天,则无需更改日期)。Book Now按其上方的按钮。现在,它将引导您自动进入目标页面。Enter Guest Details,从此处解析html元素我已经尝试过(使用一个):
import requests
from bs4 import BeautifulSoup
url = 'https://booking.discoverqatar.qa/SearchHandler.aspx?'
second_url = 'https://booking.discoverqatar.qa/PassengerDetails.aspx?'
params = {
'Module':'H','txtCity':'','hdnCity':'2947','txtHotel':'','hdnHotel':'',
'fromDate':'05/11/2019','toDate':'07/11/2019','selZone':'','minSelPrice':'',
'maxSelPrice':'','roomConfiguration':'2|0|','noOfRooms':'1',
'hotelStandardArray':'63,60,54,50,52,51','CallFrom':'','DllNationality':'-1',
'HdnNoOfRooms':'-1','SourceXid':'MTEzNzg=','mdx':''
}
payload = {
'CallFrom':'MToxNjozOCBQTXxCMkN8MToxNjozOCBQTQ==',
'Btype':'MToxNjozOCBQTXxBfDE6MTY6MzggUE0=',
'PaxConfig':'MToxNjozOCBQTXwyfDB8MnwwfHwxOjE2OjM4IFBN',
'usid':'MToxNjozOCBQTXxoZW54dmkzcWVnc3J3cXpld2lsa2ZwMm18MToxNjozOCBQTQ=='
}
with requests.Session() as s:
r = s.get(url,params=params,headers={"User-agent":"Mozilla/5.0"})
res = s.get(second_url,params=payload,headers={
"User-agent":"Mozilla/5.0",
"Referer":r.url
})
soup = BeautifulSoup(res.text,'lxml')
print(soup)
Run Code Online (Sandbox Code Playgroud)
在上面的脚本中,我已经复制和值粘贴CallFrom,Btype,PaxConfig并usid从开发工具中使用payload。 …
我已经在python中与硒结合使用创建了一个脚本,以从Google Play商店中抓取不同的应用名称,当我执行脚本时,它们都会通过。但是,结果正在转换为我的非英语母语。
如何修改Selenium python绑定中的language选项?
我的尝试(试图更改语言选项,但失败了):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
link = 'https://play.google.com/store'
chrome_options = Options()
chrome_options.add_argument("accept-language=en-US")
with webdriver.Chrome(options=chrome_options) as driver:
driver.get(link)
for item in wait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.details a.title'))):
print(item.text)
Run Code Online (Sandbox Code Playgroud)
我的输出使用的是我的母语,而不是英语。
我已经用 python 编写了一个脚本来登录网站并解析用户名以确保我真的能够登录。使用我在下面尝试过的方式似乎可以让我到达那里。但是,我在脚本中使用了从 chrome 开发工具中获取的硬编码 cookie 来获得成功。
我试过:
import requests
from bs4 import BeautifulSoup
url = 'https://secure.imdb.com/ap/signin?openid.pape.max_auth_age=0&openid.return_to=https%3A%2F%2Fwww.imdb.com%2Fap-signin-handler&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=imdb_pro_us&openid.mode=checkid_setup&siteState=eyJvcGVuaWQuYXNzb2NfaGFuZGxlIjoiaW1kYl9wcm9fdXMiLCJyZWRpcmVjdFRvIjoiaHR0cHM6Ly9wcm8uaW1kYi5jb20vIn0&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0'
signin = 'https://secure.imdb.com/ap/signin'
mainurl = 'https://pro.imdb.com/'
with requests.Session() as s:
res = s.get(url,headers={"User-agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text,"lxml")
payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
payload['email'] = 'some username'
payload['password'] = 'some password'
s.post(signin,data=payload,headers={
"User-agent":"Mozilla/5.0",
"Cookie": 'adblk=adblk_yes; ubid-main=130-2884709-6520735; _msuuid_518k2z41603=95C56F3B-E3C1-40E5-A47B-C4F7BAF2FF5D; _fbp=fb.1.1574621403438.97041399; pa=BCYm5GYAag-hj1CWg3cPXjfv2X6NGPUp6kLguepMku7Yf0W9-iSTjgmVNGmQLwUfJ5XJPHqlh84f%0D%0Agrd2voq0Q7TR_rdXU4T1BJw-1a-DdvCNSVuWSm50IXJDC_H4-wM_Qli_%0D%0A; uu=BCYnANeBBdnuTg3UKEVGDiO203C7KR0AQTdyE9Y_Y70vpd04N5QZ2bD3RwWdMBNMAJtdbRbPZMpG%0D%0AbPpC6vZvoMDzucwsE7pTQiKxY24Gr4_-0ONm7hGKPfPbMwvI1NYzy5ZhTIyIUqeVAQ7geCBiS5NS%0D%0A1A%0D%0A; session-id=137-0235974-9052660; session-id-time=2205351554; session-token=jsvzgJ4JY/TCgodelKegvXcqdLyAy4NTDO5/iEvk90VA8qWWEPJpiiRYAZe3V0EYVFlKq590mXU0OU9XMbAzwyKqXIzPLzKfLf3Cc3k0g/VQNTo6roAEa5IxmOGZjWrJuhkRZ1YgeF5uPZLcatWF1y5PFHqvjaDxQrf2LZbgRXF5N7vacTZ8maK0ciJmQEjh; csm-hit=tb:8HH0DWNBDVSWP881GYKG+s-8HH0DWNBDVSWP881GYKG|1574631571950&t:1574631571952&adb:adblk_yes'
})
r = s.get(mainurl,headers={
"Cookie": 'adblk=adblk_yes; ubid-main=130-2884709-6520735; _msuuid_518k2z41603=95C56F3B-E3C1-40E5-A47B-C4F7BAF2FF5D; _fbp=fb.1.1574621403438.97041399; pa=BCYm5GYAag-hj1CWg3cPXjfv2X6NGPUp6kLguepMku7Yf0W9-iSTjgmVNGmQLwUfJ5XJPHqlh84f%0D%0Agrd2voq0Q7TR_rdXU4T1BJw-1a-DdvCNSVuWSm50IXJDC_H4-wM_Qli_%0D%0A; csm-hit=tb:KV47B1QVKP4DNB3QGY95+b-NM69W1Y35R7ARV0639V5|1574631544432&t:1574631544432&adb:adblk_yes; session-id=137-0235974-9052660; session-id-time=2205351554; session-token="EsIzROiSTmFDfXd5jnBPIBOpYG9jAu7tiWXDF8R52sUw5jS6OjddfOOQB+ytCmq0K3UnXs9wKBvQtkB4aVNsXieVbRcIUrKf3iPnYeJchbOlShMjg+MR+O7IQgPKkw0BKihdYQ1YIl7KQS8VeLxZjtzJ5sj5ocnY72fCKdwq/fGOjfieFYbe9Km3a8h++1GpC738JbwcVdpTG08v1pjhQKifqPQXnqhcyVKhi8CD1qk="; x-main="C1KbtQgFFBAYfwttdRSrU5CpCe@Fn6SPHnBTY6dO2ppimt@u1P1L7G0PueQMn6X3"; at-main=Atza|IwEBICfS3UKNp2mwmbyUPY1QzjXRHMcL6fjv2ND7BDXsZ1G-qDPJKsLJXeU9gJOvRpWsofSpOJCyhnap-bIOWCutU6VMIS9bn3UkNVRP8WFVqrs-CLB5opLbrEx6YxVGQlfaxx54gzuuGO4D30z-AgBpGe64_bn0K1iLOT3P3i7S3nBzvP_0AopwKlbU7SRnE5m21cVfVK7bwbtfZO4cf7DrpGcaHK4dlY5jKHPzNx_AR4ypqsEBFbHon36N1j8foty6wLJhFP1gNCvs24mVCec24TRho5ZXFDYqhLB-dw9V3XY1eq7q1QNgtAdYkDSJ6Mq1nllFu59WqIVs1Y3lLEaxDUExLtCt-VQArpS_hZtZR8C_kevhV01jEhWg8RUQaCdYTMwZHwa778MiEOrrrdGqFnR5; sess-at-main="tWwUfkZLx+mDAPqZo+J6yJlnjqBJvYJ0oVMS6/NcIKQ="; …Run Code Online (Sandbox Code Playgroud) I'm trying to parse all the company names from this webpage. There are around 2431 companies in there. However, the way I've tried below can fetches me 1000 results.
This is what I can see about the number of results in response while going through dev tools:
hitsPerPage: 1000
index: "YCCompany_production"
nbHits: 2431 <------------------------
nbPages: 1
page: 0
Run Code Online (Sandbox Code Playgroud)
How can I get the rest of the results using requests?
I've tried so far:
import requests
url = 'https://45bwzj1sgc-dsn.algolia.net/1/indexes/*/queries?'
params …Run Code Online (Sandbox Code Playgroud) htt(p|ps)当我向https网站发出请求时,在代理中的第二个参数中使用的正确方法是什么?我在下面使用的代理只是一个占位符.
当我尝试这样(它的工作原理):
proxies = {
'https': 'http://79.170.192.143:34394',
}
Run Code Online (Sandbox Code Playgroud)
当我尝试这样(它也有效):
proxies = {
'https': 'https://79.170.192.143:34394',
}
Run Code Online (Sandbox Code Playgroud)
htt(p|ps)代理中的第二个只是占位符http吗?如果我向网站发出请求怎么办?
我已经在python中创建了一个脚本来解析网页中的两个字段- total revenue这很令人担忧date。我关注的字段是javascript加密的。它们在json数组中的页面源中可用。以下脚本可以相应地解析这两个字段。
但是,问题在于该页面中的可见日期与页面源中可用的日期不同。
该网页中的日期是这样的
页面源中的日期是这样的
显然会有一天的变化。
当您单击该选项卡访问该网页后,Quarterly您可以在其中看到结果:
我尝试过:
import re
import json
import requests
url = 'https://finance.yahoo.com/quote/GTX/financials?p=GTX'
res = requests.get(url)
data = re.findall(r'root.App.main[^{]+(.*);',res.text)[0]
jsoncontent = json.loads(data)
container = jsoncontent['context']['dispatcher']['stores']['QuoteSummaryStore']['incomeStatementHistoryQuarterly']['incomeStatementHistory']
total_revenue = container[0]['totalRevenue']['raw']
concerning_date = container[0]['endDate']['fmt']
print(total_revenue,concerning_date)
Run Code Online (Sandbox Code Playgroud)
我得到的结果(以百万为单位的收入):
802000000 2019-06-30
Run Code Online (Sandbox Code Playgroud)
结果我希望得到:
802000000 2019-06-29
Run Code Online (Sandbox Code Playgroud)
当我尝试使用此行情自动AAPL收录器时,我会得到确切的日期,因此不能选择跟踪或添加日期。
如何从该站点获取确切日期?
顺便说一句,我知道如何使用硒来获得它们,所以我只想坚持requests。
我正在尝试使用requests库登录 Instagram 。我成功地使用了以下脚本,但是它不再起作用了。密码字段被加密(手动登录时检查开发工具)。
我试过了 :
import re
import requests
from bs4 import BeautifulSoup
link = 'https://www.instagram.com/accounts/login/'
login_url = 'https://www.instagram.com/accounts/login/ajax/'
payload = {
'username': 'someusername',
'password': 'somepassword',
'enc_password': '',
'queryParams': {},
'optIntoOneTap': 'false'
}
with requests.Session() as s:
r = s.get(link)
csrf = re.findall(r"csrf_token\":\"(.*?)\"",r.text)[0]
r = s.post(login_url,data=payload,headers={
"user-agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
"x-requested-with": "XMLHttpRequest",
"referer": "https://www.instagram.com/accounts/login/",
"x-csrftoken":csrf
})
print(r.status_code)
print(r.url)
Run Code Online (Sandbox Code Playgroud)
我发现使用开发工具:
username: someusername
enc_password: #PWD_INSTAGRAM_BROWSER:10:1592421027:ARpQAAm7pp/etjy2dMjVtPRdJFRPu8FAGILBRyupINxLckJ3QO0u0RLmU5NaONYK2G0jQt+78BBDBxR9nrUsufbZgR02YvR8BLcHS4uN8Gu88O2Z2mQU9AH3C0Z2NpDPpS22uqUYhxDKcYS5cA==
queryParams: {"oneTapUsers":"[\"36990119985\"]"}
optIntoOneTap: false
Run Code Online (Sandbox Code Playgroud)
如何使用请求登录 Instagram?
I'm trying to extract the tabular contents available on a graph in a webpage. The content of those tables are only visible when someone hovers his cursor within the area. One such table is this one.
表格在其中的标题为EPS consensus revisions : last 18 months。
到目前为止,我已经尝试过:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www.marketscreener.com/SUNCORP-GROUP-LTD-6491453/revisions/"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 10)
for items …Run Code Online (Sandbox Code Playgroud)