How to find all the JavaScript requests made from my browser when I'm accessing a site

Question

How to find all the JavaScript requests made from my browser when I'm accessing a site

Ali*_*Ali 4 javascript python python-2.7 python-3.x python-requests

I want to scrap the contents of LinkedIn using requests and bs4 but I'm facing a problem with the JavaScript that is loading the page after I sign in(I don't get the home page directly), I don't wanna use Selenium

here is my code

import requests
from bs4 import BeautifulSoup

class Linkedin():
    def __init__(self, url ):
        self.url = url
        self.header = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) "
                                 "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"}

    def saveRsulteToHtmlFile(self, nameOfFile=None):
        if nameOfFile == None:
            nameOfFile ="Linkedin_page"
        with open(nameOfFile+".html", "wb") as file:
            file.write(self.response.content)

    def getSingInPage(self):
        self.sess = requests.Session()
        self.response = self.sess.get(self.url, headers=self.header)
        soup = BeautifulSoup(self.response.content, "html.parser")
        self.csrf = soup.find(attrs={"name" : "loginCsrfParam"})["value"]

    def connecteToMyLinkdin(self):
        self.form_data = {"session_key": "myemail@mail.com",
                     "loginCsrfParam": self.csrf,
                     "session_password": "mypassword"}
        self.url = "https://www.linkedin.com/uas/login-submit"
        self.response = self.sess.post(self.url, headers=self.header, data=self.form_data)


    def getAnyPage(self,url):
        self.response = self.sess.get(url, headers=self.header)




url = "https://www.linkedin.com/"

likedin_page = Linkedin(url)
likedin_page.getSingInPage()
likedin_page.connecteToMyLinkdin() #I'm connected but java script still loading 
likedin_page.getAnyPage("https://www.linkedin.com/jobs/")
likedin_page.saveRsulteToHtmlFile()

Run Code Online (Sandbox Code Playgroud)

I want help to pass the javascript loads without using Selenium...

Answer 1

Kra*_*rab 7

尽管从技术上讲可以模拟来自Python的所有调用，但在像LinkedIn这样的动态页面上，我认为这将是非常乏味且脆弱的。

无论如何，您需要先在浏览器中打开“开发人员工具”，然后再打开LinkedIn，然后查看访问量。您可以过滤来自Javascript的请求（在Firefox中，过滤器称为XHR）。

然后，您将在代码中模拟必要/有趣的请求。好处是服务器通常将结构化数据返回给Javascript，例如JSON。因此，您不需要做太多的HTML解析。

如果您发现这种方式进展不大（这实际上取决于特定的站点），那么您可能必须使用Selenium或其他替代方法，例如：

https://robotframework.org/
https://miyakogi.github.io/pyppeteer/（Puppeteer到Python的端口）

归档时间：	6 年前
查看次数：	230 次
最近记录：	6 年前