小编bar*_*rny的帖子

使用 Python 获取在 Instagram 上评论或喜欢帖子的用户列表

我Instaloader在 Python 中发现了允许抓取Instagram 个人资料的库。它非常好，但我无法找到一种方法来获取在 Instagram 上发表评论或喜欢帖子的用户列表。

我已经查看了所有文档，但找不到答案。这是文档：https : //instaloader.github.io/as-module.html

这是我拥有的代码：

import instaloader

L = instaloader.Instaloader() #nalazenje stvari sa instagrama

profile = instaloader.Profile.from_username(L.context, 'jlo') #daj mi pratioce od datog user-a
print(profile.get_posts())

for post in profile.get_posts():

    post_likes = post.get_likes()
    post_comments = post.get_comments()

    print(post_likes)  # post_likes object
    print(post_comments) # # post_comments object

    # post_likes.name  post_likes.username  post_likes.user    DOES NOT WORK
    # post_comments.name  post_comments.username  post_comments.user    DOES NOT WORK

Run Code Online (Sandbox Code Playgroud)

python instagram instaloader

tag*_*aga

2021 06-01

1
推荐指数

1
解决办法

4765
查看次数

没有名为“BeautifulSoup”的模块？

我正在使用 Python3 并在我的 Mac 上下载了 BeautifulSoup，但它一直显示“没有名为 bs4 的模块”或“没有名为 BeautifulSoup 的模块”。我该怎么办？

这是 Coursera 上 Web Scraping 的 Py4E 的作业。

from bs4 import BeautifulSoup

Run Code Online (Sandbox Code Playgroud)

没有名为 bs4 的模块

$pip install BeautifulSoup

Run Code Online (Sandbox Code Playgroud)

无效的语法

import BeautifulSoup from BeautifulSoup

Run Code Online (Sandbox Code Playgroud)

没有名为 BeautifulSoup 的模块

python beautifulsoup

作者

2021 02-11

1
推荐指数

1
解决办法

160
查看次数

Puppeteer 如何检查页面上是否存在类

我正在尝试使用 puppeteer 来检查网页上是否存在类。例如，假设您想抓取某些数据，并且您知道这些数据存储在某个类中。要获取数据，您需要使用类名来获取数据。这是我尝试使用的代码。这是行不通的。

        let pageClicked = document.querySelector('.classIAmTryingToFind')

        if(pageClicked){
            console.log('False')
            await browser.close() 
        }else{
            console.log('True')
            await browser.close() 
        }

Run Code Online (Sandbox Code Playgroud)

当我运行代码时出现此错误。

UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Target closed.

Run Code Online (Sandbox Code Playgroud)

javascript puppeteer queryselector

Nei*_*ik0

2021 03-20

1
推荐指数

1
解决办法

5687
查看次数

通过 scrapy 在以逗号分隔的一列中提取 Woocommerce 产品图像

我正在使用 scrapy 创建一个数据抓取器。要提取 woo-commerce 产品图像，我使用此命令

'img': response.css('figure.woocommerce-product-gallery__image a').attrib['href'],

Run Code Online (Sandbox Code Playgroud)

产品链接：https://royalprint.pk/product/name-print-superhero-sweatshirt-011/

但它只提取 csv 中的一个 img url

我想将 Woocommerce 产品图片抓取到以逗号分隔的一列中。

请帮忙。问候

python scrapy web-scraping python-3.x woocommerce

Han*_*mes

2021 02-20

1
推荐指数

1
解决办法

733
查看次数

在无头服务器云上运行的特定网站上的 Puppeteer 超时

我制作了一个在我的计算机上运行良好的 node.js 网络抓取代码，但是，当我部署到运行 Debian 的 Google Cloud VM 实例时，它会返回特定网站的超时错误。我已经为 puppeteer 尝试了许多不同的设置，但似乎都不起作用。我相信当我从谷歌云服务器运行时，我试图抓取的网站会阻止我的代码，但当我从我的计算机运行时不会。抓取部分在我的电脑上运行良好。Puppeteer 找到 HTML 标签并检索信息。

const puppeteer = require('puppeteer');
const GoogleSpreadsheet = require('google-spreadsheet');
const { promisify } = require('util');
const credentials = require('./credentials.json');

async function main(){

    const scrapCopasa = await scrapCopasaFunction();

    console.log('Done!')

}



async function scrapCopasaFunction() {

    const browser = await puppeteer.launch({
        args: ['--no-sandbox'], 
    });
    const page = await browser.newPage();
    //await page.setDefaultNavigationTimeout(0);
    //await page.setViewport({width: 1366, height: 768});
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');
    await page.goto('http://www.copasa.com.br/wps/portal/internet/abastecimento-de-agua/nivel-dos-reservatorios');
    //await …

Run Code Online (Sandbox Code Playgroud)

javascript node.js web-scraping google-cloud-platform puppeteer

grc*_*grc

2021 01-29

1
推荐指数

1
解决办法

760
查看次数

从 forexfactory.com 抓取数据

我是Python的初学者。在这个问题中，他们从外汇工厂提取数据。当时的解决方案是按照他们的逻辑工作，找到 table soup.find('table', class_="calendar__table")。但是，现在网络结构已经改变，html table is removed and converted to some javascript format. 所以，这个解决方案现在找不到任何东西。

 import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.forexfactory.com/calendar.php?day=nov18.2016')
soup = BeautifulSoup(r.text, 'lxml')

calendar_table = soup.find('table', class_="calendar__table")

print(calendar_table)


# for row in calendar_table.find_all('tr', class_=['calendar__row calendar_row','newday']):
#     row_data = [td.get_text(strip=True) for td in row.find_all('td')]
#     print(row_data)

Run Code Online (Sandbox Code Playgroud)

由于我是初学者，我不知道该怎么做。那么，我该如何抓取数据呢？如果您给我任何提示，这对我会有帮助。非常感谢您阅读我的帖子。

python selenium beautifulsoup web-scraping python-3.x

Phi*_*Phi

2021 12-04

1
推荐指数

2
解决办法

3852
查看次数

如何计算只有 1 个字符的行数？

我试图只打印只有 1 个字符的行数。

我有一个 20 万行的文件，其中一些行只有一个字符（任何类型的字符）

由于我没有经验，我在谷歌上搜索了很多并抓取了文档，并从不同来源提出了这个混合解决方案：

awk -F^\w$ '{print NF-1}' myfile.log

Run Code Online (Sandbox Code Playgroud)

我原以为会过滤带有单个字符的行，这似乎有效

^\w$

Run Code Online (Sandbox Code Playgroud)

但是我没有得到包含单个字符的行数。而是这样的：

linux bash zsh

deu*_*euq

2021 04-22

1
推荐指数

1
解决办法

55
查看次数

没有找到 Python3-Requests 的连接适配器

我在 python3 中使用带有请求包的漂亮汤来进行网页抓取。这是我的代码。

import csv  
from datetime import datetime
import requests
import csv  
from datetime import datetime 
from bs4 import BeautifulSoup


quote_page = ['http://10.69.161.179:8080'];

data = []

page = requests.get(quote_page)

soup = BeautifulSoup(page.content,'html.parser')

name_box = soup.find('div', attrs={'class':'caption span10'})

name= name_box.text.strip() #strip() is used to remove starting and ending

print(name);

data.append(name)

    

with open('sample.csv', 'a') as csv_file:  
    writer = csv.writer(csv_file)
    writer.writerow([name])

print ("Success");

Run Code Online (Sandbox Code Playgroud)

当我执行上述代码时，出现以下错误。

Traceback (most recent call last):
  File "first_try.py", line 21, in <module>
    page = requests.get(quote_page);
  File "C:\Python\lib\site-packages\requests-2.13.0-py3.6.egg\requests\api.py", line 70, …

Run Code Online (Sandbox Code Playgroud)

python-3.x python-requests

Vic*_*cky

2021 02-11

0
推荐指数

1
解决办法

2万
查看次数

如何在 python 中制作 chrome 扩展？

我已经浏览过论坛，但所有解决方案都很旧，并且某些软件包已被弃用。

我想创建 chrome 扩展并使用 python 包，如 scipy、正则表达式和抓取库。

我是一名 python 开发人员，不懂 javascript。

有什么方法可以完全创建 chrome 扩展或大部分在 python 中使用？

python google-chrome-extension google-chrome-devtools

Has*_*yub

2021 03-19

0
推荐指数

1
解决办法

4255
查看次数

如何从文本文件中删除重复且包含某些单词的行？

我正在尝试从抓取的数据中删除重复的行和包含某些单词的行。我搜索了各种代码，但它们不起作用:(

这是代码。只有第一部分有效，即删除重复行：

openFile = open("links.txt", "r") 
writeFile = open("updatedfile.txt", "w") 
#Store traversed lines
tmp = set() 
for txtLine in openFile: 
#Check new line
    if txtLine not in tmp: 
        writeFile.write(txtLine) 
#Add new traversed line to tmp 
        tmp.add(txtLine)         
openFile.close() 
writeFile.close()

sleep(5)

with open("updatedfile.txt", "r") as fp:
    lines = fp.readlines()

with open("updatedfile.txt", "w") as fp:
    for line in lines:
        if line.strip("\n") != "search":
            fp.write(line)

Run Code Online (Sandbox Code Playgroud)

这是 links.txt 文件

https://twitter.com/search?q=%23BTC&src=hashtag_click
https://twitter.com/search?q=%23ADA&src=hashtag_click
https://twitter.com/search?q=%23LTC&src=hashtag_click
https://twitter.com/search?q=%23CAKE&src=hashtag_click
https://twitter.com/Marie62943337
https://twitter.com/Marie62943337
https://twitter.com/Fathur0501
https://twitter.com/Fathur0501
https://twitter.com/BogdanMar93
https://twitter.com/BogdanMar93
https://t.[spaced because body cannot contain short …

Run Code Online (Sandbox Code Playgroud)

python twitter selenium

Cas*_*no

2021 12-15

0
推荐指数

1
解决办法

80
查看次数