电报机器人可以读取/访问我或机器人既不是管理员的电报频道吗?
我知道直到去年11月才有可能,但我听说有些人已经这样做了,但到目前为止我无法做到.
我非常感谢您的投入和知识.
Ps任何解决方法都会很棒.
我想从这里下载所有产品的图像.我的蜘蛛看起来像:
from shopclues.items import ImgData
import scrapy
class multipleImages(scrapy.Spider):
name='multipleImages'
start_urls=['http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera',]
def parse (self, response):
for url in response.css('div.products-grid div.grid-product):
yield {
ImgData(image_urls=[url.css('img::attr(src)').extract()])
}
Run Code Online (Sandbox Code Playgroud)
和items.py:
import scrapy
from scrapy.item import Item
class ShopcluesItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
pass
class ImgData(Item):
image_urls=scrapy.Field()
images=scrapy.Field()
Run Code Online (Sandbox Code Playgroud)
但是我在运行蜘蛛时遇到以下错误:
2016-09-29 11:56:19 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/robots.txt> (referer: None)
2016-09-29 11:56:20 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> (referer: None)
2016-09-29 11:56:20 [scrapy] ERROR: …Run Code Online (Sandbox Code Playgroud) 我有一个带有cookie的泡菜,我通过以下命令创建
def doLogin(driver):
#do login stuff
pickle.dump(driver.get_cookies(), open("cookies.pkl", "wb"))
Run Code Online (Sandbox Code Playgroud)
我有示例代码来获取cookie
driver = webdriver.PhantomJS()
self.doLogin(driver)
driver.delete_all_cookies()
for cookie in pickle.load(open("cookies.pkl", "rb")):
driver.add_cookie(cookie)
Run Code Online (Sandbox Code Playgroud)
我可以看到它很好地创建了cookie,因为如果我print没关系,add_cookie()正在做阴影的事情
这给出了以下例外
WebDriverException:消息:{"errorMessage":"无法设置Cookie","请求":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection": "接近", "内容长度": "219", "内容类型": "应用/ JSON;字符集= UTF-8", "主机": "127.0.0.1:50738","User-Agent": "Python-urllib/2.7"},"httpVersion":"1.1","method":"POST","post":"{\"sessionId \":\"391db430-154a-11e6-8a0a-ef59204729f5 \" ,"cookie":{\"domain \":\"secretWebsite \",\"name \":\"JSESSIONID \",\"value \":\"8332B6099FA3BBBC82893D4C7E6E918B \",\"path \": \"也是一个秘密\",\"httponly \":false,\"secure \":true}}","url":"/ cookie","urlParsed":{"anchor":"","查询":"", "文件": "曲奇", "目录": "/", "路径": "/曲奇", "相对": "/曲奇", "口": "", "主机": "", "密码": "", "用户": "", "用户信息": "", "权威": "", "协议": "", "源": "/饼干", "queryKey" :{},"chunks":["cookie"]},"urlOriginal":"/ session/391db430-154a-11e6-8a0a-ef59204729f5/cookie"}}屏幕截图:可通过屏幕获得
为了工作,我需要的是将webdriver更改为Firefox
这是一个已知的PhantomJS问题吗?
我有一个UTF-8编码的字符串,它来自包含字符的其他地方\xc3\x85lesund(文字反斜杠,文字“ x”,文字“ c”等)。
打印它会输出以下内容:
\xc3\x85lesund
Run Code Online (Sandbox Code Playgroud)
我想将其转换为字节变量:
b'\xc3\x85lesund'
Run Code Online (Sandbox Code Playgroud)
为了能够编码:
'Ålesund'
Run Code Online (Sandbox Code Playgroud)
我怎样才能做到这一点?我正在使用python 3.4。
python ×4
web-scraping ×2
cookies ×1
encoding ×1
phantomjs ×1
python-3.x ×1
scrapy ×1
selenium ×1
telegram ×1
telegram-bot ×1