我尝试使用 TOR 代理进行抓取,并且在一个线程中一切正常,但这很慢。我尝试做一些简单的事情:
def get_new_ip():
with Controller.from_port(port = 9051) as controller:
controller.authenticate(password="password")
controller.signal(Signal.NEWNYM)
time.sleep(controller.get_newnym_wait())
def check_ip():
get_new_ip()
session = requests.session()
session.proxies = {'http': 'socks5h://localhost:9050', 'https': 'socks5h://localhost:9050'}
r = session.get('http://httpbin.org/ip')
r.text
with Pool(processes=3) as pool:
for _ in range(9):
pool.apply_async(check_ip)
pool.close()
pool.join()
Run Code Online (Sandbox Code Playgroud)
当我运行它时,我看到输出:
{"origin": "95.179.181.1, 95.179.181.1"}
{"origin": "95.179.181.1, 95.179.181.1"}
{"origin": "95.179.181.1, 95.179.181.1"}
{"origin": "151.80.53.232, 151.80.53.232"}
{"origin": "151.80.53.232, 151.80.53.232"}
{"origin": "151.80.53.232, 151.80.53.232"}
{"origin": "145.239.169.47, 145.239.169.47"}
{"origin": "145.239.169.47, 145.239.169.47"}
{"origin": "145.239.169.47, 145.239.169.47"}
Run Code Online (Sandbox Code Playgroud)
为什么会发生这种情况,我如何为每个线程提供自己的 IP?顺便说一下,我尝试过像 TorRequests、TorCtl 这样的库,结果是一样的。
我知道TOR在发布新IP之前似乎有延迟,但是为什么相同的IP会进入不同的进程?
tor multiprocessing python-3.x python-requests python-asyncio
async def rss_downloader(rss):\n global counter\n async with download_limit:\n headers = {\n \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36\'\n }\n try:\n response = await httpx.get(rss, headers=headers, verify=False)\n if response.status_code == 200:\n r_text = response.text\n await downloaded_rss.put({\'url\': rss, \'feed\': r_text})\n else:\n counter += 1\n print(f\'\xe2\x84\x96{counter} - {response.status_code} - {rss}\')\n except (\n ConnectTimeout, ConnectionClosed\n ):\n not_found_rss.append(rss)\n except Exception:\n not_found_rss.append(rss)\n logging.exception(f\'{rss}\')\n\n\nasync def main():\n parser_task = asyncio.create_task(parser_queue())\n tasks = [\n asyncio.create_task(rss_downloader(item[\'url\'])) for item in db[config[\'mongodb\'][\'channels_collection\']].find({\'url\': {\'$ne\': \'No RSS\'}})\n ]\n …Run Code Online (Sandbox Code Playgroud) 消息发送功能:
template = {
'other':
'Text.'
'More Text.'
'Much more text.'
}
def send_message(driver, answer):
driver.find_element_by_xpath('XPATH').click()
action = ActionChains(driver)
action.send_keys(answer)
action.send_keys(Keys.RETURN)
action.perform()
Run Code Online (Sandbox Code Playgroud)
根据从 接收到的消息template,采用必要的答案并将其send_message()作为answer参数传递给。如果您按原样发送消息,那么在 WhatsApp 中它会排成一行:
Text.More text.Much more text.
如果添加,\n则每一行都将发送一条新消息,即:
如何在一封邮件中发送带有换行符的文本?
我有一个异步函数来从站点获取数据:
async def get_matches_info(url):
async with aiohttp.ClientSession() as session:
try:
async with session.get(url, proxy=proxy) as response:
...
...
...
...
except:
print('ERROR GET URL: ', url)
print(traceback.print_exc())
Run Code Online (Sandbox Code Playgroud)
我有一个大约 200 个链接的列表。几乎总是一切正常,但有时我会收到以下错误:
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\aiohttp\connector.py", line 924, in _wrap_create_connection
await self._loop.create_connection(*args, **kwargs))
File "C:\Python37\lib\asyncio\base_events.py", line 986, in create_connection
ssl_handshake_timeout=ssl_handshake_timeout)
File "C:\Python37\lib\asyncio\base_events.py", line 1014, in _create_connection_transport
await waiter
ConnectionResetError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "parser.py", line 90, …Run Code Online (Sandbox Code Playgroud) 例如,我有 2 个列表:
a = ['podcast', 'podcasts', 'history', 'gossip', 'finance', 'business', 'kids', 'motivation', 'news', 'investing']
b = ['podcast', 'history', 'gossip', 'finance', 'kids', 'motivation', 'investing']
Run Code Online (Sandbox Code Playgroud)
我想在列表a中查找不在列表中的项目b
我尝试这样做:
c = []
for _ in a:
if _ not in b:
c.append(_)
Run Code Online (Sandbox Code Playgroud)
最初,我有一个带有关键字的文本文件:
podcast
podcasts
history
gossip
finance
Run Code Online (Sandbox Code Playgroud)
同样对于几乎所有关键字,我都有包含信息的文本文件:
podcast.txt
podcasts.txt
history.txt
Run Code Online (Sandbox Code Playgroud)
我需要找到我丢失的文件我加载了这样的关键字列表:
podcast
podcasts
history
gossip
finance
Run Code Online (Sandbox Code Playgroud) 我尝试使用以下命令安装 pyinstallerpip install pyinstaller并出现错误:
C:\Users\kshnk>pip install pyinstaller
Collecting pyinstaller
Downloading https://files.pythonhosted.org/packages/e2/c9/0b44b2ea87ba36395483a672fddd07e6a9cb2b8d3c4a28d7ae76c7e7e1e5/PyInstaller-3.5.tar.gz (3.5MB)
|????????????????????????????????| 3.5MB 2.2MB/s
Installing build dependencies ... error
ERROR: Command errored out with exit status 1:
command: 'c:\python37\python.exe' 'c:\python37\lib\site-packages\pip' install --ignore-installed --no-user --prefix 'C:\Users\kshnk\AppData\Local\Temp\pip-build-env-11qk42u_\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel
cwd: None
Complete output (24 lines):
Traceback (most recent call last):
File "c:\python37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\python37\lib\site-packages\pip\__main__.py", line 16, in …Run Code Online (Sandbox Code Playgroud) 我有一个类似以下的列表:
some_list = [
'fullname',
'John Kroes',
'email',
'johnkroes1978@example.com',
'password',
'3kKb5HYag'
]
Run Code Online (Sandbox Code Playgroud)
如何将其转换为如下所示的词典?
some_dict = {
'fullname': 'John Kroes',
'email': 'johnkroes1978@example.com',
'password': '3kKb5HYag'
}
Run Code Online (Sandbox Code Playgroud)
我尝试使用嵌套循环来完成此操作,但是我不知道如何先记下键,然后记下其值。
我还在循环之前创建了一个临时变量,然后在其中写下了当前循环元素,并尝试在try中将其用作值。
我正在尝试使用队列触发器创建一个函数,这是function.json:
"scriptFile": "__init__.py",
"bindings": [
{
"name": "CraigslistItemParser",
"type": "queueTrigger",
"direction": "in",
"queueName": "craigslist",
"connection": "DefaultEndpointsProtocol=https;AccountName=NAME;AccountKey=KEY;EndpointSuffix=core.windows.net"
}
]
}
Run Code Online (Sandbox Code Playgroud)
在控制台日志中部署该函数时,出现错误:
The 'CraigslistItemParser' function is in error: Microsoft.Azure.WebJobs.Host: Error indexing method 'Functions.CraigslistItemParser'. Microsoft.Azure.WebJobs.Extensions.Storage: Storage account connection string 'DefaultEndpointsProtocol=https;AccountName=NAME;AccountKey=KEY;EndpointSuffix=core.windows.net' does not exist. Make sure that it is a defined App Setting.
Run Code Online (Sandbox Code Playgroud)
什么是应用程序设置,我在任何地方都找不到它们?
python-3.x ×7
azure ×1
azure-queues ×1
pyinstaller ×1
python ×1
selenium ×1
tor ×1
whatsapp ×1