我正在尝试学习如何从页面自动获取网址.在下面的代码中,我试图获取网页的标题:
import urllib.request
import re
url = "http://www.google.com"
regex = r'<title>(,+?)</title>'
pattern = re.compile(regex)
with urllib.request.urlopen(url) as response:
html = response.read()
title = re.findall(pattern, html)
print(title)
Run Code Online (Sandbox Code Playgroud)
我得到了这个意想不到的错误:
Traceback (most recent call last):
File "path\to\file\Crawler.py", line 11, in <module>
title = re.findall(pattern, html)
File "C:\Python33\lib\re.py", line 201, in findall
return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
我想使用python在屏幕上获取活动窗口.
例如,路由器的管理界面,您输入用户名和密码作为admin
我希望使用python捕获该管理界面以自动输入用户名和密码.
为了做到这一点,我需要进口什么?
我正在使用不同的程序(ffmpeg)来获取下载的YouTube视频的长度,以便随机化视频中的特定点.但是,当我尝试执行此代码时,我收到此错误:
def grabTimeOfDownloadedYoutubeVideo(youtubeVideo):
process = subprocess.Popen(['/usr/local/bin/ffmpeg', '-i', youtubeVideo], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
stdout, stderr = process.communicate()
matches = str(re.search(b"Duration:\s{1}(?P<hours>\d+?):(?P<minutes>\d+?):(?P<seconds>\d+\.\d+?),", stdout, re.DOTALL).groupdict()).encode()
print(matches)
hours = int(matches['hours'])
minutes = int(matches['minutes'])
seconds = int(matches['seconds'])
total = 0
total += 60 * 60 * hours
total += 60 * minutes
total += seconds
print(total)
Run Code Online (Sandbox Code Playgroud)
匹配变量打印出来:
b"{'minutes': b'04', 'hours': b'00', 'seconds': b'24.94'}"
Run Code Online (Sandbox Code Playgroud)
因此,所有输出在其开头都带有'b'.如何删除'b'并获取号码?
完整的错误消息:
Traceback (most recent call last):
File "bot.py", line 87, in <module>
grabTimeOfDownloadedYoutubeVideo("videos/1.mp4")
File "bot.py", line 77, in grabTimeOfDownloadedYoutubeVideo
hours = int(matches['hours'])
TypeError: byte …Run Code Online (Sandbox Code Playgroud)