所以,我一直试图写出一个正则表达式,比如一个阻值,它包含一定数量的数字,最多只有一个字母,但总是一定数量的字符总数(让我们用四个例子) - 字符电阻代码).
首先,我可以做,'\d*[RKM]\d*'但这将允许类似的东西'R'.
此外,我可以做类似的事情'[\dRKM]{4}',但这将允许像'RRR4'我想要的那样的东西.
'\d{1,4}[Rr]\d{0,3} | ([RKM]\d{3}) | (\d{4})'虽然更具体,但仍然允许'1234R567'不是四个字符.
所以基本上,是否有更紧凑的写作方式'[RKM]\d\d\d | \d[RKM]\d\d | \d\d[RKM]\d | \d\d\d[RKM] | \d\d\d\d'?
我想序列化并存储一个selenium webdriver对象,然后我可以在我的代码中的其他地方使用它.我正在尝试使用泡菜来做到这一点.如果有另一种方法来保存webdriver对象的状态,那么我可以在以后再次提起它,这很棒(我不能只是重新加载网址,因为我正在查看的网站是javascript-heavy和当前页面取决于我到目前为止点击的内容).
目前,我有这样的代码.
import pickle
from selenium import webdriver
d = webdriver.PhantomJS()
d.get(url)
d.find_element_by_xpath(xpath).click()
p = pickle.dumps(d, pickle.HIGHEST_PROTOCOL)
# Stuff happens here.
new_driver = pickle.loads(p)
print new_driver.page_source.encode('utf-8', 'ignore')
Run Code Online (Sandbox Code Playgroud)
当我运行它时,我得到以下错误(我打印时发生错误,而不是之前):
return self.driver.page_source.encode('utf-8', 'ignore')
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 436, in page_source
return self.execute(Command.GET_PAGE_SOURCE)['value']
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 163, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
return self._request(url, method=command_info[0], data=data)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 396, in _request
response = opener.open(request)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File …Run Code Online (Sandbox Code Playgroud) 首先,我一直在尝试从这个网页获取下拉列表:http : //solutions.3m.com/wps/portal/3M/en_US/Interconnect/Home/Products/ProductCatalog/Catalog/?PC_Z7_RJH9U5230O73D0ISNF9B3C3SI1000000_nid7WiJFRF7FPCX8F9100000000
这是我的代码:
import urllib2
from bs4 import BeautifulSoup
import re
from pprint import pprint
from selenium import webdriver
url = 'http://solutions.3m.com/wps/portal/3M/en_US/Interconnect/Home/Products/ProductCatalog/Catalog/?PC_Z7_RJH9U5230O73D0ISNF9B3C3SI1000000_nid=RFCNF5FK7WitWK7G49LP38glNZJXPCDXLDbl'
element_xpath = '//*[@id="Component1"]'
driver = webdriver.PhantomJS()
driver.get(url)
element = driver.find_element_by_xpath(element_xpath)
element_xpath = '/option[@value="02"]'
all_options = element.find_elements_by_tag_name("option")
for option in all_options:
print("Value is: %s" % option.get_attribute("value"))
option.click()
source = driver.page_source.encode('utf-8', 'ignore')
driver.quit()
source = str(source)
soup = BeautifulSoup(source, 'html.parser')
print soup
Run Code Online (Sandbox Code Playgroud)
打印出来的是这样的:
Traceback (most recent call last):
File "../../../../test.py", line 58, in <module>
Value is: XX
main() …Run Code Online (Sandbox Code Playgroud) 我想使用top以按进程名称监视多个进程.我已经知道做了$ top -p $(pgrep -d ',' <pattern>)但top只限制了20个pid.有没有办法允许20多个pids?
我是否必须结合ps并watch获得类似的结果?