小编Mar*_*dir的帖子

写正则表达式的更好方法

所以,我一直试图写出一个正则表达式,比如一个阻值,它包含一定数量的数字,最多只有一个字母,但总是一定数量的字符总数(让我们用四个例子) - 字符电阻代码).

首先,我可以做,'\d*[RKM]\d*'但这将允许类似的东西'R'.

此外,我可以做类似的事情'[\dRKM]{4}',但这将允许像'RRR4'我想要的那样的东西.

'\d{1,4}[Rr]\d{0,3} | ([RKM]\d{3}) | (\d{4})'虽然更具体,但仍然允许'1234R567'不是四个字符.

所以基本上,是否有更紧凑的写作方式'[RKM]\d\d\d | \d[RKM]\d\d | \d\d[RKM]\d | \d\d\d[RKM] | \d\d\d\d'

regex

8
推荐指数
1
解决办法
128
查看次数

酸洗Selenium Webdriver对象

我想序列化并存储一个selenium webdriver对象,然后我可以在我的代码中的其他地方使用它.我正在尝试使用泡菜来做到这一点.如果有另一种方法来保存webdriver对象的状态,那么我可以在以后再次提起它,这很棒(我不能只是重新加载网址,因为我正在查看的网站是javascript-heavy和当前页面取决于我到目前为止点击的内容).

目前,我有这样的代码.

import pickle
from selenium import webdriver

d = webdriver.PhantomJS()
d.get(url)
d.find_element_by_xpath(xpath).click()
p = pickle.dumps(d, pickle.HIGHEST_PROTOCOL)
# Stuff happens here.
new_driver = pickle.loads(p)
print new_driver.page_source.encode('utf-8', 'ignore')
Run Code Online (Sandbox Code Playgroud)

当我运行它时,我得到以下错误(我打印时发生错误,而不是之前):

    return self.driver.page_source.encode('utf-8', 'ignore')
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 436, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 163, in execute
    response = self.command_executor.execute(driver_command, params)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
    return self._request(url, method=command_info[0], data=data)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 396, in _request
    response = opener.open(request)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
File …
Run Code Online (Sandbox Code Playgroud)

python selenium serialization pickle selenium-webdriver

5
推荐指数
2
解决办法
2056
查看次数

如何在 Selenium 的下拉列表中选择项目

首先,我一直在尝试从这个网页获取下拉列表:http : //solutions.3m.com/wps/portal/3M/en_US/Interconnect/Home/Products/ProductCatalog/Catalog/?PC_Z7_RJH9U5230O73D0ISNF9B3C3SI1000000_nid7WiJFRF7FPCX8F9100000000

这是我的代码:

    import urllib2
    from bs4 import BeautifulSoup
    import re
    from pprint import pprint

    from selenium import webdriver

    url = 'http://solutions.3m.com/wps/portal/3M/en_US/Interconnect/Home/Products/ProductCatalog/Catalog/?PC_Z7_RJH9U5230O73D0ISNF9B3C3SI1000000_nid=RFCNF5FK7WitWK7G49LP38glNZJXPCDXLDbl'

    element_xpath = '//*[@id="Component1"]'
    driver = webdriver.PhantomJS()
    driver.get(url)
    element = driver.find_element_by_xpath(element_xpath)
    element_xpath = '/option[@value="02"]'
    all_options = element.find_elements_by_tag_name("option")
    for option in all_options:
        print("Value is: %s" % option.get_attribute("value"))
        option.click()
    source = driver.page_source.encode('utf-8', 'ignore')
    driver.quit()

    source = str(source)

    soup = BeautifulSoup(source, 'html.parser')

    print soup
Run Code Online (Sandbox Code Playgroud)

打印出来的是这样的:

Traceback (most recent call last):
  File "../../../../test.py", line 58, in <module>
Value is: XX
    main() …
Run Code Online (Sandbox Code Playgroud)

python selenium select selenium-webdriver

3
推荐指数
1
解决办法
1万
查看次数

Linux Top命令,包含20多个命令

我想使用top以按进程名称监视多个进程.我已经知道做了$ top -p $(pgrep -d ',' <pattern>)top只限制了20个pid.有没有办法允许20多个pids?

我是否必须结合pswatch获得类似的结果?

linux bash top-command

3
推荐指数
1
解决办法
597
查看次数