小编Mar*_*dir的帖子

写正则表达式的更好方法

所以,我一直试图写出一个正则表达式,比如一个阻值,它包含一定数量的数字,最多只有一个字母,但总是一定数量的字符总数(让我们用四个例子) - 字符电阻代码).

首先,我可以做,'\d*[RKM]\d*'但这将允许类似的东西'R'.

此外,我可以做类似的事情'[\dRKM]{4}',但这将允许像'RRR4'我想要的那样的东西.

'\d{1,4}[Rr]\d{0,3} | ([RKM]\d{3}) | (\d{4})'虽然更具体,但仍然允许'1234R567'不是四个字符.

所以基本上,是否有更紧凑的写作方式'[RKM]\d\d\d | \d[RKM]\d\d | \d\d[RKM]\d | \d\d\d[RKM] | \d\d\d\d'？

regex

Mar*_*dir

lucky-day

8
推荐指数

1
解决办法

128
查看次数

酸洗Selenium Webdriver对象

我想序列化并存储一个selenium webdriver对象,然后我可以在我的代码中的其他地方使用它.我正在尝试使用泡菜来做到这一点.如果有另一种方法来保存webdriver对象的状态,那么我可以在以后再次提起它,这很棒(我不能只是重新加载网址,因为我正在查看的网站是javascript-heavy和当前页面取决于我到目前为止点击的内容).

目前,我有这样的代码.

import pickle
from selenium import webdriver

d = webdriver.PhantomJS()
d.get(url)
d.find_element_by_xpath(xpath).click()
p = pickle.dumps(d, pickle.HIGHEST_PROTOCOL)
# Stuff happens here.
new_driver = pickle.loads(p)
print new_driver.page_source.encode('utf-8', 'ignore')

Run Code Online (Sandbox Code Playgroud)

当我运行它时,我得到以下错误(我打印时发生错误,而不是之前):

    return self.driver.page_source.encode('utf-8', 'ignore')
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 436, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 163, in execute
    response = self.command_executor.execute(driver_command, params)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
    return self._request(url, method=command_info[0], data=data)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 396, in _request
    response = opener.open(request)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
File …

Run Code Online (Sandbox Code Playgroud)

python selenium serialization pickle selenium-webdriver

Mar*_*dir

2014 08-09

5
推荐指数

2
解决办法

2056
查看次数

如何在 Selenium 的下拉列表中选择项目

首先，我一直在尝试从这个网页获取下拉列表：http : //solutions.3m.com/wps/portal/3M/en_US/Interconnect/Home/Products/ProductCatalog/Catalog/?PC_Z7_RJH9U5230O73D0ISNF9B3C3SI1000000_nid7WiJFRF7FPCX8F9100000000

这是我的代码：

    import urllib2
    from bs4 import BeautifulSoup
    import re
    from pprint import pprint

    from selenium import webdriver

    url = 'http://solutions.3m.com/wps/portal/3M/en_US/Interconnect/Home/Products/ProductCatalog/Catalog/?PC_Z7_RJH9U5230O73D0ISNF9B3C3SI1000000_nid=RFCNF5FK7WitWK7G49LP38glNZJXPCDXLDbl'

    element_xpath = '//*[@id="Component1"]'
    driver = webdriver.PhantomJS()
    driver.get(url)
    element = driver.find_element_by_xpath(element_xpath)
    element_xpath = '/option[@value="02"]'
    all_options = element.find_elements_by_tag_name("option")
    for option in all_options:
        print("Value is: %s" % option.get_attribute("value"))
        option.click()
    source = driver.page_source.encode('utf-8', 'ignore')
    driver.quit()

    source = str(source)

    soup = BeautifulSoup(source, 'html.parser')

    print soup

Run Code Online (Sandbox Code Playgroud)

打印出来的是这样的：

Traceback (most recent call last):
  File "../../../../test.py", line 58, in <module>
Value is: XX
    main() …

Run Code Online (Sandbox Code Playgroud)

python selenium select selenium-webdriver

Mar*_*dir

2014 06-12

3
推荐指数

1
解决办法

1万
查看次数

Linux Top命令,包含20多个命令

我想使用top以按进程名称监视多个进程.我已经知道做了$ top -p $(pgrep -d ',' <pattern>)但top只限制了20个pid.有没有办法允许20多个pids？

我是否必须结合ps并watch获得类似的结果？

linux bash top-command

Mar*_*dir

lucky-day

3
推荐指数

1
解决办法

597
查看次数

标签统计

python ×2

selenium ×2

selenium-webdriver ×2

bash ×1

linux ×1

pickle ×1

regex ×1

select ×1

serialization ×1

top-command ×1

写正则表达式的更好方法

酸洗Selenium Webdriver对象

如何在 Selenium 的下拉列表中选择项目

Linux Top命令,包含20多个命令

标签 统计

小编Mar_dir的帖子

标签统计