小编par*_*rik的帖子

从字符串中获取字符串后的字符串

从这样的字符串中只获取important_stuff部分的最快方法是什么:

bla-bla_delimiter_important_stuff

Run Code Online (Sandbox Code Playgroud)

_delimiter_ 总是存在,但字符串的其余部分可以改变.

php string

Ale*_*lex

2017 01-12

44
推荐指数

5
解决办法

6万
查看次数

Selenium-Debugging:元素在点(X,Y)处不可点击

我试图通过Selenium 刮掉这个网站.

我想点击"下一页"按钮,为此我这样做:

 driver.find_element_by_class_name('pagination-r').click()

Run Code Online (Sandbox Code Playgroud)

它适用于许多页面但不适用于所有页面,我收到此错误

WebDriverException: Message: Element is not clickable at point (918, 13). Other element would receive the click: <div class="linkAuchan"></div>

Run Code Online (Sandbox Code Playgroud)

总是为这个页面

我读了这个问题

我试过这个

driver.implicitly_wait(10)
el = driver.find_element_by_class_name('pagination-r')
action = webdriver.common.action_chains.ActionChains(driver)
action.move_to_element_with_offset(el, 918, 13)
action.click()
action.perform()

Run Code Online (Sandbox Code Playgroud)

但我得到了同样的错误

python selenium web-scraping selenium-firefoxdriver selenium-webdriver

par*_*rik

2018 05-09

42
推荐指数

3
解决办法

3万
查看次数

无法使用shub-image运行/部署自定义脚本

我有使用shub-image 运行/部署自定义脚本的问题.

setup.py

from setuptools import setup, find_packages

setup(
    name = 'EU-Crawler',
    version = '1.0',
    packages = find_packages(),
    scripts = [
        'bin/launcher.py',
        'bin/DE_WEB_launcher.py',
        'bin/ES_WEB_launcher.py',
        'bin/FR_WEB_launcher.py',
        'bin/IT_WEB_launcher.py',
        'bin/NL_WEB_launcher.py',
        'bin/DE_MOBILE_launcher.py',
        'bin/FR_MOBILE_launcher.py'
    ],
    package_data = {
        'Crawling': ['*.ini'],
    },
    entry_points = {'scrapy': ['settings = Crawling.settings']},
    install_requires=[
        'scrapy-crawlera>=1.2.2',
        'configobj',
        'scrapy-fake-useragent',
        'xmltodict',
        'selenium==2.53.2',
        'python-dateutil',
        'pyvirtualdisplay',
        'beautifulsoup4',
        'incapsula-cracker-py3'
    ], 
    extras_require={'ScrapyElasticSearch': 'ScrapyElasticSearch[extras]'},
    zip_safe = False,
    include_package_data=True
)

Run Code Online (Sandbox Code Playgroud)

在这个文件中我有

scripts = [
       'bin/launcher.py',
       'bin/DE_WEB_launcher.py',
       'bin/ES_WEB_launcher.py',
       'bin/FR_WEB_launcher.py',
       'bin/IT_WEB_launcher.py',
       'bin/NL_WEB_launcher.py',
       'bin/DE_MOBILE_launcher.py',
       'bin/FR_MOBILE_launcher.py'
   ],

Run Code Online (Sandbox Code Playgroud)

谁是我要发送的不同的filse

我使用此命令进行部署

sudo shub image …

Run Code Online (Sandbox Code Playgroud)

python scrapy scrapinghub

par*_*rik

2018 06-18

11
推荐指数

1
解决办法

238
查看次数

如何绕过Incapsula与Python

我使用Scrapy,我试图刮掉这个使用Incapsula的网站

<meta name="robots" content="noindex,nofollow">
<script src="/_Incapsula_Resource?SWJIYLWA=719d34d31c8e3a6e6fffd425f7e032f3">
</script>

Run Code Online (Sandbox Code Playgroud)

我已经问了一个问题关于这个问题2年前,但这种方法(Incapsula-饼干)不工作了.

我试图理解Incapsula是如何工作的 ,我试图绕过它

def start_requests(self):
    yield Request('https://courses-en-ligne.carrefour.fr',  cookies={'store': 92}, dont_filter=True, callback = self.init_shop)
def init_shop(self,response) :
    result_content      = response.body
    RE_ENCODED_FUNCTION = re.compile('var b="(.*?)"', re.DOTALL)
    RE_INCAPSULA        = re.compile('(_Incapsula_Resource\?SWHANEDL=.*?)"')
    INCAPSULA_URL       = 'https://courses-en-ligne.carrefour.fr/%s'
    encoded_func        = RE_ENCODED_FUNCTION.search(result_content).group(1)
    decoded_func        = ''.join([chr(int(encoded_func[i:i+2], 16)) for i in xrange(0, len(encoded_func), 2)])
    incapsula_params    = RE_INCAPSULA.search(decoded_func).group(1)
    incap_url           = INCAPSULA_URL % incapsula_params
    yield Request(incap_url)
def parse(self):
    print response.body

Run Code Online (Sandbox Code Playgroud)

但我被重定向到RE-Captcha Page

<html style="height:100%"> …

Run Code Online (Sandbox Code Playgroud)

python recaptcha scrapy incapsula

par*_*rik

2018 04-17

11
推荐指数

2
解决办法

6010
查看次数

如何在Scrapy中使用ssl客户端证书(p12)？

我需要使用格式为p12(PKCS12)的客户端证书文件与scrapy的网络服务器通信,有没有办法做到这一点？

python client-certificates pkcs#12 scrapy

par*_*rik

2017 05-04

8
推荐指数

1
解决办法

1032
查看次数

谷歌浏览器:同时具有更多元素的"流畅"scrollIntoView()不起作用

题

在Google Chrome中,如果同时滚动更多容器,则行为"平滑"的element.scrollIntoView()不起作用.如果使用顺序触发触发下一个容器,则第一个容器会停止滚动.

在Firefox中一切正常.

我的解决方法是使用"即时"行为,但我喜欢使用"流畅"行为来获得更好的用户体验.

例

这是一个使用Angular的plunker

HTML

<p>
  In Google Chrome element.scrollIntoView() with behavior 'smooth' doesn't work, if scrolling more containers at the same time.
  Shwon in case 'All Smooth (200ms sequence)' the container stopps scrolling.
  <br>
  <br> In Firefox all works.
</p>

<div class="row mb-1">
  <div class="col">
    <button (click)="reset()" type="button" class="btn btn-secondary">Reset</button>
  </div>
</div>

<div class="row mb-1">
  <div class="col">
    <button (click)="scrollAllInstant()" type="button" class="btn btn-secondary">All Instant</button>
    <small class="text-success">Works</small>
  </div>
</div>

<div class="row mb-1">
  <div class="col">
    <button (click)="scrollAllSmooth()" type="button" …

Run Code Online (Sandbox Code Playgroud)

javascript google-chrome js-scrollintoview

ali*_*000

2018 03-16

7
推荐指数

1
解决办法

381
查看次数

在scrapy中并行运行多个蜘蛛并行1个网站？

我想用2个部分抓取一个网站,我的脚本没有我需要的那么快.

是否可以发射2个蜘蛛,一个用于刮第一部分,第二个用于第二部分？

我试着有两个不同的类,然后运行它们

scrapy crawl firstSpider
scrapy crawl secondSpider

Run Code Online (Sandbox Code Playgroud)

但我认为这不聪明.

我阅读了scrapyd的文档,但我不知道这对我的情况是否有益.

python web-crawler scrapy web-scraping scrapy-spider

par*_*rik

2018 07-20

6
推荐指数

2
解决办法

9384
查看次数

违反PH假设

运行生存分析,假设关于变量的p值具有统计显着性 - 假设与结果呈正相关.然而,根据Schoenfeld残差,违反了比例风险(PH)假设.

在纠正PH违规后,可能会发生以下哪种情况？

p值可能不再重要.
p值仍然很重要,但HR的大小可能会发生变化.
p值仍然显着,但关联方向可能会改变(即正关联可能最终为负).

PH假设违规通常意味着需要在模型中包含交互效应.在简单线性回归中,包括新变量可能由于共线性而改变现有变量系数的方向.在上面的案例中我们可以使用相同的理由吗？

cox-regression survival-analysis

Slo*_*uei

2018 02-12

6
推荐指数

1
解决办法

495
查看次数

WPF/XAML - 绑定到 TextBox PreviewTextInput 事件

我正在尝试将文本框“PreviewTextInput”绑定到视图模型中的方法。我正在关注这篇文章，但我的方法从未被调用。这是我的 XAML 代码：

<UserControl x:Class="ConfigurationView"
             xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
             xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
             xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
             xmlns:d="http://schemas.microsoft.com/expression/blend/2008" 
             xmlns:local="clr-namespace:OPCUAProjectModule.Views"
             xmlns:i="clr-namespace:System.Windows.Interactivity;assembly=System.Windows.Interactivity"
             mc:Ignorable="d" 
             d:DesignHeight="500" d:DesignWidth="700">
.....
.....
.....
                    <TextBox x:Name="txtServer" Text="{Binding Server, Mode=TwoWay, UpdateSourceTrigger=PropertyChanged, ValidatesOnDataErrors=True, NotifyOnValidationError=True}">
                        <i:Interaction.Triggers>
                            <i:EventTrigger EventName="PreviewTextInput" >
                                <i:InvokeCommandAction Command="{Binding IsAllowedInput}" />
                            </i:EventTrigger>
                        </i:Interaction.Triggers>
                    </TextBox>
....
....

Run Code Online (Sandbox Code Playgroud)

这里我们使用 ViewModel 代码：

public class ConfigurationViewModel : BindableBase, INotifyDataErrorInfo
{
....
....
    public string Server
    {
        get
        {
            return this.server;
        }

        set
        {
            this.SetProperty(ref this.server, value);
        }
    }

    private void IsAllowedInput(object sender, System.Windows.Input.TextCompositionEventArgs e)
    {
        //Never enters here.
    }

Run Code Online (Sandbox Code Playgroud)

c# wpf xaml prism mvvm

Mil*_*rov

2018 05-30

6
推荐指数

1
解决办法

5514
查看次数

如何修复Selenium WebDriverException:"浏览器似乎已退出"

当我想使用时,我得到了这个例外 FireFox webdriver

引发WebDriverException"浏览器似乎已退出"WebDriverException:消息:在我们连接之前,浏览器似乎已退出.如果在FirefoxBinary构造函数中指定了log_file,请检查它是否有详细信息.

我读了这个问题并更新了我的硒,但我已经遇到了同样的问题.

我的代码:

driver = webdriver.Firefox()
time.sleep(5)
driver.get('http://www.example.com')

Run Code Online (Sandbox Code Playgroud)

UPDATE

我读了这个问题

现在我有这个错误

OSError: [Errno 20] Not a directory
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x407a690>> ignored

Run Code Online (Sandbox Code Playgroud)

python selenium selenium-firefoxdriver selenium-webdriver

par*_*rik

2018 06-15

5
推荐指数

2
解决办法

9110
查看次数