我试图用我的脚本隐藏CasperJS的利用率.目前我正在尝试更改分辨率,用户代理和语言:
casper.userAgent("My UA");
casper.viewport(1600, 900);
casper.page.customHeaders = {'Accept-Language': 'fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3'};
Run Code Online (Sandbox Code Playgroud)
casper.viewport()和casper.page.customHeaders似乎不适用于Google Analytics ...在尝试某些网站时,它似乎没问题,但Google Analytics会看到我是一个网络刮刀:
My lang is "c"
Compatibility with JAVA : no
Screen resolution : 1024x768
Flash version : not set
Run Code Online (Sandbox Code Playgroud)
我有什么可以做的吗?
(一块)解决方案
感谢kasper pedersen,这是解决方案的一部分:
我们可以在初始化部分覆盖一些变量:
casper.on('page.initialized', function (page) {
page.evaluate(function () {
(function() {
window.screen = {
width: 1600,
height: 900
};
window.navigator.__defineGetter__('javaEnabled', function () {
return function() { return true; };
});
})
});
});
Run Code Online (Sandbox Code Playgroud)
这会破坏Windows解析和Java插件.为了伪造闪存插件,我们可以做类似的事情:
casper.on('page.initialized', function (page) {
page.evaluate(function () {
(function() {
window.screen = { …Run Code Online (Sandbox Code Playgroud) 我试图在我的Windows机器上编译Pyparsing但是遇到以下错误:
python setup.py build_ext --inplace
running build_ext
cythoning pyparsing.pyx to pyparsing.c
Error compiling Cython file:
------------------------------------------------------------
...
If C{include} is set to true, the matched expression is also parsed (the
skipped text
and matched expression are returned as a 2-element list). The C{ignore}
argument is used to define grammars (typically quoted strings and comment
s) that
might contain false matches.
"""
def __init__( self, other, include=False, ignore=None, failOn=None ):
^
------------------------------------------------------------
pyparsing.pyx:2764:31: Expected an identifier, found 'include'
Error compiling Cython …Run Code Online (Sandbox Code Playgroud) 从Casperjs 访问https://disqus.com/profile/login/链接会继续返回以下内容
[警告] [幻像]加载资源失败,状态=失败:https: //disqus.com/profile/login/
ensnare.js
var casper = require("casper").create({
verbose: true,
logLevel: "debug"
});
casper.options.timeout = 15000;
casper.start("https://disqus.com/profile/login/", function() {
this.echo("YES!", "GREEN_BAR");
this.echo(this.getTitle());
});
casper.run();
Run Code Online (Sandbox Code Playgroud)
config.json
{"ignoreSslErrors": true, "cookiesFile": "biscuit", "maxDiskCacheSize": 1000, "diskCache": true}
Run Code Online (Sandbox Code Playgroud)
请注意,我将"ignoreSslErrors"更改为false,但它不起作用.
从终端调用脚本
./phantomjs --config=config.json casperjs/bin/bootstrap.js --casper-path=casperjs --cli ensnare.js
Run Code Online (Sandbox Code Playgroud)
截图
我怎么解决这个问题呢?我可以无问题地访问其他页面.
我试图用nltk.download()更新我的nltk数据,但我得到HTTP错误401:需要授权.
当我追踪有问题的网址时,我在downloader.py中找到了它
DEFAULT_URL =' http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml '
然后我复制了该URL并在浏览器中运行它,发现它要求我输入用户名和密码.这是一个屏幕截图.
有谁知道如何解决这个问题?