我有一个大项目(让我们说A repo
),它有一个来自的子文件夹B repo
.当我承诺时,我会像下面那样遇到警告A repo
warning: adding embedded git repository: extractor/annotator-server
hint: You've added another git repository inside your current repository.
hint: Clones of the outer repository will not contain the contents of
hint: the embedded repository and will not know how to obtain it.
hint: If you meant to add a submodule, use:
hint:
hint: git submodule add <url> extractor/annotator-server
hint:
hint: If you added this path by mistake, you can remove it from the …
Run Code Online (Sandbox Code Playgroud) 我不知道这样的代理服务器的名称是什么,欢迎您修改我的问题标题.
当我在谷歌搜索代理服务器时,很多实现像maproxy或a-python-proxy-in-than-than-100-lines-of-code.那些代理服务器似乎只是要求远程服务器获取某个URL地址.
我想构建一个代理服务器,它包含一个代理池(一个http/https代理列表),只有一个IP地址和一个端口来服务传入的请求.当请求到来时,它会从池中选择一个代理并执行此请求,并返回结果.
例如,我有一个IP'192.168.1.66'的VPS.我在此VPS启动代理服务器,IP为"127.0.0.1",端口为"8080".
然后,我可以使用此代理,如下所示.
import requests
url = 'http://www.google.com'
headers = {
...
}
proxies = {
'http': 'http://192.168.1.66:8080'
}
r = requests.get(url, headers=headers, proxies=proxies)
Run Code Online (Sandbox Code Playgroud)
我看到了一些不足之处:
from twisted.web import proxy, http
from twisted.internet import reactor
from twisted.python import log
import sys
log.startLogging(sys.stdout)
class ProxyFactory(http.HTTPFactory):
protocol = proxy.Proxy
reactor.listenTCP(8080, ProxyFactory())
reactor.run()
Run Code Online (Sandbox Code Playgroud)
它工作正常,但它很简单,我不知道它是如何工作的,以及如何改进此代码以使用代理池.
来自hidu/proxy-manager,由golang编写.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ client (want visit http://www.baidu.com/) +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
| via proxy 127.0.0.1:8090
|
V
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ + proxy pool …
Run Code Online (Sandbox Code Playgroud) 我正在尝试实现信用卡号码的简单验证.我在维基百科上读到了Luhn算法:
- 从最右边的校验位开始计数,然后向左移动,将每第二个数字的值加倍.
- 将产品的数字(例如,10:1 + 0 = 1,14:1 + 4 = 5)与原始数字中的无数数字相加.
- 如果总模数10等于0(如果总数以零结束)则该数字根据Luhn公式有效; 否则它无效.
在维基百科上,很容易理解Luhn算法的描述.但是,我还在Rosetta Code和其他地方看到了Luhn算法的其他实现.
这些实现工作得很好,但我很困惑为什么他们可以使用数组来完成工作.他们使用的数组似乎与Luhn算法无关,我无法看到他们如何实现维基百科上描述的步骤.
他们为什么要使用数组?它们有什么意义,它们如何用于实现维基百科所描述的算法?
我有一个csv,struct is
CAT1,CAT2,TITLE,URL,CONTENT
,CAT1,CAT2,TITLE,CONTENT都是中文的.
我想要火车LinearSVC
或MultinomialNB
X(TITLE)和功能(CAT1,CAT2),都会得到这个错误.下面是我的代码:
PS:我通过这个例子scikit-learn text_analytics在下面写代码
import numpy as np
import csv
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
label_list = []
def label_map_target(label):
''' map chinese feature name to integer '''
try:
idx = label_list.index(label)
except ValueError:
idx = len(label_list)
label_list.append(label)
return idx
c1_list = []
c2_list = []
title_list = []
with open(csv_file, 'r') as f:
# row_from_csv is for shorting this example
for row in …
Run Code Online (Sandbox Code Playgroud) 我已经测试过,无论是在集群模式还是客户端模式下,logger
都print
无法在 a 中打印消息pandas_udf
。
测试代码:
import sys
import numpy as np
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
import logging
logger = logging.getLogger('test')
spark = (SparkSession
.builder
.appName('test')
.getOrCreate())
df = spark.createDataFrame(pd.DataFrame({
'y': np.random.randint(1, 10, (20,)),
'ds': np.random.randint(1000, 9999, (20,)),
'store_id' : ['a'] * 10 + ['b'] *7 + ['q']*3,
'product_id' : ['c'] * 5 + ['d'] *12 + ['e']*3,
})
)
@pandas_udf('y int, ds int, store_id string, product_id string', …
Run Code Online (Sandbox Code Playgroud) 今天去修改matplotlib的配置。搜索matplotlibrc
显示我有两个:
查看site-packages
文件夹,我发现很多包的名称中有波浪号:
~klearn
是sklearn
,但还有另一个sklearn
。~atplotlib
也是 matplotlib,更改日期是 2018-11
~-tplotlib
的更改日期是 2019-3.15
matplotlib
的更改日期是2019-3.28
(我最近确实更新了 matplotlib)这些波浪号名称包的用途是什么?我可以安全地删除它们吗?
今天我看到这篇文章 找出Chrome控制台是否打开。
@zswang 提供了检测 Chrome DevTools(console) 是否打开的方法。那真是让我吃惊,然后我开始思考有没有办法绕过这种检测技术?
有两种方法可以检测 chrome DevTools 是否打开(详细信息在上面的帖子中)
使用 Object.defineProperty
我可以绕过这个,它可以分配给另一个函数。我试过了Object.defineProperty=null
,然后检测函数死了(我知道写一个模拟函数更好,这里只是一个例子)
使用obj.__defineGetter__
( Object.prototype.__defineGetter__
)
Object.prototype.__defineGetter__= null
不会破检测,怎么走?
最后不得不说我不喜欢被监控,希望有合适的走走方式。
我使用下面的语句来获取 html 字符串:
import urllib3
url ='http://urllib3.readthedocs.org/'
http_pool = urllib3.connection_from_url(url)
r = http_pool.urlopen('GET',url)
print (r.data)
Run Code Online (Sandbox Code Playgroud)
但输出是:
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "b'\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html xmlns="http://www.w3.org/1999/xhtml">\n <head>\n <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />\n \n\n .......................................\n</script>\n\n\n\n </body>\n</html>''
Run Code Online (Sandbox Code Playgroud)
我怎样才能得到一个原始的 html 字符串?
我已经搜索了jekyll这个错误.当jekyll处理页面时,似乎发生了ruby错误,但我根本不理解ruby.
jekyll版本1.3.1
我甚至重新安装了ruby和jekyll,但结果并没有改变.
更新:
这个错误在我将jekyll从1.31降级到1.20后消失了
注意:我用jekyll 1.20创建了我的网站,所以它不能用1.3.1构建?这是核心问题吗?
E:\ GitHub\sample> jekyll serve --trace:
Configuration file: E:/GitHub/sample/_config.yml
Source: E:/GitHub/sample
Destination: E:/GitHub/sample/_site
Generating... D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/page.rb:127:in `join': no implicit conversion of nil int
o String (TypeError)
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/page.rb:127:in `relative_path'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/page.rb:122:in `path'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:118:in `pagination_candidate?'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:77:in `block in template_page'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:76:in `select'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:76:in `template_page'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:17:in `generate'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/site.rb:229:in `block in generate'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/site.rb:228:in `each'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/site.rb:228:in `generate'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/site.rb:38:in `process'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/command.rb:18:in `process_site'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/commands/build.rb:23:in `build'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/commands/build.rb:7:in `process'
from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/bin/jekyll:97:in …
Run Code Online (Sandbox Code Playgroud) 来自plotly 文档:
布局 > xaxis > tickvals:
设置此轴上出现刻度的值。仅当
tickmode
设置为“数组”时才有效。与 一起使用ticktext
。布局 > xaxis > 刻度文本:
通过 设置在刻度位置显示的文本
tickvals
。仅当tickmode
设置为“数组”时才有效。与 一起使用tickvals
。
例子:
import pandas as pd
import numpy as np
np.random.seed(42)
feature = pd.DataFrame({'ds': pd.date_range('20200101', periods=100*24, freq='H'),
'y': np.random.randint(0,20, 100*24) ,
'yhat': np.random.randint(0,20, 100*24) ,
'price': np.random.choice([6600, 7000, 5500, 7800], 100*24)})
import plotly.graph_objects as go
import plotly.offline as py
import plotly.express as px
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)
y = feature.set_index('ds').resample('D')['y'].sum()
fig …
Run Code Online (Sandbox Code Playgroud) python ×5
javascript ×2
algorithm ×1
anaconda ×1
apache-spark ×1
console.log ×1
git ×1
github ×1
http-proxy ×1
jekyll ×1
luhn ×1
pandas ×1
pip ×1
plotly ×1
proxy ×1
pyspark ×1
ruby ×1
scikit-learn ×1
urllib3 ×1