小编Mit*_*ril的帖子

如何使外部存储库和嵌入式存储库作为通用/独立存储库工作？

我有一个大项目(让我们说A repo),它有一个来自的子文件夹B repo.当我承诺时,我会像下面那样遇到警告A repo

warning: adding embedded git repository: extractor/annotator-server
hint: You've added another git repository inside your current repository.
hint: Clones of the outer repository will not contain the contents of
hint: the embedded repository and will not know how to obtain it.
hint: If you meant to add a submodule, use:
hint:
hint:   git submodule add <url> extractor/annotator-server
hint:
hint: If you added this path by mistake, you can remove it from the …

Run Code Online (Sandbox Code Playgroud)

git version-control github

Mit*_*ril

2017 10-30

18
推荐指数

5
解决办法

3万
查看次数

如何在python中编写代理池服务器(当请求到来时,选择一个代理来获取url内容)？

我不知道这样的代理服务器的名称是什么,欢迎您修改我的问题标题.

当我在谷歌搜索代理服务器时,很多实现像maproxy或a-python-proxy-in-than-than-100-lines-of-code.那些代理服务器似乎只是要求远程服务器获取某个URL地址.

我想构建一个代理服务器,它包含一个代理池(一个http/https代理列表),只有一个IP地址和一个端口来服务传入的请求.当请求到来时,它会从池中选择一个代理并执行此请求,并返回结果.

例如,我有一个IP'192.168.1.66'的VPS.我在此VPS启动代理服务器,IP为"127.0.0.1",端口为"8080".

然后,我可以使用此代理,如下所示.

import requests
url = 'http://www.google.com'
headers = {
    ...
}
proxies = {
    'http': 'http://192.168.1.66:8080'
}

r = requests.get(url, headers=headers, proxies=proxies)

Run Code Online (Sandbox Code Playgroud)

我看到了一些不足之处:

from twisted.web import proxy, http
from twisted.internet import reactor
from twisted.python import log
import sys
log.startLogging(sys.stdout)

class ProxyFactory(http.HTTPFactory):
    protocol = proxy.Proxy

reactor.listenTCP(8080, ProxyFactory())
reactor.run()

Run Code Online (Sandbox Code Playgroud)

它工作正常,但它很简单,我不知道它是如何工作的,以及如何改进此代码以使用代理池.

一个示例流程:

来自hidu/proxy-manager,由golang编写.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
+ client (want visit http://www.baidu.com/)              +  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
                        |  
                        |  via proxy 127.0.0.1:8090  
                        |  
                        V  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
+                       +         proxy pool …

Run Code Online (Sandbox Code Playgroud)

python proxy http-proxy

Mit*_*ril

2017 02-20

17
推荐指数

1
解决办法

3073
查看次数

Luhn算法的实现

我正在尝试实现信用卡号码的简单验证.我在维基百科上读到了Luhn算法:

从最右边的校验位开始计数,然后向左移动,将每第二个数字的值加倍.

将产品的数字(例如,10:1 + 0 = 1,14:1 + 4 = 5)与原始数字中的无数数字相加.

如果总模数10等于0(如果总数以零结束)则该数字根据Luhn公式有效; 否则它无效.

在维基百科上,很容易理解Luhn算法的描述.但是,我还在Rosetta Code和其他地方看到了Luhn算法的其他实现.

这些实现工作得很好,但我很困惑为什么他们可以使用数组来完成工作.他们使用的数组似乎与Luhn算法无关,我无法看到他们如何实现维基百科上描述的步骤.

他们为什么要使用数组？它们有什么意义,它们如何用于实现维基百科所描述的算法？

javascript algorithm luhn

Mit*_*ril

2013 07-11

14
推荐指数

3
解决办法

2万
查看次数

sklearn分类器得到ValueError:输入形状不好

我有一个csv,struct is CAT1,CAT2,TITLE,URL,CONTENT,CAT1,CAT2,TITLE,CONTENT都是中文的.

我想要火车LinearSVC或MultinomialNBX(TITLE)和功能(CAT1,CAT2),都会得到这个错误.下面是我的代码:

PS:我通过这个例子scikit-learn text_analytics在下面写代码

import numpy as np
import csv
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline

label_list = []

def label_map_target(label):
    ''' map chinese feature name to integer  '''
    try:
        idx = label_list.index(label)
    except ValueError:
        idx = len(label_list)
        label_list.append(label)

    return idx


c1_list = []
c2_list = []
title_list = []
with open(csv_file, 'r') as f:
    # row_from_csv is for shorting this example
    for row in …

Run Code Online (Sandbox Code Playgroud)

python classification scikit-learn text-classification

Mit*_*ril

2015 08-03

14
推荐指数

2
解决办法

6万
查看次数

如何在 pyspark pandas_udf 中记录/打印消息？

我已经测试过，无论是在集群模式还是客户端模式下，logger都print无法在 a 中打印消息pandas_udf。

测试代码：

import sys
import numpy as np
import pandas as pd

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
import logging

logger = logging.getLogger('test')

spark = (SparkSession
.builder
.appName('test')
.getOrCreate())


df = spark.createDataFrame(pd.DataFrame({
    'y': np.random.randint(1, 10, (20,)),
    'ds': np.random.randint(1000, 9999, (20,)),
    'store_id' : ['a'] * 10 + ['b'] *7 + ['q']*3,
    'product_id' : ['c'] * 5 + ['d'] *12 + ['e']*3,
    })
)


@pandas_udf('y int, ds int, store_id string, product_id string', …

Run Code Online (Sandbox Code Playgroud)

user-defined-functions pandas apache-spark pyspark

Mit*_*ril

lucky-day

12
推荐指数

1
解决办法

8750
查看次数

Anaconda/Python 站点包子文件夹名称中带有波浪号 - 它们是什么？

今天去修改matplotlib的配置。搜索matplotlibrc显示我有两个：

$搜索结果截图，有两个条目：<code>C:\Anaconda3\Lib\site-packages\~-tplotlib\mpl-data</code> 和 <code>C:\Anaconda3\Lib\site-packages\matplotlib \mpl-数据</code>$

查看site-packages文件夹，我发现很多包的名称中有波浪号：

~klearn是sklearn，但还有另一个sklearn。
~atplotlib 也是 matplotlib，更改日期是 2018-11
~-tplotlib的更改日期是 2019-3.15
matplotlib的更改日期是2019-3.28（我最近确实更新了 matplotlib）

这些波浪号名称包的用途是什么？我可以安全地删除它们吗？

python pip anaconda python-packaging

Mit*_*ril

2019 10-30

8
推荐指数

1
解决办法

1089
查看次数

避免检测“Chrome DevTools(console)是否打开”

今天我看到这篇文章找出Chrome控制台是否打开。

@zswang 提供了检测 Chrome DevTools(console) 是否打开的方法。那真是让我吃惊，然后我开始思考有没有办法绕过这种检测技术？

有两种方法可以检测 chrome DevTools 是否打开（详细信息在上面的帖子中）

使用 Object.defineProperty

我可以绕过这个，它可以分配给另一个函数。我试过了Object.defineProperty=null，然后检测函数死了（我知道写一个模拟函数更好，这里只是一个例子）
使用obj.__defineGetter__( Object.prototype.__defineGetter__)

Object.prototype.__defineGetter__= null 不会破检测，怎么走？

最后不得不说我不喜欢被监控，希望有合适的走走方式。

javascript google-chrome google-chrome-devtools console.log

Mit*_*ril

2017 05-23

7
推荐指数

3
解决办法

2万
查看次数

如何使用 urllib3 打印原始 html 字符串？

我使用下面的语句来获取 html 字符串：

import urllib3

url ='http://urllib3.readthedocs.org/'
http_pool = urllib3.connection_from_url(url)
r = http_pool.urlopen('GET',url)

print (r.data)

Run Code Online (Sandbox Code Playgroud)

但输出是：

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "b'\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html xmlns="http://www.w3.org/1999/xhtml">\n  <head>\n    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />\n    \n\n   .......................................\n</script>\n\n\n\n  </body>\n</html>''

Run Code Online (Sandbox Code Playgroud)

我怎样才能得到一个原始的 html 字符串？

python urllib3

Mit*_*ril

lucky-day

6
推荐指数

1
解决办法

1万
查看次数

jekyll发送错误:没有将nil隐式转换为String

我已经搜索了jekyll这个错误.当jekyll处理页面时,似乎发生了ruby错误,但我根本不理解ruby.

jekyll版本1.3.1

我甚至重新安装了ruby和jekyll,但结果并没有改变.

更新:
这个错误在我将jekyll从1.31降级到1.20后消失了
注意:我用jekyll 1.20创建了我的网站,所以它不能用1.3.1构建？这是核心问题吗？

E:\ GitHub\sample> jekyll serve --trace:

Configuration file: E:/GitHub/sample/_config.yml
            Source: E:/GitHub/sample
       Destination: E:/GitHub/sample/_site
      Generating... D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/page.rb:127:in `join': no implicit conversion of nil int
o String (TypeError)
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/page.rb:127:in `relative_path'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/page.rb:122:in `path'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:118:in `pagination_candidate?'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:77:in `block in template_page'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:76:in `select'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:76:in `template_page'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/generators/pagination.rb:17:in `generate'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/site.rb:229:in `block in generate'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/site.rb:228:in `each'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/site.rb:228:in `generate'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/site.rb:38:in `process'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/command.rb:18:in `process_site'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/commands/build.rb:23:in `build'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/lib/jekyll/commands/build.rb:7:in `process'
        from D:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/jekyll-1.3.1/bin/jekyll:97:in …

Run Code Online (Sandbox Code Playgroud)

ruby jekyll

Mit*_*ril

2014 01-19

6
推荐指数

1
解决办法

2148
查看次数

Plotly：如何设置自定义 xticks

来自plotly 文档：

布局 > xaxis > tickvals：

设置此轴上出现刻度的值。仅当tickmode设置为“数组”时才有效。与一起使用 ticktext。

布局 > xaxis > 刻度文本：

通过设置在刻度位置显示的文本tickvals。仅当tickmode设置为“数组”时才有效。与一起使用tickvals。

例子：

import pandas as pd
import numpy as np

np.random.seed(42)
feature = pd.DataFrame({'ds': pd.date_range('20200101', periods=100*24, freq='H'), 
                        'y': np.random.randint(0,20, 100*24) , 
                        'yhat': np.random.randint(0,20, 100*24) , 
                        'price': np.random.choice([6600, 7000, 5500, 7800], 100*24)})


import plotly.graph_objects as go
import plotly.offline as py
import plotly.express as px
from plotly.offline import init_notebook_mode

init_notebook_mode(connected=True)


y = feature.set_index('ds').resample('D')['y'].sum()

fig …

Run Code Online (Sandbox Code Playgroud)

python plotly plotly-python

Mit*_*ril

2020 05-18

6
推荐指数

1
解决办法

4730
查看次数