使用时click我知道如何定义多项选择选项。我也知道如何将选项设置为必需选项。但是,我怎么能表明一个选项B,只需要在设置选项的值A是foo?
下面是一个例子:
import click
@click.command()
@click.option('--output',
type=click.Choice(['stdout', 'file']), default='stdout')
@click.option('--filename', type=click.STRING)
def main(output, filename):
print("output: " + output)
if output == 'file':
if filename is None:
print("filename must be provided!")
else:
print("filename: " + str(filename))
if __name__ == "__main__":
main()
Run Code Online (Sandbox Code Playgroud)
如果output选项是stdout,则filename不需要。但是,如果用户选择output是file,则filename必须提供另一个选项。单击是否支持此模式?
在函数的开头,我可以添加如下内容:
if output == 'file' and filename is None:
raise ValueError('When output is "file", a …Run Code Online (Sandbox Code Playgroud) 我的data框架包含10,000,000行!分组后,仍然有大约9,000,000个子帧循环.
代码是:
data = read.csv('big.csv')
for id, new_df in data.groupby(level=0): # look at mini df and do some analysis
# some code for each of the small data frames
Run Code Online (Sandbox Code Playgroud)
这是非常低效的,现在代码运行了10多个小时.
有没有办法加快速度?
完整代码:
d = pd.DataFrame() # new df to populate
print 'Start of the loop'
for id, new_df in data.groupby(level=0):
c = [new_df.iloc[i:] for i in range(len(new_df.index))]
x = pd.concat(c, keys=new_df.index).reset_index(level=(2,3), drop=True).reset_index()
x = x.set_index(['level_0','level_1', x.groupby(['level_0','level_1']).cumcount()])
d = pd.concat([d, x])
Run Code Online (Sandbox Code Playgroud)
要获取数据:
data = pd.read_csv('https://raw.githubusercontent.com/skiler07/data/master/so_data.csv', index_col=0).set_index(['id','date'])
Run Code Online (Sandbox Code Playgroud)
注意:
大多数id只有1个日期.这表明只有1次访问.对于具有更多访问量的id,我想以3d格式构建它们,例如将所有访问存储在第二维度中.输出为(id,visit,features)
我在virtualenv下使用Click并使用entry_pointsetuptools中的指令将根映射到一个名为dispatch的函数.
我的工具公开两个子serve和config,我使用在顶级组的选项,以确保用户总是通过一个--path指令.但用法结果如下:
mycommand --path=/tmp serve
Run Code Online (Sandbox Code Playgroud)
无论是serve和config子命令需要确保用户始终在传递路径和理想,我想目前的CLI为:
mycommand serve /tmp` or `mycommand config validate /tmp
Run Code Online (Sandbox Code Playgroud)
当前基于Click的实现如下:
# cli root
@click.group()
@click.option('--path', type=click.Path(writable=True))
@click.version_option(__version__)
@click.pass_context
def dispatch(ctx, path):
"""My project description"""
ctx.obj = Project(path="config.yaml")
# serve
@dispatch.command()
@pass_project
def serve(project):
"""Starts WSGI server using the configuration"""
print "hello"
# config
@dispatch.group()
@pass_project
def config(project):
"""Validate or initalise a configuration file"""
pass
@config.command("validate")
@pass_project
def config_validate(project):
"""Reports on the validity of …Run Code Online (Sandbox Code Playgroud) 我正在使用IF声明将一些数字组合在一起.在列AI中有数字和文本值.
A B
1 s =IF(A1>200,2345,"ad")
Run Code Online (Sandbox Code Playgroud)
如果我这样做,那么B1返回2345.
Excel如何将字符串值与数字值进行比较?
我有这样的数据pandas.DataFrame:
Date, Team1, Team2, Team1 Score, Team2 Score, Event
8/2/17, Juventus, Milan, 2, 1, Friendly match
6/2/17, Milan, Napoli, 3, 0, Friendly match
5/1/17, Milan, Sampdoria, 1, 0, Friendly match
25/12/16, Parma, Milan, 0, 5, Friendly match
Run Code Online (Sandbox Code Playgroud)
我怎么能列出米兰的进球数?
输出应该看起来像::
[1, 3, 1, 5]
Run Code Online (Sandbox Code Playgroud) 我有多个包含不同类型的txt文件的zip文件.如下所示:
zip1
- file1.txt
- file2.txt
- file3.txt
Run Code Online (Sandbox Code Playgroud)
如何使用pandas读取每个文件而不提取它们?
我知道如果每个zip是1个文件我可以使用read_csv的压缩方法,如下所示:
df = pd.read_csv(textfile.zip, compression='zip')
Run Code Online (Sandbox Code Playgroud)
任何有关如何做到这一点的帮助都会很棒.
我正在尝试同时执行Scipy的多次迭代,curve_fit以避免循环,从而提高速度.
这与这个问题非常相似,已经解决了.然而,功能是分段(不连续)的事实使得该解决方案不适用于此.
考虑这个例子:
import numpy as np
from numpy import random as rng
from scipy.optimize import curve_fit
rng.seed(0)
N=20
X=np.logspace(-1,1,N)
Y = np.zeros((4, N))
for i in range(0,4):
b = i+1
a = b
print(a,b)
Y[i] = (X/b)**(-a) #+ 0.01 * rng.randn(6)
Y[i, X>b] = 1
Run Code Online (Sandbox Code Playgroud)
这产生了这些数组:
你可以看到哪些是不连续的X==b.我可以通过迭代检索原始值a并b使用curve_fit:
def plaw(r, a, b):
""" Theoretical power law for the shape of the normalized conditional density """
import numpy as …Run Code Online (Sandbox Code Playgroud) 我目前正在阅读"Scikit-Learn&TensorFlow的动手机器学习".当我尝试重新创建Transformation Pipelines代码时出错.我怎样才能解决这个问题?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_pipeline = Pipeline([('imputer', Imputer(strategy = "median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
housing_num_tr = num_pipeline.fit_transform(housing_num)
from sklearn.pipeline import FeatureUnion
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attribs)),
('imputer', Imputer(strategy = "median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
cat_pipeline = Pipeline([('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer()),
])
full_pipeline = FeatureUnion(transformer_list = [("num_pipeline", num_pipeline),
("cat_pipeline", cat_pipeline),
])
# And we can now run the whole pipeline simply:
housing_prepared = full_pipeline.fit_transform(housing) …Run Code Online (Sandbox Code Playgroud) 从这样的数据集:
import pandas as pd
import numpy as np
import statsmodels.api as sm
# A dataframe with two variables
np.random.seed(123)
rows = 12
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x'])
df = df.set_index(rng)
Run Code Online (Sandbox Code Playgroud)
......和这样的线性回归模型:
x = sm.add_constant(df['x'])
model = sm.OLS(df['y'], x).fit()
Run Code Online (Sandbox Code Playgroud)
...你可以通过这种方式轻松检索一些模型系数:
print(model.params)
Run Code Online (Sandbox Code Playgroud)
但我无法找到如何从模型摘要中检索所有其他参数:
print(str(model.summary()))
Run Code Online (Sandbox Code Playgroud)
如问题中所述,我对R平方特别感兴趣.
从帖子如何从Pandas中的OLS摘要中提取特定值?我了解到你可以在print(model.r2)那里做同样的事情.但这似乎不适用于statsmodels.
有什么建议?
我想知道如何使用列表理解来替换列表的值.例如
theList = [[1,2,3],[4,5,6],[7,8,9]]
newList = [[1,2,3],[4,5,6],[7,8,9]]
for i in range(len(theList)):
for j in range(len(theList)):
if theList[i][j] % 2 == 0:
newList[i][j] = 'hey'
Run Code Online (Sandbox Code Playgroud)
我想知道如何将其转换为列表理解格式.
python ×9
pandas ×3
numpy ×2
python-3.x ×2
python-click ×2
dataframe ×1
list ×1
pipeline ×1
scikit-learn ×1
scipy ×1
statsmodels ×1
zip ×1