我有一些分层数据,最终到时间序列数据,看起来像这样:
df = pandas.DataFrame(
{'value_a': values_a, 'value_b': values_b},
index=[states, cities, dates])
df.index.names = ['State', 'City', 'Date']
df
value_a value_b
State City Date
Georgia Atlanta 2012-01-01 0 10
2012-01-02 1 11
2012-01-03 2 12
2012-01-04 3 13
Savanna 2012-01-01 4 14
2012-01-02 5 15
2012-01-03 6 16
2012-01-04 7 17
Alabama Mobile 2012-01-01 8 18
2012-01-02 9 19
2012-01-03 10 20
2012-01-04 11 21
Montgomery 2012-01-01 12 22
2012-01-02 13 23
2012-01-03 14 24
2012-01-04 15 25
Run Code Online (Sandbox Code Playgroud)
我想对每个城市进行时间重新采样,所以就像这样
df.resample("2D", how="sum")
Run Code Online (Sandbox Code Playgroud)
会输出 …
我假设Elasticsearch中的每个分片都是索引.但我在某处读到每个段都是Lucene索引.
什么是细分市场?它如何影响搜索性能?我的索引每天大小达到450GB(我每天都会创建一个新的),默认的Elasticsearch设置.
当我执行curl -XPOST "http://localhost:9200/logstash-2013.03.0$i_optimize?max_num_segments=1",我得到
num_committed_segments=11和num_search_segments=11.
上述值不应该是1吗?也许这是因为index.merge.policy.segments_per_tier价值?这层是什么?
我想知道为什么我不能{number_format($row['my_number'])}在Heredoc里面做些什么.有没有办法解决这个问题,而不必诉诸于定义$myNumber下面的变量?
看了http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.nowdoc但什么也没找到.
码
foreach ($dbh -> query($sql) as $row):
$myNumber = number_format($row['my_number']);
$table .= <<<EOT
<tr>
<td>{$row['my_number']}</td> // WORKS
<td>$myNumber</td> // WORKS
<td>{number_format($row['my_number'])}</td> // DOES NOT WORK!
</tr>
EOT;
endforeach;
Run Code Online (Sandbox Code Playgroud) 我使用Connexion的框架瓶打造的microService.我想用我的应用程序编写测试py.test.
pytest-flask它在文档中说创建一个夹具,conftest.py就像这样创建应用程序:
conftest.pyimport pytest
from api.main import create_app
@pytest.fixture
def app():
app = create_app()
return app
Run Code Online (Sandbox Code Playgroud)
在我的测试中,我正在使用这样的client夹具:
test_api.pydef test_api_ping(client):
res = client.get('/status')
assert res.status == 200
Run Code Online (Sandbox Code Playgroud)
但是当我运行时,py.test我收到以下错误消息:
==================================== ERRORS ====================================
_______________________ ERROR at setup of test_api_ping ________________________
request = <SubRequest '_monkeypatch_response_class' for <Function 'test_api_ping'>>
monkeypatch = <_pytest.monkeypatch.MonkeyPatch instance at 0x7f9f76b76518>
@pytest.fixture(autouse=True)
def _monkeypatch_response_class(request, monkeypatch):
"""Set custom response class before test suite and restore the original …Run Code Online (Sandbox Code Playgroud) 在宽和深学习模型中插入1MM +行会抛出ValueError: GraphDef cannot be larger than 2GB:
Traceback (most recent call last):
File "search_click.py", line 207, in <module>
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "search_click.py", line 204, in main
train_and_eval()
File "search_click.py", line 181, in train_and_eval
m.fit(input_fn=lambda: input_fn(df_train), steps=FLAGS.train_steps)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 182, in fit
monitors=monitors)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 458, in _train_model
summary_writer=graph_actions.get_summary_writer(self._model_dir))
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/graph_actions.py", line 76, in get_summary_writer
graph=ops.get_default_graph())
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/summary_io.py", line 113, in __init__
self.add_graph(graph=graph, graph_def=graph_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/summary_io.py", line 204, …Run Code Online (Sandbox Code Playgroud) 我遇到了两个for循环的简单java程序.问题是这些for循环是否需要相同的时间来执行,或者首先执行的速度会比第二个更快.
以下是计划:
public static void main(String[] args) {
Long t1 = System.currentTimeMillis();
for (int i = 999; i > 0; i--) {
System.out.println(i);
}
t1 = System.currentTimeMillis() - t1;
Long t2 = System.currentTimeMillis();
for (int j = 0; j < 999; j++) {
System.out.println(j);
}
t2 = System.currentTimeMillis() - t2;
System.out.println("for loop1 time : " + t1);
System.out.println("for loop2 time : " + t2);
}
Run Code Online (Sandbox Code Playgroud)
执行此操作后,我发现第一个for循环比第二个花费更多时间.但是在交换位置之后,结果与先前写入的循环相同,总是花费比另一个更多的时间.我对结果感到非常惊讶.请有人告诉我上面的程序是如何工作的.
我正在尝试自动将jar包含到我的PySpark类路径中.现在我可以输入以下命令,它可以工作:
$ pyspark --jars /path/to/my.jar
Run Code Online (Sandbox Code Playgroud)
我想默认包含那个jar,这样我只能输入pyspark并在IPython Notebook中使用它.
我已经读过,我可以通过在env中设置PYSPARK_SUBMIT_ARGS来包含参数:
export PYSPARK_SUBMIT_ARGS="--jars /path/to/my.jar"
Run Code Online (Sandbox Code Playgroud)
不幸的是,上述方法无效.我收到运行时错误Failed to load class for data source.
运行Spark 1.3.1.
编辑
使用IPython Notebook时我的解决方法如下:
$ IPYTHON_OPTS="notebook" pyspark --jars /path/to/my.jar
Run Code Online (Sandbox Code Playgroud) 我有一个Parameters类型map的列:
>>> from pyspark.sql import SQLContext
>>> sqlContext = SQLContext(sc)
>>> d = [{'Parameters': {'foo': '1', 'bar': '2', 'baz': 'aaa'}}]
>>> df = sqlContext.createDataFrame(d)
>>> df.collect()
[Row(Parameters={'foo': '1', 'bar': '2', 'baz': 'aaa'})]
Run Code Online (Sandbox Code Playgroud)
我想重塑它在pyspark这样所有的按键(foo,bar,等)都列,分别为:
[Row(foo='1', bar='2', baz='aaa')]
Run Code Online (Sandbox Code Playgroud)
使用withColumn作品:
(df
.withColumn('foo', df.Parameters['foo'])
.withColumn('bar', df.Parameters['bar'])
.withColumn('baz', df.Parameters['baz'])
.drop('Parameters')
).collect()
Run Code Online (Sandbox Code Playgroud)
但我需要一个没有明确提到列名的解决方案,因为我有几十个.
>>> df.printSchema()
root
|-- Parameters: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull …Run Code Online (Sandbox Code Playgroud) 如何在条形图中的条形图上方添加值的标签:
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'Users': [ 'Bob', 'Jim', 'Ted', 'Jesus', 'James'],
'Score': [10,2,5,6,7],})
df = df.set_index('Users')
df.plot(kind='bar', title='Scores')
plt.show()
Run Code Online (Sandbox Code Playgroud) 可以为移动设备使用WYSIWYG文本编辑器(尤其是TinyMCE)还是不支持?会不会得到支持?