将lmplot在seaborn拟合回归模型的截距.但是,有时我想要在没有截距的情况下拟合回归模型,即通过原点进行回归.
例如:
In [1]: import numpy as np
...: import pandas as pd
...: import seaborn as sns
...: import matplotlib.pyplot as plt
...: import statsmodels.formula.api as sfa
...:
In [2]: %matplotlib inline
In [3]: np.random.seed(2016)
In [4]: x = np.linspace(0, 10, 32)
In [5]: y = 0.3 * x + np.random.randn(len(x))
In [6]: df = pd.DataFrame({'x': x, 'y': y})
In [7]: r = sfa.ols('y ~ x + 0', data=df).fit()
In [8]: sns.lmplot(x='x', y='y', data=df, fit_reg=True) …Run Code Online (Sandbox Code Playgroud) 我有一个由matlab创建的struct数组,并存储在v7.3格式的mat文件中:
struArray = struct('name', {'one', 'two', 'three'},
'id', {1,2,3},
'data', {[1:10], [3:9], [0]})
save('test.mat', 'struArray', '-v7.3')
Run Code Online (Sandbox Code Playgroud)
现在我想通过python使用h5py读取这个文件:
data = h5py.File('test.mat')
struArray = data['/struArray']
Run Code Online (Sandbox Code Playgroud)
我不知道如何逐个获取结构数据struArray:
for index in range(<the size of struArray>):
elem = <the index th struct in struArray>
name = <the name of elem>
id = <the id of elem>
data = <the data of elem>
Run Code Online (Sandbox Code Playgroud) I want to process stock level-2 data in pandas. Suppose there are four kinds data in each row for simplicity:
Which can be easily defined as a structured dtype in numpy:
dtype = np.dtype([
('millis', 'int64'),
('last_price', 'float64'),
('ask_queue', ('int32', 200)),
('bid_queue', ('int32', 200))
]) …Run Code Online (Sandbox Code Playgroud) 我想在没有任何配置文件的情况下使用log4j .我要做的是:
logger = (Logger) LogManager.getLogger(this.getClass());
String pattern = "[%level] %m%n";
//do something to make this logger output to an local file "/xxx/yyy/zzz.log"
Run Code Online (Sandbox Code Playgroud)
我找到了这个答案:以 编程方式配置Log4j记录器.
但是文档Logger#addAppender说:
这种方法不是通过公共API公开的,主要用于单元测试.
我不确定在我的代码中使用此方法是否正确,或者还有其他更好的解决方案来解决我的问题.
我想写一个Tuple2 [A,B]到Seq [C]的隐式转换,其中C是A和B的超类型.我的第一个尝试如下:
implicit def t2seq[A,B,C](t: (A,B))(implicit env: (A,B) <:< (C,C)): Seq[C] = {
val (a,b) = env(t)
Seq(a,b)
}
Run Code Online (Sandbox Code Playgroud)
但它不起作用:
scala> (1,2): Seq[Int]
<console>:7: error: type mismatch;
found : (Int, Int)
required: Seq[Int]
(1,2): Seq[Int]
^
Run Code Online (Sandbox Code Playgroud)
虽然这个工作:
class Tuple2W[A,B](t: (A,B)) {
def toSeq[C](implicit env: (A,B) <:< (C,C)): Seq[C] = {
val (a,b) = env(t)
Seq(a,b)
}
}
implicit def t2tw[A,B](t: (A,B)): Tuple2W[A,B] = new Tuple2W(t)
Run Code Online (Sandbox Code Playgroud)
使用案例:
scala> (1,2).toSeq
res0: Seq[Int] = List(1, 2)
Run Code Online (Sandbox Code Playgroud)
我不知道为什么第一个解决方案没有按预期工作. Scala版本2.8.0.r22634-b20100728020027(Java HotSpot(TM)Client VM,Java 1.6.0_20).
例如,我想绘制函数
f(x,y) =sin(x^2+y^2)/(x^2+y^2), x^2+y^2 <=4?
Run Code Online (Sandbox Code Playgroud)
在Mathematica中,我可以这样做:
Plot3D[Sin[x^2 + y^2]/(x^2 + y^2), {x, -4, 4}, {y, -4, 4},
RegionFunction -> (#1^2 + #2^2 <= 4 Pi &)]
Run Code Online (Sandbox Code Playgroud)
其中RegionFunction指定x,y的区域来绘制.

我有一个名为的矩阵xs:
array([[1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 1],
[2, 1, 0, 0, 0, 1, 2, 1, 1, 2, 2]])
Run Code Online (Sandbox Code Playgroud)
现在我想用同一行中最近的前一个元素替换零(假设第一列必须非零.).粗略的解决方案如下:
In [55]: row, col = xs.shape
In [56]: for r in xrange(row):
....: for c in xrange(col):
....: if xs[r, c] == 0:
....: xs[r, c] = xs[r, c-1]
....:
In [57]: xs
Out[57]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1],
[2, 1, 1, 1, 1, 1, 2, 1, 1, …Run Code Online (Sandbox Code Playgroud) 在python 3.4.3和Pandas 0.16中,如何指定dtype索引为str?以下代码是我尝试过的:
In [1]: from io import StringIO
In [2]: import pandas as pd
In [3]: import numpy as np
In [4]: fra = pd.read_csv(StringIO('date,close\n20140101,10.2\n20140102,10.5'), index_col=0, dtype={'date': np.str_, 'close': np.float})
In [5]: fra.index
Out[5]: Int64Index([20140101, 20140102], dtype='int64')
Run Code Online (Sandbox Code Playgroud) 我希望在RDD性能方面采取行动,reduce但不需要操作员可交换.即我希望result跟随永远是"123456789".
scala> val rdd = sc.parallelize(1 to 9 map (_.toString))
rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[24] at parallelize at <console>:24
scala> val result = rdd.someAction{ _+_ }
Run Code Online (Sandbox Code Playgroud)
首先,我找到了fold.文件RDD#fold说:
def fold(zeroValue:T)(op:(T,T)⇒T):T使用给定的关联函数和中性"零值" 聚合每个分区的元素,然后聚合所有分区的结果
请注意,doc中不需要交换.但是,结果并不像预期的那样:
scala> rdd.fold(""){ _+_ }
res10: String = 312456879
Run Code Online (Sandbox Code Playgroud)
编辑我试过@ dk14提到的,没有运气:
scala> val rdd = sc.parallelize(1 to 9 map (_.toString))
rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[48] at parallelize at <console>:24
scala> rdd.zipWithIndex.map(x => (x._2, x._1)).sortByKey().map(_._2).fold(""){ _+_ …Run Code Online (Sandbox Code Playgroud) 编译以下代码失败,但如果我删除specialized方法中的注释,它就会通过dot.
Scala代码运行器版本2.12.0-RC2 - 版权所有2002-2016,LAMP/EPFL和Lightbend,Inc.
abstract class Op[@specialized Left, @specialized Right] {
@specialized
type Result
def r: Numeric[Result]
def times(left: Left, right: Right): Result
}
object Op {
implicit object IntDoubleOp extends Op[Int, Double] {
type Result = Double
val r = implicitly[Numeric[Double]]
def times(left: Int, right: Double): Double = left * right
}
}
object calc {
def dot[@specialized Left, @specialized Right](xs: Array[Left], ys: Array[Right])
(implicit op: Op[Left, Right]): op.Result = {
var total = op.r.zero …Run Code Online (Sandbox Code Playgroud)