我想在Ubuntu-Linux上在anaconda上安装seaborn.
conda install -c anaconda seaborn=0.7.1
Run Code Online (Sandbox Code Playgroud)
我收到以下错误消息:
Fetching package metadata .../home/moritz/Python/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for conda.anaconda.org has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
SubjectAltNameWarning
/home/moritz/Python/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for conda.anaconda.org has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
SubjectAltNameWarning …Run Code Online (Sandbox Code Playgroud) 我很难理解PeriodIndex和DateTimeIndex之间的区别,以及何时使用它.特别是,使用Periods而不是Timestamps似乎更自然,但最近我发现Timestamps似乎提供了相同的索引功能,可以与timegrouper一起使用,并且还可以更好地使用Matplotlib的日期功能.所以我想知道是否有任何理由使用Periods(PeriodIndex)?
我正在尝试使用我提供的拆分cross_val_score来运行。sklearn该sklearn文档给出了以下示例:
>>> from sklearn.model_selection import PredefinedSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> test_fold = [0, 1, -1, 1]
>>> ps = PredefinedSplit(test_fold)
>>> ps.get_n_splits()
2
>>> print(ps)
PredefinedSplit(test_fold=array([ 0, 1, -1, 1]))
>>> for train_index, test_index in ps.split():
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [1 2 3] …Run Code Online (Sandbox Code Playgroud) 我有一个由以下代码生成的数据框:
l_dates = ['2017-01-01 19:53:36',
'2017-01-01 19:54:36',
'2017-01-03 18:15:13',
'2017-01-03 18:18:11',
'2017-01-03 18:44:35',
'2017-01-07 12:50:48']
l_ids = list(range(len(l_dates)))
l_values = [x*1000-1 for x in l_ids]
l_data = list(zip(l_dates, l_ids, l_values))
df1_ = pd.DataFrame(data = l_data, columns = ['timeStamp', 'usageid', 'values'])
Run Code Online (Sandbox Code Playgroud)
在这个版本中看起来如下
timeStamp usageid values
0 2017-01-01 19:53:36 0 -1
1 2017-01-01 19:54:36 1 999
2 2017-01-03 18:15:13 2 1999
3 2017-01-03 18:18:11 3 2999
4 2017-01-03 18:44:35 4 3999
5 2017-01-07 12:50:48 5 4999
Run Code Online (Sandbox Code Playgroud)
我想根据密切相关的观察结果组成小组。例如,15 分钟时间间隔内的所有观察结果应分组在一起。
我知道我可以以成对的方式识别这些类型的观察结果,如下所示
df_user10241['timeStamp'] < pd.Timedelta(minutes=15) …Run Code Online (Sandbox Code Playgroud) 在intellij中设置scala项目时,我收到以下错误日志:
Error:Error while importing SBT project:<br/>...<br/><pre>[error] at
sbt.MainLoop$.$anonfun$runWithNewLog$1(MainLoop.scala:107)
[error] at sbt.io.Using.apply(Using.scala:22)
[error] at sbt.MainLoop$.runWithNewLog(MainLoop.scala:101)
[error] at sbt.MainLoop$.runAndClearLast(MainLoop.scala:57)
[error] at sbt.MainLoop$.runLoggedLoop(MainLoop.scala:42)
[error] at sbt.MainLoop$.runLogged(MainLoop.scala:34)
[error] at sbt.StandardMain$.runManaged(Main.scala:113)
[error] at sbt.xMain.run(Main.scala:76)
[error] at xsbt.boot.Launch$$anonfun$run$1.apply(Launch.scala:109)
[error] at xsbt.boot.Launch$.withContextLoader(Launch.scala:128)
[error] at xsbt.boot.Launch$.run(Launch.scala:109)
[error] at xsbt.boot.Launch$$anonfun$apply$1.apply(Launch.scala:35)
[error] at xsbt.boot.Launch$.launch(Launch.scala:117)
[error] at xsbt.boot.Launch$.apply(Launch.scala:18)
[error] at xsbt.boot.Boot$.runImpl(Boot.scala:41)
[error] at xsbt.boot.Boot$.main(Boot.scala:17)
[error] at xsbt.boot.Boot.main(Boot.scala)
[error] java.lang.ClassNotFoundException: org.jetbrains.sbt.CreateTasks$
[error] Use 'last' for the full log.
[info] shutting down server</pre><br/>See complete log in <a href="file:/home/moritz/.IdeaIC2017.2/system/log/sbt.last.log">file:/home/xxxx/.IdeaIC2017.2/system/log/sbt.last.log</a>
Run Code Online (Sandbox Code Playgroud)
我的build.sbt看起来如下:
name := "someProjectName"
version …Run Code Online (Sandbox Code Playgroud) 关于Dask,我有几个基本问题:
作为编辑:我的应用程序是我想在我的本地计算机或群集上并行化一个for循环(即它应该在群集上工作)。
作为第二个编辑:关于期货和延迟计算之间的关系,我想我也不太清楚。
谢谢
我在学习Paul Chiusano和Runar Bjanarson的著作“ Scala中的函数编程”(第7章-纯函数并行性)时遇到了以下情况。
package fpinscala.parallelism
import java.util.concurrent._
import language.implicitConversions
object Par {
type Par[A] = ExecutorService => Future[A]
def run[A](s: ExecutorService)(a: Par[A]): Future[A] = a(s)
def unit[A](a: A): Par[A] = (es: ExecutorService) => UnitFuture(a) // `unit` is represented as a function that returns a `UnitFuture`, which is a simple implementation of `Future` that just wraps a constant value. It doesn't use the `ExecutorService` at all. It's always done and can't be cancelled. Its `get` method simply returns the value …Run Code Online (Sandbox Code Playgroud) java parallel-processing scala java.util.concurrent scala-repl
将镶木地板文件转换为数据框时,我遇到了文件类型问题。
我愿意
bucket = 's3://some_bucket/test/usages'
import pyarrow.parquet as pq
import s3fs
s3 = s3fs.S3FileSystem()
read_pq = pq.ParquetDataset(bucket, filesystem=s3).read_pandas()
Run Code Online (Sandbox Code Playgroud)
当我这样做时read_pq,我得到
pyarrow.Table
_COL_0: decimal(9, 0)
_COL_1: decimal(9, 0)
_COL_2: decimal(9, 0)
_COL_3: decimal(9, 0)
Run Code Online (Sandbox Code Playgroud)
当我这样做时df = read_pd.to_pandas(); df.dtypes,我得到
_COL_0 object
_COL_1 object
_COL_2 object
_COL_3 object
dtype: object
Run Code Online (Sandbox Code Playgroud)
原始数据都是整数。当我对 pandas 数据帧中的对象进行操作时,操作非常缓慢。
pd.to_numeric或类似的方法?decimal(9, 0)?还是最好直接在熊猫数据帧上进行转换?
我试过:read_pq.column('_COL_0').cast('int32')抛出一个错误,如
No cast implemented from decimal(9, 0) to int32
Run Code Online (Sandbox Code Playgroud) 我有一个这样的数据框:
data_ = list(range(106))
index_ = pd.period_range('3/1/2004', '12/1/2012', freq='M')
df2_ = pd.DataFrame(data = data_, index = index_, columns = ['data'])
Run Code Online (Sandbox Code Playgroud)
我想绘制这个数据框。目前,我正在使用:
df2_.plot()
Run Code Online (Sandbox Code Playgroud)
现在我喜欢控制 x 轴上的标签(可能还有刻度)。特别是,我喜欢在轴上有每月的刻度,并且可能每隔一个月或每季度有一个标签。我也喜欢有垂直的网格线。
我开始研究这个例子,但我已经无法构建 timedelta。
我有一个data.framewith 变量var1 var2(两个字符串)和变量x, y, 和z。我想规范化 variables x,y并将z它们全部除以它们各自的第一个元素。
我试过:
df_ %>%
mutate_at(c("x", "y", "z"), funs(./.[1])) %>% head()
Run Code Online (Sandbox Code Playgroud)
但是,这将整列设置为 1。我怎样才能实现它除以第一个元素?
其次,什么是归一化添加到数据帧作为变量的最佳方式x_norm,y_norm,z_norm?
非常感谢,如果您需要更多信息,请告诉我。
python ×6
pandas ×3
scala ×2
anaconda ×1
apache-arrow ×1
conda ×1
dask ×1
dplyr ×1
java ×1
matplotlib ×1
mutate ×1
parquet ×1
pyarrow ×1
python-3.x ×1
r ×1
sbt ×1
scala-repl ×1
scikit-learn ×1