小编clo*_*g14的帖子

更新anaconda和安装新软件包的问题

我想在Ubuntu-Linux上在anaconda上安装seaborn.

conda install -c anaconda seaborn=0.7.1

Run Code Online (Sandbox Code Playgroud)

我收到以下错误消息:

Fetching package metadata .../home/moritz/Python/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for conda.anaconda.org has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/home/moritz/Python/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for conda.anaconda.org has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning …

Run Code Online (Sandbox Code Playgroud)

python anaconda conda

clo*_*g14

2017 01-21

9
推荐指数

2
解决办法

4040
查看次数

Python - Pandas - 时间戳和周期范围之间的差异

我很难理解PeriodIndex和DateTimeIndex之间的区别,以及何时使用它.特别是,使用Periods而不是Timestamps似乎更自然,但最近我发现Timestamps似乎提供了相同的索引功能,可以与timegrouper一起使用,并且还可以更好地使用Matplotlib的日期功能.所以我想知道是否有任何理由使用Periods(PeriodIndex)？

python pandas

clo*_*g14

lucky-day

9
推荐指数

1
解决办法

2445
查看次数

sklearn中的预定义Split函数

我正在尝试使用我提供的拆分cross_val_score来运行。sklearn该sklearn文档给出了以下示例：

>>> from sklearn.model_selection import PredefinedSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> test_fold = [0, 1, -1, 1]
>>> ps = PredefinedSplit(test_fold)
>>> ps.get_n_splits()
2
>>> print(ps)       
PredefinedSplit(test_fold=array([ 0,  1, -1,  1]))
>>> for train_index, test_index in ps.split():
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [1 2 3] …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn

clo*_*g14

2017 05-14

8
推荐指数

1
解决办法

6392
查看次数

熊猫根据时间戳的接近程度形成群体

我有一个由以下代码生成的数据框：

l_dates = ['2017-01-01 19:53:36',
           '2017-01-01 19:54:36',
           '2017-01-03 18:15:13',
           '2017-01-03 18:18:11',
           '2017-01-03 18:44:35',
           '2017-01-07 12:50:48']

l_ids = list(range(len(l_dates)))

l_values = [x*1000-1 for x in l_ids]

l_data = list(zip(l_dates, l_ids, l_values))

df1_ = pd.DataFrame(data = l_data, columns = ['timeStamp', 'usageid', 'values'])

Run Code Online (Sandbox Code Playgroud)

在这个版本中看起来如下

             timeStamp  usageid  values
0  2017-01-01 19:53:36        0      -1
1  2017-01-01 19:54:36        1     999
2  2017-01-03 18:15:13        2    1999
3  2017-01-03 18:18:11        3    2999
4  2017-01-03 18:44:35        4    3999
5  2017-01-07 12:50:48        5    4999

Run Code Online (Sandbox Code Playgroud)

我想根据密切相关的观察结果组成小组。例如，15 分钟时间间隔内的所有观察结果应分组在一起。

我知道我可以以成对的方式识别这些类型的观察结果，如下所示

df_user10241['timeStamp']  < pd.Timedelta(minutes=15) …

Run Code Online (Sandbox Code Playgroud)

python python-3.x pandas pandas-groupby

clo*_*g14

lucky-day

6
推荐指数

1
解决办法

766
查看次数

scala项目在intellij中不起作用

在intellij中设置scala项目时,我收到以下错误日志:

Error:Error while importing SBT project:<br/>...<br/><pre>[error]   at 
sbt.MainLoop$.$anonfun$runWithNewLog$1(MainLoop.scala:107)
[error]     at sbt.io.Using.apply(Using.scala:22)
[error]     at sbt.MainLoop$.runWithNewLog(MainLoop.scala:101)
[error]     at sbt.MainLoop$.runAndClearLast(MainLoop.scala:57)
[error]     at sbt.MainLoop$.runLoggedLoop(MainLoop.scala:42)
[error]     at sbt.MainLoop$.runLogged(MainLoop.scala:34)
[error]     at sbt.StandardMain$.runManaged(Main.scala:113)
[error]     at sbt.xMain.run(Main.scala:76)
[error]     at xsbt.boot.Launch$$anonfun$run$1.apply(Launch.scala:109)
[error]     at xsbt.boot.Launch$.withContextLoader(Launch.scala:128)
[error]     at xsbt.boot.Launch$.run(Launch.scala:109)
[error]     at xsbt.boot.Launch$$anonfun$apply$1.apply(Launch.scala:35) 
[error]     at xsbt.boot.Launch$.launch(Launch.scala:117)
[error]     at xsbt.boot.Launch$.apply(Launch.scala:18)
[error]     at xsbt.boot.Boot$.runImpl(Boot.scala:41)
[error]     at xsbt.boot.Boot$.main(Boot.scala:17)
[error]     at xsbt.boot.Boot.main(Boot.scala)
[error] java.lang.ClassNotFoundException: org.jetbrains.sbt.CreateTasks$
[error] Use 'last' for the full log.
[info] shutting down server</pre><br/>See complete log in <a href="file:/home/moritz/.IdeaIC2017.2/system/log/sbt.last.log">file:/home/xxxx/.IdeaIC2017.2/system/log/sbt.last.log</a>

Run Code Online (Sandbox Code Playgroud)

我的build.sbt看起来如下:

name := "someProjectName"

version …

Run Code Online (Sandbox Code Playgroud)

scala intellij-idea sbt

clo*_*g14

2018 01-10

5
推荐指数

2
解决办法

3647
查看次数

黄昏：延迟与期货和任务图生成

关于Dask，我有几个基本问题：

当我想使用dask进行分布式计算时（即在集群上），必须使用Futures是否正确？
在那种情况下，即在使用期货时，任务图仍然是推理计算的方式。如果是，我如何创建它们。
我通常如何才能获得与任务图相关联的字典？

作为编辑：我的应用程序是我想在我的本地计算机或群集上并行化一个for循环（即它应该在群集上工作）。

作为第二个编辑：关于期货和延迟计算之间的关系，我想我也不太清楚。

谢谢

python distributed-computing dask

clo*_*g14

2019 01-17

5
推荐指数

1
解决办法

310
查看次数

在REPL中的Scala中具有java.util.concurrent._的死锁

我在学习Paul Chiusano和Runar Bjanarson的著作“ Scala中的函数编程”（第7章-纯函数并行性）时遇到了以下情况。

    package fpinscala.parallelism

    import java.util.concurrent._
    import language.implicitConversions


    object Par {
      type Par[A] = ExecutorService => Future[A]

      def run[A](s: ExecutorService)(a: Par[A]): Future[A] = a(s)

      def unit[A](a: A): Par[A] = (es: ExecutorService) => UnitFuture(a) // `unit` is represented as a function that returns a `UnitFuture`, which is a simple implementation of `Future` that just wraps a constant value. It doesn't use the `ExecutorService` at all. It's always done and can't be cancelled. Its `get` method simply returns the value …

Run Code Online (Sandbox Code Playgroud)

java parallel-processing scala java.util.concurrent scala-repl

clo*_*g14

2019 02-02

5
推荐指数

1
解决办法

148
查看次数

将镶木地板数据转换为熊猫数据框时的数据类型问题

将镶木地板文件转换为数据框时，我遇到了文件类型问题。

我愿意

bucket = 's3://some_bucket/test/usages'

import pyarrow.parquet as pq
import s3fs
s3 = s3fs.S3FileSystem()

read_pq = pq.ParquetDataset(bucket, filesystem=s3).read_pandas()

Run Code Online (Sandbox Code Playgroud)

当我这样做时read_pq，我得到

pyarrow.Table
_COL_0: decimal(9, 0)
_COL_1: decimal(9, 0)
_COL_2: decimal(9, 0)
_COL_3: decimal(9, 0)

Run Code Online (Sandbox Code Playgroud)

当我这样做时df = read_pd.to_pandas(); df.dtypes，我得到

_COL_0    object
_COL_1    object
_COL_2    object
_COL_3    object
dtype: object

Run Code Online (Sandbox Code Playgroud)

原始数据都是整数。当我对 pandas 数据帧中的对象进行操作时，操作非常缓慢。

如何将镶木地板列转换为可在 Pandas 中读取为 int 或 float 的格式？
或者最好像上面一样对熊猫数据框进行操作并使用pd.to_numeric或类似的方法？
还是原始数据格式有问题decimal(9, 0)？

还是最好直接在熊猫数据帧上进行转换？

我试过：read_pq.column('_COL_0').cast('int32')抛出一个错误，如

No cast implemented from decimal(9, 0) to int32

Run Code Online (Sandbox Code Playgroud)

pandas parquet apache-arrow pyarrow

clo*_*g14

lucky-day

5
推荐指数

1
解决办法

703
查看次数

带日期的 Matplotlib - 更改每月数据的标签和刻度

我有一个这样的数据框：

data_ = list(range(106))
index_ =  pd.period_range('3/1/2004', '12/1/2012', freq='M')
df2_ = pd.DataFrame(data = data_, index = index_, columns = ['data'])

Run Code Online (Sandbox Code Playgroud)

我想绘制这个数据框。目前，我正在使用：

df2_.plot()

Run Code Online (Sandbox Code Playgroud)

现在我喜欢控制 x 轴上的标签（可能还有刻度）。特别是，我喜欢在轴上有每月的刻度，并且可能每隔一个月或每季度有一个标签。我也喜欢有垂直的网格线。

我开始研究这个例子，但我已经无法构建 timedelta。

python matplotlib

clo*_*g14

2017 02-01

2
推荐指数

1
解决办法

1万
查看次数

使用 dplyr 规范化数据框列的选择

我有一个data.framewith 变量var1 var2（两个字符串）和变量x, y, 和z。我想规范化 variables x，y并将z它们全部除以它们各自的第一个元素。

我试过：

df_ %>% 
  mutate_at(c("x", "y", "z"), funs(./.[1])) %>% head()

Run Code Online (Sandbox Code Playgroud)

但是，这将整列设置为 1。我怎样才能实现它除以第一个元素？

其次，什么是归一化添加到数据帧作为变量的最佳方式x_norm，y_norm，z_norm？

非常感谢，如果您需要更多信息，请告诉我。

r dplyr mutate

clo*_*g14

2018 02-26

2
推荐指数

1
解决办法

1105
查看次数

标签统计

python ×6

pandas ×3

scala ×2

anaconda ×1

apache-arrow ×1

conda ×1

dask ×1

distributed-computing ×1

dplyr ×1

intellij-idea ×1

java ×1

java.util.concurrent ×1

matplotlib ×1

mutate ×1

pandas-groupby ×1

parallel-processing ×1

parquet ×1

pyarrow ×1

python-3.x ×1

r ×1

sbt ×1

scala-repl ×1

scikit-learn ×1

标签 统计

小编clo_g14的帖子

标签统计