小编Pra*_*n V的帖子

为什么R的data.table比熊猫快得多？

我有一个1200万行数据集,其中3列作为唯一标识符,另外2列具有值.我正在尝试做一个相当简单的任务:
- 按三个标识符分组.这产生了大约260万个唯一组合
- 任务1:计算列的中位数Val1
- 任务2:计算Val1给定某些条件的列的平均值Val2

以下是我的结果,使用pandas和data.table(目前最新版本,在同一台机器上):

+-----------------+-----------------+------------+
|                 |      pandas     | data.table |
+-----------------+-----------------+------------+
| TASK 1          | 150 seconds     | 4 seconds  |
| TASK 1 + TASK 2 |  doesn't finish | 5 seconds  |
+-----------------+-----------------+------------+

Run Code Online (Sandbox Code Playgroud)

我想我可能在做大熊猫的事情 - 转换Grp1和Grp2分类并没有多大帮助,也没有切换到.agg和.apply.有任何想法吗？

以下是可重现的代码.
数据帧生成:

import numpy as np
import pandas as pd
from collections import OrderedDict
import time

np.random.seed(123)
list1 = list(pd.util.testing.rands_array(10, 750)) …

Run Code Online (Sandbox Code Playgroud)

r pandas data.table

Bog*_*anC

2018 04-02

21
推荐指数

1
解决办法

2354
查看次数

IIS 8.0中内核模式和用户模式缓存之间的区别

内核模式缓存和用户模式缓存之间有什么区别以及如何跟踪它们？

asp.net caching http.sys iis-8

Pra*_*n V

lucky-day

10
推荐指数

1
解决办法

5913
查看次数

PowerShell ISE：如何运行新的PowerShell版本

如何使PowerShell ISE与PowerShell 6.0一起使用。当前，它具有4.0。

此服务器已安装PowerShell 4.0，并且我通过以下链接通过PowerShell-6.1.0-win-x64.msi安装了PowerShell 6.0：https : //github.com/PowerShell/PowerShell/releases 文件现在位于C：\ Program Files中\ PowerShell \ 6。

但是，ISE仍然显示4.0，但是我需要它运行6.0

$ PSVersionTable.psversion

重大次要建筑修订

4 0 -1 -1

powershell version powershell-ise powershell-core

Con*_* S.

2019 08-29

9
推荐指数

2
解决办法

9130
查看次数