相关疑难解决方法(0)

删除 pandas 中的列的内存有效方法？

在 pandas 中删除列而不耗尽内存的最佳方法是什么？

我有一个很大的数据集，经过一些变量操作后，我需要删除大约一半的变量。我尝试使用df.drop(vars, axis=1, inplace=True)但发现我的内存使用量猛增了很多。没有inplace参数也一样。

这是这个旧的 pandas 问题线程中讨论的确切主题，但它已关闭而没有给出答案。关于 SO 有很多类似的问题，但我还没有找到答案，具体是如何在从大数据框中删除许多变量时避免大量内存增加。谢谢！

python dataframe pandas

Mau*_*cio

lucky-day

6
推荐指数

1
解决办法

1105
查看次数

Pandas 操作 DataFrame 就地与非就地（就地 = True vs False）

我想知道如果有一个在内存使用量的显著减少，当我们选择操作就地一个数据帧（相对于不就地）。

我在 Stack Overflow 上做了一些搜索，发现了这篇文章，其中的答案指出，如果操作没有就地完成，则会返回数据帧的副本（我想这在有可选参数时有点明显）称为“就地”：P）。

如果我不需要保留原始数据框，那么只修改数据框是有益的（并且合乎逻辑的），对吗？

语境：

当按数据框中的特定“列”排序时，我试图获取顶部元素。我想知道这两个中哪一个更有效：

到位：

df.sort('some_column', ascending=0, inplace=1)
top = df.iloc[0]

Run Code Online (Sandbox Code Playgroud)

对比

复制：

top = df.sort('some_column', ascending=0).iloc[0]

Run Code Online (Sandbox Code Playgroud)

For the 'copy' case, it still allocates memory in making the copy when sorting even though I'm not assigning the copy to a variable right? If so, how long does it take to deallocate that copy from memory?

Thanks for any insights in advance!

python memory pandas

Ell*_*est

lucky-day

5
推荐指数

1
解决办法

2498
查看次数

大熊猫不能用滴管滴下NAN

我将pandas导入为pd并运行下面的代码并获得以下结果

码:

traindataset = pd.read_csv('/Users/train.csv')
print traindataset.dtypes
print traindataset.shape
print traindataset.iloc[25,3]
traindataset.dropna(how='any')
print traindataset.iloc[25,3]
print traindataset.shape

Run Code Online (Sandbox Code Playgroud)

产量

TripType                   int64  
VisitNumber                int64  
Weekday                   object  
Upc                      float64  
ScanCount                  int64  
DepartmentDescription     object  
FinelineNumber           float64  
dtype: object

(647054, 7)

nan  
nan

(647054, 7) 
[Finished in 2.2s]

Run Code Online (Sandbox Code Playgroud)

从结果来看,dropna行不起作用,因为行号没有改变,并且数据帧中仍然有NAN.那怎么样？我现在很疯狂.

python missing-data dataframe pandas

fan*_*ngh

2018 12-20

4
推荐指数

4
解决办法

2万
查看次数

带有 Inplace 的 Asyncio Pandas

我刚刚阅读了这个介绍，但是在实现其中一个示例时遇到了麻烦（注释代码是第二个示例）：

import asyncio
import pandas as pd
from openpyxl import load_workbook

async def loop_dfs(dfs):
    async def clean_df(df):
        df.drop(["column_1"], axis=1, inplace=True)
        ... a bunch of other inplace=True functions ...
        return "Done"

    # tasks = [clean_df(df) for (table, dfs) in dfs.items()]
    # await asyncio.gather(*tasks)

    tasks = [clean_df(df) for (table, df) in dfs.items()]
    completed, pending = await asyncio.wait(tasks)


def main():
    dfs = {
        sn: pd.read_excel("excel.xlsx", sheet_name=sn)
        for sn in load_workbook("excel.xlsx").sheetnames
    }

    # loop = asyncio.get_event_loop()
    # loop.run_until_complete(loop_dfs(dfs))

    loop = asyncio.get_event_loop()
    try: …

Run Code Online (Sandbox Code Playgroud)

python python-3.x pandas python-asyncio

Ton*_*ony

2018 09-15

4
推荐指数

1
解决办法

4692
查看次数