小编pef*_*ath的帖子

Python Pandas合并导致内存溢出

我是Pandas的新手,我正在尝试合并一些数据子集.我给出了一个具体案例,但这个问题很普遍:如何/为什么会发生这种情况,我该如何解决呢?

我加载的数据大约是85兆左右,但我经常看到我的python会话运行接近10 gig的内存使用量然后给出内存错误.

我不知道为什么会发生这种情况,但这让我感到害怕,因为我甚至无法按照我想要的方式开始查看数据.

这就是我所做的:

导入主数据

import requests, zipfile, StringIO
import numpy as np
import pandas as pd 


STAR2013url="http://www3.cde.ca.gov/starresearchfiles/2013/p3/ca2013_all_csv_v3.zip"
STAR2013fileName = 'ca2013_all_csv_v3.txt'

r = requests.get(STAR2013url)
z = zipfile.ZipFile(StringIO.StringIO(r.content))

STAR2013=pd.read_csv(z.open(STAR2013fileName))
Run Code Online (Sandbox Code Playgroud)

导入一些交叉引用表

STARentityList2013url = "http://www3.cde.ca.gov/starresearchfiles/2013/p3/ca2013entities_csv.zip"
STARentityList2013fileName = "ca2013entities_csv.txt"
r = requests.get(STARentityList2013url)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
STARentityList2013=pd.read_csv(z.open(STARentityList2013fileName))

STARlookUpTestID2013url = "http://www3.cde.ca.gov/starresearchfiles/2013/p3/tests.zip"
STARlookUpTestID2013fileName = "Tests.txt"
r = requests.get(STARlookUpTestID2013url)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
STARlookUpTestID2013=pd.read_csv(z.open(STARlookUpTestID2013fileName))

STARlookUpSubgroupID2013url = "http://www3.cde.ca.gov/starresearchfiles/2013/p3/subgroups.zip"
STARlookUpSubgroupID2013fileName = "Subgroups.txt"
r = requests.get(STARlookUpSubgroupID2013url)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
STARlookUpSubgroupID2013=pd.read_csv(z.open(STARlookUpSubgroupID2013fileName))
Run Code Online (Sandbox Code Playgroud)

将列ID重命名为允许合并

STARlookUpSubgroupID2013 = STARlookUpSubgroupID2013.rename(columns={'001':'Subgroup ID'})
STARlookUpSubgroupID2013
Run Code Online (Sandbox Code Playgroud)

成功融合

merged = pd.merge(STAR2013,STARlookUpSubgroupID2013, on='Subgroup …
Run Code Online (Sandbox Code Playgroud)

python memory merge out-of-memory pandas

10
推荐指数
1
解决办法
6066
查看次数

沿x轴所需距离的seaborn箱线图

无论如何将seaborn箱形图沿x轴放置在所需的距离?

我有一个带有分层列索引的数据框,索引分配,最大,键入学生姓名的行索引

+------------+----------+---------+----------+---------------+
| Type       | Homework | Quiz    | Homework | Presentations |
|            | max 100  | max 100 | max 100  | max 100       |
+------------+----------+---------+----------+---------------+
| Assignment | 1        | 2       | 3        | 4             |
+------------+----------+---------+----------+---------------+
| Student 1  | 88       | 98      | 100      | 85            |
+------------+----------+---------+----------+---------------+
| Student 2  | 96       | 79      | 100      | 97            |
+------------+----------+---------+----------+---------------+
| Student 3  | 87       | 79      | 72       | 78 …
Run Code Online (Sandbox Code Playgroud)

python pandas seaborn

6
推荐指数
2
解决办法
2930
查看次数

标签 统计

pandas ×2

python ×2

memory ×1

merge ×1

out-of-memory ×1

seaborn ×1