我试图在pandas中读取一个大的csv文件(aprox.6 GB),我收到以下内存错误:
MemoryError Traceback (most recent call last)
<ipython-input-58-67a72687871b> in <module>()
----> 1 data=pd.read_csv('aphro.csv',sep=';')
C:\Python27\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format)
450 infer_datetime_format=infer_datetime_format)
451
--> 452 return _read(filepath_or_buffer, kwds)
453
454 parser_f.__name__ …Run Code Online (Sandbox Code Playgroud) 我有两个具有相同列的数据框:
数据框 1:
attr_1 attr_77 ... attr_8
userID
John 1.2501 2.4196 ... 1.7610
Charles 0.0000 1.0618 ... 1.4813
Genarito 2.7037 4.6707 ... 5.3583
Mark 9.2775 6.7638 ... 6.0071
Run Code Online (Sandbox Code Playgroud)
数据框 2:
attr_1 attr_77 ... attr_8
petID
Firulais 1.2501 2.4196 ... 1.7610
Connie 0.0000 1.0618 ... 1.4813
PopCorn 2.7037 4.6707 ... 5.3583
Run Code Online (Sandbox Code Playgroud)
我想生成所有可能组合的相关性和 p 值数据框,结果如下:
userId petID Correlation p-value
0 John Firulais 0.091447 1.222927e-02
1 John Connie 0.101687 5.313359e-03
2 John PopCorn 0.178965 8.103919e-07
3 Charles Firulais -0.078460 3.167896e-02
Run Code Online (Sandbox Code Playgroud)
问题是笛卡尔积生成了超过 …