mav*_*avi 1 python csv dataset pandas
我正在使用playerStat.csv,其中包含8个列,我只需要2个.所以我试图创建一个只有这2列的新DataFrame.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_csv("HLTVData/playerStats.csv")
dataset.head(20)
Run Code Online (Sandbox Code Playgroud)
我只需要ADR和评级.
所以我首先用数据集创建一个矩阵.
mat = dataset.as_matrix()
#4 is the ADR and 6 is the Rating
newDAtaSet = pd.DataFrame(dataset, index=indexMatrix,columns=(mat[:,4],mat[:,6]) )
Run Code Online (Sandbox Code Playgroud)
NameError Traceback (most recent call last)
<ipython-input-10-1f975cc2514a> in <module>()
1 #4 is the ADR and 6 is the Rating
----> 2 newDataSet = pd.DataFrame(dataset, index=indexMatrix,columns=(mat[:,4],mat[:,6]) )
NameError: name 'indexMatrix' is not defined
Run Code Online (Sandbox Code Playgroud)
我也尝试过使用数据集.
newDataSet = pd.DataFrame(dataset, index=np.array(range(dataset.shape[0])), columns=dataset['ADR'])
/home/tensor/miniconda3/envs/tensorflow35openvc/lib/python3.5/site-packages/pandas/core/internals.py in _make_na_block(self, placement, fill_value)
3984
3985 dtype, fill_value = infer_dtype_from_scalar(fill_value)
-> 3986 block_values = np.empty(block_shape, dtype=dtype)
3987 block_values.fill(fill_value)
3988 return make_block(block_values, placement=placement)
MemoryError:
Run Code Online (Sandbox Code Playgroud)
我认为你需要参数usecols在read_csv:
dataset = pd.read_csv("HLTVData/playerStats.csv", usecols=['ADR','Rating'])
Run Code Online (Sandbox Code Playgroud)
要么:
dataset = pd.read_csv("HLTVData/playerStats.csv", usecols=[4,6])
Run Code Online (Sandbox Code Playgroud)