ML的数据分离

314*_*141 5 python numpy python-3.x pandas

我为Machine Learning项目导入了一个数据集.我需要在我的第一个输入图层中的每个"神经元"包含一个数字数据.但是,我一直无法做到这一点.这是我的代码:

import math
import numpy as np
import pandas as pd; v = pd.read_csv('atestred.csv', 
error_bad_lines=False).values
rw = 1
print(v)
for x in range(0,10):
    rw += 1
    s = (v[rw])
list(s)
#s is one row of the dataset 
print(s)#Just a debug.
myvar = s
class l1neuron(object):
    def gi():
        for n in range(0, len(s)):
            x = (s[n])
            print(x)#Just another debug 
n11 = l1neuron
n11.gi()
Run Code Online (Sandbox Code Playgroud)

理想情况下我想要的是这样的变体,其中代码为从数据中提取的每个新行创建一个新变量(我在第一个循环中尝试做的)以及从每行提取的每个数据的新变量(我尝试在课堂和第二个循环中做什么).

如果我完全忽略了我的代码,那么请随意指出我正确的方向进行完整的重写.

以下是我的数据集的前几行:

fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
Run Code Online (Sandbox Code Playgroud)

提前致谢.

use*_*402 2

如果我很好地理解您的问题,您希望将 csv 表中的每一行转换为一个单独的变量,该变量又保存该行的所有值。以下是您可以如何处理此问题的示例。有很多方法可以达到这个目的,其他方法可能更高效、更快、更Python、更时髦等等。但编写下面的代码是为了帮助您了解如何将表格数据存储到命名变量中。

两点备注:

  1. 如果读取数据是您唯一需要 pandas 的事情,您可能会寻找一个不太复杂的解决方案
  2. L1Neuron 类不是很透明,它的成员无法从代码中读取,而是通过 attrs 中的变量列表在运行时创建。您可能想看看namedTuples以获得更好的可读性。

`

import pandas as pd 
from io import StringIO
import numbers


# example data:
atestred = StringIO("""fixed acidity;volatile acidity;citric acid;\
residual sugar;chlorides;free sulfur dioxide;total sulfur dioxide;\
density;pH;sulphates;alcohol;quality
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
""")



# read example data into dataframe 'data'; extract values and column names:
data     = pd.read_csv(atestred, error_bad_lines=False, sep=';') 
colNames = list(data)



class L1Neuron(object):
    "neuron class that holds the variables of one data line"

    def __init__(self, **attr):
        """
        attr is a dict (like {'alcohol': 12, 'pH':7.4});
        every pair in attr will result in a member variable 
        of this object with that name and value"""
        for name, value in attr.items():
            setattr(self, name.replace(" ", "_"), value)

    def gi(self):
        "print all numeric member variables whose names don't start with an underscore:"
        for v in sorted(dir(self)):
            if not v.startswith('_'):
                value = getattr(self, v) 
                if isinstance(value, numbers.Number): 
                    print("%-20s = %5.2f" % (v, value))
        print('-'*50)


# read csv into variables (one for each line):        
neuronVariables = []        
for s in data.values:
    variables   = dict(zip(colNames, s))
    neuron      = L1Neuron(**variables)
    neuronVariables.append(neuron)

# now the variables in neuronVariables are ready to be used:     
for n11 in neuronVariables:
    print("free sulphur dioxide in  this variable:", n11.free_sulfur_dioxide, end = " of ")
    print(n11.total_sulfur_dioxide,  "total sulphur dioxide" )
    n11.gi()
Run Code Online (Sandbox Code Playgroud)