如何在数据帧中将字符串转换为浮点值

Ash*_*nha 1 python azure pandas azure-machine-learning-studio

当我们有一个数据类型为字符串的列并且值为col1 col2 1 .89时,我们将面临错误

所以,当我们使用时

def azureml_main(dataframe1 = None, dataframe2 = None):

    # Execution logic goes here
    print('Input pandas.DataFrame #1:')
    import pandas as pd
    import numpy as np
    from sklearn.kernel_approximation import RBFSampler
    x =dataframe1.iloc[:,2:1080]
    print x
    df1 = dataframe1[['colname']]

    change = np.array(df1)
    b = change.ravel()
    print b
    rbf_feature = RBFSampler(gamma=1, n_components=100,random_state=1)
    print rbf_feature
    print "test"
    X_features = rbf_feature.fit_transform(x)
Run Code Online (Sandbox Code Playgroud)

在此之后我们得到错误,因为无法将非int转换为float类型

EdC*_*ica 5

使用astype(float)例如:

df['col'] = df['col'].astype(float)
Run Code Online (Sandbox Code Playgroud)

或者convert_objects:

df = df.convert_objects(convert_numeric=True)
Run Code Online (Sandbox Code Playgroud)

例:

In [379]:

df = pd.DataFrame({'a':['1.23', '0.123']})
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 1 columns):
a    2 non-null object
dtypes: object(1)
memory usage: 32.0+ bytes
In [380]:

df['a'].astype(float)
Out[380]:
0    1.230
1    0.123
Name: a, dtype: float64

In [382]:

df = df.convert_objects(convert_numeric=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 1 columns):
a    2 non-null float64
dtypes: float64(1)
memory usage: 32.0 bytes
Run Code Online (Sandbox Code Playgroud)

UPDATE

如果你运行的版本0.17.0或更高版本则convert_objects已经被替换的方法:to_numeric,to_datetime,和to_timestamp因此,而不是:

df['col'] = df['col'].astype(float)
Run Code Online (Sandbox Code Playgroud)

你可以做:

df['col'] = pd.to_numeric(df['col'])
Run Code Online (Sandbox Code Playgroud)

请注意,默认情况下,如果您希望强制NaN执行以下操作,则任何不可转换的值都会引发错误:

df['col'] = pd.to_numeric(df['col'], errors='coerce')
Run Code Online (Sandbox Code Playgroud)