Soh*_*ate 5 python typeerror pandas scikit-learn
我正在练习线性回归,在这里我将日期作为输入 x 传递并期望输出 y(float)
x = df[('Date')].values
x = x.reshape(-1, 1)
y= df[('MeanTemp')].values #MeanTemp column has float values
y = y.reshape(-1, 1)
Run Code Online (Sandbox Code Playgroud)
当我打印 x 时,输出是:
array([['1942-07-01T00:00:00.000000000'],
['1942-07-02T00:00:00.000000000'],
['1942-07-03T00:00:00.000000000'],
['1942-07-04T00:00:00.000000000'],
['1942-07-05T00:00:00.000000000'],
['1942-07-06T00:00:00.000000000'],
['1942-07-07T00:00:00.000000000'],
['1942-07-08T00:00:00.000000000'],
['1942-07-09T00:00:00.000000000'],
['1942-07-10T00:00:00.000000000']], dtype='datetime64[ns]')
Run Code Online (Sandbox Code Playgroud)
现在,当我使用线性回归时
linlin = LinearRegression()
linlin.fit(x, y)
Run Code Online (Sandbox Code Playgroud)
它没有给出任何错误,但是当我写的时候
linlin.predict(x)
TypeError: The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.
Run Code Online (Sandbox Code Playgroud)
弹出上面的TypeError。如何将此数据类型转换为浮点型以便预测函数正常工作?
您可以使用 from numpy,以天timedelta为单位的日期与min日期进行比较,如下所示:
>>> import numpy as np
>>> df['date_delta'] = (df['Date'] - df['Date'].min()) / np.timedelta64(1,'D')
>>> x = df['date_delta'].values
Run Code Online (Sandbox Code Playgroud)
或者您可以使用以下函数将日期转换为浮点表示形式:
>>> import numpy as np
>>> import pandas as pd
>>> def dt64_to_float(dt64):
... year = dt64.astype('M8[Y]')
... days = (dt64 - year).astype('timedelta64[D]')
... year_next = year + np.timedelta64(1, 'Y')
... days_of_year = (year_next.astype('M8[D]') - year.astype('M8[D]')).astype('timedelta64[D]')
... dt_float = 1970 + year.astype(float) + days / (days_of_year)
... return dt_float
>>> df['date_float'] = dt64_to_float(df['Date'].to_numpy())
>>> x = df['date_float'].values
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
13614 次 |
| 最近记录: |