如何使用pandas / numpy标准化/标准化日期?

use*_*451 1 python numpy pandas

带有以下代码段

import pandas as pd
train = pd.read_csv('train.csv',parse_dates=['dates'])
print(data['dates'])
Run Code Online (Sandbox Code Playgroud)

我加载并控制数据。

我的问题是,如何使data ['dates']标准化/归一化以使所有元素都位于-1和1(线性或高斯)之间?

bak*_*kal 6

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import time

def convert_to_timestamp(x):
    """Convert date objects to integers"""
    return time.mktime(x.to_datetime().timetuple())


def normalize(df):
    """Normalize the DF using min/max"""
    scaler = MinMaxScaler(feature_range=(-1, 1))
    dates_scaled = scaler.fit_transform(df['dates'])

    return dates_scaled

if __name__ == '__main__':
    # Create a random series of dates
    df = pd.DataFrame({
        'dates':
            ['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
             '1981-01-21', '1991-02-21', '1991-03-23']
    })

    # Convert to date objects
    df['dates'] = pd.to_datetime(df['dates'])

    # Now df has date objects like you would, we convert to UNIX timestamps
    df['dates'] = df['dates'].apply(convert_to_timestamp)

    # Call normalization function
    df = normalize(df)
Run Code Online (Sandbox Code Playgroud)

样品:

我们使用转换的日期对象 convert_to_timestamp

       dates
0 1980-01-01
1 1980-02-02
2 1980-03-02
3 1980-01-21
4 1981-01-21
5 1991-02-21
6 1991-03-23
Run Code Online (Sandbox Code Playgroud)

可以使用MinMaxScalerfrom 进行规范化的UNIX时间戳sklearn

       dates
0  315507600
1  318272400
2  320778000
3  317235600
4  348858000
5  667069200
6  669661200
Run Code Online (Sandbox Code Playgroud)

归一化为(-1,1),最终结果

[-1.         -0.98438644 -0.97023664 -0.99024152 -0.81166138  0.98536228
  1.        ]
Run Code Online (Sandbox Code Playgroud)


ste*_*boc 4

熊猫的解决方案

df = pd.DataFrame({
        'A':
            ['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
             '1981-01-21', '1991-02-21', '1991-03-23'] })
df['A'] = pd.to_datetime(df['A']).astype('int64')
max_a = df.A.max()
min_a = df.A.min()
min_norm = -1
max_norm =1
df['NORMA'] = (df.A- min_a) *(max_norm - min_norm) / (max_a-min_a) + min_norm
Run Code Online (Sandbox Code Playgroud)