创建数据框并将列设置为日期时间?

Veg*_*ega 4 python datetime pandas

我想创建一个 pandas dataframe df ,例如:

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
        "value": [10, 20, 16, 31, 56],
    }
)
Run Code Online (Sandbox Code Playgroud)

我想在创建数据帧时将“日期”列指定为 dtype=datetime64[ns],而不是之后。

所以不是这样的:

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
        "value": [10, 20, 16, 31, 56],
    }
)
df["date"] = pd.to_datetime(df["date"])
Run Code Online (Sandbox Code Playgroud)

但像这样:

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": pd.Series(
            ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
            dtype=np.datetime64,
        ),
        "value": [10, 20, 16, 31, 56],
    }
)
Run Code Online (Sandbox Code Playgroud)

但这给出了错误:

ValueError:“datetime64”dtype 没有单位。请改为传递“datetime64[ns]”。

这样做

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": pd.Series(
            ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
            dtype=np.datetime64[ns],
        ),
        "value": [10, 20, 16, 31, 56],
    }
) 
Run Code Online (Sandbox Code Playgroud)

我收到此错误:

NameError:名称“ns”未定义

那么如何将特定列设置为“datetime64[ns]”类型?

Sea*_*ean 5

您可以pd.to_datetime在字典中使用,如下所示:

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": pd.to_datetime(["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"]),
        "value": [10, 20, 16, 31, 56],
    }
)
Run Code Online (Sandbox Code Playgroud)

date列已datetime64[ns]格式化,如您所见df.info()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   group   5 non-null      object        
 1   date    5 non-null      datetime64[ns]
 2   value   5 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 248.0+ bytes
Run Code Online (Sandbox Code Playgroud)

  • @jezrael我的解释是,只要最终结果得到所需的数据类型就可以了,而不是我们需要在过程中**设置数据类型的步骤。有时很难猜测 OP 想要什么。不管怎样,谢谢你的提醒。祝你今天过得愉快 :-) (2认同)