使用pd.read_clipboard加载DataFrame时解析日期时间

Question

使用pd.read_clipboard加载DataFrame时解析日期时间

Ale*_*lex 6 python datetime date dataframe pandas

我看到很多贴在StackOverflow上的DataFrame看起来像:

          a                  dt         b
0 -0.713356 2015-10-01 00:00:00 -0.159170
1 -1.636397 2015-10-01 00:30:00 -1.038110
2 -1.390117 2015-10-01 01:00:00 -1.124016

Run Code Online (Sandbox Code Playgroud)

我仍然没有找到一种使用.read_clipboard(.read_tabledocs中的参数列表)将这些复制到我的解释器中的好方法.

我以为关键是parse_dates参数:

parse_dates : boolean or list of ints or names or list of lists or dict, default False
* boolean. If True -> try parsing the index.
* list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
* list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
* dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

Run Code Online (Sandbox Code Playgroud)

pd.read_clipboard(parse_dates={'dt': [1, 2]})引发例外NotImplementedError: file structure not yet supported.

当我尝试跳过第一行时,pd.read_clipboard(parse_dates=[[1, 2]], names=['a', 'dt1', 'dt2', 'b'], skiprows=1, header=None)我得到了相同的异常.

其他人如何做到这一点？

Answer 1

cs9*_*s95 8

这就是我所做的。首先，确保您的列之间有两个空格：

          a                  dt         b
0 -0.713356  2015-10-01 00:00:00  -0.159170
1 -1.636397  2015-10-01 00:30:00  -1.038110
2 -1.390117  2015-10-01 01:00:00  -1.124016

Run Code Online (Sandbox Code Playgroud)

请注意，日期时间列在日期和时间之间有一个空格。这个很重要。接下来，我使用这样的东西来加载它：

df = pd.read_clipboard(sep='\s{2,}', parse_dates=[1], engine='python')
df

             a                  dt         b
0  0 -0.713356 2015-10-01 00:00:00 -0.159170
1  1 -1.636397 2015-10-01 00:30:00 -1.038110
2  2 -1.390117 2015-10-01 01:00:00 -1.124016

Run Code Online (Sandbox Code Playgroud)

df.dtypes

a             object
dt    datetime64[ns]
b            float64
dtype: object

Run Code Online (Sandbox Code Playgroud)

是的，这不是一个完全自动化的过程，但是只要您处理要复制的小数据帧，它就不会那么糟糕。尽管我愿意看到更好的选择。

归档时间：	8 年前
查看次数：	107 次
最近记录：	7 年前