使用Python的pandas从TXT文件解析DD MM YY HH MM SS列

Question

使用Python的pandas从TXT文件解析DD MM YY HH MM SS列

大家先谢谢你们的时间.我在格式中有许多以空格分隔的文本文件;

    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

Run Code Online (Sandbox Code Playgroud)

采用DD MM YY HH MM SS格式和我的结果值.我正在尝试使用Python的pandas读取txt文件.在发布这个问题之前,我已经尝试过对此进行相当多的研究,所以希望我没有覆盖这个问题.

基于反复试验和研究,我提出了:

    import pandas as pd
    from cStringIO import StringIO
    def parse_all_fields(day_col, month_col, year_col, hour_col, minute_col,second_col):
    day_col = _maybe_cast(day_col)
    month_col = _maybe_cast(month_col)
    year_col = _maybe_cast(year_col)
    hour_col = _maybe_cast(hour_col)
    minute_col = _maybe_cast(minute_col)
    second_col = _maybe_cast(second_col)
    return lib.try_parse_datetime_components(day_col, month_col, year_col, hour_col, minute_col, second_col)
    ##Read the .txt file
    data1 = pd.read_table('0132_3.TXT', sep='\s+', names=['Day','Month','Year','Hour','Min','Sec','Value'])
    data1[:10]

    Out[21]: 

    Day,Month,Year,Hour, Min, Sec, Value
    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

    data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)

Run Code Online (Sandbox Code Playgroud)

    TypeError                                 Traceback (most recent call last)
    <ipython-input-22-8ee408dc19c3> in <module>()
    ----> 1 data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':   ['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)

    TypeError: expected read buffer, DataFrame found

Run Code Online (Sandbox Code Playgroud)

此时我被困住了.首先,预期的读缓冲区错误让我感到困惑.我是否需要对.txt文件进行更多预处理才能将日期转换为可读格式？注意 - read_table的parse_function在此日期格式上不能单独工作.

我是初学者 - 努力学习.对不起,如果代码错误/基本/混乱.如果有人可以提供帮助,我将非常感激.提前谢谢了.

Answer 1

And*_*den 5

我认为在阅读csv时解析日期会更容易:

In [1]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]])

In [2]: df
Out[2]:
                    0      1
0 2013-04-29 00:00:00  7.667
1 2013-04-29 00:00:00  7.000
2 2013-04-29 00:00:00  7.000
3 2013-04-29 00:00:00  7.333
4 2013-04-29 00:00:00  7.000

Run Code Online (Sandbox Code Playgroud)

由于您使用的是不常见的日期格式,因此您还需要指定日期解析器:

In [11]: def date_parser(ss):
             day, month, year, hour, min, sec = ss.split()
             return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))

In [12]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)

In [13]: df
Out[13]:
                    0      1
0 2013-04-29 18:15:00  7.667
1 2013-04-29 18:30:00  7.000
2 2013-04-29 18:45:00  7.000
3 2013-04-29 19:00:00  7.333
4 2013-04-29 19:15:00  7.000

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，2 月前
查看次数：	2451 次
最近记录：	12 年，2 月前