如何从导入的 csv 文件中索引日期时间列 - pandas

And*_*cio 2 csv indexing datetime concatenation pandas

我正在尝试合并和附加不同的时间序列,从 csv 文件导入它们。我尝试过以下基本代码:

import pandas as pd
import numpy as np
import glob
import csv
import os

path = r'./A08_csv'     # use your path
#all_files = glob.glob(os.path.join(path, "A08_B1_T5.csv"))

df5 = pd.read_csv('./A08_csv/A08_B1_T5.csv', parse_dates={'Date Time'})
df6 = pd.read_csv('./A08_csv/A08_B1_T6.csv', parse_dates={'Date Time'})

print len(df5)
print len(df6)

df = pd.concat([df5],[df6], join='outer')
print len(df)
Run Code Online (Sandbox Code Playgroud)

结果是:

12755 (df5)
24770 (df6)
12755 (df)
Run Code Online (Sandbox Code Playgroud)

df 不应该与两个文件中最长的一个一样长吗(就 ['Date Time'] 列上的值而言,它们有很多共同的行)?

我尝试根据日期时间对数据进行索引,添加此行:

#df5.set_index(pd.DatetimeIndex(df5['Date Time']))
Run Code Online (Sandbox Code Playgroud)

但是我收到了错误:

KeyError: 'Date Time'
Run Code Online (Sandbox Code Playgroud)

关于为什么会发生这种情况有任何线索吗?

jez*_*ael 7

我认为你需要:

df5.set_index(['Date Time'], inplace=True)
Run Code Online (Sandbox Code Playgroud)

或者更好地read_csv添加参数index_col

import pandas as pd
import io

temp=u"""Date Time,a
2010-01-27 16:00:00,2.0
2010-01-27 16:10:00,2.2
2010-01-27 16:30:00,1.7"""

df = pd.read_csv(io.StringIO(temp), index_col=['Date Time'], parse_dates=['Date Time'])
print (df)
                       a
Date Time               
2010-01-27 16:00:00  2.0
2010-01-27 16:10:00  2.2
2010-01-27 16:30:00  1.7

print (df.index)
DatetimeIndex(['2010-01-27 16:00:00', '2010-01-27 16:10:00',
               '2010-01-27 16:30:00'],
              dtype='datetime64[ns]', name='Date Time', freq=None)
Run Code Online (Sandbox Code Playgroud)

另一个解决方案是按顺序添加到参数列 - 如果列Date Time是第一个,则添加0index_coland parse_dates(python count from 0):

import pandas as pd
import io


temp=u"""Date Time,a
2010-01-27 16:00:00,2.0
2010-01-27 16:10:00,2.2
2010-01-27 16:30:00,1.7"""

df = pd.read_csv(io.StringIO(temp), index_col=0, parse_dates=[0])
print (df)
                       a
Date Time               
2010-01-27 16:00:00  2.0
2010-01-27 16:10:00  2.2
2010-01-27 16:30:00  1.7

print (df.index)
DatetimeIndex(['2010-01-27 16:00:00', '2010-01-27 16:10:00',
               '2010-01-27 16:30:00'],
              dtype='datetime64[ns]', name='Date Time', freq=None)
Run Code Online (Sandbox Code Playgroud)