将从Excel读取的数据组织到Pandas DataFrame

Question

将从Excel读取的数据组织到Pandas DataFrame

我用这个脚本的目标是:1.从excel文件(> 100,000k行)读取timseries数据以及标题(标签,单位)2.convert excel数字日期到pandas dataFrame的最佳日期时间对象3.Be能够使用时间戳引用行和系列标签以引用列

到目前为止,我使用xlrd将excel数据读入列表.制作pandas系列,每个列表和使用时间列表作为索引.组合系列与系列标题,以使python字典.将字典传递给pandas DataFrame.尽管我的努力,df.index似乎设置为列标题,我不知道何时将日期转换为datetime对象.

我刚开始使用python 3天前所以任何建议都会很棒!这是我的代码:

    #Open excel workbook and first sheet
    wb = xlrd.open_workbook("C:\GreenCSV\Calgary\CWater.xlsx")
    sh = wb.sheet_by_index(0)

    #Read rows containing labels and units
    Labels = sh.row_values(1, start_colx=0, end_colx=None)
    Units = sh.row_values(2, start_colx=0, end_colx=None)

    #Initialize list to hold data
    Data = [None] * (sh.ncols)

    #read column by column and store in list
    for colnum in range(sh.ncols):
        Data[colnum] = sh.col_values(colnum, start_rowx=5, end_rowx=None)

    #Delete unecessary rows and columns
    del Labels[3],Labels[0:2], Units[3], Units[0:2], Data[3], Data[0:2]   

    #Create Pandas Series
    s = [None] * (sh.ncols - 4)
    for colnum in range(sh.ncols - 4):
        s[colnum] = Series(Data[colnum+1], index=Data[0])

    #Create Dictionary of Series
    dictionary = {}
    for i in range(sh.ncols-4):
        dictionary[i]= {Labels[i] : s[i]}

    #Pass Dictionary to Pandas DataFrame
    df = pd.DataFrame.from_dict(dictionary)

Run Code Online (Sandbox Code Playgroud)

Answer 1

And*_*den 10

你可以在这里直接使用pandas,我通常喜欢创建一个DataFrames字典(键是表格名称):

In [11]: xl = pd.ExcelFile("C:\GreenCSV\Calgary\CWater.xlsx")

In [12]: xl.sheet_names  # in your example it may be different
Out[12]: [u'Sheet1', u'Sheet2', u'Sheet3']

In [13]: dfs = {sheet: xl.parse(sheet) for sheet in xl.sheet_names}

In [14]: dfs['Sheet1'] # access DataFrame by sheet name

Run Code Online (Sandbox Code Playgroud)

您可以查看哪些文档parse提供更多选项(例如skiprows),这些允许您解析单个工作表,具有更多控制权...

归档时间：	12 年，1 月前
查看次数：	14186 次
最近记录：	12 年，1 月前