从受密码保护的Excel文件到pandas DataFrame

dmv*_*nna 8 python excel pandas

我可以使用以下命令打开受密码保护的Excel文件:

import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = sys.argv[1:3]
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets(1) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1
Run Code Online (Sandbox Code Playgroud)

我不确定如何将信息传递给pandas数据帧.我是否需要逐个读取单元格,或者是否有方便的方法来实现?

Suh*_*ote 17

简单的解决方案

import io
import pandas as pd
import msoffcrypto

passwd = 'xyz'

decrypted_workbook = io.BytesIO()
with open(path_to_your_file, 'rb') as file:
    office_file = msoffcrypto.OfficeFile(file)
    office_file.load_key(password=passwd)
    office_file.decrypt(decrypted_workbook)

df = pd.read_excel(decrypted_workbook, sheet_name='abc')
Run Code Online (Sandbox Code Playgroud)
pip install --user msoffcrypto-tool
Run Code Online (Sandbox Code Playgroud)

将每个 Excel 的所有工作表从目录和子目录导出到单独的 csv 文件

from glob import glob
PATH = "Active Cons data"

# Scaning all the excel files from directories and sub-directories
excel_files = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.xlsx'))] 

for i in excel_files:
    print(str(i))
    decrypted_workbook = io.BytesIO()
    with open(i, 'rb') as file:
        office_file = msoffcrypto.OfficeFile(file)
        office_file.load_key(password=passwd)
        office_file.decrypt(decrypted_workbook)

    df = pd.read_excel(decrypted_workbook, sheet_name=None)
    sheets_count = len(df.keys())
    sheet_l = list(df.keys())  # list of sheet names
    print(sheet_l)
    for i in range(sheets_count):
        sheet = sheet_l[i]
        df = pd.read_excel(decrypted_workbook, sheet_name=sheet)
        new_file = f"D:\\all_csv\\{sheet}.csv"
        df.to_csv(new_file, index=False)
Run Code Online (Sandbox Code Playgroud)

  • 这非常有帮助,尽管我建议将简单解决方案中的变量“i”更改为指示它是文件路径的内容,例如“file_path”,因为“i”通常用作迭代器。我花了一分钟才弄清楚... (2认同)

小智 6

假设起始单元格指定为 (StartRow, StartCol),结束单元格指定为 (EndRow, EndCol),我发现以下内容对我有用:

# Get the content in the rectangular selection region
# content is a tuple of tuples
content = xlws.Range(xlws.Cells(StartRow, StartCol), xlws.Cells(EndRow, EndCol)).Value 

# Transfer content to pandas dataframe
dataframe = pandas.DataFrame(list(content))
Run Code Online (Sandbox Code Playgroud)

注意:Excel 单元格 B5 在 win32com 中作为第 5 行,第 2 列给出。此外,我们需要 list(...) 将元组元组转换为元组列表,因为没有用于元组元组的 pandas.DataFrame 构造函数。


小智 5

来自大卫哈曼的网站(所有学分都归他所有) https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/

使用 xlwings,打开文件将首先启动 Excel 应用程序,以便您可以输入密码。

import pandas as pd
import xlwings as xw

PATH = '/Users/me/Desktop/xlwings_sample.xlsx'
wb = xw.Book(PATH)
sheet = wb.sheets['sample']

df = sheet['A1:C4'].options(pd.DataFrame, index=False, header=True).value
df
Run Code Online (Sandbox Code Playgroud)