Python查找给定列中的最高行

Question

Python查找给定列中的最高行

我是stackoverflow的新手,最近学到了一些基本的Python.这是我第一次使用openpyxl.在我使用xlrd和xlsxwriter之前,我确实设法制作了一些有用的程序.但是现在我需要一个.xlsx读写器.

我需要使用已存储在代码中的数据来读取和编辑文件.假设.xlsx有五列数据:A,B,C,D,E.在A列中,我有超过1000行数据.在D列,我有150行数据.

基本上,我希望程序找到包含给定列数据的最后一行(比方说D).然后,将存储的变量写入dataD列中的下一个可用行(最后一行+ 1).

问题是我无法使用,ws.get_highest_row()因为它返回A列上的行1000.

基本上,到目前为止,这就是我所拥有的:

data = 'xxx'
from openpyxl import load_workbook
wb = load_workbook('book.xlsx', use_iterators=True)
ws = wb.get_sheet_by_name('Sheet1')
last_row = ws.get_highest_row()

Run Code Online (Sandbox Code Playgroud)

显然这根本不起作用.last_row返回1000.

Answer 1

Lon*_*Rob 1

以下是使用 Pandas 的方法。

使用可以轻松获取 Pandas 中的最后一个非空行last_valid_index。

可能有更好的方法将结果写入DataFrame文件xlsx，但是根据文档，这种非常愚蠢的方法实际上是在openpyxl.

假设您从这个简单的工作表开始：

原始工作表

假设我们要放入xxx列中C：

import openpyxl as xl
import pandas as pd

wb = xl.load_workbook('deleteme.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
df = pd.read_excel('deleteme.xlsx')

def replace_first_null(df, col_name, value):
    """
    Replace the first null value in DataFrame df.`col_name`
    with `value`.
    """
    return_df = df.copy()
    idx = list(df.index)
    last_valid = df[col_name].last_valid_index()
    last_valid_row_number = idx.index(last_valid)
    # This next line has mixed number and string indexing
    # but it should be ok, since df is coming from an
    # Excel sheet and should have a consecutive index
    return_df.loc[last_valid_row_number + 1, col_name] = value
    return return_df

def write_df_to_worksheet(ws, df):
    """
    Write the values in df to the worksheet ws in place
    """
    for i, col in enumerate(replaced):
        for j, val in enumerate(replaced[col]):
            if not pd.isnull(val):
                # Python is zero indexed, so add one
                # (plus an extra one to take account
                #  of the header row!)
                ws.cell(row=j + 2, column=i + 1).value = val

# Here's the actual replacing happening
replaced = replace_first_null(df, 'C', 'xxx')
write_df_to_worksheet(ws, df)
wb.save('changed.xlsx')

Run Code Online (Sandbox Code Playgroud)

结果是：

编辑后的 Excel 文件

归档时间：	10 年，4 月前
查看次数：	9970 次
最近记录：	8 年，5 月前