使用 R 在 Excel 工作表中创建图表

eli*_*avs 5 excel r

我正在使用openXLSX包从我的R输出中生成 excel 文件。
我找不到将 excel 图表添加到 excel 工作簿的方法。
我看到它python有一个用于创建 Excel 文件的模块,该模块有一个用于添加 Excel 图表的类。
有没有办法用 R 做到这一点?

Adi*_*rid 5

这是使用 package 的解决方案XLConnect。不过有一点要注意,它依赖于您需要提前创建的图表模板,并且它会生成新文件,而不是将工作表或图表附加到现有文件中。

它由两个阶段组成:

  1. 为您要使用的图表类型准备 Excel 模板。
  2. 每次都根据需要使用来自 R 的数据更新模板文件。

第一步:根据您需要的图表类型,在excel中准备模板。您可以将所有模板放在同一个文件(在不同的工作表中)或在几个不同的文件中。准备模板时,在工作表中包含您需要的图表类型,但不是引用特定单元格,而是需要使用“命名范围”。参见示例。您也可以使用我创建示例文件。请注意在文件和图表的数据引用中使用命名范围(作为Sheet1!bar_namesSheet1!values而不是Sheet1!$A$2:$A$4Sheet1!$B$2:$B$4)。

Excel 中命名范围的旁注。命名范围意味着您为要在图表中使用的数据命名,然后“告诉图表”使用命名范围,而不是绝对位置。您可以在 Excel 中的“公式”菜单中访问“名称管理器”。我们使用命名范围的原因XLConnect 是能够控制命名范围,因此当我们修改命名范围时图表会动态更新。

第二步:使用以下代码的改编版,使其适合您的需要。主要使用您自己的数据框并更新createName函数中的引用。

library(XLConnect) # load library
wb1 <- loadWorkbook(filename = "edit_chart_via_R_to_excel.xlsx") 
new.df <- data.frame(Type = c("Ford", "Hyundai", "BMW", "Other"),
          Number = c(45, 35, 25, 15)) # sample data
writeWorksheet(wb1, data = new.df, sheet = "Sheet1", 
               startRow = 1, startCol = 1, header = TRUE)
# update named ranges for the chart's use.
# Note that 
# "Sheet1!$A$2:$A$5" and "Sheet1!$B$2:$B$5" 
# should change according to the data you are updating
createName(wb1, "bar_names", "Sheet1!$A$2:$A$5", overwrite = TRUE) 
createName(wb1, "values", "Sheet1!$B$2:$B$5", overwrite = TRUE)
saveWorkbook(wb1)
Run Code Online (Sandbox Code Playgroud)

这应该可以解决问题。

请注意,如果您想将其作为新文件提供(并保留原始模板而不覆盖它),您可以在开始修改之前复制并保存模板。


jsa*_*avn 5

我考虑使用reticulate基于数据的本机 excel 图表从头开始编写 .xlsx 文件,避免必须制作模板。下面的脚本会生成一些数据,将其保存到 .xlsx 文件中,然后在数据下方构建折线图。有关不同图表类型,请参阅https://xlsxwriter.readthedocs.io/chart.html 上的文档!

另请注意,如果reticulate找不到现有安装,这应该会提示您安装 Python 。

该代码可在以下要点获得:https : //gist.github.com/jsavn/cbea4b35d73cea6841489e72a221c4e9

Python脚本 write_xlsx_and_chart_to_file.py

(这个文件名source()在后面R脚本的调用中会用到)

import pandas as pd
import xlsxwriter as xw

# The skeleton of below function based on example from: https://xlsxwriter.readthedocs.io/example_pandas_chart.html#ex-pandas-chart
# We pass the function a pandas dataframe;
# The dataframe is inserted in an .xslx spreadsheet
# We take note of the number of rows and columns, and use those to position the chart below the data
# We then iterate over the rows of the data and insert each row as a separate line (series) in the line chart

def save_time_series_as_xlsx_with_chart(pandas_df, filename):
  if not(filename.endswith('.xlsx')):
    print("Warning: added .xlsx to filename")
    filename = filename + '.xlsx'
  # Create a Pandas dataframe from the data.
  # pandas_df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

  ## get dimensions of data frame to use for positioning the chart later
  pandas_df_nrow, pandas_df_ncol = pandas_df.shape

  # Create a Pandas Excel writer using XlsxWriter as the engine.
  writer = pd.ExcelWriter(filename, engine='xlsxwriter')

  # Convert the dataframe to an XlsxWriter Excel object.
  pandas_df.to_excel(writer, sheet_name='Sheet1', index=False)

  # Get the xlsxwriter workbook and worksheet objects.
  workbook  = writer.book
  worksheet = writer.sheets['Sheet1']

  # Create a chart object.
  chart = workbook.add_chart({'type': 'line'})

  # Configure the series of the chart from the dataframe data
  # THe coordinates of each series in the line chart are the positions of the data in the excel file
  # Note that data starts at row 2, column 1, so the row/col values need to be adjusted accordingly
  # However, python counts rows & columns from 0
  for row_in_data in range(0,pandas_df_nrow):
    row_in_sheet = row_in_data+1  # data starts on 2nd row
    last_col_in_sheet = pandas_df_ncol-1 # number of columns minus one in 0-notation
    first_col_with_data = 1  # 2nd column in 0-notation
    range_of_series = xw.utility.xl_range(
      first_row=row_in_sheet,  # read from the current row in loop only
      first_col=first_col_with_data, # data starts in 2nd column, i.e. 1 in 0-notation
      last_row=row_in_sheet,
      last_col=last_col_in_sheet
      )
    range_of_categories = xw.utility.xl_range(
      first_row=0, # read from 1st row only - header
      first_col=first_col_with_data,  # read from 2nd column for month headers
      last_row=0, 
      last_col=last_col_in_sheet
      )
    formula_for_series = '=Sheet1!' + range_of_series
    col_with_series_name = 0  # first column
    name_of_series = '=Sheet1!' + xw.utility.xl_rowcol_to_cell(row=row_in_sheet, col=col_with_series_name)
    formula_for_categories = 'Sheet1!' + range_of_categories
    chart.add_series({'values': formula_for_series, 'name': name_of_series, 'categories': formula_for_categories})

  # Insert the chart into the worksheet.
  worksheet.insert_chart(pandas_df_nrow+2, 2, chart)

  # Close the Pandas Excel writer and output the Excel file.
  writer.save()
Run Code Online (Sandbox Code Playgroud)

R 脚本

library(tidyverse)
library(reticulate)

set.seed(19)  # random seed fixed

# check if packages are available, otherwise install
for (package in c("pandas","xlsxwriter")) {
  if (py_module_available(package)) {
    message(package, " already installed! Proceeding...")
  } else {
    py_install(packages = package)  
  }
}

## generate some time series data for month & year
tbl <- expand_grid(Year=2017:2020, Month=month.name) %>% mutate(N=sample(1:100, size=nrow(.), replace=TRUE))

## ggplot2 plot of the data so we know what to expect
fig <- 
  ggplot(data=tbl) +
  geom_line(aes(x=Month, y=N, group=Year, colour=factor(Year)), size=1) +
  theme_minimal() +
  NULL
print(fig)  # see a ggplot2 version of same plot

# convert data to wide format to put in excel
tbl_wide_format <- tbl %>%
  pivot_wider(names_from=Month, values_from=N)

# convert wide format data to pandas dataframe, to pass to python script
tbl_pandas <- r_to_py(tbl_wide_format)

## import python script
source_python("write_xlsx_and_chart_to_file.py")

## save chart using python script
save_time_series_as_xlsx_with_chart(tbl_pandas, "reticulate_pandas_writexlsx_excel_line_chart.xlsx")
Run Code Online (Sandbox Code Playgroud)