pandas dataframe为latex或html table nbconvert

J G*_*rif 6 latex ipython pdflatex ipython-notebook

当使用nbconvert到latex&PDF时,是否可以从ipython笔记本中的pandas数据框中获取格式良好的表?

默认似乎只是一个左对齐的数字块,看起来很伪劣.

我想更像是笔记本中的数据帧的html显示或乳胶表.保存和显示HTML渲染数据帧的.png图像也没问题,但究竟如何做到这一点已经证明是难以捉摸的.

最低限度,我想要一个简单的中心对齐表格,字体很好.

我没有幸运尝试使用.to_latex()方法从pandas数据帧获取乳胶表,无论是在笔记本中还是在nbconvert输出中.我也尝试过(在阅读ipython开发列表讨论之后,并按照自定义显示逻辑笔记本示例)使用_repr_html_和_repr_latex_方法创建自定义类,分别返回_to_html()和_to_latex()的结果.我认为nb转换的一个主要问题是pdflatex对数据框to_latex()输出中的{'或//'不满意.但我不想在检查之前开始摆弄那个我没有错过的东西.

谢谢.

log*_*ogc 8

在这个Github问题中讨论了一种更简单的方法.基本上,您必须向_repr_latex_DataFrame类添加一个方法,这是一个在其官方文档中从pandas中记录的过程.

我在这样的笔记本中这样做了:

import pandas as pd

pd.set_option('display.notebook_repr_html', True)

def _repr_latex_(self):
    return "\centering{%s}" % self.to_latex()

pd.DataFrame._repr_latex_ = _repr_latex_  # monkey patch pandas DataFrame
Run Code Online (Sandbox Code Playgroud)

以下代码:

d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df
Run Code Online (Sandbox Code Playgroud)

如果在笔记本中实时评估,它将变为HTML表格,并转换为PDF格式的(居中)表格:

$ ipython nbconvert --to latex --post PDF notebook.ipynb
Run Code Online (Sandbox Code Playgroud)

  • 我得到“没有名为 PDF 的模块” (3认同)
  • 然而,这有效,第一个表格之后的所有内容都集中在我的 pdf 中。 (2认同)

Pus*_*kar 8

现在最简单的方法是将数据框显示为降价表。您可能需要tabulate为此进行安装。

在您的代码单元格中,显示数据框时,请使用以下内容:

from IPython.display import Markdown, display
display(Markdown(df.to_markdown()))
Run Code Online (Sandbox Code Playgroud)

由于它是一个 Markdown 表,nbconvert 可以轻松地将其转换为 Latex。


ely*_*ely 5

我为此编写了自己mako的模板方案.我认为,如果您承诺一次为自己做好准备,这实际上是一个非常简单的工作流程.之后,您开始看到模板化所需格式的元数据,因此可以将其从代码中分解出来(并不表示第三方依赖)是一种非常好的解决方法.

这是我提出的工作流程.

  1. 编写.mako模板,接受您的数据帧作为参数(可能还有其他args)并将其转换为您想要的TeX格式(例如下面的例子).

  2. 创建一个包装类(我称之为to_tex),它创建了您想要的API(例如,您可以将数据对象传递给它,并在mako内部处理对渲染命令的调用).

  3. 在包装类中,决定你想要的输出方式.将TeX代码打印到屏幕上?使用子进程实际将其编译为pdf?

就我而言,我正在研究为研究论文生成初步结果,并且需要将表格式化为具有嵌套列名称等的复杂的双重排序结构.以下是其中一个表格的示例:

模板化TeX工具的示例输出

这是mako模板(警告,粗略):

<%page args="df, table_title, group_var, sort_var"/>
<%
"""
Template for country/industry two-panel double sorts TeX table.
Inputs: 
-------
df: pandas DataFrame
    Must be 17 x 12 and have rows and columns that positionally
    correspond to the entries of the table.

table_title: string
    String used for the title of the table.

group_var: string
    String naming the grouping variable for the horizontal sorts.
    Should be 'Country' or 'Industry'.

sort_var: string (raw)
    String naming the variable that is being sorted, e.g.
    "beta" or "ivol". Note that if you want the symbol to
    be rendered as a TeX symbol, then pass a raw Python
    string as the arg and include the needed TeX markup in
    the passed string. If the string isn't raw, some of the
    TeX markup might be interpreted as special characters.

Returns:
--------
When used with mako.template.Template.render, will produce
a raw TeX string that can be rendered into a PDF containing
the specified data.

Author:
-------
Ely M. Spears, 05/21/2013

"""
# Python imports and helper function definitions.
import numpy as np  
def format_helper(x):
    return str(np.round(x,2))
%>


<%text>
\documentclass[10pt]{article}
\usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
\usepackage{array}
\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\setlength{\parskip}{1em}
\setlength{\parindent}{0in}
\renewcommand*\arraystretch{1.5}
\author{Ely Spears}


\begin{document}
\begin{table} \caption{</%text>${table_title}<%text>}
\begin{center}
    \begin{tabular}{ | p{2.5cm}  c c c c c p{1cm} c c c c c c p{1cm} |}
    \hline
    & \multicolumn{6}{c}{CAPM $\beta$} & \multicolumn{6}{c}{CAPM $\alpha$ (\%p.a.)} & \\
    \cline{2-7} \cline{9-14}
    & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \\
    Stock </%text>${sort_var}<%text> is: & Low & 2 & 3 & 4 & High & Low - High & & Low & 2 & 3 & 4 & High & Low - High \\ 
    \hline
    \multicolumn{4}{|l}{Panel A. Point estimates} & & & & & & & & & & \\ 
    \hline
    Low            & </%text>${' & '.join(df.ix[0].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[0].map(format_helper).values[6:])}<%text> \\
    2              & </%text>${' & '.join(df.ix[1].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[1].map(format_helper).values[6:])}<%text> \\
    3              & </%text>${' & '.join(df.ix[2].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[2].map(format_helper).values[6:])}<%text> \\
    4              & </%text>${' & '.join(df.ix[3].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[3].map(format_helper).values[6:])}<%text> \\
    High           & </%text>${' & '.join(df.ix[4].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[4].map(format_helper).values[6:])}<%text> \\
    Low - High     & </%text>${' & '.join(df.ix[5].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[5].map(format_helper).values[6:11])}<%text> & \\


    \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
        & </%text>${format_helper(df.ix[6,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[6,11])}<%text> \\


    \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
        & </%text>${format_helper(df.ix[7,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[7,11])}<%text> \\


    \multicolumn{13}{|l}{Total effect} & </%text>${format_helper(df.ix[8,11])}<%text>  \\
    \hline
    \multicolumn{4}{|l}{Panel B. t-statistics} & & & & & & & & & & \\
    \hline
    Low            & </%text>${' & '.join(df.ix[9].map(format_helper).values[0:6])}<%text>  & & </%text>${' & '.join(df.ix[9].map(format_helper).values[6:])}<%text> \\
    2              & </%text>${' & '.join(df.ix[10].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[10].map(format_helper).values[6:])}<%text> \\
    3              & </%text>${' & '.join(df.ix[11].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[11].map(format_helper).values[6:])}<%text> \\
    4              & </%text>${' & '.join(df.ix[12].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[12].map(format_helper).values[6:])}<%text> \\
    High           & </%text>${' & '.join(df.ix[13].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[13].map(format_helper).values[6:])}<%text> \\
    Low - High     & </%text>${' & '.join(df.ix[14].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[14].map(format_helper).values[6:11])}<%text> & \\


    \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
        & </%text>${format_helper(df.ix[15,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[15,11])}<%text> \\


    \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
        & </%text>${format_helper(df.ix[16,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[16,11])}<%text> \\
    \hline
    \end{tabular}
\end{center}
\end{table}
\end{document}
</%text>
Run Code Online (Sandbox Code Playgroud)

我的包装器to_tex.py看起来像这样(在if __name__ == "__main__"节中有示例用法):

"""
to_tex.py

Class for handling strings of TeX code and producing the
rendered PDF via PDF LaTeX. Assumes ability to call PDFLaTeX
via the operating system.
"""
class to_tex(object):
    """
    Publishes a TeX string to a PDF rendering with pdflatex.
    """
    def __init__(self, tex_string, tex_file, display=False):
        """
        Publish a string to a .tex file, which will be
        rendered into a .pdf file via pdflatex.
        """
        self.tex_string    = tex_string
        self.tex_file      = tex_file
        self.__to_tex_file()
        self.__to_pdf_file(display)
        print "Render status:", self.render_status

    def __to_tex_file(self):
        """
        Writes a tex string to a file.
        """
        with open(self.tex_file, 'w') as t_file:
            t_file.write(self.tex_string)

    def __to_pdf_file(self, display=False):
        """
        Compile a tex file to a pdf file with the
        same file path and name.
        """
        try:
            import os
            from subprocess import Popen
            proc = Popen(["pdflatex", "-output-directory", os.path.dirname(self.tex_file), self.tex_file])
            proc.communicate()
            self.render_status = "success"
        except Exception as e:
            self.render_status = str(e)

        # Launch a display of the pdf if requested.
        if (self.render_status == "success") and display:
            try:
                proc = Popen(["evince", self.tex_file.replace(".tex", ".pdf")])
                proc.communicate()
            except:
                pass

if __name__ == "__main__":
    from mako.template import Template
    template_file = "path/to/template.mako"
    t = Template(filename=template_file)
    tex_str = t.render(arg1="arg1", ...)
    tex_wrapper = to_tex(tex_str, )
Run Code Online (Sandbox Code Playgroud)

我的选择是直接将TeX字符串泵入pdflatex并作为选项显示它.

实际上使用DataFrame的一小段代码在这里:

# Assume calculation work is done prior to this ...
all_beta  = pandas.concat([beta_df,  beta_tstat_df], axis=0)
all_alpha = pandas.concat([alpha_df, alpha_tstat_df], axis=0)
all_df = pandas.concat([all_beta, all_alpha], axis=1)

# Render result in TeX
tex_mako  = "/my_project/templates/mako/two_panel_double_sort_table.mako"
tex_file = "/my_project/some_tex_file_name.tex"

from mako.template import Template
t = Template(filename=tex_mako)
tex_str = t.render(all_df, table_title, group_var, tex_risk_name)

import my_project.to_tex as to_tex
tex_obj = to_tex.to_tex(tex_str, tex_file)
Run Code Online (Sandbox Code Playgroud)

  • 太好了,有机会你可以把它做成一个包(或者可能是 https://gist.github.com/takluyver/5098835),我们真的应该有一个 IPython 的表包,它会吐出多种格式! (2认同)