以编程方式将pandas数据帧转换为markdown表

Ole*_*Vik 32 python markdown pandas

我有一个从数据库生成的Pandas Dataframe,它具有混合编码的数据.例如:

+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
Run Code Online (Sandbox Code Playgroud)

如图所示,有英语和挪威语的混合(我认为在数据库中编码为ISO-8859-1).我需要将此Dataframe输出的内容作为Markdown表获取,但不会遇到编码问题.我按照这个答案(来自Generate Markdown表的问题)得到了以下内容:

import sys, sqlite3

db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()

rows = []
for index, row in df.iterrows():
    items = (row['date'], 
             row['path'], 
             row['language'], 
             row['shortest_sentence'],
             row['longest_sentence'], 
             row['number_words'], 
             row['readability_consensus'])
    rows.append(items)

headings = ['Date', 
            'Path', 
            'Language',
            'Shortest Sentence', 
            'Longest Sentence since', 
            'Words',
            'Grade level']

fields = [0, 1, 2, 3, 4, 5, 6]
align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),
         ('^','^'), ('^','^')]

table(sys.stdout, rows, fields, headings, align)
Run Code Online (Sandbox Code Playgroud)

但是,这会产生UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)错误.如何将Dataframe作为Markdown表输出?也就是说,为了将该代码存储在文件中以用于编写Markdown文档.我需要输出看起来像这样:

| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
|----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------|
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |
Run Code Online (Sandbox Code Playgroud)

kpy*_*ykc 30

进一步改进答案,用于IPython Notebook:

def pandas_df_to_markdown_table(df):
    from IPython.display import Markdown, display
    fmt = ['---' for i in range(len(df.columns))]
    df_fmt = pd.DataFrame([fmt], columns=df.columns)
    df_formatted = pd.concat([df_fmt, df])
    display(Markdown(df_formatted.to_csv(sep="|", index=False)))

pandas_df_to_markdown_table(infodf)
Run Code Online (Sandbox Code Playgroud)

或使用制表:

pip install tabulate
Run Code Online (Sandbox Code Playgroud)

使用示例在文档中.

  • 为了使用表格,我使用了`print(tabulate.tabulate(df.values,df.columns,tablefmt =“ pipe”)) (2认同)
  • `df.to_markdown()` 现在可以在使用 `tabulate` 的 Pandas 上使用。 (2认同)

tim*_*ink 27

Pandas 1.0 于 2020 年 1 月 29 日发布,并支持 Markdown 转换,因此您现在可以直接执行此操作!

文档中获取的示例:

df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())
Run Code Online (Sandbox Code Playgroud)
|    |   A |   B |
|:---|----:|----:|
| a  |   1 |   1 |
| a  |   2 |   2 |
| b  |   3 |   3 |
Run Code Online (Sandbox Code Playgroud)

或者没有索引:

|    |   A |   B |
|:---|----:|----:|
| a  |   1 |   1 |
| a  |   2 |   2 |
| b  |   3 |   3 |
Run Code Online (Sandbox Code Playgroud)
|   A |   B |
|----:|----:|
|   1 |   1 |
|   2 |   2 |
|   3 |   3 |
Run Code Online (Sandbox Code Playgroud)

  • 是的:`df.to_markdown(showindex=False)`。Pandas 使用 [tabulate](https://github.com/astanin/python-tabulate),因此您可以通过 to_markdown() 传递 tabulate 参数。 (5认同)

Seb*_*nki 23

我建议用python-tabulate库生成ascii-tables.该库也支持pandas.DataFrame.图书馆直到现在才有降价输出.我已经提出了一个引入这种格式的拉取请求 - 也许它很快就会被添加到master(最终到pypi).

以下是如何使用它:

from pandas import DataFrame
from tabulate import tabulate

df = DataFrame({
    "weekday": ["monday", "thursday", "wednesday"],
    "temperature": [20, 30, 25],
    "precipitation": [100, 200, 150],
}).set_index("weekday")

print(tabulate(df, tablefmt="pipe", headers="keys"))
Run Code Online (Sandbox Code Playgroud)

输出:

| weekday   |   temperature |   precipitation |
|:----------|--------------:|----------------:|
| monday    |            20 |             100 |
| thursday  |            30 |             200 |
| wednesday |            25 |             150 |
Run Code Online (Sandbox Code Playgroud)

  • 你可以用`tablefmt ="pipe"`来做到这一点.PR被拒绝了,并且没有`tablefmt ="markdown"`. (7认同)

Roh*_*hit 9

试试吧.我得到了它的工作.

请参阅本答案末尾转换为HTML的markdown文件的屏幕截图.

import pandas as pd

# You don't need these two lines
# as you already have your DataFrame in memory
df = pd.read_csv("nor.txt", sep="|")
df.drop(df.columns[-1], axis=1)

# Get column names
cols = df.columns

# Create a new DataFrame with just the markdown
# strings
df2 = pd.DataFrame([['---',]*len(cols)], columns=cols)

#Create a new concatenated DataFrame
df3 = pd.concat([df2, df])

#Save as markdown
df3.to_csv("nor.md", sep="|", index=False)
Run Code Online (Sandbox Code Playgroud)

我通过将HTML转换为Markdown以HTML格式输出


Dan*_*ein 7

将 DataFrame 导出到 Markdown

我创建了以下函数,用于在 Python 中将 pandas.DataFrame 导出到 Markdown:

def df_to_markdown(df, float_format='%.2g'):
    """
    Export a pandas.DataFrame to markdown-formatted text.
    DataFrame should not contain any `|` characters.
    """
    from os import linesep
    return linesep.join([
        '|'.join(df.columns),
        '|'.join(4 * '-' for i in df.columns),
        df.to_csv(sep='|', index=False, header=False, float_format=float_format)
    ]).replace('|', ' | ')
Run Code Online (Sandbox Code Playgroud)

此功能可能不会自动修复 OP 的编码问题,但这与从 Pandas 转换为 Markdown 是不同的问题。


dub*_*dan 5

我在这篇文章中尝试了上述几种解决方案,发现这种解决方案效果最好.

要将pandas数据框转换为markdown表,我建议使用pytablewriter.使用这篇文章中提供的数据:

import pandas as pd
import pytablewriter
from StringIO import StringIO

c = StringIO("""ID, path,language, date,longest_sentence, shortest_sentence, number_words , readability_consensus 
0, data/Eng/Sagitarius.txt , Eng, 2015-09-17 , With administrative experience in the prepa... , I am able to relocate internationally on short not..., 306, 11th and 12th grade
31 , data/Nor/Høylandet.txt  , Nor, 2015-07-22 , Høgskolen i Østfold er et eksempel..., Som skuespiller har jeg både..., 253, 15th and 16th grade
""")
df = pd.read_csv(c,sep=',',index_col=['ID'])

writer = pytablewriter.MarkdownTableWriter()
writer.table_name = "example_table"
writer.header_list = list(df.columns.values)
writer.value_matrix = df.values.tolist()
writer.write_table()
Run Code Online (Sandbox Code Playgroud)

这导致:

# example_table
ID |           path           |language|    date    |                longest_sentence                |                   shortest_sentence                  | number_words | readability_consensus 
--:|--------------------------|--------|------------|------------------------------------------------|------------------------------------------------------|-------------:|-----------------------
  0| data/Eng/Sagitarius.txt  | Eng    | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...|           306| 11th and 12th grade   
 31| data/Nor/Høylandet.txt  | Nor    | 2015-07-22 | Høgskolen i Østfold er et eksempel...        | Som skuespiller har jeg både...                      |           253| 15th and 16th grade   
Run Code Online (Sandbox Code Playgroud)

这是一个降价渲染截图.

在此输入图像描述


Ole*_*Vik 1

是的,所以我借鉴了Rohit建议的问题(Python - 编码字符串 - 瑞典字母),扩展了他的答案,并提出了以下内容:

\n\n
# Enforce UTF-8 encoding\nimport sys\nstdin, stdout = sys.stdin, sys.stdout\nreload(sys)\nsys.stdin, sys.stdout = stdin, stdout\nsys.setdefaultencoding(\'UTF-8\')\n\n# SQLite3 database\nimport sqlite3\n# Pandas: Data structures and data analysis tools\nimport pandas as pd\n\n# Read database, attach as Pandas dataframe\ndb = sqlite3.connect("Applications.db")\ndf = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)\ndb.close()\ndf.columns = [\'Path\', \'Language\', \'Date\', \'Shortest Sentence\', \'Longest Sentence\', \'Words\', \'Readability Consensus\']\n\n# Parse Dataframe and apply Markdown, then save as \'table.md\'\ncols = df.columns\ndf2 = pd.DataFrame([[\'---\',\'---\',\'---\',\'---\',\'---\',\'---\',\'---\']], columns=cols)\ndf3 = pd.concat([df2, df])\ndf3.to_csv("table.md", sep="|", index=False)\n
Run Code Online (Sandbox Code Playgroud)\n\n

一个重要的先兆是shortest_sentencelongest_sentence列不包含不必要的换行符,.replace(\'\\n\', \' \').replace(\'\\r\', \'\')在提交到 SQLite 数据库之前通过应用它们来删除它们。看来解决方案不是强制执行特定于语言的编码(ISO-8859-1对于挪威语),而是UTF-8使用它而不是默认的ASCII

\n\n

我通过我的 IPython 笔记本(Python 2.7.10)运行了这个,并得到了一个如下表(此处的外观固定间距):

\n\n
| Path                    | Language | Date       | Shortest Sentence                                                                            | Longest Sentence                                                                                                                                                                                                                                         | Words | Readability Consensus |\n|-------------------------|----------|------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|-----------------------|\n| data/Eng/Something1.txt | Eng      | 2015-09-17 | I am able to relocate to London on short notice.                                             | With my administrative experience in the preparation of the structure and content of seminars in various courses, and critiquing academic papers on various levels, I am confident that I can execute the work required as an editorial assistant.       | 306   | 11th and 12th grade   |\n| data/Nor/NoeNorr\xc3\xb8nt.txt | Nor      | 2015-09-17 | Jeg har grundig kjennskap til Microsoft Office og Adobe.                                     | I l\xc3\xb8pet av studiene har jeg v\xc3\xa6rt salgsmedarbeider for et st\xc3\xb8rre konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Tr\xc3\xb8nderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 205   | 18th and 19th grade   |\n| data/Nor/\xc3\x98rret.txt.txt  | Nor      | 2015-09-17 | Jeg h\xc3\xa5per p\xc3\xa5 positiv tilbakemelding, og m\xc3\xb8ter naturligvis til intervju hvis det er \xc3\xb8nskelig. | I l\xc3\xb8pet av studiene har jeg v\xc3\xa6rt salgsmedarbeider for et st\xc3\xb8rre konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Tr\xc3\xb8nderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 160   | 18th and 19th grade   |\n
Run Code Online (Sandbox Code Playgroud)\n\n

因此,Markdown 表没有编码问题。

\n