Python Pandas:突出显示匹配的文本和行

Ste*_*DEU 7 python pandas

I\xe2\x80\x99m 尝试将 df1 中与 df3 中的值匹配的任何字符串的字体颜色更改为红色并突出显示该行。我找不到任何有关更改字体颜色的信息。数据集是:

\n\n
df1 = [ \xe2\x80\x98i like to shop at store a.\xe2\x80\x99 , \xe2\x80\x98he likes to shop at the store b.\xe2\x80\x99, \xe2\x80\x98she is happy to shop at store c.\xe2\x80\x99, \'we want to shop at the store d.\']\ndf2 = [ \xe2\x80\x98store a\xe2\x80\x99, \xe2\x80\x98store b\xe2\x80\x99, \xe2\x80\x98store c\xe2\x80\x99, \'store d\' ]\ndf3 = [ \xe2\x80\x98like to\xe2\x80\x99, \xe2\x80\x98likes to shop\xe2\x80\x99, \xe2\x80\x98at store\xe2\x80\x99 ]\n
Run Code Online (Sandbox Code Playgroud)\n\n

我正在使用以下内容:

\n\n
myDataSet = list(zip(df1,df2))\ndf = pd.DataFrame(data = myDataSet, columns=[\xe2\x80\x98df1\xe2\x80\x99, \xe2\x80\x98df2\xe2\x80\x99]\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出应如下所示:

\n\n

在此输入图像描述

\n\n

请帮忙!

\n

Mat*_*t07 5

正如 @Ywapom 所建议的,它可以在 Jupyter Notebook 中使用 HTML 格式来完成。请也检查他的答案。

import re
from IPython.display import HTML

def display_highlighted_words(df, keywords):
    head = """
    <table>
        <thead>
            """ + \
            "".join(["<th> %s </th>" % c for c in df.columns])\
            + """
        </thead>
    <tbody>"""

    for i,r in df.iterrows():
        row = "<tr>"
        for c in df.columns:
            matches = []
            for k in keywords:
                for match in re.finditer(k, str(r[c])):
                    matches.append(match)
        
            # reverse sorting
            matches = sorted(matches, key = lambda x: x.start(), reverse=True)
        
            # building HTML row
            cell = str(r[c])
            for match in matches:
                cell = cell[:match.start()] +\
                    "<span style='color:red;'> %s </span>" % cell[match.start():match.end()] +\
                    cell[match.end():]
            row += "<td> %s </td>" % cell
                
            row += "</tr>"
        head += row

    head += "</tbody></table>"
    display(HTML(head))
Run Code Online (Sandbox Code Playgroud)

然后,使用像这样的示例 DataFrame

df = pd.DataFrame([["Franco color Franco",1], 
                   ["Franco Franco Ciccio Franco",2], 
                   ["Ciccio span",3]], columns=["A", "B"])
display_highlighted_words(df, ["Franco", "Ciccio"])
Run Code Online (Sandbox Code Playgroud)

结果如下。

结果示例

上面的代码可以很容易地扩展为从数据集的列中选择关键字向量,正如最初的问题所问的那样。


Tur*_*uro 4

您可以有条件地设置包含此文本的行或单元格的格式,例如按照下面的示例。我不认为你能够仅用红色突出显示文本的一部分(好吧,除非你想进行重新解析整个 html 的重大修改,我什至不确定这是否可能)。请参阅 Styler文档

import pandas as pd

df1 = [ 'i like to shop at store a.' , 'he likes to shop at the store b.', 'she is happy to shop at store c.', 'we want to shop at the store d.']
df2 = [ 'store a', 'store b', 'store c', 'store d' ]
df3 = [ 'like to', 'likes to shop', 'at store' ]

myDataSet = list(zip(df1,df2))
df = pd.DataFrame(data = myDataSet, columns=['df1', 'df2'])

def in_statements(val):
    for statement in df3:
        if statement in val:
            color = 'yellow'
            break
        else:
            color = 'black'
    return 'background-color: %s' % color

df = df.style.applymap(in_statements)

df
Run Code Online (Sandbox Code Playgroud)

为什么还要处理造型毛茸茸的问题呢?:) 添加一个额外的列来提取您感兴趣的文本不是更好吗?(如果不存在则为空白)

编辑: 根据请求,通过添加额外的列来实现没有样式限制的目标的方法:

def check(df):
    df["Statements"] = ", ".join(
        [x for x in df3 if x in df["df1"].to_string()])
    return df

df = df.groupby("df1").apply(lambda dfx: check(dfx))
df
Run Code Online (Sandbox Code Playgroud)