Pandas Dataframe：计算一列中的唯一单词并返回另一列中的计数

Question

Pandas Dataframe：计算一列中的唯一单词并返回另一列中的计数

Sam*_*mie 2 python text dataframe pandas

我有一个包含以下列的数据框

df['Album']（包含艺术家X的专辑名称）
df['Tracks']（包含artistX专辑中的曲目）
df['Lyrics']（包含曲目的歌词）

我正在尝试计算 df['Lyrics'] 中的单词数并返回一个名为 df['wordcount'] 的新列以及计算 df['Lyrics'] 中唯一单词的数量并返回一个名为 df 的新列['唯一字数']。

我已经能够通过计算 df['lyrics'] 中的每个字符串减去空格来获得 df['wordcount'] 。

totalscore = df.Lyrics.str.count('[^\s]') #count every word in a track df['wordcount'] = totalscore df

我已经能够计算 df['Lyrics'] 中的唯一单词

import collections
from collections import Counter

results = Counter()
count_unique = df.Lyrics.str.lower().str.split().apply(results.update)
unique_counts = sum((results).values())
df['uniquewordcount'] = unique_counts

Run Code Online (Sandbox Code Playgroud)

这给了我 df['Lyrics'] 中所有唯一单词的数量，这就是代码的目的，但我想要每首曲目的歌词中的唯一单词，我的 python 目前不是很好解决方案可能对每个人都显而易见，但对我来说不是。我希望有人能指出我如何获得每首曲目的唯一单词数的正确方向。

预期输出：

Album    Tracks    Lyrics                      wordcount  uniquewordcount
 A         Ball   Ball is life and Ball is key       7           5
           Pass   Pass me the hookah Pass me the     7           4

Run Code Online (Sandbox Code Playgroud)

我得到了什么：

Album    Tracks    Lyrics                    wordcount  uniquewordcount
  A     Ball   Ball is life and Ball is key       7           9
        Pass   Pass me the hookah Pass me the     7           9

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ant*_*vBR 6

这是一种替代解决方案：

import pandas as pd

df = pd.DataFrame({'Lyrics': ['This is some life some collection of words',
                              'Lyrics abound lyrics here there eveywhere',
                              'Come fly come fly away']})

# Split list into new series
lyrics = df['Lyrics'].str.lower().str.split()

# Get amount of unique words
df['LyricsCounter'] = lyrics.apply(set).apply(len)

# Get amount of words
df['LyricsWords'] = lyrics.apply(len)

print(df)

Run Code Online (Sandbox Code Playgroud)

返回：

                                       Lyrics  LyricsCounter  LyricsWords
0  This is some life some collection of words              7            8
1   Lyrics abound lyrics here there eveywhere              5            6
2                      Come fly come fly away              3            5

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，11 月前
查看次数：	2619 次
最近记录：	7 年，11 月前