Pandas Dataframe:计算一列中的唯一单词并返回另一列中的计数

Sam*_*mie 2 python text dataframe pandas

我有一个包含以下列的数据框

  1. df['Album'](包含艺术家X的专辑名称)
  2. df['Tracks'](包含artistX专辑中的曲目)
  3. df['Lyrics'](包含曲目的歌词)

我正在尝试计算 df['Lyrics'] 中的单词数并返回一个名为 df['wordcount'] 的新列以及计算 df['Lyrics'] 中唯一单词的数量并返回一个名为 df 的新列['唯一字数']。

我已经能够通过计算 df['lyrics'] 中的每个字符串减去空格来获得 df['wordcount'] 。

totalscore = df.Lyrics.str.count('[^\s]') #count every word in a track df['wordcount'] = totalscore df

我已经能够计算 df['Lyrics'] 中的唯一单词

import collections
from collections import Counter

results = Counter()
count_unique = df.Lyrics.str.lower().str.split().apply(results.update)
unique_counts = sum((results).values())
df['uniquewordcount'] = unique_counts
Run Code Online (Sandbox Code Playgroud)

这给了我 df['Lyrics'] 中所有唯一单词的数量,这就是代码的目的,但我想要每首曲目的歌词中的唯一单词,我的 python 目前不是很好解决方案可能对每个人都显而易见,但对我来说不是。我希望有人能指出我如何获得每首曲目的唯一单词数的正确方向。

预期输出:

Album    Tracks    Lyrics                      wordcount  uniquewordcount
 A         Ball   Ball is life and Ball is key       7           5
           Pass   Pass me the hookah Pass me the     7           4
Run Code Online (Sandbox Code Playgroud)

我得到了什么:

Album    Tracks    Lyrics                    wordcount  uniquewordcount
  A     Ball   Ball is life and Ball is key       7           9
        Pass   Pass me the hookah Pass me the     7           9
Run Code Online (Sandbox Code Playgroud)

Ant*_*vBR 6

这是一种替代解决方案:

import pandas as pd

df = pd.DataFrame({'Lyrics': ['This is some life some collection of words',
                              'Lyrics abound lyrics here there eveywhere',
                              'Come fly come fly away']})

# Split list into new series
lyrics = df['Lyrics'].str.lower().str.split()

# Get amount of unique words
df['LyricsCounter'] = lyrics.apply(set).apply(len)

# Get amount of words
df['LyricsWords'] = lyrics.apply(len)

print(df)
Run Code Online (Sandbox Code Playgroud)

返回:

                                       Lyrics  LyricsCounter  LyricsWords
0  This is some life some collection of words              7            8
1   Lyrics abound lyrics here there eveywhere              5            6
2                      Come fly come fly away              3            5
Run Code Online (Sandbox Code Playgroud)