如何使用 Python 从数据框中的每个字符串中获取第一个单词？

Question

如何使用 Python 从数据框中的每个字符串中获取第一个单词？

我有一个名为“data”的 Pandas DataFrame，有 2 列和 50 行，每行填充一两行文本，从 .tsv 文件导入。除了字符串之外，一些问题可能包含整数和浮点数。我试图提取每个句子的第一个单词（在两列中），但始终收到此错误：AttributeError: 'DataFrame' object has no attribute 'str'。

起初，我认为错误是由于我错误地使用了“data.str.split”，但我能谷歌的所有更改都失败了。然后我通过文件可能不是所有的字符串组成。所以我在文件上尝试了“data.astype(str)”，但同样的错误仍然存在。有什么建议？非常感谢！

这是我的代码：

import pandas as pd
questions = "questions.tsv"
data = pd.read_csv(questions, usecols = [3], nrows = 50, header=1, sep="\t")
data = data.astype(str)
first_words = data.str.split(None, 1)[0]

Run Code Online (Sandbox Code Playgroud)

Answer 1

jez*_*ael 5

用：

first_words = data.apply(lambda x: x.str.split().str[0])

Run Code Online (Sandbox Code Playgroud)

或者：

first_words = data.applymap(lambda x: x.split()[0])

Run Code Online (Sandbox Code Playgroud)

样本：

data = pd.DataFrame({'a':['aa ss ss','ee rre', 1, 'r'],
                   'b':[4,'rrt ee', 'ee www ee', 6]})
print (data)
          a          b
0  aa ss ss          4
1    ee rre     rrt ee
2         1  ee www ee
3         r          6

data = data.astype(str)
first_words = data.apply(lambda x: x.str.split().str[0])
print (first_words)
    a    b
0  aa    4
1  ee  rrt
2   1   ee
3   r    6

Run Code Online (Sandbox Code Playgroud)

first_words = data.applymap(lambda x: x.split()[0])
print (first_words)
    a    b
0  aa    4
1  ee  rrt
2   1   ee
3   r    6

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，2 月前
查看次数：	8347 次
最近记录：	8 年，2 月前