将字符串转换为dataframe,以冒号分隔

Rog*_*ger 0 python pandas

我有一个字符串,来自一篇有几百个句子的文章.我想将字符串转换为数据帧,每个句子作为一行.例如,

data = 'This is a book, to which I found exciting. I bought it for my cousin. He likes it.'
Run Code Online (Sandbox Code Playgroud)

我希望它变成:

This is a book, to which I found exciting.
I bought it for my cousin.
He likes it.
Run Code Online (Sandbox Code Playgroud)

作为一个python新手,这是我试过的:

import pandas as pd
data_csv = StringIO(data)
data_df = pd.read_csv(data_csv, sep = ".")
Run Code Online (Sandbox Code Playgroud)

使用上面的代码,所有句子都成为列名.我实际上想要它们在一列的行中.

Dee*_*ace 5

不要用read_csv.只需拆分'.'并使用标准pd.DataFrame:

data = 'This is a book, to which I found exciting. I bought it for my cousin. He likes it.'
data_df = pd.DataFrame([sentence for sentence in data.split('.') if sentence],
                       columns=['sentences'])
print(data_df)

#                                     sentences
#  0  This is a book, to which I found exciting
#  1                  I bought it for my cousin
#  2                                He likes it
Run Code Online (Sandbox Code Playgroud)

请记住,如果某些句子中存在浮点数,这将会中断.在这种情况下,您需要更改字符串的格式(例如,使用'\n'而不是'.'单独的句子.)