我有一个字符串,来自一篇有几百个句子的文章.我想将字符串转换为数据帧,每个句子作为一行.例如,
data = 'This is a book, to which I found exciting. I bought it for my cousin. He likes it.'
Run Code Online (Sandbox Code Playgroud)
我希望它变成:
This is a book, to which I found exciting.
I bought it for my cousin.
He likes it.
Run Code Online (Sandbox Code Playgroud)
作为一个python新手,这是我试过的:
import pandas as pd
data_csv = StringIO(data)
data_df = pd.read_csv(data_csv, sep = ".")
Run Code Online (Sandbox Code Playgroud)
使用上面的代码,所有句子都成为列名.我实际上想要它们在一列的行中.
不要用read_csv.只需拆分'.'并使用标准pd.DataFrame:
data = 'This is a book, to which I found exciting. I bought it for my cousin. He likes it.'
data_df = pd.DataFrame([sentence for sentence in data.split('.') if sentence],
columns=['sentences'])
print(data_df)
# sentences
# 0 This is a book, to which I found exciting
# 1 I bought it for my cousin
# 2 He likes it
Run Code Online (Sandbox Code Playgroud)
请记住,如果某些句子中存在浮点数,这将会中断.在这种情况下,您需要更改字符串的格式(例如,使用'\n'而不是'.'单独的句子.)
| 归档时间: |
|
| 查看次数: |
155 次 |
| 最近记录: |