确保每一行以标点符号结尾

use*_*415 0 python unix nltk

我从nltk抓取了文本语料库,现在想要处理它以确保文件中的每一行都以标点符号结束.

Her mother
had died too long ago for her to
remember her caresses; and her place had been supplied
by an excellent woman as governess, who had fallen little short
of a mother in affection.
Run Code Online (Sandbox Code Playgroud)

应该成为:

Her mother had died too long ago for her to remember her caresses; 
and her place had been supplied by an excellent woman as governess, who had fallen little short of a mother in affection.
Run Code Online (Sandbox Code Playgroud)

我试过匹配,如果在行尾没有标点符号,但无法弄清楚如何向上移动下一行.非常感谢任何帮助!

fed*_*qui 5

如果您使用pastesed喜欢这个怎么办?

paste 打印同一行中的所有文本.

$ paste -s -d' ' file
Her mother had died too long ago for her to remember her caresses; and her place had been supplied by an excellent woman as governess, who had fallen little short of a mother in affection.
Run Code Online (Sandbox Code Playgroud)

sed在每个.和之后添加一个新行;.

$ paste -s -d' ' file | sed -r 's/(\.|\;) /\1\n/g'
Her mother had died too long ago for her to remember her caresses;
and her place had been supplied by an excellent woman as governess, who had fallen little short of a mother in affection.
Run Code Online (Sandbox Code Playgroud)