Python 如何将带连字符的单词与换行符合并？

Question

Python 如何将带连字符的单词与换行符合并？

I want to say that Napp Granade
serves in the spirit of a town in our dis-
trict of Georgia called Andersonville.

Run Code Online (Sandbox Code Playgroud)

我有数千个包含上述数据的文本文件，并且单词已使用连字符和换行符进行包装。

我想要做的是删除连字符并将换行符放在单词的末尾。如果可能的话，我不想删除所有带连字符的单词，只删除那些位于行尾的单词。

            with open(filename, encoding="utf8") as f:
              file_str = f.read()


            re.sub("\s*-\s*", "", file_str)

            with open(filename, "w", encoding="utf8") as f:
              f.write(file_str)

Run Code Online (Sandbox Code Playgroud)

上面的代码不起作用，我尝试了几种不同的方法。

我想浏览整个文本文件并删除所有表示换行符的连字符。如：

I want to say that Napp Granade
serves in the spirit of a town in our district
of Georgia called Andersonville.

Run Code Online (Sandbox Code Playgroud)

任何帮助，将不胜感激。

Answer 1

Thi*_*lle 5

您不需要使用正则表达式：

filename = 'test.txt'

# I want to say that Napp Granade
# serves in the spirit of a town in our dis-
# trict of Georgia called Anderson-
# ville.

with open(filename, encoding="utf8") as f:
    lines = [line.strip('\n') for line in f]
    for num, line in enumerate(lines):
        if line.endswith('-'):
            # the end of the word is at the start of next line
            end = lines[num+1].split()[0]
            # we remove the - and append the end of the word
            lines[num] = line[:-1] + end
            # and remove the end of the word and possibly the 
            # following space from the next line
            lines[num+1] = lines[num+1][len(end)+1:]

    text = '\n'.join(lines)

with open(filename, "w", encoding="utf8") as f:
    f.write(text)


# I want to say that Napp Granade
# serves in the spirit of a town in our district
# of Georgia called Andersonville.

Run Code Online (Sandbox Code Playgroud)

但是你当然可以，而且它更短：

with open(filename, encoding="utf8") as f:
    text = f.read()

text = re.sub(r'-\n(\w+ *)', r'\1\n', text)

with open(filename, "w", encoding="utf8") as f:
        f.write(text)

Run Code Online (Sandbox Code Playgroud)

我们寻找 a-后跟\n，并捕获下面的单词，这是拆分单词的结尾。
我们用捕获的单词后跟换行符替换所有这些。

不要忘记使用原始字符串进行替换，以便\1正确解释。

归档时间：	8 年，9 月前
查看次数：	963 次
最近记录：	8 年，9 月前