Git在一条线内合并

ake*_*het 51 git version-control latex

前言

我正在使用git作为我的实验室正在编写的纸张的版本控制系统,在LaTeX中.有几个人合作.

我遇到git对它如何融合很顽固.假设两个人对一行进行了单词更改,然后尝试合并它们.虽然git diff --word-diff似乎能够逐字显示分支之间的差异,但git merge似乎无法逐字执行合并,而是需要手动合并.

使用LaTeX文档时,这一点特别烦人,因为编写LaTeX时的常见习惯是每行写一个完整的段落,让文本编辑器在为您显示时处理自动换行.我们现在正在努力为每个句子添加换行符,以便git至少可以合并段落中不同句子的更改.但它仍然会对句子中的多个变化感到困惑,这使得文本当然不再包装得很好.

问题

有没有办法git合并两个文件"逐字"而不是"逐行"?

ach*_*000 14

这是一个与sehe相同的解决方案,有一些改变,希望能解决你的意见:

  • 这个解决方案考虑通过句子而不是单词进行合并,就像你手工做的那样,只有现在,用户每段会看到一行,但是git会看到段落分成句子.这似乎更合乎逻辑,因为从段落中添加/删除句子可能与段落中的其他更改兼容,但是当两个提交编辑同一个句子时,可能更希望手动合并.这也有"干净"快照的好处,仍然有点人类可读(和乳胶可编辑!).
  • 过滤器是一行命令,可以更容易地将其移植到协作者.

正如在saha的解决方案中做出(或附加).gittatributes.

    *.tex filter=sentencebreak
Run Code Online (Sandbox Code Playgroud)

现在实现清洁和涂抹过滤器:

    git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /$&%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
    git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""
Run Code Online (Sandbox Code Playgroud)

我已经创建了一个包含以下内容的测试文件,请注意单行段落.

    \chapter{Tumbling Tumbleweeds. Intro}
    A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski.  At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. This Lebowski, he called himself the Dude. Now, Dude, that's a name no one would self-apply where I come from.  But then, there was a lot about the Dude that didn't make a whole lot of sense to me.  And a lot about where he lived, like- wise.  But then again, maybe that's why I found the place s'durned innarestin'.

    This line has two sentences. But it also ends with a comment. % here
Run Code Online (Sandbox Code Playgroud)

在我们将其提交到本地仓库之后,我们可以看到原始内容.

    $ git show HEAD:test.tex

    \chapter{Tumbling Tumbleweeds. Intro}
    A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski. %NL%
     At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. %NL%
    This Lebowski, he called himself the Dude. %NL%
    Now, Dude, that's a name no one would self-apply where I come from. %NL%
     But then, there was a lot about the Dude that didn't make a whole lot of sense to me. %NL%
     And a lot about where he lived, like- wise. %NL%
     But then again, maybe that's why I found the place s'durned innarestin'.

    This line has two sentences. But it also ends with a comment. % here
Run Code Online (Sandbox Code Playgroud)

所以,清洁过滤器的规则是只要找到文本字符串结束与.?!''(这是乳胶的方式做双引号),然后是一个空格,它会添加%NL%和换行符.但是它会忽略以\(乳胶命令)开头的行或在任何地方包含注释(因此注释不能成为主文本的一部分).

涂抹过滤器删除%NL%和换行符.

在"干净"文件上进行差异和合并,因此对段落的更改将逐句合并.这是期望的行为.

好的一点是,乳胶文件应该在干净或污迹状态下编译,因此协作者有一些希望不需要做任何事情.最后,您可以将git config命令放在作为repo一部分的shell脚本中,这样协作者只需在repo的根目录中运行它即可进行配置.

    #!/bin/bash

    git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /$&%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
    git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""

    fileArray=($(find . -iname "*.tex"))

    for (( i=0; i<${#fileArray[@]}; i++ ));
    do
        perl -pe "s/%NL%\n//gm" < ${fileArray[$i]} > temp
        mv temp ${fileArray[$i]}
    done
Run Code Online (Sandbox Code Playgroud)

最后一点是黑客攻击,因为当第一次运行此脚本时,分支已经检出(以干净的形式)并且它不会自动弄脏.

您可以将此脚本和.gitattributes文件添加到repo,然后新用户只需要克隆,然后在repo的根目录中运行脚本.

我认为这个脚本甚至可以在windows git上运行,如果在git bash中完成的话.

缺点:

  • 这不能巧妙地处理带有注释的行,它只是忽略它们.
  • %NL%有点难看
  • 过滤器可能搞砸了一些方程式(对此不确定).

  • “Edits must be at least 6 characters”——好的,所以文件名中的错字仍然存在。这真的是一个非常 st^H^H ...... stackoverflow 上的不利规则。 (2认同)

seh*_*ehe 8

可以试试这个:

而不是交换合并引擎()你可以做某种'规范化'(规范化,如果你愿意).我不会说LateX,但让我举例说明如下:

说你输入像 test.raw

curve ball well received {misfit} whatever
proprietary format extinction {benefit}.
Run Code Online (Sandbox Code Playgroud)

您希望它逐字进行差异/合并.添加以下.gitattributes文件

*.raw     filter=wordbyword
Run Code Online (Sandbox Code Playgroud)

然后

git config --global filter.wordbyword.clean /home/username/bin/wordbyword.clean
git config --global filter.wordbyword.smudge /home/username/bin/wordbyword.smudge
Run Code Online (Sandbox Code Playgroud)

过滤器的极简主义实现将是

/home/username/bin/wordbyword.clean

#!/usr/bin/perl
use strict;
use warnings;

while (<>)
{
    print "$_\n" foreach (m/(.*?\s+)/go);
    print '#@#DELIM#@#' . "\n";
}
Run Code Online (Sandbox Code Playgroud)

/home/username/bin/wordbyword.smudge

#!/usr/bin/perl
use strict;
use warnings;

while (<>)
{
    chomp; '#@#DELIM#@#' eq $_ and print "\n" or print;
}
Run Code Online (Sandbox Code Playgroud)

提交文件后,使用`git show检查已提交的blob的原始内容

HEAD:test.raw`:

curve 
ball 
well 
received 
{misfit} 
whatever

#@#DELIM#@#
proprietary 
format 
extinction 
{benefit}.

#@#DELIM#@#
Run Code Online (Sandbox Code Playgroud)

将test.raw的内容更改为

curve ball welled repreived {misfit} whatever
proprietary extinction format {benefit}.
Run Code Online (Sandbox Code Playgroud)

git diff --patch-with-stat可能你想要的输出:

 test.raw |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/test.raw b/test.raw
index b0b0b88..ed8c393 100644
--- a/test.raw
+++ b/test.raw
@@ -1,14 +1,14 @@
 curve 
 ball 
-well 
-received 
+welled 
+repreived 
 {misfit} 
 whatever

 #@#DELIM#@#
 proprietary 
-format 
 extinction 
+format 
 {benefit}.

 #@#DELIM#@#
Run Code Online (Sandbox Code Playgroud)

您可以看到这将如何神奇地用于合并,从而导致逐字区分和合并.QED

(我希望你喜欢我对.gitattributes的创造性使用.如果没有,我很喜欢做这个小练习)

  • 我喜欢.gitattributes的创造性使用.看起来像这个解决方案虽然有一些有问题的副作用... 1)文件将作为单行每行存储在git的提交快照中,对吗?因此,所有的协作者都需要拥有脚本(以及一个工作的perl安装或我们使用的任何脚本引擎),否则他们将在他们的工作树中看到基本上的乱码?2)提交差异也很麻烦,因为它们也是每行一个字. (2认同)