.PO(gettext)文件的3路Git合并驱动程序在哪里？

Question

.PO(gettext)文件的3路Git合并驱动程序在哪里？

Mik*_*nen 15 git merge localization conflict gettext

我已经有了以下内容

[attr]POFILE merge=merge-po-files

locale/*.po POFILE

Run Code Online (Sandbox Code Playgroud)

在.gitattributes同一个本地化文件(例如locale/en.po)已经在并行分支中修改时,我希望合并分支以正常工作.我目前正在使用以下合并驱动程序:

#!/bin/bash
# git merge driver for .PO files (gettext localizations)
# Install:
# git config merge.merge-po-files.driver "./bin/merge-po-files %A %O %B"

LOCAL="${1}._LOCAL_"
BASE="${2}._BASE_"
REMOTE="${3}._REMOTE_"

# rename to bit more meaningful filenames to get better conflict results
cp "${1}" "$LOCAL"
cp "${2}" "$BASE"
cp "${3}" "$REMOTE"

# merge files and overwrite local file with the result
msgcat "$LOCAL" "$BASE" "$REMOTE" -o "${1}" || exit 1

# cleanup
rm -f "$LOCAL" "$BASE" "$REMOTE"

# check if merge has conflicts
fgrep -q '#-#-#-#-#' "${1}" && exit 1

# if we get here, merge is successful
exit 0

Run Code Online (Sandbox Code Playgroud)

然而,这msgcat太愚蠢了,这不是真正的三方合并.例如,如果我有

BASE版本
```
msgid "foo"
msgstr "foo"
```
Run Code Online (Sandbox Code Playgroud)
本地版本
```
msgid "foo"
msgstr "bar"
```
Run Code Online (Sandbox Code Playgroud)
REMOTE版本
```
msgid "foo"
msgstr "foo"
```
Run Code Online (Sandbox Code Playgroud)

我最终会发生冲突.但是,真正的三向合并驱动程序将输出正确的合并:

msgid "foo"
msgstr "bar"

Run Code Online (Sandbox Code Playgroud)

请注意,我不能简单地添加--use-first,msgcat因为REMOTE可能包含更新的翻译.另外,如果BASE,LOCAL和REMOTE都是唯一的,我仍然想要一个冲突,因为这确实是一个冲突.

为了使这项工作,我需要改变什么？如果可能的话,奖励积分比'# - # - # - # - #'更少疯狂的冲突标记.

Answer 1

Mik*_*nen 6

这是一个有点复杂的示例驱动程序,似乎输出正确的合并,可能包含本地或远程版本应删除的一些翻译.
什么都不应该丢失,所以这个驱动程序在某些情况下只会增加一些额外的混乱.

此版本使用gettext原生冲突的标记,看起来像#-#-#-#-#联合fuzzy标志,而不是普通的Git冲突标记.
驱动程序有点难以解决错误(或功能)msgcat和msguniq:

#!/bin/bash
# git merge driver for .PO files
# Copyright (c) Mikko Rantalainen <mikko.rantalainen@peda.net>, 2013
# License: MIT

ORIG_HASH=$(git hash-object "${1}")
WORKFILE=$(git ls-tree -r HEAD | fgrep "$ORIG_HASH" | cut -b54-)
echo "Using custom merge driver for $WORKFILE..."

LOCAL="${1}._LOCAL_"
BASE="${2}._BASE_"
REMOTE="${3}._REMOTE_"

LOCAL_ONELINE="$LOCAL""ONELINE_"
BASE_ONELINE="$BASE""ONELINE_"
REMOTE_ONELINE="$REMOTE""ONELINE_"

OUTPUT="$LOCAL""OUTPUT_"
MERGED="$LOCAL""MERGED_"
MERGED2="$LOCAL""MERGED2_"

TEMPLATE1="$LOCAL""TEMPLATE1_"
TEMPLATE2="$LOCAL""TEMPLATE2_"
FALLBACK_OBSOLETE="$LOCAL""FALLBACK_OBSOLETE_"

# standardize the input files for regexping
# default to UTF-8 in case charset is still the placeholder "CHARSET"
cat "${1}" | perl -npe 's!(^"Content-Type: text/plain; charset=)(CHARSET)(\\n"$)!$1UTF-8$3!' | msgcat --no-wrap --sort-output - > "$LOCAL"
cat "${2}" | perl -npe 's!(^"Content-Type: text/plain; charset=)(CHARSET)(\\n"$)!$1UTF-8$3!' | msgcat --no-wrap --sort-output - > "$BASE"
cat "${3}" | perl -npe 's!(^"Content-Type: text/plain; charset=)(CHARSET)(\\n"$)!$1UTF-8$3!' | msgcat --no-wrap --sort-output - > "$REMOTE"

# convert each definition to single line presentation
# extra fill is required to make sure that git separates each conflict 
perl -npe 'BEGIN {$/ = "\n\n"}; s/#\n$/\n/s; s/#/##/sg; s/\n/#n/sg; s/#n$/\n/sg; s/#n$/\n/sg; $_.="#fill#\n" x 4' "$LOCAL" > "$LOCAL_ONELINE"
perl -npe 'BEGIN {$/ = "\n\n"}; s/#\n$/\n/s; s/#/##/sg; s/\n/#n/sg; s/#n$/\n/sg; s/#n$/\n/sg; $_.="#fill#\n" x 4' "$BASE"  > "$BASE_ONELINE"
perl -npe 'BEGIN {$/ = "\n\n"}; s/#\n$/\n/s; s/#/##/sg; s/\n/#n/sg; s/#n$/\n/sg; s/#n$/\n/sg; $_.="#fill#\n" x 4' "$REMOTE"  > "$REMOTE_ONELINE"

# merge files using normal git merge machinery
git merge-file -p --union -L "Current (working directory)" -L "Base (common ancestor)" -L "Incoming (applied changeset)" "$LOCAL_ONELINE" "$BASE_ONELINE" "$REMOTE_ONELINE" > "$MERGED"
MERGESTATUS=$?

# remove possibly duplicated headers (workaround msguniq bug http://comments.gmane.org/gmane.comp.gnu.gettext.bugs/96)
cat "$MERGED" | perl -npe 'BEGIN {$/ = "\n\n"}; s/^([^\n]+#nmsgid ""#nmsgstr ""#n.*?\n)([^\n]+#nmsgid ""#nmsgstr ""#n.*?\n)+/$1/gs' > "$MERGED2"

# remove lines that have totally empty msgstr
# and convert back to normal PO file representation
cat "$MERGED2" | grep -v '#nmsgstr ""$' | grep -v '^#fill#$' | perl -npe 's/#n/\n/g; s/##/#/g' > "$MERGED"

# run the output through msguniq to merge conflicts gettext style
# msguniq seems to have a bug that causes empty output if zero msgids
# are found after the header. Expected output would be the header...
# Workaround the bug by adding an empty obsolete fallback msgid
# that will be automatically removed by msguniq

cat > "$FALLBACK_OBSOLETE" << 'EOF'

#~ msgid "obsolete fallback"
#~ msgstr ""

EOF
cat "$MERGED" "$FALLBACK_OBSOLETE" | msguniq --no-wrap --sort-output > "$MERGED2"


# create a hacked template from default merge between 3 versions
# we do this to try to preserve original file ordering
msgcat --use-first "$LOCAL" "$REMOTE" "$BASE" > "$TEMPLATE1"
msghack --empty "$TEMPLATE1" > "$TEMPLATE2"
msgmerge --silent --no-wrap --no-fuzzy-matching "$MERGED2" "$TEMPLATE2" > "$OUTPUT"

# show some results to stdout
if grep -q '#-#-#-#-#' "$OUTPUT"
then
    FUZZY=$(cat "$OUTPUT" | msgattrib --only-fuzzy --no-obsolete --color | perl -npe 'BEGIN{ undef $/; }; s/^.*?msgid "".*?\n\n//s')
    if test -n "$FUZZY"
    then
        echo "-------------------------------"
        echo "Fuzzy translations after merge:"
        echo "-------------------------------"
        echo "$FUZZY"
        echo "-------------------------------"
    fi
fi

# git merge driver must overwrite the first parameter with output
mv "$OUTPUT" "${1}"

# cleanup
rm -f "$LOCAL" "$BASE" "$REMOTE" "$LOCAL_ONELINE" "$BASE_ONELINE" "$REMOTE_ONELINE" "$MERGED" "$MERGED2" "$TEMPLATE1" "$TEMPLATE2" "$FALLBACK_OBSOLETE"

# return conflict if merge has conflicts according to msgcat/msguniq
grep -q '#-#-#-#-#' "${1}" && exit 1

# otherwise, return git merge status
exit $MERGESTATUS

# Steps to install this driver:
# (1) Edit ".git/config" in your repository directory
# (2) Add following section:
#
# [merge "merge-po-files"]
#   name = merge po-files driver
#   driver = ./bin/merge-po-files %A %O %B
#   recursive = binary
#
# or
#
# git config merge.merge-po-files.driver "./bin/merge-po-files %A %O %B"
#
# The file ".gitattributes" will point git to use this merge driver.

Run Code Online (Sandbox Code Playgroud)

关于此驱动程序的简短说明

它将常规PO文件格式转换为单行格式,其中每一行都是一个翻译条目.
然后它使用常规git merge-file --union进行合并,并且在合并之后,生成的单行格式被转换回常规PO文件格式.
实际的冲突解决是在此之后完成的msguniq,
然后它最终将生成的文件与通过定期msgcat组合原始输入文件生成的模板合并,以恢复可能丢失的元数据.

警告:此驱动程序将msgcat --no-wrap在.PO文件上使用,UTF-8如果未指定实际编码,将强制进行编码.
如果您想使用此合并的驱动程序,但总是检查结果,改变最终exit $MERGESTATUS的样子exit 1.

从此驱动程序获得合并冲突后,解决冲突的最佳方法是使用virtaal和选择打开冲突文件Navigation: Incomplete.
我发现这个UI是一个非常好的修复冲突的工具.

Answer 2

Mik*_*nen 4

这是 2021 年的另一个答案。我现在正在使用以下合并驱动程序，这似乎适用于我测试过的所有情况。我已将其存储./bin/merge-po-files在我们的存储库中。

#!/bin/bash
#
# Three-way merge driver for PO files, runs on multiple CPUs where possible
#
# Copyright 2015-2016 Marco Ciampa
# Copyright 2021 Mikko Rantalainen <mikko.rantalainen@iki.fi>
# License: MIT (https://opensource.org/licenses/MIT)
#
# Original source:
# /sf/answers/2067497351/
# https://github.com/mezis/git-whistles/blob/master/libexec/git-merge-po.sh
#
# Install with
# git config merge.merge-po-files.driver "./bin/merge-po-files %A %O %B %P"
#
# Note that you also need file `.gitattributes` with following lines:
#
# [attr]POFILE merge=merge-po-files
# locale/*.po POFILE
#
##########################################################################
# CONFIG:

# Formatting flags to be be used to produce merged .po files
# This can be set to match project needs for the .po files.
# NOTE: $MSGCAT_FINAL_FLAGS will be passed to msgcat without quotation
MSGCAT_FINAL_FLAGS="--no-wrap --sort-output"

# Verbosity level:
# 0: Silent except for real errors
# 1: Show simple header for each file processed
# 2: Also show all conflicts in merge result (both new and existing)
# 3: Also show all status messages with timestamps
VERBOSITY="${VERBOSITY:=2}"

##########################################################################
# Implementation:

# Use logical names for arguments:
LOCAL="$1"
BASE="$2"
OTHER="$3"
FILENAME="$4"
OUTPUT="$LOCAL"

# The temporary directory for all files we need - note that most files are
# created without extensions to emit nicer conflict messages where gettext
# likes to embed the basename of the file in the conflict message so we
# use names like "local" and "other" instead of e.g. "local.G2wZ.po".
TEMP="$(mktemp -d /tmp/merge-po.XXXXXX)"


# abort on any error and report the details if possible
set -E
set -e
on_error()
{
    local parent_lineno="$1"
    local message="$3"
    local code="$2"
    if [[ -n "$message" ]] ; then
        printf "### $0: error near line %d: status %d: %s\n" "${parent_lineno}" "${code}" "${message}" 1>&2
    else
        printf "### $0: error near line %d: status %d\n" "${parent_lineno}" "${code}" 1>&2
    fi
    exit 255
}
trap 'on_error ${LINENO} $?' ERR


# Maybe print message(s) to stdout with timestamps
function status()
{
    if test "$VERBOSITY" -ge 3
    then
        printf "%s %s\n" "$(date '+%Y-%m-%d %H:%M:%S.%3N')" "$@"
    fi
}

# Quietly take translations from $1 and apply those according to template $2
# (and do not use fuzzy-matching, always generate output)
# also supports all flags to msgmerge
function apply_po_template()
{
    msgmerge --force-po --quiet --no-fuzzy-matching "$@"
}

# Take stdin, remove the "graveyard strings" and emit the result to stdout
function strip_graveyard()
{
    msgattrib --no-obsolete
}

# Take stdin, keep only confict lines and emit the result to stdout
function only_conflicts()
{
    msggrep --msgstr -F -e '#-#-#-#-#' -
    # alternative slightly worse implementation: msgattrib --only-fuzzy
}

# Take stdin, discard confict lines and emit the result to stdout
function without_conflicts()
{
    msggrep -v --msgstr -F -e '#-#-#-#-#' -
    # alternative slightly worse implementation: msgattrib --no-fuzzy
}

# Select messages from $1 that are also in $2 but whose contents have changed
# and emit results to stdout
function extract_changes()
{
    # Extract conflicting changes and discard any changes to graveyard area only
    msgcat -o - "$1" "$2" \
    | only_conflicts \
    | apply_po_template -o - "$1" - \
    | strip_graveyard
}

# Emit only the header of $1, supports flags of msggrep
function extract_header()
{
    # Unfortunately gettext really doesn't support extracting just header
    # so we have to get creative: extract only strings that originate
    # from file called "//" which should result to header only
     msggrep --force-po -N // "$@"

    # Logically msggrep --force-po -v -K -E -e '.' should return the header
    # only but msggrep seems be buggy with msgids with line feeds and output
    # those, too
}

# Take file in $1 and show conflicts with colors in the file to stdout
function show_conflicts()
{
    OUTPUT="$1"
    shift
    # Count number of lines to remove from the output and output conflict lines without the header
    CONFLICT_HEADER_LINES=$(cat "$OUTPUT" | msggrep --force-po --color=never --msgstr -F -e '#-#-#-#-#' - | extract_header - | wc -l)
    # tail wants line number of the first displayed line so we want +1 here:
    CONFLICTS=$(cat "$OUTPUT" | msggrep --force-po --color --msgstr -F -e '#-#-#-#-#' - | tail -n "+$((CONFLICT_HEADER_LINES+1))")
    if test -n "$CONFLICTS"
    then
        #echo "----------------------------"
        #echo "Conflicts after merge:"
        echo "----------------------------"
        printf "%s\n" "$CONFLICTS"
        echo "----------------------------"
    fi
}

# Sanity check that we have a sensible temporary directory
test -n "$TEMP" || exit 125
test -d "$TEMP" || exit 126
test -w "$TEMP" || exit 127

if test "$VERBOSITY" -ge 1
then
    printf "Using gettext .PO merge driver: %s ...\n" "$FILENAME"
fi

# Extract the PO header from the current branch (top of file until first empty line)
extract_header -o "${TEMP}/header" "$LOCAL"

##########################################################################
# Following parts can be run partially parallel and "wait" is used to syncronize processing


# Clean input files and use logical filenames for possible conflict markers:
status "Canonicalizing input files ..."
msguniq --force-po -o "${TEMP}/base" --unique "${BASE}" &
msguniq --force-po -o "${TEMP}/local" --unique "${LOCAL}" &
msguniq --force-po -o "${TEMP}/other" --unique "${OTHER}" &
wait

status "Computing local-changes, other-changes and unchanged ..."
msgcat --force-po -o - "${TEMP}/base" "${TEMP}/local" "${TEMP}/other" | without_conflicts > "${TEMP}/unchanged" &
extract_changes "${TEMP}/local" "${TEMP}/base" > "${TEMP}/local-changes" &
extract_changes "${TEMP}/other" "${TEMP}/base" > "${TEMP}/other-changes" &
wait

# Messages changed on both local and other (conflicts):
status "Computing conflicts ..."
msgcat --force-po -o - "${TEMP}/other-changes" "${TEMP}/local-changes" | only_conflicts > "${TEMP}/conflicts"

# Messages changed on local, not on other; and vice-versa:
status "Computing local-only and other-only changes ..."
msgcat --force-po -o "${TEMP}/local-only"  --unique "${TEMP}/local-changes"  "${TEMP}/conflicts" &
msgcat --force-po -o "${TEMP}/other-only" --unique "${TEMP}/other-changes" "${TEMP}/conflicts" &
wait

# Note: following steps require sequential processing and cannot be run in parallel

status "Computing initial merge without template ..."
# Note that we may end up with some extra so we have to apply template later
msgcat --force-po -o "${TEMP}/merge1" "${TEMP}/unchanged" "${TEMP}/conflicts" "${TEMP}/local-only" "${TEMP}/other-only"

# Create a template to only output messages that are actually needed (union of messages on local and other create the template!)
status "Computing template and applying it to merge result ..."
msgcat --force-po -o - "${TEMP}/local" "${TEMP}/other" | apply_po_template -o "${TEMP}/merge2" "${TEMP}/merge1" -

# Final merge result is merge2 with original header
status "Fixing the header after merge ..."
msgcat --force-po $MSGCAT_FINAL_FLAGS -o "${TEMP}/merge3" --use-first "${TEMP}/header" "${TEMP}/merge2"

# Produce output file (overwrites input LOCAL file because git expects that for the results)
status "Saving output ..."
mv "${TEMP}/merge3" "$OUTPUT"

status "Cleaning up ..."

rm "${TEMP}"/*
rmdir "${TEMP}"

status "Checking for conflicts in the result ..."

# Check for conflicts in the final merge
if grep -q '#-#-#-#-#' "$OUTPUT"
then
    if test "$VERBOSITY" -ge 1
    then
        printf "### Conflict(s) detected ###\n"
    fi

    if test "$VERBOSITY" -ge 2
    then
        # Verbose diagnostics
        show_conflicts "$OUTPUT"
    fi

    status "Automatic merge failed, exiting with status 1."
    exit 1
fi

status "Automatic merge completed successfully, exiting with status 0."
exit 0

Run Code Online (Sandbox Code Playgroud)

此变体基于 @mezis 在同一问题中的答案中的版本，但它有以下改进：

尽可能在多个 CPU 上并行运行（分发到多个 CPU 是通过在后台运行多个管道，然后使用同步&所有并行管道来完成的wait。最终合并需要顺序代码，因此仅在一个 CPU 核心上运行。合并速度给出的 .PO 输入似乎约为 1 MB/s。
添加大量文档。
在开始处添加可配置变量来定义最终的 gettext 文件格式。在上面的示例中，默认配置是--no-wrap --sort-output.
对所有临时文件使用不带文件扩展名的逻辑名称，以便更容易理解 gettext 合并冲突。
使用合并驱动程序中的新git选项%P将正确的文件名作为参数传递。当合并的文件内容与项目中的另一个文件匹配时，这是必需的 - 在这种情况下，与文件内容 SHA-1 匹配的旧代码可能会打印错误的文件名。请注意，%P必须在 git config 中使用（请参阅文件开头的文档）。
避免使用perl,awk或sed来修改甚至读取 gettext 文件 - 只是 gettext 工具。可选部分使用grep,tail和wc仅显示与标准输出的详细冲突，但不处理输出文件中的实际数据。
正确合并不同复数形式发生变化的情况（合并将导致翻译中发生冲突，但不应丢失任何内容）。
请注意，如果墓地中有合并冲突（以#~这些冲突开头的行将被默默删除，而不是尝试合并此类情况）。不冲突的墓地数据将被保留。
请注意，这不会尝试在合并之前或之后进行任何模糊匹配。有时这可以改善结果，但这取决于启发式方法，并且此合并驱动程序试图具有确定性。

归档时间：	12 年，8 月前
查看次数：	2443 次
最近记录：	10 年，8 月前