.PO(gettext)文件的3路Git合并驱动程序在哪里?

Mik*_*nen 15 git merge localization conflict gettext

我已经有了以下内容

[attr]POFILE merge=merge-po-files

locale/*.po POFILE
Run Code Online (Sandbox Code Playgroud)

.gitattributes同一个本地化文件(例如locale/en.po)已经在并行分支中修改时,我希望合并分支以正常工作.我目前正在使用以下合并驱动程序:

#!/bin/bash
# git merge driver for .PO files (gettext localizations)
# Install:
# git config merge.merge-po-files.driver "./bin/merge-po-files %A %O %B"

LOCAL="${1}._LOCAL_"
BASE="${2}._BASE_"
REMOTE="${3}._REMOTE_"

# rename to bit more meaningful filenames to get better conflict results
cp "${1}" "$LOCAL"
cp "${2}" "$BASE"
cp "${3}" "$REMOTE"

# merge files and overwrite local file with the result
msgcat "$LOCAL" "$BASE" "$REMOTE" -o "${1}" || exit 1

# cleanup
rm -f "$LOCAL" "$BASE" "$REMOTE"

# check if merge has conflicts
fgrep -q '#-#-#-#-#' "${1}" && exit 1

# if we get here, merge is successful
exit 0
Run Code Online (Sandbox Code Playgroud)

然而,这msgcat太愚蠢了,这不是真正的三方合并.例如,如果我有

  1. BASE版本

    msgid "foo"
    msgstr "foo"
    
    Run Code Online (Sandbox Code Playgroud)
  2. 本地版本

    msgid "foo"
    msgstr "bar"
    
    Run Code Online (Sandbox Code Playgroud)
  3. REMOTE版本

    msgid "foo"
    msgstr "foo"
    
    Run Code Online (Sandbox Code Playgroud)

我最终会发生冲突.但是,真正的三向合并驱动程序将输出正确的合并:

msgid "foo"
msgstr "bar"
Run Code Online (Sandbox Code Playgroud)

请注意,我不能简单地添加--use-first,msgcat因为REMOTE可能包含更新的翻译.另外,如果BASE,LOCAL和REMOTE都是唯一的,我仍然想要一个冲突,因为这确实是一个冲突.

为了使这项工作,我需要改变什么?如果可能的话,奖励积分比'# - # - # - # - #'更少疯狂的冲突标记.

Mik*_*nen 6

这是一个有点复杂的示例驱动程序,似乎输出正确的合并,可能包含本地或远程版本应删除的一些翻译.
什么都不应该丢失,所以这个驱动程序在某些情况下只会增加一些额外的混乱.

此版本使用gettext原生冲突的标记,看起来像#-#-#-#-#联合fuzzy标志,而不是普通的Git冲突标记.
驱动程序有点难以解决错误(或功能)msgcatmsguniq:

#!/bin/bash
# git merge driver for .PO files
# Copyright (c) Mikko Rantalainen <mikko.rantalainen@peda.net>, 2013
# License: MIT

ORIG_HASH=$(git hash-object "${1}")
WORKFILE=$(git ls-tree -r HEAD | fgrep "$ORIG_HASH" | cut -b54-)
echo "Using custom merge driver for $WORKFILE..."

LOCAL="${1}._LOCAL_"
BASE="${2}._BASE_"
REMOTE="${3}._REMOTE_"

LOCAL_ONELINE="$LOCAL""ONELINE_"
BASE_ONELINE="$BASE""ONELINE_"
REMOTE_ONELINE="$REMOTE""ONELINE_"

OUTPUT="$LOCAL""OUTPUT_"
MERGED="$LOCAL""MERGED_"
MERGED2="$LOCAL""MERGED2_"

TEMPLATE1="$LOCAL""TEMPLATE1_"
TEMPLATE2="$LOCAL""TEMPLATE2_"
FALLBACK_OBSOLETE="$LOCAL""FALLBACK_OBSOLETE_"

# standardize the input files for regexping
# default to UTF-8 in case charset is still the placeholder "CHARSET"
cat "${1}" | perl -npe 's!(^"Content-Type: text/plain; charset=)(CHARSET)(\\n"$)!$1UTF-8$3!' | msgcat --no-wrap --sort-output - > "$LOCAL"
cat "${2}" | perl -npe 's!(^"Content-Type: text/plain; charset=)(CHARSET)(\\n"$)!$1UTF-8$3!' | msgcat --no-wrap --sort-output - > "$BASE"
cat "${3}" | perl -npe 's!(^"Content-Type: text/plain; charset=)(CHARSET)(\\n"$)!$1UTF-8$3!' | msgcat --no-wrap --sort-output - > "$REMOTE"

# convert each definition to single line presentation
# extra fill is required to make sure that git separates each conflict 
perl -npe 'BEGIN {$/ = "\n\n"}; s/#\n$/\n/s; s/#/##/sg; s/\n/#n/sg; s/#n$/\n/sg; s/#n$/\n/sg; $_.="#fill#\n" x 4' "$LOCAL" > "$LOCAL_ONELINE"
perl -npe 'BEGIN {$/ = "\n\n"}; s/#\n$/\n/s; s/#/##/sg; s/\n/#n/sg; s/#n$/\n/sg; s/#n$/\n/sg; $_.="#fill#\n" x 4' "$BASE"  > "$BASE_ONELINE"
perl -npe 'BEGIN {$/ = "\n\n"}; s/#\n$/\n/s; s/#/##/sg; s/\n/#n/sg; s/#n$/\n/sg; s/#n$/\n/sg; $_.="#fill#\n" x 4' "$REMOTE"  > "$REMOTE_ONELINE"

# merge files using normal git merge machinery
git merge-file -p --union -L "Current (working directory)" -L "Base (common ancestor)" -L "Incoming (applied changeset)" "$LOCAL_ONELINE" "$BASE_ONELINE" "$REMOTE_ONELINE" > "$MERGED"
MERGESTATUS=$?

# remove possibly duplicated headers (workaround msguniq bug http://comments.gmane.org/gmane.comp.gnu.gettext.bugs/96)
cat "$MERGED" | perl -npe 'BEGIN {$/ = "\n\n"}; s/^([^\n]+#nmsgid ""#nmsgstr ""#n.*?\n)([^\n]+#nmsgid ""#nmsgstr ""#n.*?\n)+/$1/gs' > "$MERGED2"

# remove lines that have totally empty msgstr
# and convert back to normal PO file representation
cat "$MERGED2" | grep -v '#nmsgstr ""$' | grep -v '^#fill#$' | perl -npe 's/#n/\n/g; s/##/#/g' > "$MERGED"

# run the output through msguniq to merge conflicts gettext style
# msguniq seems to have a bug that causes empty output if zero msgids
# are found after the header. Expected output would be the header...
# Workaround the bug by adding an empty obsolete fallback msgid
# that will be automatically removed by msguniq

cat > "$FALLBACK_OBSOLETE" << 'EOF'

#~ msgid "obsolete fallback"
#~ msgstr ""

EOF
cat "$MERGED" "$FALLBACK_OBSOLETE" | msguniq --no-wrap --sort-output > "$MERGED2"


# create a hacked template from default merge between 3 versions
# we do this to try to preserve original file ordering
msgcat --use-first "$LOCAL" "$REMOTE" "$BASE" > "$TEMPLATE1"
msghack --empty "$TEMPLATE1" > "$TEMPLATE2"
msgmerge --silent --no-wrap --no-fuzzy-matching "$MERGED2" "$TEMPLATE2" > "$OUTPUT"

# show some results to stdout
if grep -q '#-#-#-#-#' "$OUTPUT"
then
    FUZZY=$(cat "$OUTPUT" | msgattrib --only-fuzzy --no-obsolete --color | perl -npe 'BEGIN{ undef $/; }; s/^.*?msgid "".*?\n\n//s')
    if test -n "$FUZZY"
    then
        echo "-------------------------------"
        echo "Fuzzy translations after merge:"
        echo "-------------------------------"
        echo "$FUZZY"
        echo "-------------------------------"
    fi
fi

# git merge driver must overwrite the first parameter with output
mv "$OUTPUT" "${1}"

# cleanup
rm -f "$LOCAL" "$BASE" "$REMOTE" "$LOCAL_ONELINE" "$BASE_ONELINE" "$REMOTE_ONELINE" "$MERGED" "$MERGED2" "$TEMPLATE1" "$TEMPLATE2" "$FALLBACK_OBSOLETE"

# return conflict if merge has conflicts according to msgcat/msguniq
grep -q '#-#-#-#-#' "${1}" && exit 1

# otherwise, return git merge status
exit $MERGESTATUS

# Steps to install this driver:
# (1) Edit ".git/config" in your repository directory
# (2) Add following section:
#
# [merge "merge-po-files"]
#   name = merge po-files driver
#   driver = ./bin/merge-po-files %A %O %B
#   recursive = binary
#
# or
#
# git config merge.merge-po-files.driver "./bin/merge-po-files %A %O %B"
#
# The file ".gitattributes" will point git to use this merge driver.
Run Code Online (Sandbox Code Playgroud)

关于此驱动程序的简短说明

  • 它将常规PO文件格式转换为单行格式,其中每一行都是一个翻译条目.
  • 然后它使用常规git merge-file --union进行合并,并且在合并之后,生成的单行格式被转换回常规PO文件格式.
    实际的冲突解决是在此之后完成的msguniq,
  • 然后它最终将生成的文件与通过定期msgcat组合原始输入文件生成的模板合并,以恢复可能丢失的元数据.

警告:此驱动程序将msgcat --no-wrap.PO文件上使用,UTF-8如果未指定实际编码,将强制进行编码.
如果您想使用此合并的驱动程序,但总是检查结果,改变最终exit $MERGESTATUS的样子exit 1.

从此驱动程序获得合并冲突后,解决冲突的最佳方法是使用virtaal和选择打开冲突文件Navigation: Incomplete.
我发现这个UI是一个非常好的修复冲突的工具.


Mik*_*nen 4

这是 2021 年的另一个答案。我现在正在使用以下合并驱动程序,这似乎适用于我测试过的所有情况。我已将其存储./bin/merge-po-files在我们的存储库中。

#!/bin/bash
#
# Three-way merge driver for PO files, runs on multiple CPUs where possible
#
# Copyright 2015-2016 Marco Ciampa
# Copyright 2021 Mikko Rantalainen <mikko.rantalainen@iki.fi>
# License: MIT (https://opensource.org/licenses/MIT)
#
# Original source:
# /sf/answers/2067497351/
# https://github.com/mezis/git-whistles/blob/master/libexec/git-merge-po.sh
#
# Install with
# git config merge.merge-po-files.driver "./bin/merge-po-files %A %O %B %P"
#
# Note that you also need file `.gitattributes` with following lines:
#
# [attr]POFILE merge=merge-po-files
# locale/*.po POFILE
#
##########################################################################
# CONFIG:

# Formatting flags to be be used to produce merged .po files
# This can be set to match project needs for the .po files.
# NOTE: $MSGCAT_FINAL_FLAGS will be passed to msgcat without quotation
MSGCAT_FINAL_FLAGS="--no-wrap --sort-output"

# Verbosity level:
# 0: Silent except for real errors
# 1: Show simple header for each file processed
# 2: Also show all conflicts in merge result (both new and existing)
# 3: Also show all status messages with timestamps
VERBOSITY="${VERBOSITY:=2}"

##########################################################################
# Implementation:

# Use logical names for arguments:
LOCAL="$1"
BASE="$2"
OTHER="$3"
FILENAME="$4"
OUTPUT="$LOCAL"

# The temporary directory for all files we need - note that most files are
# created without extensions to emit nicer conflict messages where gettext
# likes to embed the basename of the file in the conflict message so we
# use names like "local" and "other" instead of e.g. "local.G2wZ.po".
TEMP="$(mktemp -d /tmp/merge-po.XXXXXX)"


# abort on any error and report the details if possible
set -E
set -e
on_error()
{
    local parent_lineno="$1"
    local message="$3"
    local code="$2"
    if [[ -n "$message" ]] ; then
        printf "### $0: error near line %d: status %d: %s\n" "${parent_lineno}" "${code}" "${message}" 1>&2
    else
        printf "### $0: error near line %d: status %d\n" "${parent_lineno}" "${code}" 1>&2
    fi
    exit 255
}
trap 'on_error ${LINENO} $?' ERR


# Maybe print message(s) to stdout with timestamps
function status()
{
    if test "$VERBOSITY" -ge 3
    then
        printf "%s %s\n" "$(date '+%Y-%m-%d %H:%M:%S.%3N')" "$@"
    fi
}

# Quietly take translations from $1 and apply those according to template $2
# (and do not use fuzzy-matching, always generate output)
# also supports all flags to msgmerge
function apply_po_template()
{
    msgmerge --force-po --quiet --no-fuzzy-matching "$@"
}

# Take stdin, remove the "graveyard strings" and emit the result to stdout
function strip_graveyard()
{
    msgattrib --no-obsolete
}

# Take stdin, keep only confict lines and emit the result to stdout
function only_conflicts()
{
    msggrep --msgstr -F -e '#-#-#-#-#' -
    # alternative slightly worse implementation: msgattrib --only-fuzzy
}

# Take stdin, discard confict lines and emit the result to stdout
function without_conflicts()
{
    msggrep -v --msgstr -F -e '#-#-#-#-#' -
    # alternative slightly worse implementation: msgattrib --no-fuzzy
}

# Select messages from $1 that are also in $2 but whose contents have changed
# and emit results to stdout
function extract_changes()
{
    # Extract conflicting changes and discard any changes to graveyard area only
    msgcat -o - "$1" "$2" \
    | only_conflicts \
    | apply_po_template -o - "$1" - \
    | strip_graveyard
}

# Emit only the header of $1, supports flags of msggrep
function extract_header()
{
    # Unfortunately gettext really doesn't support extracting just header
    # so we have to get creative: extract only strings that originate
    # from file called "//" which should result to header only
     msggrep --force-po -N // "$@"

    # Logically msggrep --force-po -v -K -E -e '.' should return the header
    # only but msggrep seems be buggy with msgids with line feeds and output
    # those, too
}

# Take file in $1 and show conflicts with colors in the file to stdout
function show_conflicts()
{
    OUTPUT="$1"
    shift
    # Count number of lines to remove from the output and output conflict lines without the header
    CONFLICT_HEADER_LINES=$(cat "$OUTPUT" | msggrep --force-po --color=never --msgstr -F -e '#-#-#-#-#' - | extract_header - | wc -l)
    # tail wants line number of the first displayed line so we want +1 here:
    CONFLICTS=$(cat "$OUTPUT" | msggrep --force-po --color --msgstr -F -e '#-#-#-#-#' - | tail -n "+$((CONFLICT_HEADER_LINES+1))")
    if test -n "$CONFLICTS"
    then
        #echo "----------------------------"
        #echo "Conflicts after merge:"
        echo "----------------------------"
        printf "%s\n" "$CONFLICTS"
        echo "----------------------------"
    fi
}

# Sanity check that we have a sensible temporary directory
test -n "$TEMP" || exit 125
test -d "$TEMP" || exit 126
test -w "$TEMP" || exit 127

if test "$VERBOSITY" -ge 1
then
    printf "Using gettext .PO merge driver: %s ...\n" "$FILENAME"
fi

# Extract the PO header from the current branch (top of file until first empty line)
extract_header -o "${TEMP}/header" "$LOCAL"

##########################################################################
# Following parts can be run partially parallel and "wait" is used to syncronize processing


# Clean input files and use logical filenames for possible conflict markers:
status "Canonicalizing input files ..."
msguniq --force-po -o "${TEMP}/base" --unique "${BASE}" &
msguniq --force-po -o "${TEMP}/local" --unique "${LOCAL}" &
msguniq --force-po -o "${TEMP}/other" --unique "${OTHER}" &
wait

status "Computing local-changes, other-changes and unchanged ..."
msgcat --force-po -o - "${TEMP}/base" "${TEMP}/local" "${TEMP}/other" | without_conflicts > "${TEMP}/unchanged" &
extract_changes "${TEMP}/local" "${TEMP}/base" > "${TEMP}/local-changes" &
extract_changes "${TEMP}/other" "${TEMP}/base" > "${TEMP}/other-changes" &
wait

# Messages changed on both local and other (conflicts):
status "Computing conflicts ..."
msgcat --force-po -o - "${TEMP}/other-changes" "${TEMP}/local-changes" | only_conflicts > "${TEMP}/conflicts"

# Messages changed on local, not on other; and vice-versa:
status "Computing local-only and other-only changes ..."
msgcat --force-po -o "${TEMP}/local-only"  --unique "${TEMP}/local-changes"  "${TEMP}/conflicts" &
msgcat --force-po -o "${TEMP}/other-only" --unique "${TEMP}/other-changes" "${TEMP}/conflicts" &
wait

# Note: following steps require sequential processing and cannot be run in parallel

status "Computing initial merge without template ..."
# Note that we may end up with some extra so we have to apply template later
msgcat --force-po -o "${TEMP}/merge1" "${TEMP}/unchanged" "${TEMP}/conflicts" "${TEMP}/local-only" "${TEMP}/other-only"

# Create a template to only output messages that are actually needed (union of messages on local and other create the template!)
status "Computing template and applying it to merge result ..."
msgcat --force-po -o - "${TEMP}/local" "${TEMP}/other" | apply_po_template -o "${TEMP}/merge2" "${TEMP}/merge1" -

# Final merge result is merge2 with original header
status "Fixing the header after merge ..."
msgcat --force-po $MSGCAT_FINAL_FLAGS -o "${TEMP}/merge3" --use-first "${TEMP}/header" "${TEMP}/merge2"

# Produce output file (overwrites input LOCAL file because git expects that for the results)
status "Saving output ..."
mv "${TEMP}/merge3" "$OUTPUT"

status "Cleaning up ..."

rm "${TEMP}"/*
rmdir "${TEMP}"

status "Checking for conflicts in the result ..."

# Check for conflicts in the final merge
if grep -q '#-#-#-#-#' "$OUTPUT"
then
    if test "$VERBOSITY" -ge 1
    then
        printf "### Conflict(s) detected ###\n"
    fi

    if test "$VERBOSITY" -ge 2
    then
        # Verbose diagnostics
        show_conflicts "$OUTPUT"
    fi

    status "Automatic merge failed, exiting with status 1."
    exit 1
fi

status "Automatic merge completed successfully, exiting with status 0."
exit 0
Run Code Online (Sandbox Code Playgroud)

此变体基于 @mezis 在同一问题中的答案中的版本,但它有以下改进:

  • 尽可能在多个 CPU 上并行运行(分发到多个 CPU 是通过在后台运行多个管道,然后使用 同步&所有并行管道来完成的wait。最终合并需要顺序代码,因此仅在一个 CPU 核心上运行。合并速度给出的 .PO 输入似乎约为 1 MB/s。
  • 添加大量文档。
  • 在开始处添加可配置变量来定义最终的 gettext 文件格式。在上面的示例中,默认配置是--no-wrap --sort-output.
  • 对所有临时文件使用不带文件扩展名的逻辑名称,以便更容易理解 gettext 合并冲突。
  • 使用合并驱动程序中的新git选项%P将正确的文件名作为参数传递。当合并的文件内容与项目中的另一个文件匹配时,这是必需的 - 在这种情况下,与文件内容 SHA-1 匹配的旧代码可能会打印错误的文件名。请注意,%P必须在 git config 中使用(请参阅文件开头的文档)。
  • 避免使用perl,awksed来修改甚至读取 gettext 文件 - 只是 gettext 工具。可选部分使用grep,tailwc仅显示与标准输出的详细冲突,但不处理输出文件中的实际数据。
  • 正确合并不同复数形式发生变化的情况(合并将导致翻译中发生冲突,但不应丢失任何内容)。
  • 请注意,如果墓地中有合并冲突(以#~这些冲突开头的行将被默默删除,而不是尝试合并此类情况)。不冲突的墓地数据将被保留。
  • 请注意,这不会尝试在合并之前或之后进行任何模糊匹配。有时这可以改善结果,但这取决于启发式方法,并且此合并驱动程序试图具有确定性。