使用选择的 RangeElements 在 Google Doc 中获取所有嵌套的文本元素

hrz*_*fer 2 google-docs google-apps-script

在与上述类似的文档中,我可以使用以下代码获取所有段落:

var paras = body.getParagraphs();

请注意,上面的代码不仅返回顶级段落,还返回ListItems、Tables 等内部的所有子级段落。

如何在选定范围内做同样的事情?以下代码仅返回顶级元素。

const selection = DocumentApp.getActiveDocument().getSelection();
var rangeElements = selection.getRangeElements();
Run Code Online (Sandbox Code Playgroud)

例如,上表包含 9 个非空段落,如果它们在选择中,我想一一处理它们。

我想要实现的目标类似于通过尽可能多地保留格式、表格、列表项等来翻译选择中的文本。

Chr*_*ris 8

.getRangeElements()返回一个RangeElements数组。范围元素是一个包装对象,用于帮助我们处理部分选择。我们可以调用.getElement()这个数组中的每一项来获取Element 对象,这是一个非常通用的对象,几乎可以代表 Google Doc 的任何部分。Elements有一个.getType()返回ElementType枚举的方法;而且有很多


让我们使用到目前为止我们所知道的来看看 Google Doc 中可能的类型是什么(我已经创建了一个类似于你的 (img)作为示例):

function selectionHasWhichTypes() {
  var doc = DocumentApp.getActiveDocument();
  var selection = doc.getSelection();
  var rangeElems = selection.getRangeElements();

  rangeElems.forEach(function(elem){
    var elem = elem.getElement();

    Logger.log(elem.getType());
  });
}

//Logger OUTPUT:
PARAGRAPH
PARAGRAPH
PARAGRAPH
PARAGRAPH
PARAGRAPH
LIST_ITEM
LIST_ITEM
LIST_ITEM
PARAGRAPH
PARAGRAPH
PARAGRAPH
TABLE
PARAGRAPH
Run Code Online (Sandbox Code Playgroud)

啊哈!看起来我们只需要处理与段落LIST_ITEMElementTypes现在,但让我们保持自己的孩子记住过(我们会发现,这些都是3 5,可以有孩子的)。这听起来像是一个递归函数的工作,它会不断深入到子元素中,直到我们找到并处理它们为止。


所以让我们尝试一下。下一部分可能看起来令人困惑,但本质上它是找到一个元素,检查它是否有子元素,然后查看它们是否有子元素,等等。我们需要检查,如果我们得到新的ElementTypes处理,以及...

function selectionHasWhichTypes() {
  var doc = DocumentApp.getActiveDocument();
  var selection = doc.getSelection();
  var rangeElems = selection.getRangeElements();

  rangeElems.forEach(function(elem){
    var elem = elem.getElement();

    elemsHaveWhatChildElems(elem, elem.getType());

  });
}

function elemsHaveWhatChildElems(elem, typeChain){
  var elemType = elem.getType();
  if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH"){ //Lets see if element is one of our basic 3. If so they could have children.
    var numChildren = elem.getNumChildren(); //How many children are there?
    if(numChildren > 0){
      for(var i = 0; i < numChildren; i++){ //Let's go through them.
        var child = elem.getChild(i);
        elemsHaveWhatChildElems(child, typeChain + "." + child.getType()); //Recursion step to look for more children.
      }
    }else{
       Logger.log(typeChain); //Let's log the chain of Parent to Child elements.
    }
  }else{
    Logger.log("*" + typeChain); //Let's mark the new elemTypeChains we have not seen.
  }
}

//Logger OUTPUT:
*PARAGRAPH.TEXT
PARAGRAPH
*PARAGRAPH.HORIZONTAL_RULE
PARAGRAPH
*PARAGRAPH.TEXT
*LIST_ITEM.TEXT
*LIST_ITEM.TEXT
*LIST_ITEM.TEXT
PARAGRAPH
*PARAGRAPH.TEXT
PARAGRAPH
*TABLE.TABLE_ROW
*TABLE.TABLE_ROW
PARAGRAPH
Run Code Online (Sandbox Code Playgroud)

好的,所以日志的每一行都是 Elements 及其子项的链。我们有一些新的 ElementTypeHORIZONTAL_RULETABLE_ROWTEXT)。如果链只有 aParagraph并且没有子链,则由“PARAGRAPH”指示。我们可以忽略它,因为它是一个空行。我们也可以忽略,HORIZONTAL_RULE因为这显然不包含文本。

如果我们得到了一个 TEXT 元素,这意味着我们可以执行我们的功能(即对于 OP,它将是一个翻译),就像我们对 LIST_ITEM 和 PARAGRAPH 所做的那样。但是,我们仍然需要处理TableRow对象(日志如下:)TABLE.TABLE_ROW。这类似于我们的主要 3 个元素,可以与我们的if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH")which 更改为if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH" || elemType == "TABLE_ROW").

这给了我们链中的另一个新元素;TableCell(日志如:)TABLE.TABLE_ROW.TABLE_CELL,我们可以再次将其添加到我们的 if 语句中:if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH" || elemType == "TABLE_ROW" || elemType == "TABLE_CELL")


是时候看看当我们处理 Table ElementType 时会发生什么了

function selectionHasWhichtypeChains() {
  var doc = DocumentApp.getActiveDocument();
  var selection = doc.getSelection();
  var rangeElems = selection.getRangeElements();

  rangeElems.forEach(function(elem){
    var elem = elem.getElement();

    elemsHaveWhatChildElems(elem, elem.getType());

  });
}

function elemsHaveWhatChildElems(elem, typeChain){
  var elemType = elem.getType();
  if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH" || elemType == "TABLE_ROW" || elemType == "TABLE_CELL"){ //Lets see if element is one of our basic 5 if so they could have children.
    var numChildren = elem.getNumChildren(); //How many children are there?
    if(numChildren > 0){
      for(var i = 0; i < numChildren; i++){ //Let's go through them.
        var child = elem.getChild(i);
        elemsHaveWhatChildElems(child, typeChain + "." + child.getType()); //Recursion step to look for more children.
      }
    }else{
       Logger.log(typeChain); //Let's log the chain of Parent to Child elements.
    }
  }else{
    Logger.log("*" + typeChain); //Let's mark the new elemTypeChains we have not seen.
  }
}

//Logger OUTPUT:
*PARAGRAPH.TEXT
PARAGRAPH
*PARAGRAPH.HORIZONTAL_RULE
PARAGRAPH
*PARAGRAPH.TEXT
*LIST_ITEM.TEXT
*LIST_ITEM.TEXT
*LIST_ITEM.TEXT
PARAGRAPH
*PARAGRAPH.TEXT
PARAGRAPH
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.HORIZONTAL_RULE
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
PARAGRAPH
Run Code Online (Sandbox Code Playgroud)

这很棒!我们已经深入到每个父元素的深处,并且到达文本元素空白段落!从这里我们可以稍微修改我们的代码以添加我们想要在保持文档结构的同时执行的功能:

function myFunction() {
  var doc = DocumentApp.getActiveDocument();
  var selection = doc.getSelection();
  var rangeElems = selection.getRangeElements(); //Get main Elements of selection

  rangeElems.forEach(function(elem){ //Let's rn through each to find ALL of their children.
    var elem = elem.getElement(); //We have an ElementType. Let's get the full element.
    getNestedTextElements(elem, elem.getType()); //Time to go down the rabbit hole.
  });
}

function getNestedTextElements(elem, typeChain){
  var elemType = elem.getType();
  if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH" || elemType == "TABLE_ROW" || elemType == "TABLE_CELL"){ //Lets see if element is one of our basic 5, if so they could have children.
    var numChildren = elem.getNumChildren(); //How many children are there?
    if(numChildren > 0){
      for(var i = 0; i < numChildren; i++){ //Let's go through them.
        var child = elem.getChild(i);
        getNestedTextElements(child, typeChain + "." + child.getType()); //Recursion step to look for more children.
      }
    }
  }else if(elemType == "TEXT"){
    //THIS IS WHERE WE CAN PERFORM OUR OPERATIONS ON THE TEXT ELEMENT
    var text = elem.getText();


  }else{
    Logger.log("*" + typeChain); //Let's log the new elem we dont deal with now - for future proofing.
  }
}
Run Code Online (Sandbox Code Playgroud)

繁荣!完毕。我知道这是一篇很长的文章,但我已经将解决​​方案的每个部分分解成几个部分,以帮助新的 Apps 脚本编码人员了解选择的结构(我猜是文档正文)以及如何在结构出现时修改它非常复杂(许多嵌套元素)。我真的希望这会有所帮助。如果有人看到可以改进的作品,请告诉我。


作为对 OP 的说明:请注意,这不一定处理 Element 的部分选择,但是可以通过稍微修改第一个函数来轻松处理RangeElementisPartial()上的检查