如何使用python从docx文件中提取超链接中的url

Question

如何使用python从docx文件中提取超链接中的url

我一直试图找出如何使用 python 从 docx 文件中获取 url，但没有找到任何东西，我试过 python-docx 和 python-docx2txt，但 python-docx 似乎只提取文本，而python-docx2txt 能够从超链接中提取文本，但不能提取 url 本身。

Answer 1

小智 7

我是 Python 的初学者，有一项任务是使用 Python 更改 .docx 文档中的每个超链接。感谢 Kiran 的代码，它给了我一些猜测、试验和错误的提示，并最终让它工作。这是我拥有并想与其他初学者分享的代码。

# python to change docx URL hyperlinks:
### see: /sf/ask/2833303021/

from docx import Document
from docx.opc.constants import RELATIONSHIP_TYPE as RT

print(" This program changes the hyperlinks detected in a word .docx file \n")

docx_file=input(" Pls input docx filename (without .docx): ")

document = Document(docx_file + ".docx")

rels = document.part.rels

for rel in rels:
   if rels[rel].reltype == RT.HYPERLINK:
      print("\n Origianl link id -", rel, "with detected URL: ", rels[rel]._target)
      new_url=input(" Pls input new URL: ")
      rels[rel]._target=new_url

out_file=docx_file + "-out.docx"

document.save(out_file)

print("\n File saved to: ", out_file)

Run Code Online (Sandbox Code Playgroud)

谢谢你，Lapyiu Ho

Answer 2

小智 1

def iter_hyperlink_rels(rels):
   for rel in rels:
      if rels[rel].reltype == RT.HYPERLINK:
         yield rels[rel]

Run Code Online (Sandbox Code Playgroud)

这将消除该错误。

归档时间：	9 年，7 月前
查看次数：	8665 次
最近记录：	5 年，4 月前