我正在使用 python pptx 模块自动更新 powerpoint 文件中的值。我可以使用以下代码提取文件中的所有文本:
from pptx import Presentation
prs = Presentation(path_to_presentation)
# text_runs will be populated with a list of strings,
# one for each text run in presentation
text_runs = []
for slide in prs.slides:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
text_runs.append(run.text)
Run Code Online (Sandbox Code Playgroud)
此代码将提取文件中的所有文本,但无法提取 ppt 表中的文本,我想更新其中一些值。我试图从这个问题中实现一些代码:Reading text values in a PowerPoint table using pptx? 但不能。有任何想法吗?谢谢。
这对我有用:
def access_table():
slide = prs.slides[0] #first slide
table = slide.shapes[2].table # maybe 0..n
for r in table.rows:
s = ""
for c in r.cells:
s += c.text_frame.text + " | "
#to write
#c.text_frame.text = "example"
print s
Run Code Online (Sandbox Code Playgroud)
以下代码从幻灯片演示文稿中的表格中提取文本。表格外部演示文稿中的文本被省略,但您也可以修改我的代码以捕获非表格对象中的文本。
import pptx as pptx
from pptx import *
def get_tables_from_presentation(pres):
"""
The input parameter `pres` should receive
an object returned by `pptx.Presentation()`
EXAMPLE:
```
import pptx
p = "C:\\Users\\user\\Desktop\\power_point_pres.pptx"
pres = pptx.Presentation(p)
tables = get_tables_from_presentation(pres)
```
"""
tables = list()
for slide in pres.slides:
for shp in iter(slide.shapes):
if shp.has_table:
table = shp.table
tables.append(table)
return tables
def iter_to_nonempty_table_cells(tbl):
"""
:param tbl: 'pptx.table.Table'
input table is NOT modified
:return: return iterator to non-empty rows
"""
for ridx in range(sum(1 for _ in iter(tbl.rows))):
for cidx in range(sum(1 for _ in iter(tbl.columns))):
cell = tbl.cell(ridx, cidx)
txt = type("")(cell.text)
txt = txt.strip()
if len(txt) > 1:
yield txt
# establish read path
in_file_path = "C:\\Users\\user\\Desktop\\power_point_pres.pptx"
# Open slide-show presentation
pres = Presentation(in_file_path)
# extract tables from slide-show presentation
tables = get_tables_from_presentation(pres)
for tbl in tables:
it = iter_to_nonempty_table_cells(tbl)
print("".join(it))
Run Code Online (Sandbox Code Playgroud)
其他人发布了用伪代码编写的这个问题的半有用答案。他们写了以下内容:
For r = 1 to tbl.rows.count
For c = 1 to tbl.columns.count
tbl.cell(r,c).Shape.Textframe.Text
Run Code Online (Sandbox Code Playgroud)
问题是,那不是Python。
在 python 中,这样的语法是非法的For r = 1 to 10
,我们可以这样写:
for r in range(1, 11):
print(r)
from itertools import *
for r in takewhile(lambda k: k <= 10, count(1)):
print(r)
Run Code Online (Sandbox Code Playgroud)
此外,行索引r = 0从不开始r = 1
表格的左上角tbl.cell(0,0)不是tbl.cell(1,1)
不存在.count行属性或列属性之类的东西。(For r = 1 to tbl.rows.count)没有任何意义,因为不存在这样的事情tbl.rows.count
tbl.cell(r,c).Shape不起作用,因为从类实例化的对象pptx.table._Cell没有名为的属性Shape
cell对象具有以下属性:
fillis_merge_originis_spannedmargin_bottommargin_leftmargin_rightmargin_topmergepartspan_heightspan_widthsplittexttext_framevertical_anchor修复如下所示:
# ----------------------------------------
# BEGIN SYNTACTICALLY INCORRECT CODE
# ----------------------------------------
# For r = 1 to tbl.rows.count
# For c = 1 to tbl.columns.count
# tbl.cell(r,c).Shape.Textframe.Text
# ----------------------------------------
# END SYNTACTICALLY INCORRECT CODE
# BEGIN SYNTACTICALLY CORRECT CODE
# ----------------------------------------
for r in range(sum(1 for row in iter(tbl.rows))):
for c in range(sum(1 for _ in iter(tbl.columns))):
print(tbl.cell(r,c).text)
# ----------------------------------------
# END SYNTACTICALLY CORRECT CODE
# ----------------------------------------
Run Code Online (Sandbox Code Playgroud)
continue在您的原始源代码中,您有以下 for 循环:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
Run Code Online (Sandbox Code Playgroud)
该 for 循环不执行任何操作。
该关键字的意思只是“增加循环计数器并跳转到循环的开头”但是,在循环之后和循环结束之前continue没有代码。continue也就是说,无论如何,循环都会继续,而无需您编写,continue因为它已经位于循环体的末尾。
要了解更多信息,continue请考虑以下示例:
for k in [1, 2, 3, 4, 5]:
print("For k ==", k, "we have k % 2 == ", k % 2)
if not k % 2 == 0:
continue
print("For k ==", k, "we got past the `continue`")
Run Code Online (Sandbox Code Playgroud)
输出是:
For k == 1 we have k % 2 == 1
For k == 2 we have k % 2 == 0
For k == 2 we got past the `continue`
For k == 3 we have k % 2 == 1
For k == 4 we have k % 2 == 0
For k == 4 we got past the `continue`
For k == 5 we have k % 2 == 1
Run Code Online (Sandbox Code Playgroud)
无论使用什么关键字,以下三段代码都打印完全相同的消息continue:
For k == 1 we have k % 2 == 1
For k == 2 we have k % 2 == 0
For k == 2 we got past the `continue`
For k == 3 we have k % 2 == 1
For k == 4 we have k % 2 == 0
For k == 4 we got past the `continue`
For k == 5 we have k % 2 == 1
Run Code Online (Sandbox Code Playgroud)
您的代码将错过更多文本而不仅仅是表格;例如,它不会看到属于组的形状中的文本。
对于表,您需要做几件事:
测试形状以查看形状的 .HasTable 属性是否为 true。如果是这样,您可以使用形状的 .Table 对象来提取文本。从概念上讲,非常空中代码:
For r = 1 to tbl.rows.count
For c = 1 to tbl.columns.count
tbl.cell(r,c).Shape.Textframe.Text ' is what you're after
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8659 次 |
| 最近记录: |