使用库时缩短大型堆栈跟踪

Lon*_*Rob 6 python stack-trace python-3.x

我经常与大型图书馆(例如pandasmatplotlib)合作

这意味着异常通常会产生较长的堆栈跟踪。

由于该错误很少出现在库中,而错误经常出现在我自己的代码中,因此在大多数情况下,我不需要查看库的详细信息。

几个常见的例子:

大熊猫

>>> import pandas as pd
>>> df = pd.DataFrame(dict(a=[1,2,3]))
>>> df['b'] # Hint: there _is_ no 'b'
Run Code Online (Sandbox Code Playgroud)

在这里,我尝试访问未知密钥。这个简单的错误产生一个包含28行的stacktrace:

Traceback (most recent call last):
  File "an_arbitrary_python\lib\site-packages\pandas\core\indexes\base.py", line 2393, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)
KeyError: 'b'

During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "an_arbitrary_python\lib\site-packages\pandas\core\frame.py", line 2062, in __getitem__
        return self._getitem_column(key)
      File "an_arbitrary_python\lib\site-packages\pandas\core\frame.py", line 2069, in _getitem_column
        return self._get_item_cache(key)
      File "an_arbitrary_python\lib\site-packages\pandas\core\generic.py", line 1534, in _get_item_cache
        values = self._data.get(item)
      File "an_arbitrary_python\lib\site-packages\pandas\core\internals.py", line 3590, in get
        loc = self.items.get_loc(item)
      File "an_arbitrary_python\lib\site-packages\pandas\core\indexes\base.py", line 2395, in get_loc
        return self._engine.get_loc(self._maybe_cast_indexer(key))
      File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)
      File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
      File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)
      File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)
    KeyError: 'b'
Run Code Online (Sandbox Code Playgroud)

知道我最终的加入hashtable_class_helper.pxi对我几乎没有帮助。我需要知道我的代码在哪里搞砸了。

Matplotlib

>>> import matplotlib.pyplot as plt
>>> import matplotlib.cm as cm
>>> def foo():
...     plt.plot([1,2,3], cbap=cm.Blues) # cbap is a typo for cmap
...
>>> def bar():
...     foo()
...
>>> bar()
Run Code Online (Sandbox Code Playgroud)

这次,我的关键字参数中有一个错字。但是我仍然必须看到25行堆栈跟踪:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in bar
  File "<stdin>", line 2, in foo
  File "an_arbitrary_python\lib\site-packages\matplotlib\pyplot.py", line 3317, in plot
    ret = ax.plot(*args, **kwargs)
  File "an_arbitrary_python\lib\site-packages\matplotlib\__init__.py", line 1897, in inner
    return func(ax, *args, **kwargs)
  File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_axes.py", line 1406, in plot
    for line in self._get_lines(*args, **kwargs):
  File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_base.py", line 407, in _grab_next_args
    for seg in self._plot_args(remaining, kwargs):
  File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_base.py", line 395, in _plot_args
    seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
  File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_base.py", line 302, in _makeline
    seg = mlines.Line2D(x, y, **kw)
  File "an_arbitrary_python\lib\site-packages\matplotlib\lines.py", line 431, in __init__
    self.update(kwargs)
  File "an_arbitrary_python\lib\site-packages\matplotlib\artist.py", line 885, in update
    for k, v in props.items()]
  File "an_arbitrary_python\lib\site-packages\matplotlib\artist.py", line 885, in <listcomp>
    for k, v in props.items()]
  File "an_arbitrary_python\lib\site-packages\matplotlib\artist.py", line 878, in _update_property
    raise AttributeError('Unknown property %s' % k)
AttributeError: Unknown property cbap
Run Code Online (Sandbox Code Playgroud)

在这里,我发现我结束于artist.py引发的行AttributeError,然后直接在其下方看到AttributeError确实引发了。就信息而言,这没有多少附加值。

在这些琐碎的交互式示例中,您可能只说了“看堆栈跟踪的顶部,而不是底部”,但是通常我的愚蠢的错字发生在一个函数中,因此我感兴趣的行位于这些杂乱无章的堆栈跟踪。

有什么办法可以使这些堆栈跟踪更简洁一些,并帮助我找到问题的根源,而问题的根源总是在我自己的代码中,而不是在我碰巧使用的库中?

Jon*_*tts 4

您可以使用回溯来更好地控制异常打印。例如:

import pandas as pd
import traceback

try:
    df = pd.DataFrame(dict(a=[1,2,3]))
    df['b']

except Exception, e:
    traceback.print_exc(limit=1)
    exit(1)
Run Code Online (Sandbox Code Playgroud)

这会触发标准异常打印机制,但仅显示堆栈跟踪的第一帧(这是您在示例中关心的帧)。对我来说这会产生:

Traceback (most recent call last):
  File "t.py", line 6, in <module>
    df['b']
KeyError: 'b'
Run Code Online (Sandbox Code Playgroud)

显然,您会丢失上下文,这在调试您自己的代码时非常重要。如果我们想要变得更奇特,我们可以尝试设计一个测试,看看回溯应该走多远。例如:

def find_depth(tb, continue_test):
    depth = 0

    while tb is not None:
        filename = tb.tb_frame.f_code.co_filename

        # Run the test we're given against the filename
        if not continue_test(filename):
            return depth

        tb = tb.tb_next
        depth += 1
Run Code Online (Sandbox Code Playgroud)

我不知道你如何组织和运行你的代码,但也许你可以这样做:

import pandas as pd
import traceback
import sys

def find_depth():
    # ... code from above here ...

try:
    df = pd.DataFrame(dict(a=[1, 2, 3]))
    df['b']

except Exception, e:
    traceback.print_exc(limit=get_depth(
        sys.exc_info()[2],
        # The test for which frames we should include
        lambda filename: filename.startswith('my_module')
    ))
    exit(1)
Run Code Online (Sandbox Code Playgroud)