Mar*_*tin 7 python numpy pandas
The pandas.DataFrame.to_numpy method has a copy argument with the following documentation:
copy : bool, default False
Whether to ensure that the returned value is a not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.
Playing around a bit, it seems like calling to_numpy on data that is both adjacent in memory and not of mixed types, keeps a view. But how do I check whether the resulting numpy array shares the memory with the data frame it was created from, without changing data?
Example of memory sharing:
import pandas as pd
import numpy as np
# some data frame that I expect not to be copied
frame = pd.DataFrame(np.arange(144).reshape(12,12))
array = frame.to_numpy()
array[:] = 0
print(frame)
# Prints:
# 0 1 2 3 4 5 6 7 8 9 10 11
# 0 0 0 0 0 0 0 0 0 0 0 0 0
# 1 0 0 0 0 0 0 0 0 0 0 0 0
# 2 0 0 0 0 0 0 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0 0 0 0 0 0
# 7 0 0 0 0 0 0 0 0 0 0 0 0
# 8 0 0 0 0 0 0 0 0 0 0 0 0
# 9 0 0 0 0 0 0 0 0 0 0 0 0
# 10 0 0 0 0 0 0 0 0 0 0 0 0
# 11 0 0 0 0 0 0 0 0 0 0 0 0
Run Code Online (Sandbox Code Playgroud)
Example not sharing memory:
import pandas as pd
import numpy as np
# some data frame that I expect to be copied
types = [int, str, float]
frame = pd.DataFrame({
i: [types[i%len(types)](value) for value in col]
for i, col in enumerate(np.arange(144).reshape(12,12).T)
})
array = frame.to_numpy()
array[:] = 0
print(frame)
# Prints:
# 0 1 2 3 4 5 6 7 8 9 10 11
# 0 0 12 24.0 36 48 60.0 72 84 96.0 108 120 132.0
# 1 1 13 25.0 37 49 61.0 73 85 97.0 109 121 133.0
# 2 2 14 26.0 38 50 62.0 74 86 98.0 110 122 134.0
# 3 3 15 27.0 39 51 63.0 75 87 99.0 111 123 135.0
# 4 4 16 28.0 40 52 64.0 76 88 100.0 112 124 136.0
# 5 5 17 29.0 41 53 65.0 77 89 101.0 113 125 137.0
# 6 6 18 30.0 42 54 66.0 78 90 102.0 114 126 138.0
# 7 7 19 31.0 43 55 67.0 79 91 103.0 115 127 139.0
# 8 8 20 32.0 44 56 68.0 80 92 104.0 116 128 140.0
# 9 9 21 33.0 45 57 69.0 81 93 105.0 117 129 141.0
# 10 10 22 34.0 46 58 70.0 82 94 106.0 118 130 142.0
# 11 11 23 35.0 47 59 71.0 83 95 107.0 119 131 143.0
Run Code Online (Sandbox Code Playgroud)
有numpy.shares_memory你可以使用:
# Your first example
print(np.shares_memory(array, frame)) # True, they are sharing memory
# Your second example
print(np.shares_memory(array2, frame2)) # False, they are not sharing memory
Run Code Online (Sandbox Code Playgroud)
还有numpy.may_share_memory,它更快但只能用于确保事物不共享内存(因为它只检查边界是否重叠),所以严格来说不回答问题。阅读本文了解差异。
请注意将这些 numpy 函数与 pandas 数据结构一起使用:
第一个示例np.shares_memory(frame, frame)返回True,但False第二个示例返回,可能是因为__array__第二个示例中的数据框方法在幕后创建了一个副本。
| 归档时间: |
|
| 查看次数: |
717 次 |
| 最近记录: |