我想在Pandas数据帧上执行自联接,以便将某些行附加到原始行.每行都有一个标记'i',表示右边应该附加哪一行.
d = pd.DataFrame(['A','B','C'], columns = ['some_col'])
d['i'] = [2,1,1]
In [17]: d
Out[17]:
some_col i
0 A 2
1 B 1
2 C 1
Run Code Online (Sandbox Code Playgroud)
期望的输出:
some_col i some_col_y
0 A 2 C
1 B 1 B
2 C 1 B
Run Code Online (Sandbox Code Playgroud)
也就是说,第2行被附加到第0行,第1行到第1行,第1行到第2行(如i所示).
我对如何去做的想法是
pd.merge(d, d, left_index = True, right_on = 'i', how = 'left')
Run Code Online (Sandbox Code Playgroud)
但它完全产生了其他东西.怎么做正确?
有没有更好的方法来计算给定行在numpy 2D数组中出现的次数
def get_count(array_2d, row):
count = 0
# iterate over rows, compare
for r in array_2d[:,]:
if np.equal(r, row).all():
count += 1
return count
# let's make sure it works
array_2d = np.array([[1,2], [3,4]])
row = np.array([1,2])
count = get_count(array_2d, row)
assert(count == 1)
Run Code Online (Sandbox Code Playgroud) 我想将 numpy 数组中的任意整数转换为连续范围 0...n,如下所示:
source: [2 3 4 5 4 3]
translating [2 3 4 5] -> [0 1 2 3]
target: [0 1 2 3 2 1]
Run Code Online (Sandbox Code Playgroud)
必须有比以下更好的方法:
import numpy as np
"translate arbitrary integers in the source array to contiguous range 0...n"
def translate_ids(source, source_ids, target_ids):
target = source.copy()
for i in range(len(source_ids)):
x = source_ids[i]
x_i = source == x
target[x_i] = target_ids[i]
return target
#
source = np.array([ 2, 3, 4, 5, 4, 3 ])
source_ids …Run Code Online (Sandbox Code Playgroud)