Consider the dataframe containing N columns as shown below. Each entry is an 8-bit integer.
|---------------------|------------------|---------------------|
| Column 1 | Column 2 | Column N |
|---------------------|------------------|---------------------|
| 4 | 8 | 13 |
|---------------------|------------------|---------------------|
| 0 | 32 | 16 |
|---------------------|------------------|---------------------|
Run Code Online (Sandbox Code Playgroud)
I'd like to create a new column with 8-bit entries in each row by randomly sampling each bit of data from the remaining columns. So, the resulting dataframe would look like:
|---------------------|------------------|---------------------|---------------|
| Column 1 | Column 2 | Column N | Sampled |
|---------------------|------------------|---------------------|---------------|
| 4 = (100) | 8 = (1000) | 13 = (1101) | 5 = (0101) |
|---------------------|------------------|---------------------|---------------|
| 0 = (0) | 32 = (100000) | 16 = (10000) | 48 = (110000) |
|---------------------|------------------|---------------------|---------------|
Run Code Online (Sandbox Code Playgroud)
The first entry in the "sampled" column was created by selecting one bit among all possible bits for the same position. For example, the LSB=1 in the first entry was chosen from {0 (LSB from col 1), 0 (LSB from col 2), 1 (LSB from col N)}, and so on.
This is similar to this question but instead of each entry being sampled, each bit needs to be sampled.
What is an efficient way of achieving this, considering the dataframe has a large number of rows and columns? From the similar question, I assume we need a lookup + sample to choose the entry and another sample to choose the bits?
与之前执行示例时的逻辑相同,但这里我在二进制和十进制之间转换两次,取消嵌套,然后连接回结果
df1=df.applymap(lambda x : list('{0:08b}'.format(x)))
df1=unnesting(df1,df1.columns.tolist())
s=np.random.randint(0, df1.shape[1], df1.shape[0])
yourcol=pd.Series(df1.values[np.arange(len(df1)),s]).groupby(df1.index).apply(''.join)
df['Sampled']=yourcol.map(lambda x : int(x,2))
df
Out[268]:
c1 c2 cn Sampled
0 4 8 13 12
1 0 32 16 16
Run Code Online (Sandbox Code Playgroud)
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Run Code Online (Sandbox Code Playgroud)