Adding new column with most popular string value in each row in Pandas DataFrame

Ger*_*rry 2 string dataframe python-3.x pandas

I have a very large (15 million rows) pandas dataframe df with sample being given below:

import pandas as pd
df = pd.DataFrame({'a':['ar', 're' ,'rw', 'rew', 'are'], 'b':['gh', 're', 'ww', 'rew', 'all'], 'c':['ar', 're', 'ww', '', 'different']})
df
     a    b          c
0   ar   gh         ar
1   re   re         re
2   rw   ww         ww
3  rew  rew         
4  are  all  different
Run Code Online (Sandbox Code Playgroud)

I want to add another column d which has the most common value from the other 3 columns (could be 4 or 5 columns in actual dataframe), viz., a, b, c in this case. So output will look like df as follows:

     a    b          c     d
0   ar   gh         ar    ar
1   re   re         re    re
2   rw   ww         ww    ww
3  rew  rew              rew
4  are  all  different    
Run Code Online (Sandbox Code Playgroud)

What is the most efficient way to achieve it without using lambda function that can be pretty slow (45 mins to an hour) given the size of df is 15 million rows.

ank*_*_91 5

IIUC, you need:

m = df.mode(axis=1).iloc[:,0]
df['d'] = m.mask(df.nunique(1).eq(df.shape[1])) #for all are different condition
Run Code Online (Sandbox Code Playgroud)

For a faster alternative:

df['d'] = np.where(df.nunique(1).eq(df.shape[1]),np.nan,df.mode(axis=1).iloc[:,0])
Run Code Online (Sandbox Code Playgroud)
     a    b          c    d
0   ar   gh         ar   ar
1   re   re         re   re
2   rw   ww         ww   ww
3  rew  rew             rew
4  are  all  different  NaN
Run Code Online (Sandbox Code Playgroud)