Python Scatter Plot - with smooth Line

Vai*_*ngh 0 python regression scatter-plot dataframe pandas

I have a dataframe as below (obtained after lot of preprocessing)

Please find dataframe

d = {'token': {361: '180816_031', 119: '180816_031', 101: '180816_031', 135: '180816_031', 292: '180816_031',
           133: '180816_031', 99: '180816_031', 270: '180816_031', 19: '180816_031', 382: '180816_031',
           414: '180816_031', 267: '180816_031', 218: '180816_031', 398: '180816_031', 287: '180816_031',
           155: '180816_031', 392: '180816_031', 265: '180816_031', 239: '180816_031', 237: '180816_031'},
 'station': {361: 'deneb', 119: 'callisto', 101: 'callisto', 135: 'callisto', 292: 'callisto', 133: 'deneb',
             99: 'callisto', 270: 'callisto', 19: 'deneb', 382: 'callisto', 414: 'deneb', 267: 'callisto',
             218: 'deneb', 398: 'callisto', 287: 'deneb', 155: 'deneb', 392: 'deneb', 265: 'callisto',
             239: 'callisto', 237: 'callisto'},
 'cycle_number': {361: 'cycle09', 119: 'cycle06', 101: 'cycle04', 135: 'cycle01', 292: 'cycle04', 133: 'cycle05',
                  99: 'cycle06', 270: 'cycle07', 19: 'cycle04', 382: 'cycle08', 414: 'cycle04', 267: 'cycle10',
                  218: 'cycle07', 398: 'cycle08', 287: 'cycle09', 155: 'cycle08', 392: 'cycle06', 265: 'cycle02',
                  239: 'cycle09', 237: 'cycle07'},
 'variable': {361: 'adj_high_quality_reads', 119: 'short_pass', 101: 'short_pass', 135: 'cell_mask_bilayers_sum',
              292: 'adj_active_polymerase', 133: 'cell_mask_bilayers_sum', 99: 'short_pass',
              270: 'adj_active_polymerase', 19: 'Unnamed: 0', 382: 'adj_high_quality_reads',
              414: 'num_align_high_quality_reads', 267: 'adj_active_polymerase', 218: 'adj_single_pores',
              398: 'num_align_high_quality_reads', 287: 'adj_active_polymerase', 155: 'cell_mask_bilayers_sum',
              392: 'num_align_high_quality_reads', 265: 'adj_active_polymerase', 239: 'adj_single_pores',
              237: 'adj_single_pores'},
 'value': {361: 99704.0, 119: 2072785.0, 101: 2061059.0, 135: 1682208.0, 292: 675306.0, 133: 1714292.0,
           99: 2072785.0, 270: 687988.0, 19: 19.0, 382: np.nan, 414: 285176.0, 267: 86914.0, 218: 948971.0,
           398: 405196.0, 287: 137926.0, 155: 1830032.0, 392: 480081.0, 265: 951689.0, 239: 681452.0,
           237: 882671.0}}
Run Code Online (Sandbox Code Playgroud)

Data:

          token   station cycle_number                      variable  \
19   180816_031     deneb      cycle04                    Unnamed: 0   
99   180816_031  callisto      cycle06                    short_pass   
101  180816_031  callisto      cycle04                    short_pass   
119  180816_031  callisto      cycle06                    short_pass   
133  180816_031     deneb      cycle05        cell_mask_bilayers_sum   
135  180816_031  callisto      cycle01        cell_mask_bilayers_sum   
155  180816_031     deneb      cycle08        cell_mask_bilayers_sum   
218  180816_031     deneb      cycle07              adj_single_pores   
237  180816_031  callisto      cycle07              adj_single_pores   
239  180816_031  callisto      cycle09              adj_single_pores   
265  180816_031  callisto      cycle02         adj_active_polymerase   
267  180816_031  callisto      cycle10         adj_active_polymerase   
270  180816_031  callisto      cycle07         adj_active_polymerase   
287  180816_031     deneb      cycle09         adj_active_polymerase   
292  180816_031  callisto      cycle04         adj_active_polymerase   
361  180816_031     deneb      cycle09        adj_high_quality_reads   
382  180816_031  callisto      cycle08        adj_high_quality_reads   
392  180816_031     deneb      cycle06  num_align_high_quality_reads   
398  180816_031  callisto      cycle08  num_align_high_quality_reads   
414  180816_031     deneb      cycle04  num_align_high_quality_reads   

         value  
19        19.0  
99   2072785.0  
101  2061059.0  
119  2072785.0  
133  1714292.0  
135  1682208.0  
155  1830032.0  
218   948971.0  
237   882671.0  
239   681452.0  
265   951689.0  
267    86914.0  
270   687988.0  
287   137926.0  
292   675306.0  
361    99704.0  
382        NaN  
392   480081.0  
398   405196.0  
414   285176.0  
Run Code Online (Sandbox Code Playgroud)

I am trying to create scatterplot with smooth line

fig,ax = plt.subplots()
fig.set_size_inches(16,4)

#to get different colors for each of the `variable` value assign the variable to hue
g2=sns.lmplot(x='cycle_number',y='value',data=df, hue='variable',  size=4, aspect=5)
Run Code Online (Sandbox Code Playgroud)

This code gives plots only one value for scatter plot, however my expected output is as below

Expected Output:

在此处输入图片说明

TRIES RESULT

TRY 1

I tried to create bar plot (with some help) & i was successful but with scatter plot I could not do it

Below code to convert same to bar

df1 = df.groupby(['token','variable']).agg({'value': 'mean'})
df1.reset_index(inplace=True)
df1.sort_values('value',inplace=True,ascending=False)
fig,ax = plt.subplots()
fig.set_size_inches(16,8)
#to get different colors for each of the variable assign the variable to hue
g=sns.barplot(x='token',y='value',data=df1, hue='variable',ax=ax)
#Code for to put legend outside the plot
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
# Put a legend to the right of the current axis
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# Adding respective values to the top of each bar
for p in ax.patches: 
    ax.annotate("%d" % p.get_height(), (p.get_x() + p.get_width() / 2, p.get_height()),
                ha='center', va='center', fontsize=11, color='black', xytext=(0, 10), 
                textcoords='offset points',fontweight='bold')
plt.show()
Run Code Online (Sandbox Code Playgroud)

TRY 2

g2=sns.lmplot(x='cycle_number',y='value',data=df), this gives error 
ValueError: could not convert string to float: 'cycle10'
Run Code Online (Sandbox Code Playgroud)

I know what the error means here but I am feeling helpless trying to replicate to output code

TRY 3:

sns.lmplot('cycle_number', 'value', data=df, hue='variable', fit_reg=False)
Run Code Online (Sandbox Code Playgroud)

OUTPUT GENERATED: Blank GRID

Nih*_*hal 6

用:

sns.pointplot('cycle_number', 'value', data=df, hue='variable')
Run Code Online (Sandbox Code Playgroud)

在此处输入图片说明

文档: https ://seaborn.pydata.org/generated/seaborn.pointplot.html

使用这个 VS 预期输出生成的输出

在此处输入图片说明

预期输出

尝试这个:

df = pd.DataFrame(d)

df['cycle_number'] = df['cycle_number'].str.replace('cycle', '')
df['cycle_number'] = df['cycle_number'].apply(pd.to_numeric)
print(df)

fig, ax = plt.subplots()
fig.set_size_inches(16, 4)
# sns.pointplot('cycle_number', 'value', data=df, hue='variable', err_style="bars", ci=68)
sns.lmplot('cycle_number', 'value', data=df, hue='variable', ci=None, order=2, truncate=True)
# use order = 5 to see more curve
Run Code Online (Sandbox Code Playgroud)

输出 order=2

在此处输入图片说明

根据共享的最新代码输出( for order=2

Final Output - 图例与图形区域重叠

输出 4(用于order=5)*: USING ORDER =5

除了图例与绘图区域重叠之外,图形曲线非常好。