zac*_*ach 31 python matplotlib pandas
ggplot2在R 中使用库我最喜欢的一个方面是能够轻松指定美学.我可以快速制作一个散点图并应用与特定列关联的颜色,我希望能够使用python/pandas/matplotlib执行此操作.我想知道是否有任何便利功能,人们用它来使用pandas数据帧和Matplotlib将颜色映射到值?
##ggplot scatterplot example with R dataframe, `df`, colored by col3
ggplot(data = df, aes(x=col1, y=col2, color=col3)) + geom_point()
##ideal situation with pandas dataframe, 'df', where colors are chosen by col3
df.plot(x=col1,y=col2,color=col3)
Run Code Online (Sandbox Code Playgroud)
编辑:感谢您的回复,但我想包含一个示例数据框,以澄清我的要求.两列包含数字数据,第三列是分类变量.我想的脚本将根据此值指定颜色.
import pandas as pd
df = pd.DataFrame({'Height':np.random.normal(10),
'Weight':np.random.normal(10),
'Gender': ["Male","Male","Male","Male","Male",
"Female","Female","Female","Female","Female"]})
Run Code Online (Sandbox Code Playgroud)
Pau*_*l H 52
Seaborn精彩地处理这个用例:
import numpy
import pandas
from matplotlib import pyplot
import seaborn
seaborn.set(style='ticks')
numpy.random.seed(0)
N = 37
_genders= ['Female', 'Male', 'Non-binary', 'No Response']
df = pandas.DataFrame({
'Height (cm)': numpy.random.uniform(low=130, high=200, size=N),
'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N),
'Gender': numpy.random.choice(_genders, size=N)
})
fg = seaborn.FacetGrid(data=df, hue='Gender', hue_order=_genders, aspect=1.61)
fg.map(pyplot.scatter, 'Weight (kg)', 'Height (cm)').add_legend()
Run Code Online (Sandbox Code Playgroud)
哪个立即输出:
在这种情况下,我会直接使用matplotlib.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'):
fig, ax = plt.subplots()
categories = np.unique(df[catcol])
colors = np.linspace(0, 1, len(categories))
colordict = dict(zip(categories, colors))
df["Color"] = df[catcol].apply(lambda x: colordict[x])
ax.scatter(df[xcol], df[ycol], c=df.Color)
return fig
if 1:
df = pd.DataFrame({'Height':np.random.normal(size=10),
'Weight':np.random.normal(size=10),
'Gender': ["Male","Male","Unknown","Male","Male",
"Female","Did not respond","Unknown","Female","Female"]})
fig = dfScatter(df)
fig.savefig('fig1.png')
Run Code Online (Sandbox Code Playgroud)
这给了我:
据我所知,该颜色列可以是任何matplotlib兼容颜色(RBGA元组,HTML名称,十六进制值等).
我无法获得除数字地图以外的任何数字值.
实际上你可以使用ggplot for python:
from ggplot import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Height':np.random.randn(10),
'Weight':np.random.randn(10),
'Gender': ["Male","Male","Male","Male","Male",
"Female","Female","Female","Female","Female"]})
ggplot(aes(x='Height', y='Weight', color='Gender'), data=df) + geom_point()
Run Code Online (Sandbox Code Playgroud)
https://seaborn.pydata.org/generated/seaborn.scatterplot.html
import numpy
import pandas
import seaborn as sns
numpy.random.seed(0)
N = 37
_genders= ['Female', 'Male', 'Non-binary', 'No Response']
df = pandas.DataFrame({
'Height (cm)': numpy.random.uniform(low=130, high=200, size=N),
'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N),
'Gender': numpy.random.choice(_genders, size=N)
})
sns.scatterplot(data=df, x='Height (cm)', y='Weight (kg)', hue='Gender')
Run Code Online (Sandbox Code Playgroud)
小智 6
您可以使用plot 方法的color参数来定义每列所需的颜色。例如:
from pandas import DataFrame
data = DataFrame({'a':range(5),'b':range(1,6),'c':range(2,7)})
colors = ['yellowgreen','cyan','magenta']
data.plot(color=colors)
Run Code Online (Sandbox Code Playgroud)

您可以使用颜色名称或颜色十六进制代码,例如 '#000000' 表示黑色。您可以在 matplotlib 的 color.py 文件中找到所有定义的颜色名称。下面是 matplotlib 的 github 存储库中 color.py 文件的链接。
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/colors.py
datetime dtype.pandas.DataFrame.plot and matplotlib.pyplot.scatter can take a c or color parameter, which must be a color, a sequence of colors, or a sequence of numbers.python 3.8, pandas 1.3.1, and matplotlib 3.4.2cmap options.'Date' is already a datetime64[ns] dtype from DataReaderconda install -c anaconda pandas-datareader or pip install pandas-datareader depending on your environment.import pandas as pd
import matplotlib.pyplot as plt
import pandas_datareader as web # for data; not part of pandas
tickers = 'amzn'
df = web.DataReader(ticker, data_source='yahoo', start='2018-01-01', end='2021-01-01').reset_index()
df['ticker'] = ticker
Date High Low Open Close Volume Adj Close ticker
0 2018-01-02 1190.00000 1170.510010 1172.000000 1189.010010 2694500 1189.010010 amzn
1 2018-01-03 1205.48999 1188.300049 1188.300049 1204.199951 3108800 1204.199951 amzn
Run Code Online (Sandbox Code Playgroud)
c as a numberpandas.DataFrame.plotdf.Date.dt.month creates a pandas.Series of month numbersax = df.plot(kind='scatter', x='Date', y='High', c=df.Date.dt.month, cmap='Set3', figsize=(11, 4), title='c parameter as a month number')
plt.show()
Run Code Online (Sandbox Code Playgroud)
matplotlib.pyplot.scatterfig, ax = plt.subplots(figsize=(11, 4))
ax.scatter(data=df, x='Date', y='High', c=df.Date.dt.month, cmap='Set3')
ax.set(title='c parameter as a month number', xlabel='Date', ylabel='High')
plt.show()
Run Code Online (Sandbox Code Playgroud)
c as a datetime dtypepandas.DataFrame.plotax = df.plot(kind='scatter', x='Date', y='High', c='Date', cmap='winter', figsize=(11, 4), title='c parameter as a datetime dtype')
plt.show()
Run Code Online (Sandbox Code Playgroud)
matplotlib.pyplot.scatterfig, ax = plt.subplots(figsize=(11, 4))
ax.scatter(data=df, x='Date', y='High', c='Date', cmap='winter')
ax.set(title='c parameter as a datetime dtype', xlabel='Date', ylabel='High')
plt.show()
Run Code Online (Sandbox Code Playgroud)