我需要ID在每个domain
数据中计算唯一值
ID, domain
123, 'vk.com'
123, 'vk.com'
123, 'twitter.com'
456, 'vk.com'
456, 'facebook.com'
456, 'vk.com'
456, 'google.com'
789, 'twitter.com'
789, 'vk.com'
Run Code Online (Sandbox Code Playgroud)
我尝试df.groupby(['domain', 'ID']).count()
但我想得到
domain, count
vk.com 3
twitter.com 2
facebook.com 1
google.com 1
Run Code Online (Sandbox Code Playgroud) 我有数据
data id url size domain subdomain
13/Jun/2016:06:27:26 30055 https://api.weather.com/v1/geocode/55.740002/37.610001/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 3929 weather.com api.weather.com
13/Jun/2016:06:27:26 30055 https://api.weather.com/v1/geocode/54.720001/20.469999/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 3845 weather.com api.weather.com
13/Jun/2016:06:27:27 3845 https://api.weather.com/v1/geocode/54.970001/73.370003/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 30055 weather.com api.weather.com
13/Jun/2016:06:27:27 30055 https://api.weather.com/v1/geocode/59.919998/30.219999/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 3914 weather.com api.weather.com
13/Jun/2016:06:27:28 30055 https://facebook.com 4005 facebook.com facebook.com
Run Code Online (Sandbox Code Playgroud)
我需要用间隔5分钟对它进行分组.欲望输出
data id url size domain subdomain
13/Jun/2016:06:27:26 30055 https://api.weather.com/v1/geocode/55.740002/37.610001/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 3929 weather.com api.weather.com
13/Jun/2016:06:27:27 3845 https://api.weather.com/v1/geocode/54.970001/73.370003/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 30055 weather.com api.weather.com
13/Jun/2016:06:27:28 30055 https://facebook.com 4005 facebook.com facebook.com
Run Code Online (Sandbox Code Playgroud)
我需要groupby id, subdomain并建立5min
我尝试使用的间隔
print df.groupby([df['data'],pd.TimeGrouper(freq='Min')])
Run Code Online (Sandbox Code Playgroud)
先用分钟分组,但它会返回 TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but …
我有这样的情节
fig = plt.figure()
desire_salary = (df[(df['inc'] <= int(salary_people))])
print desire_salary
# Create the pivot_table
result = desire_salary.pivot_table('city', 'cult', aggfunc='count')
# plot it in a separate step. this returns the matplotlib axes
ax = result.plot(kind='bar', alpha=0.75, rot=0, label="Presence / Absence of cultural centre")
ax.set_xlabel("Cultural centre")
ax.set_ylabel("Frequency")
ax.set_title('The relationship between the wage level and the presence of the cultural center')
plt.show()
Run Code Online (Sandbox Code Playgroud)
我想将此添加到subplot. 我试试
fig, ax = plt.subplots(2, 3)
...
ax = result.add_subplot()
Run Code Online (Sandbox Code Playgroud)
但它返回 AttributeError: 'Series' object has no attribute …