我确信这是一个简单的方法,但我无法获得df.loc正确的语法。
import pandas as pd
import numpy as np
d = { 'data' : [4, 2, 7, np.nan, 7, 6, 5, np.nan, 6, 3, np.nan, 2],
'a' : [4, 2, 7, 9, 7, 6, 5, 4, 6, 3, np.nan, 2],
'b' : [4, 2, 7, 11, 7, 6, 5, 2, 6, 3, 3, 2]}
df2 = pd.DataFrame(d)
df2.loc[df2.data == np.nan], min(['a', 'b'])
print df2
Run Code Online (Sandbox Code Playgroud)
我想用np.nan标签“a”和“b”中的最小值替换“数据”中的所有内容。请注意,有时这些值之一也会丢失 ( np.nan)。
结果应该是:
a b data
0 4 4 4 …Run Code Online (Sandbox Code Playgroud) 根据我之前的问题,我现在正试图抓取多个网页(所有页面都包含特定季节的游戏).我也试图刮掉多个父网址(季节):
from selenium import webdriver
import pandas as pd
import time
url = ['http://www.oddsportal.com/hockey/austria/ebel-2014-2015/results/#/page/',
'http://www.oddsportal.com/hockey/austria/ebel-2013-2014/results/#/page/']
data = []
for i in url:
for j in range(1,8):
print i+str(j)
driver = webdriver.PhantomJS()
driver.implicitly_wait(10)
driver.get(i+str(j))
for match in driver.find_elements_by_css_selector("div#tournamentTable tr.deactivate"):
home, away = match.find_element_by_class_name("table-participant").text.split(" - ")
date = match.find_element_by_xpath(".//preceding::th[contains(@class, 'first2')][1]").text
if " - " in date:
date, event = date.split(" - ")
else:
event = "Not specified"
data.append({
"home": home.strip(),
"away": away.strip(),
"date": date.strip(),
"event": event.strip()
})
driver.close()
time.sleep(3)
print …Run Code Online (Sandbox Code Playgroud) 我想要扩展平均值给出不包括当前项目的结果,即项目之前的平均值。这是我要找的:
d = { 'home' : ['A', 'B', 'B', 'A', 'B', 'A', 'A'], 'away' : ['B', 'A','A', 'B', 'A', 'B', 'B'],
'aw' : [1,0,0,0,1,0,np.nan],'hw' : [0,1,0,1,0,1, np.nan]}
df2 = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df2.index = range(1,len(df2) + 1)
df2['homewin_at_home'] = df2.groupby('home')['hw'].apply(pd.expanding_mean)
print df2
Run Code Online (Sandbox Code Playgroud)
结果:
home away hw aw homewin_at_home
1 A B 0 1 0.000000
2 B A 1 0 1.000000
3 B A 0 0 0.500000
4 A B 1 0 0.500000
5 B A 0 1 0.333333 …Run Code Online (Sandbox Code Playgroud)