我是一名python程序员,正在从K&R书中学习C语言.这似乎是一个非常微不足道的问题,但我仍然难过.下面是K&R(RIP Ritchie!)一书中的代码片段,它实现了atoi()函数.
atoi(s) /*convert s to integer */
char s[];
{
int i, n, sign;
for (i=0; s[i]==' '||s[i] == '\n' || s[i] == '\t'; i++)
; /* skip whitespace */
sign = 1;
if (s[i] == '+' || s[i] = '-') /* sign */
sign = (s[i++] == '+') ? 1 : -1;
for (n=0; s[i] >= '0' && s[i] <= '9'; i++)
n = 10 * n + s[i] - '0';
return (sign * n);
}
Run Code Online (Sandbox Code Playgroud)
我的问题:
1)除了计算有效字符的数量之外,第一个'for'循环是否可用于任何目的? …
说我的数据看起来像这样:
date,name,id,dept,sale1,sale2,sale3,total_sale
1/1/17,John,50,Sales,50.0,60.0,70.0,180.0
1/1/17,Mike,21,Engg,43.0,55.0,2.0,100.0
1/1/17,Jane,99,Tech,90.0,80.0,70.0,240.0
1/2/17,John,50,Sales,60.0,70.0,80.0,210.0
1/2/17,Mike,21,Engg,53.0,65.0,12.0,130.0
1/2/17,Jane,99,Tech,100.0,90.0,80.0,270.0
1/3/17,John,50,Sales,40.0,50.0,60.0,150.0
1/3/17,Mike,21,Engg,53.0,55.0,12.0,120.0
1/3/17,Jane,99,Tech,80.0,70.0,60.0,210.0
Run Code Online (Sandbox Code Playgroud)
我想要一个新列average,这是total_sale每个name,id,dept元组的平均值
我试过了
df.groupby(['name', 'id', 'dept'])['total_sale'].mean()
Run Code Online (Sandbox Code Playgroud)
这确实返回了一个平均值的系列:
name id dept
Jane 99 Tech 240.000000
John 50 Sales 180.000000
Mike 21 Engg 116.666667
Name: total_sale, dtype: float64
Run Code Online (Sandbox Code Playgroud)
但我如何参考数据?该系列是一维形状(3,).理想情况下,我希望将其放回具有适当列的数据框中,以便我可以正确引用name/id/dept.
我在玩 BeautilfulSoup,我正在寻找一种方法来在 JS 元素中获取特定的 json 字符串。
这是JS:
<script>window.pinball = window.pinball || [];
window.pinball.push(['add', {"srp_cleanup":"inactive","book_visit":"inactive","my_visits":"inactive"}]);
window.Rent = window.Rent || {};
window.Rent.zutron = {"error_div":".js-generic-error","host":"rent","user_type":null,"zid":null,"origin":null,"provider":null};
window.Rent.book_visit = {"book_visit_host":"http://bookavisit.prod.services.rentpath.com"}
window.Rent.tagging = {"tealium":{"env":"prod","profile":"tealium.rent.com","account":"rentpath"}};
window.Rent.realm = "rent";
window.Rent.data = {"floorplans":{"1159255":{"availability":"1 Unit Available","availability_class":"floorplan-available-now","unitstyle":"aa1- 1 Bed/1 Bath","deposit":"","floorplan_id":1159255,"bed":"1 bed","listing_id":"571535","bath":"1 bath","sqft":"763 sqft","rent":"$1950 - $2322 /mo","propertyname":"Reading Commons","fp3dunfurnished":"http://image.rent.com/imgr/52ad5930427b3e739676240c01b7d6cc/650-","fp3dfurnished":"http://image1.rent.com/imgr/07733fbd8c8a6a9134d5e0af77d52cb2/650-","floorplanimage":"http://image.rent.com/imgr/44c2395728fa733c2682506d96ec68f5/650-"},"1159257":{"availability":"2 Units Available","availability_class":"floorplan-available-now","unitstyle":"aa3- 1 Bed/1 Bath","deposit":"","floorplan_id":1159257,"bed":"1 bed","listing_id":"571535","bath":"1 bath","sqft":"893 sqft","rent":"$1995 - $2531 /mo","propertyname":"Reading Commons","fp3dunfurnished":"http://image.rent.com/imgr/187753b2e7e6beb5aaf8602514361d89/650-","fp3dfurnished":"http://image.rent.com/imgr/55673aa4253387f0d06aa02495ccf2bc/650-","floorplanimage":"http://image.rent.com/imgr/389adb5ac1fa61c56aa04c88fe97c02f/650-"},"1159259":{"availability":"UNAVAILABLE","availability_class":"floorplan-available-later","unitstyle":"aa5- 1 Bed/1 Bath","deposit":"","floorplan_id":1159259,"bed":"1 bed","listing_id":"571535","bath":"1 bath","sqft":"899 sqft","rent":"Contact for Pricing","propertyname":"Reading Commons","floorplanimage":"http://image.rent.com/imgr/24059a4611740bd58436236758d65e20/650-"},"1159256":{"availability":"UNAVAILABLE","availability_class":"floorplan-available-later","unitstyle":"aa2- 1 Bed/1 Bath","deposit":"","floorplan_id":1159256,"bed":"1 bed","listing_id":"571535","bath":"1 bath","sqft":"880 sqft","rent":"Contact for Pricing","propertyname":"Reading Commons","floorplanimage":"http://image1.rent.com/imgr/0854a95e69c0b75ee0b13c41db2f31f1/650-"},"1159258":{"availability":"UNAVAILABLE","availability_class":"floorplan-available-later","unitstyle":"aa4- 1 Bed/1 Bath","deposit":"","floorplan_id":1159258,"bed":"1 bed","listing_id":"571535","bath":"1 …Run Code Online (Sandbox Code Playgroud) 假设我有一个主程序(test.py)和一个小实用程序(test_utils.py),该程序具有由主程序调用的辅助函数。我想通过传递通过debug_flag读取的布尔值来打开代码中的调试语句argparse。
现在,我希望test_utils.py程序中的函数也根据的值打印调试语句debug_flag。我总是可以将其debug_flag作为参数添加到每个函数定义中,test_utils.py并在调用函数时传递参数,但是这里有没有更好的方法,例如制作 debug_flag全局变量?但是,如果我声明debug_flag要从中进行全局化 test.py,将如何将其导入 test_utils.py?
这里最优雅/ Pythonic的方法是什么?
test.py:
import argparse
from test_utils import summation
def main():
args = get_args()
debug_flag = True if args[debug] == 'True' else False
print summation(5, 6, 7)
def get_args():
parser = argparse.ArgumentParser(description='Test program')
parser.add_argument('-d','--debug', help='Debug True/False', default=False)
args = vars(parser.parse_args())
return args
Run Code Online (Sandbox Code Playgroud)
test_utils.py:
from test import debug_flag
def summation(x, y, z):
if debug_flag:
print 'I am going to …Run Code Online (Sandbox Code Playgroud) 我在Python中编写了几个用于生成阶乘的模块,我想测试运行时间.我在这里找到了一个分析的例子,我使用该模板来分析我的模块:
import profile #fact
def main():
x = raw_input("Enter number: ")
profile.run('fact(int(x)); print')
profile.run('factMemoized(int(x)); print')
def fact(x):
if x == 0: return 1
elif x < 2: return x
else:
return x * fact(x-1)
def factMemoized(x):
if x == 0: return 1
elif x < 2: return x
dict1 = dict()
dict1[0] = 1
dict1[1] = 1
for i in range (0, x+1):
if dict1.has_key(i): pass
else: dict1[i] = i * dict1[i-1]
return dict1[x]
if __name__ == "__main__": …Run Code Online (Sandbox Code Playgroud) 我们假设我有一个表格如下:
Date Sales
09/01/2017 9000
09/02/2017 12000
09/03/2017 0
09/04/2017 11000
09/05/2017 14400
09/06/2017 0
09/07/2017 0
09/08/2017 21000
09/09/2017 15000
09/10/2017 23100
09/11/2017 0
09/12/2017 32000
09/13/2017 8000
Run Code Online (Sandbox Code Playgroud)
表中的值是由我无法访问的R程序估计的(现在是黑盒子).现在有几天有0值,由于我们的摄取/ ETL过程中的问题,这些值往往会蔓延.我需要用0数据估计日期的值.
我们的方法是:
现在,如果只有一天在两个好日子之间缺少数据,那么直截了当的意思就行了.如果连续两天或多天缺少数据,那么平均值就不起作用,所以我试图制定一种方法来估算多个数据点的值.
这种方法在R中有效吗?我在R总共n00b,所以我不确定这是否可行.
假设我的桌子是这样的:
Name,Subject,Score
Jon,English,80
Amy,Geography,70
Matt,English,90
Jon,Math,100
Jon,History,60
Amy,French,90
Run Code Online (Sandbox Code Playgroud)
有没有一种方法collect_list可以让我得到这样的查询:
Jon: English:80; Math:100; History:60
Amy: Geography:70; French:90
Matt: English:90
Run Code Online (Sandbox Code Playgroud)
编辑:
这里的复杂之处在于collect_listUDF 只允许一个参数,即一列。就像是
SELECT name, collect_list(subject), collect_list(score) from mytable group by name
Run Code Online (Sandbox Code Playgroud)
结果是
Jon | [English,Math,History] | [80,100,60]
Amy | [Geography,French] | [70,90]
Matt | [English] | [90]
Run Code Online (Sandbox Code Playgroud) 假设我有一个MySQL表,我通过MySQLDB访问.我有一个标准
SELECT statement:
sql = "SELECT * FROM EMPLOYEE \
WHERE INCOME > '%d'" % (1000)
Run Code Online (Sandbox Code Playgroud)
然后我用光标执行它并拔出列,如下所示.
cursor.execute(sql)
results = cursor.fetchall()
for row in results:
fname = row[0]
lname = row[1]
age = row[2]
sex = row[3]
income = row[4]
Run Code Online (Sandbox Code Playgroud)
是否可以在单个语句中分配所有列名?就像是:
for row in results:
fname, lname, age, sex, income = unpack(row)
Run Code Online (Sandbox Code Playgroud)
我总能这样做:
fname, lname, age, sex, income = row[0], row[1], row[2], row[3], row[4]
Run Code Online (Sandbox Code Playgroud)
但是我的桌子上有30多列,这很痛苦.请注意,虽然我现在正在使用MySQL,但我希望它尽可能与数据库无关; 我们仁慈的霸主可能决定在任何时候将一切都移植到另一个数据库.
熊猫新手,请耐心等待。
我的数据框的格式
date,name,country,tag,cat,score
2017-05-21,X,US,free,4,0.0573
2017-05-22,X,US,free,4,0.0626
2017-05-23,X,US,free,4,0.0584
2017-05-24,X,US,free,4,0.0563
2017-05-21,X,MX,free,4,0.0537
2017-05-22,X,MX,free,4,0.0640
2017-05-23,X,MX,free,4,0.0648
2017-05-24,X,MX,free,4,0.0668
Run Code Online (Sandbox Code Playgroud)
我试图想出一种方法来找到国家/标签/类别组内的 X 天移动平均线,所以我需要:
date,name,country,tag,cat,score,moving_average
2017-05-21,X,US,free,4,0.0573,0
2017-05-22,X,US,free,4,0.0626,0.0605
2017-05-23,X,US,free,4,0.0584,0.0594
2017-05-24,X,US,free,4,0.0563,and so on
...
2017-05-21,X,MX,free,4,0.0537,and so on
2017-05-22,X,MX,free,4,0.0640,and so on
2017-05-23,X,MX,free,4,0.0648,and so on
2017-05-24,X,MX,free,4,0.0668,and so on
Run Code Online (Sandbox Code Playgroud)
我尝试了按我需要的列分组的方法,然后使用 pd.rolling_mean 但我最终得到了一堆 NaN
df.groupby(['date', 'name', 'country', 'tag'])['score'].apply(pd.rolling_mean, 2, min_periods=2) # window size 2
Run Code Online (Sandbox Code Playgroud)
我将如何正确执行此操作?
假设我有以下数据...
date Score category
2017-01-01 50.0 1
2017-01-01 590.0 2
2017-01-02 30.0 1
2017-01-02 210.4 2
2017-01-03 11.0 1
2017-01-03 50.3 2
Run Code Online (Sandbox Code Playgroud)
所以每天,我有多个类别,每个类别都有一个分数。到目前为止,这是我的代码...
vals = [{'date': '2017-01-01', 'category': 1, 'Score': 50},
{'date': '2017-01-01', 'category': 2, 'Score': 590},
{'date': '2017-01-02', 'category': 1, 'Score': 30},
{'date': '2017-01-02', 'category': 2, 'Score': 210.4},
{'date': '2017-01-03', 'category': 1, 'Score': 11},
{'date': '2017-01-03', 'category': 2, 'Score': 50.3}]
df = pd.DataFrame(vals)
df.date = pd.to_datetime(df['date'], format='%Y-%m-%d')
df.set_index(['date'],inplace=True)
Run Code Online (Sandbox Code Playgroud)
我想要多行,每个类别一行,以及 X 轴上的日期 - 我该怎么做?
python ×4
pandas ×3
argparse ×1
atoi ×1
c ×1
data-quality ×1
debug-mode ×1
globals ×1
group-by ×1
hive ×1
html ×1
html-parsing ×1
json ×1
list ×1
logging ×1
matplotlib ×1
profiling ×1
python-2.6 ×1
r ×1
sql ×1