我有一个如下的数据框.
import pandas as pd
import numpy as np
raw_data = {'student':['A','B','C','D','E'],
'score': [100, 96, 80, 105,156],
'height': [7, 4,9,5,3],
'trigger1' : [84,95,15,78,16],
'trigger2' : [99,110,30,93,31],
'trigger3' : [114,125,45,108,46]}
df2 = pd.DataFrame(raw_data, columns = ['student','score', 'height','trigger1','trigger2','trigger3'])
print(df2)
Run Code Online (Sandbox Code Playgroud)
我需要根据多个条件派生Flag列.
我需要将得分和高度列与触发器1-3列进行比较.
标志栏:
如果得分大于等于触发1且高度小于8则红色 -
如果分数大于等于触发2且高度小于8则黄色 -
如果得分大于等于触发3且高度小于8则橙色 -
如果高度大于8,则将其留空
如何在pandas数据框中编写if else条件并派生列?
预期产出
student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 …Run Code Online (Sandbox Code Playgroud) 我有一个如下所示的数据框
import pandas as pd
import numpy as np
raw_data = {'Emp_ID':[144,220,155,200],
'Mgr_ID': [200, 144,200,500],
'Type': ['O','I','I','I'],
'Location' : ['India','UK','UK','US']
}
df2 = pd.DataFrame(raw_data, columns = ['Emp_ID','Mgr_ID', 'Type','Location'])
print(df2)
Run Code Online (Sandbox Code Playgroud)
我想获得他直接/间接报告的经理ID和最终员工ID...假设经理ID 200直接报告144和155并间接报告员工220。所以我想为经理200有单独的3条记录,如下输出..对于其他所有经理 ID 来说都是如此
想要如下输出
我想列出Pandas数据框中所有列中的所有唯一值,并将它们存储在另一个数据框中。我已经尝试过了,但是明智地附加了行,我希望明智地按列。我怎么做?
raw_data = {'student_name': ['Miller', 'Miller', 'Ali', 'Miller'],
'test_score': [76, 75,74,76]}
df2 = pd.DataFrame(raw_data, columns = ['student_name', 'test_score'])
newDF = pd.DataFrame()
for column in df2.columns[0:]:
dat = df2[column].drop_duplicates()
df3 = pd.DataFrame(dat)
newDF = newDF.append(df3)
print(newDF)
Expected Output:
student_name test_score
Ali 74
Miller 75
76
Run Code Online (Sandbox Code Playgroud) 我想按组删除基于百分位 99 值的异常值。
import pandas as pd
df = pd.DataFrame({'Group': ['A','A','A','B','B','B','B'], 'count': [1.1,11.2,1.1,3.3,3.40,3.3,100.0]})
Run Code Online (Sandbox Code Playgroud)
在输出中,我想从 A 组中删除 11.2,从 b 组中删除 100。所以在最终数据集中只有 5 个观察值。
wantdf = pd.DataFrame({'Group': ['A','A','B','B','B'], 'count': [1.1,1.1,3.3,3.40,3.3]})
Run Code Online (Sandbox Code Playgroud)
我试过这个,但我没有得到想要的结果
df[df.groupby("Group")['count'].transform(lambda x : (x<x.quantile(0.99))&(x>(x.quantile(0.01)))).eq(1)]
Run Code Online (Sandbox Code Playgroud) I have a data frame and big function like below and i wanted to apply norm_group function to data frame columns but its taking too much time with apply command. is there any way to reduce the time for this code? currently it's taking 24.4s for each loop.
import pandas as pd
import numpy as np
np.random.seed(1234)
n = 1500000
df = pd.DataFrame()
df['group'] = np.random.randint(1700, size=n)
df['ID'] = np.random.randint(5, size=n)
df['s_count'] = np.random.randint(5, size=n)
df['p_count'] = np.random.randint(5, size=n)
df['d_count'] …Run Code Online (Sandbox Code Playgroud) 我有一个这样的df
user = pd.DataFrame({'User':['101','101','101','102','102','101','101','102','102','102'],'Country':['India','Japan','India','Brazil','Japan','UK','Austria','Japan','Singapore','UK'],'Count':[50,1,2,5,6,89,10.9,10,5,6]})
Run Code Online (Sandbox Code Playgroud)
并将每个用户数据导出到单独的 csv 文件中,如下所示
user_101 = user[user['User'] == '101']
user_102 = user[user['User'] == '102']
user_101.to_csv('user_101.csv',sep=',')
user_102.to_csv('user_102.csv',sep=',')
Run Code Online (Sandbox Code Playgroud)
如何自动执行此操作,而不是手动传递用户 ID,自动从用户列中选取值并导出到相应的用户名文件。谢谢
我有这样的数据框
df = pd.DataFrame({
'User':['101','101','102','102','102','101','102','103','103','103','101'],
'Product':['x','xy','y','z','z','x','y','z','x','y',''],
'Country':['India','India','India','Brazil','India','UK','UK','Brazil','India','UK','USA']})
Run Code Online (Sandbox Code Playgroud)
我需要得到国家明智的独特产品和下面这样的用户
谢谢
如何动态传递变量参数
order = 10100
status = 'Shipped'
df1 = pd.read_sql_query("SELECT * from orders where orderNumber =""" +
str(10100) + """ and status = """ + 'status' +""" order by orderNumber """,cnx)
Run Code Online (Sandbox Code Playgroud)
TypeError:必须为str,而不是int
尽管我转换为字符串有任何想法,但遇到了以上错误?
有没有其他的方式来传递参数?