我正在尝试使用 partition by 子句对数据库中的记录进行重复数据删除 这是我为了进行重复数据删除而运行的查询。它对人口最多的记录进行排名,并保留排名最高的记录。
WITH cteDupes AS
(
--
-- Partition based on contact.owner and email
SELECT ROW_NUMBER() OVER(PARTITION BY contactowner, email
ORDER BY
-- ranking by populated field
case when otherstreet is not null then 1 else 0 end +
case when othercity is not null then 1 else 0 end
) AS RND, *
FROM scontact
where (contact_owner_name__c is not null and contact_owner_name__c<>'') and (email is not null and email<>'')
)
--Rank data and place it into …Run Code Online (Sandbox Code Playgroud) 我有这样的数据
Date LoanOfficer User_Name Loan_Number
0 2017-11-30 00:00:00 Mark Evans underwriterx 1100000293
1 2017-11-30 00:00:00 Kimberly White underwritery 1100004947
2 2017-11-30 00:00:00 DClair Phillips underwriterz 1100007224
Run Code Online (Sandbox Code Playgroud)
我已经创建了像这样的df数据透视表:
pd.pivot_table(df,index=["User_Name","LoanOfficer"],
values=["Loan_Number"],
aggfunc='count',fill_value=0,
columns=["Date"]
)
Run Code Online (Sandbox Code Playgroud)
但是,我需要按日期和月份对日期列进行分组.我正在寻找重新采样数据帧然后应用数据帧的其他解决方案,但它只在月和日中执行.任何帮助,将不胜感激