我在SQL Sever 2005中有一个varchar字段,它以"hh:mm"ss.mmmm"格式存储时间值.
我真正想要做的是使用这些时间值的内置聚合函数取平均值.但是,这个:
SELECT AVG(TimeField) FROM TableWithTimeValues
Run Code Online (Sandbox Code Playgroud)
不起作用,因为(当然)SQL不会平均varchars.但是,这个
SELECT AVG(CAST(TimeField as datetime)) FROM TableWithTimeValues
Run Code Online (Sandbox Code Playgroud)
也行不通.就像我所知,SQL不知道如何将只有时间和没有日期的值转换为datetime字段.我已经尝试了各种各样的东西来让SQL将该字段转换为日期时间,但到目前为止,没有运气.
有谁能建议更好的方法?
哪个更有效率?
//Option 1
foreach (var q in baseQuery)
{
m_TotalCashDeposit += q.deposit.Cash
m_TotalCheckDeposit += q.deposit.Check
m_TotalCashWithdrawal += q.withdraw.Cash
m_TotalCheckWithdrawal += q.withdraw.Check
}
//Option 2
m_TotalCashDeposit = baseQuery.Sum(q => q.deposit.Cash);
m_TotalCheckDeposit = baseQuery.Sum(q => q.deposit.Check);
m_TotalCashWithdrawal = baseQuery.Sum(q => q.withdraw.Cash);
m_TotalCheckWithdrawal = baseQuery.Sum(q => q.withdraw.Check);
Run Code Online (Sandbox Code Playgroud)
我想我要问的是,调用Sum将基本列举在列表上吗?所以,如果我四次打电话给Sum,是不是列举了四次列表呢?只做一个foreach不是更有效率所以我只需要列举一次列表吗?
(我正在使用postgres)
是否有任何可用于字符串的聚合函数?
我想写一个查询
select table1.name, join(' - ', unique(table2.horse)) as all_horses
from table1 inner join table2 on table1.id = table2.fk
group by table1.name
Run Code Online (Sandbox Code Playgroud)
鉴于这两个表:
| table1 | | table2 |
| id (pk) | name | | id (pk) | horse | fk |
+---------+-------+ +---------+---------+-------+
| 1 | john | | 1 | redrum | 1 |
| 2 | frank | | 2 | chaser | 1 |
| 3 | cigar | 2 |
Run Code Online (Sandbox Code Playgroud)
查询应该返回: …
我有一个比这里的示例更复杂的查询,但是它只需要返回某个字段在数据集中不会出现多次的行.
ACTIVITY_SK STUDY_ACTIVITY_SK
100 200
101 201
102 200
100 203
Run Code Online (Sandbox Code Playgroud)
在此示例中,我不希望返回任何ACTIVITY_SK100的记录,因为ACTIVITY_SK在数据集中出现两次.
数据是映射表,并且在许多联接中使用,但是这样的多个记录意味着数据质量问题,因此我需要简单地从结果中删除它们,而不是在其他地方导致错误的连接.
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
Run Code Online (Sandbox Code Playgroud)
我尝试过这样的事情:
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
WHERE A.ACTIVITY_SK NOT IN
(
SELECT
A.ACTIVITY_SK,
COUNT(*)
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
GROUP BY A.ACTIVITY_SK
HAVING COUNT(*) > 1
)
Run Code Online (Sandbox Code Playgroud)
但必须有一个较便宜的方式来做到这一点......
我有一个包含三列的DataFrame:
df.groupby('Category')按照这些值进行分组.在每个时间实例,记录两个值:一个具有"True"类别,另一个具有"False"类别.
在每个类别组中,我想计算一个数字并将其存储在每次结果列中.结果是时间t-60与t介于1和3之间的值的百分比.
实现此目的的最简单方法可能是计算该时间间隔内的值的总数rolling_count,然后执行rolling_apply以仅计算该间隔中介于1和3之间的值.
到目前为止,这是我的代码:
groups = df.groupby(['Category'])
for key, grp in groups:
grp = grp.reindex(grp['Time']) # reindex by time so we can count with rolling windows
grp['total'] = pd.rolling_count(grp['Value'], window=60) # count number of values in the last 60 seconds
grp['in_interval'] = ? ## Need to count number of values where 1<v<3 in the last 60 seconds
grp['Result'] = grp['in_interval'] …Run Code Online (Sandbox Code Playgroud) 我想知道如何将aggregatea 转换为data.framea data.table.
例如,我有一个data.table叫mydt:
Date , Time , Value
1899-01-01 , 4:00:00 , 1
1899-01-01 , 4:01:00 , 2
1899-01-01 , 4:02:00 , 3
1899-01-01 , 4:03:00 , 4
1899-01-01 , 4:04:00 , 5
1900-08-22 , 22:00:00 , 101
1900-08-22 , 22:01:00 , 102
2013-08-29 , 4:00:00 , 1000
2013-02-29 , 4:02:00 , 1001
2013-02-29 , 4:03:00 , 1002
Run Code Online (Sandbox Code Playgroud)
我想分组Date以data.table下列格式生成:
Date , Vector(variable length)
1899-02-28, c(1,2,3,4,5)
1900-08-22, …Run Code Online (Sandbox Code Playgroud) 我是新手使用数据表,想要一些帮助聚合一些数据.
Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier
859 04/02/2014 07:55 05/02/2014 15:37 10000 10000 2014 2 0 1
859 07/02/2014 03:16 07/02/2014 03:51 8960.755 8960.755 2014 2 0 2
859 11/02/2014 12:41 13/02/2014 11:56 13635.178 13606.901 2014 2 0 3
859 11/02/2014 13:34 11/02/2014 15:34 13635.178 13635.178 2014 2 13635.178 4
859 12/02/2014 13:46 14/02/2014 09:59 13660.246 13649.278 2014 2 13635.178 5
859 13/02/2014 15:33 13/02/2014 15:42 13606.901 13606.901 2014 2 13660.246 6
859 25/03/2014 14:52 26/03/2014 …Run Code Online (Sandbox Code Playgroud) 是否可以使用使用两列返回一列的自定义函数进行聚合?
假设我有一个数据帧:
x <- c(2,4,3,1,5,7)
y <- c(3,2,6,3,4,6)
group <- c("A","A","A","A","B","B")
data <- data.frame(group, x, y)
data
# group x y
# 1 A 2 3
# 2 A 4 2
# 3 A 3 6
# 4 A 1 3
# 5 B 5 4
# 6 B 7 6
Run Code Online (Sandbox Code Playgroud)
我有我想要在两列(x和y)上使用的函数:
pathlength <- function(xy) {
out <- as.matrix(dist(xy))
sum(out[row(out) - col(out) == 1])
}
Run Code Online (Sandbox Code Playgroud)
我用聚合尝试了以下内容:
out <- aggregate(cbind(x, y) ~ group, data, FUN = pathlength)
out <- aggregate(cbind(x, y) …Run Code Online (Sandbox Code Playgroud) 对不起,我对R很新,但我有一个包含多个玩家游戏日志的数据框.我试图获得每个玩家在所有游戏中积分的斜率系数.我已经看到,aggregate可以使用运营商如sum和average,并得到系数掀起了线性回归的非常简单为好.我如何结合这些?
a <- c("player1","player1","player1","player2","player2","player2")
b <- c(1,2,3,4,5,6)
c <- c(15,12,13,4,15,9)
gamelogs <- data.frame(name=a, game=b, pts=c)
Run Code Online (Sandbox Code Playgroud)
我希望这成为:
name pts slope
player1 -.4286
player2 .08242
Run Code Online (Sandbox Code Playgroud) 我知道这个问题已经被问了很多但是当我解决错误消息并使用HAVING子句时,我仍然收到了可怕的:
An aggregate may not appear in the WHERE clause unless it is in a
subquery contained in a HAVING clause or a select list,
and the column being aggregated is an outer reference.
Run Code Online (Sandbox Code Playgroud)
我做错了什么,在这里?
SELECT
mr.ClubKeyNumber,
COUNT(mr.MonthlyReportID),
SUM(CONVERT(int,mr.Submitted))
FROM MonthlyReport mr
WHERE mr.ReportYear = 2014
AND COUNT(mr.MonthlyReportID) = 12
GROUP BY mr.ClubKeyNumber
HAVING (SUM(CONVERT(int,mr.Submitted))) > 11
Run Code Online (Sandbox Code Playgroud) aggregate ×10
r ×4
data.table ×2
sql ×2
sql-server ×2
casting ×1
dataframe ×1
distance ×1
group-by ×1
having ×1
linq ×1
oracle ×1
pandas ×1
performance ×1
postgresql ×1
python ×1
string ×1
t-sql ×1
where-clause ×1