require(raster)
## Function to aggregate
fun.patch <- function(x) {
if (max(table(x)) >= 0.9 * length(x)) {
return(as.vector(which.max(table(x))))
}
else
return(NA)
}
r.lc <- raster(nrows = 100, ncols = 100)
r.lc[] <- 1:6
aggregate(r.lc, fact = c(5,5), fun.patch)
Run Code Online (Sandbox Code Playgroud)
FUN(newX[, i], ...) 中的错误:未使用的参数(na.rm = TRUE)
我只是想摆脱典型的存储库/服务/演示的 N 层架构的舒适区,并开始研究 DDD 和聚合,我不得不承认我有点困惑,希望有人能澄清以下内容例子:
如果我有一个名为 News、NewsImage 和 Customer 的实体,它们都是 EF 可持久对象,如下所示:
public class Customer
{
public virtual int Id { get; set; }
public virtual string Name { get; set; }
}
public class NewsImage
{
public virtual int Id { get; set; }
public virtual byte[] Data { get; set; }
public virtual News News { get; set; }
}
public class News
{
public virtual int Id { get; set; }
public virtual string Name { get; …Run Code Online (Sandbox Code Playgroud) 我想检查一组记录的所有(浮动)值是否相等。就像是
SELECT ..., equal(my_field) FROM my_table WHERE ... GROUP BY ...
Run Code Online (Sandbox Code Playgroud)
哪里equal(my_field)的回报true,如果所有的值my_field相等。
问题在一个简单的数据帧(可下载的 csv)上尝试 groupby ,然后 agg 返回列的聚合值(大小、总和、平均值、标准偏差)。看似简单的问题却出现了出乎意料的具有挑战性的错误。
Top15.groupby('Continent')['Pop Est'].agg(np.mean, np.std...etc)
# returns
ValueError: No axis named <function std at 0x7f16841512f0> for object type <class 'pandas.core.series.Series'>
Run Code Online (Sandbox Code Playgroud)
我想要得到的是一个索引设置为大陆和列的 df ['size', 'sum', 'mean', 'std']
示例代码
import pandas as pd
import numpy as np
# Create df
df = pd.DataFrame({'Country':['Australia','China','America','Germany'],'Pop Est':['123','234','345','456'],'Continent':['Asia','Asia','North America','Europe']})
# group and agg
df = df.groupby('Continent')['Pop Est'].agg('size','sum','np.mean','np.std')
Run Code Online (Sandbox Code Playgroud) 对于 T-SQL (SQL Server 2016)bit 类型,是否有某种方法可以实现逻辑 AND 和逻辑 OR 的聚合等效?例如,使用此表:
CREATE TABLE #Example (id int, category int, isRed bit, isBlue bit)
INSERT INTO #Example VALUES ( 1, 1, 1, 1)
INSERT INTO #Example VALUES ( 2, 1, 0, 0)
INSERT INTO #Example VALUES ( 3, 1, 1, 0)
INSERT INTO #Example VALUES ( 4, 2, 0, 1)
INSERT INTO #Example VALUES ( 5, 2, 0, 1)
INSERT INTO #Example VALUES ( 6, 2, 0, 1)
Run Code Online (Sandbox Code Playgroud)
我想创建一个查询,列表,每个类别,如果任何的isRed …
我正在尝试在 PostgreSql 中标准化日终股票价格。
假设我有一个这样定义的库存表:
create table eod (
date date not null,
stock_id int not null,
split decimal(16,8) not null,
close decimal(12,6) not null,
constraint pk_eod primary key (date, stock_id)
);
Run Code Online (Sandbox Code Playgroud)
此表中的数据可能如下所示:
"date","stock_id","eod_split","close"
"2014-06-13",14010920,"1.00000000","182.560000"
"2014-06-13",14010911,"1.00000000","91.280000"
"2014-06-13",14010923,"1.00000000","41.230000"
"2014-06-12",14010911,"1.00000000","92.290000"
"2014-06-12",14010920,"1.00000000","181.220000"
"2014-06-12",14010923,"1.00000000","40.580000"
"2014-06-11",14010920,"1.00000000","182.250000"
"2014-06-11",14010911,"1.00000000","93.860000"
"2014-06-11",14010923,"1.00000000","40.860000"
"2014-06-10",14010911,"1.00000000","94.250000"
"2014-06-10",14010923,"1.00000000","41.110000"
"2014-06-10",14010920,"1.00000000","184.290000"
"2014-06-09",14010920,"1.00000000","186.220000"
"2014-06-09",14010911,"7.00000000","93.700000"
"2014-06-09",14010923,"1.00000000","41.270000"
"2014-06-06",14010923,"1.00000000","41.480000"
"2014-06-06",14010911,"1.00000000","645.570000"
"2014-06-06",14010920,"1.00000000","186.370000"
"2014-06-05",14010920,"1.00000000","185.980000"
"2014-06-05",14010911,"1.00000000","647.350000"
"2014-06-05",14010923,"1.00000000","41.210000"
...
"2005-03-04",14010920,"1.00000000","92.370000"
"2005-03-04",14010911,"1.00000000","42.810000"
"2005-03-04",14010923,"1.00000000","25.170000"
"2005-03-03",14010923,"1.00000000","25.170000"
"2005-03-03",14010911,"1.00000000","41.790000"
"2005-03-03",14010920,"1.00000000","92.410000"
"2005-03-02",14010920,"1.00000000","92.920000"
"2005-03-02",14010923,"1.00000000","25.260000"
"2005-03-02",14010911,"1.00000000","44.121000"
"2005-03-01",14010920,"1.00000000","93.300000"
"2005-03-01",14010923,"1.00000000","25.280000"
"2005-03-01",14010911,"1.00000000","44.500000"
"2005-02-28",14010923,"1.00000000","25.160000"
"2005-02-28",14010911,"2.00000000","44.860000"
"2005-02-28",14010920,"1.00000000","92.580000"
"2005-02-25",14010923,"1.00000000","25.250000"
"2005-02-25",14010920,"1.00000000","92.800000"
"2005-02-25",14010911,"1.00000000","88.990000"
"2005-02-24",14010923,"1.00000000","25.370000"
"2005-02-24",14010920,"1.00000000","92.640000"
"2005-02-24",14010911,"1.00000000","88.930000"
"2005-02-23",14010923,"1.00000000","25.200000"
"2005-02-23",14010911,"1.00000000","88.230000"
"2005-02-23",14010920,"1.00000000","92.100000"
... …Run Code Online (Sandbox Code Playgroud) aggregrated_table = df_input.groupBy('city', 'income_bracket') \
.agg(
count('suburb').alias('suburb'),
sum('population').alias('population'),
sum('gross_income').alias('gross_income'),
sum('no_households').alias('no_households'))
Run Code Online (Sandbox Code Playgroud)
想按城市和收入等级分组,但在每个城市内,某些郊区有不同的收入等级。我如何按每个城市最常出现的收入等级分组?
例如:
aggregrated_table = df_input.groupBy('city', 'income_bracket') \
.agg(
count('suburb').alias('suburb'),
sum('population').alias('population'),
sum('gross_income').alias('gross_income'),
sum('no_households').alias('no_households'))
Run Code Online (Sandbox Code Playgroud)
将按income_bracket_10 分组
我有一个大的data.frame.data.frame包含很多值.
例如:
df <- data.frame(Company = c('A', 'A', 'B', 'C', 'A', 'B', 'B', 'C', 'C'),
Name = c("Wayne", "Duane", "William", "Rafael", "John", "Eric", "James", "Pablo", "Tammy"),
Age = c(26, 27, 28, 32, 28, 24, 34, 30, 25),
Wages = c(50000, 70000, 70000, 60000, 50000, 70000, 65000, 50000, 50000),
Education.University = c(1, 1, 1, 0, 0, 1, 1, 0, 1),
Productivity = c(100, 120, 120, 95, 88, 115, 100, 90, 120))
Run Code Online (Sandbox Code Playgroud)
我如何汇总我的data.frame?我想分析每家公司的价值观.它必须看起来像:
年龄 - >公司所有员工的平均年龄
工资 - >公司所有员工的平均工资
Education.University …
为了简单起见,我有四个表(A、B、Category 和 Relation),Relation 表将IntensityA 的 A存储在 B 中,Category 存储 B 的类型。
A <--- 关系 ---> B ---> 类别
(所以A和B的关系是n比n,当B和Category的关系是n比1时)
我的类别和需要一个ORM到组关系的记录,然后计算出Sum的Intensity每个(A类,A)(似乎简单到这里),然后我要诠释的最大计算Sum每个类别。
我的代码是这样的:
A.objects.values('B_id').annotate(AcSum=Sum(Intensity)).annotate(Max(AcSum))
Run Code Online (Sandbox Code Playgroud)
哪个抛出错误:
django.core.exceptions.FieldError: Cannot compute Max('AcSum'): 'AcSum' is an aggregate
Run Code Online (Sandbox Code Playgroud)
具有相同错误的Django-group-by包。
有关更多信息,请参阅此 stackoverflow 问题。
我正在使用 Django 2 和 PostgreSQL。
有没有办法使用 ORM 来实现这一点,如果没有,使用原始 SQL 表达式的解决方案是什么?
经过一番挣扎,我发现我写的确实是一个聚合,但是我想要的是找出每个类别中每个A的AcSum的最大值。所以我想我必须在 AcSum Calculation 之后再次对结果进行分组。基于这种见解,我发现了一个堆栈溢出问题,它提出了相同的概念(这个问题是在 1 年,2 个月前提出的,没有任何公认的答案)。将另一个值('id')链接到集合既不能作为 group_by 也不能作为输出属性的过滤器,它会从集合中删除 AcSum。由于按结果集分组的变化,将 AcSum 添加到 values() 也不是一个选项。我想我想做的是根据列内的字段(即id)重新分组查询分组。有什么想法吗?
我想对三列进行分组,然后找到在前三列中重复的所有行的第四个数字列的平均值。我可以通过以下功能实现这一点:
df2 = df.groupby(['col1', 'col2', 'col3'], as_index=False)['col4'].mean()
Run Code Online (Sandbox Code Playgroud)
问题是我还想要第五列,它将聚合由 groupby 函数分组的所有行,我不知道如何在前一个函数之上执行此操作。例如:
df
index col1 col2 col3 col4 col5
0 Week_1 James John 1 when and why?
1 Week_1 James John 3 How?
2 Week_2 James John 2 Do you know when?
3 Week_2 Mark Jim 3 What time?
4 Week_2 Andrew Simon 1 How far is it?
5 Week_2 Andrew Simon 2 Are you going?
CURRENT(with above function):
index col1 col2 col3 col4
0 Week_1 James John 2
1 Week_2 James …Run Code Online (Sandbox Code Playgroud)