我每月都有时间序列数据,我想对每个ID进行求和,按月分组,然后将月份名称作为列而不是行.
+----+------------+-------+-------+
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1 | abc | jan | 10 |
| 1 | abc | feb | 20 |
| 2 | def | jan | 10 |
| 2 | def | feb | 5 |
| 1 | abc | jan | 15 |
| 3 | ghi | mar | 15 |
Run Code Online (Sandbox Code Playgroud)
+----+------------+-----+-----+-----+
| id | extra_info | jan | feb | mar …Run Code Online (Sandbox Code Playgroud) 我是 PostgreSQL 的新手,正在使用 9.4 版。我有一个表,其中收集的测量值作为字符串,需要使用始终是最新的内容(例如 VIEW)将其转换为一种 PIVOT 表。
此外,一些值需要转换,例如乘以 1000,如下面的“sensor3”示例所示。
源表:
CREATE TABLE source (
id bigint NOT NULL,
name character varying(255),
"timestamp" timestamp without time zone,
value character varying(32672),
CONSTRAINT source_pkey PRIMARY KEY (id)
);
INSERT INTO source VALUES
(15,'sensor2','2015-01-03 22:02:05.872','88.4')
, (16,'foo27' ,'2015-01-03 22:02:10.887','-3.755')
, (17,'sensor1','2015-01-03 22:02:10.887','1.1704')
, (18,'foo27' ,'2015-01-03 22:02:50.825','-1.4')
, (19,'bar_18' ,'2015-01-03 22:02:50.833','545.43')
, (20,'foo27' ,'2015-01-03 22:02:50.935','-2.87')
, (21,'sensor3','2015-01-03 22:02:51.044','6.56');
Run Code Online (Sandbox Code Playgroud)
源表结果:
| id | name | timestamp | value |
|----+-----------+---------------------------+----------|
| 15 | …Run Code Online (Sandbox Code Playgroud) 我正在使用numpy 0.10.1和pandas 0.17.0的Python 3.4.1。我有一个大型的数据框,其中列出了单个动物的种类和性别。这是一个真实的数据集,不可避免地存在NaN表示的缺失值。数据的简化版本可以生成为:
import numpy as np
import pandas as pd
tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
'species': ["dog","dog",np.nan,"dog","dog","cat","cat","cat","dog","cat","cat","dog","dog","dog","dog",np.nan,"cat","cat","dog","dog"],
'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"]})
Run Code Online (Sandbox Code Playgroud)
打印数据框将得到:
gender id species
0 male 1 dog
1 female 2 dog
2 female 3 NaN
3 male 4 dog
4 male 5 dog
5 female 6 cat
6 female 7 cat
7 NaN 8 cat
8 male 9 dog
9 male 10 cat
10 female 11 cat
11 male 12 dog
12 female 13 dog
13 female 14 …Run Code Online (Sandbox Code Playgroud) 我有一个看起来像这样的数据集:
我想将所有co值汇总在一行中,因此最终结果如下所示:
看起来很容易,对吧?只需crosstab按照本答案中的建议使用 编写查询。问题是这要求我CREATE EXTENSION tablefunc;和我没有对我的数据库的写访问权限。
谁能推荐一个替代方案?
我想制作一个变量与 data.frame 中所有其他变量的交叉表。
library(tidyverse)
library(janitor)
humans <- starwars %>%
filter(species == "Human")
humans %>%
janitor::tabyl(gender, eye_color)
gender blue blue-gray brown dark hazel yellow
female 3 0 5 0 1 0
male 9 1 12 1 1 2
humans %>%
dplyr::select_if(is.character) %>%
dplyr::select(-name, -gender) %>%
purrr::map(.f = ~janitor::tabyl(dat = humans, gender, .x))
Error: Unknown columns `blond`, `none`, `brown`, `brown, grey`, `brown` and ...
Call `rlang::last_error()` to see a backtrace
Run Code Online (Sandbox Code Playgroud) 我很难为PDF 编织的 R Markdown文档创建一个简单漂亮的交叉表。我有一个类似于这个例子的数据集:
library(tidyverse)
fakeData <- tibble(id = c(1,2,3,4,5,6,7,8,9,10),
bmi = c("normal", "overweighted", "underweighted", "normal", "normal", "overweighted",
"normal", "overweighted", "underweighted","normal"),
gender = c("M", "F", "M", "M", "F", "F", "M", "F", "F", "F"))
Run Code Online (Sandbox Code Playgroud)
我想得到这样的输出:
有没有人有一个技巧/一个已知的好包来做到这一点?非常感谢!
statement我在 Postgres 中有下表:
stock | year | statement | amount
ACME | 2003 | Q1 Earnings | 100
ACME | 2003 | Q2 Earnings | 200
ACME | 2004 | Q2 Earnings | 300
Run Code Online (Sandbox Code Playgroud)
如何制作一个新表,将一年的 4 个季度全部排在一行?以及缺失语句的空值。
stock | year | Q1 Earnings | Q2 Earnings | Q3 Earnings | Q4 Earnings
ACME | 2003 | 100 | 200 | Null | Null
ACME | 2004 | NULL | 300 | Null | Null
Run Code Online (Sandbox Code Playgroud)
找到了这个答案:Postgres - Transpose Rows …
我有两个因素,我正在制作它的交叉表.
x<-table(clean$VMail.Plan,clean$International.Plan)
Run Code Online (Sandbox Code Playgroud)
我实际上想以图形方式表示它:
mosaicplot(x,data=clean,color=2:4,legend=TRUE,las = 1)
Run Code Online (Sandbox Code Playgroud)
正常的镶嵌图不是很吸引人,我无法使用ggplot2包使这更具吸引力.要么像热图还是泡泡?
实际的列联表:
No Yes
No 27 239
Yes 243 2159
Run Code Online (Sandbox Code Playgroud) 我正在尝试计算表上的交叉表(恰好是一个简单的物化视图,但这应该无关紧要):
user=# select * from data;
region | date | sum
--------+------------+-----
East | 2010-06-30 | 22
East | 2010-01-31 | 32
East | 2010-02-25 | 12
North | 2010-01-31 | 34
North | 2010-02-25 | 88
South | 2010-01-31 | 52
South | 2010-02-25 | 54
South | 2010-06-30 | 11
West | 2010-06-30 | 15
West | 2010-02-25 | 37
West | 2010-01-31 | 11
(11 rows)
Run Code Online (Sandbox Code Playgroud)
当我尝试使用以下表达式计算数据交叉表时,出现错误:
user=# SELECT * FROM
crosstab('select region, date, sum from x …Run Code Online (Sandbox Code Playgroud) 我们假设我们有这样的数据帧:
df = pd.DataFrame({'key' : ['one', 'two', 'three', 'four'] * 3,
'col' : ['A', 'B', 'C'] * 4,
'val1' : np.random.randn(12),
'val2' : np.random.randn(12),
'val3' : np.random.randn(12)})
Run Code Online (Sandbox Code Playgroud)
key + col 是唯一的关键
我想让col值成为拆分列或交叉列表,最后看起来像这样:
第一个天真的方法pd.crosstab(df.key,df.col)在这里不起作用:
此代码pd.crosstab(df.key,df.col,values = df[['val1', 'val2', 'val3']], aggfunc = np.max)无法运行ValueError: Wrong number of items passed 3, placement implies 1
怎么运作?