我有以下类型的数据帧:
Country <- rep(c("USA", "AUS", "GRC"),2)
Year <- 2001:2006
Level <- c("rich","middle","poor",rep(NA,3))
df <- data.frame(Country, Year,Level)
df
Country Year Level
1 USA 2001 rich
2 AUS 2002 middle
3 GRC 2003 poor
4 USA 2004 <NA>
5 AUS 2005 <NA>
6 GRC 2006 <NA>
Run Code Online (Sandbox Code Playgroud)
我想用右列中的最后一个用正确的级别标签填充缺失的值.
所以预期的结果应该是这样的:
Country Year Level
1 USA 2001 rich
2 AUS 2002 middle
3 GRC 2003 poor
4 USA 2004 rich
5 AUS 2005 middle
6 GRC 2006 poor
Run Code Online (Sandbox Code Playgroud) 我想将多重回归的结果以非常特定的格式导出到 Excel 文件中
MWS
data("mtcars")
str(mtcars)
m1<-lm(hp ~ disp, data = mtcars)
m2<-lm(hp ~ disp + wt, data = mtcars)
Run Code Online (Sandbox Code Playgroud)
我发现这种格式最合适:
library(texreg)
screenreg(list(m1, m2))
===================================
Model 1 Model 2
-----------------------------------
(Intercept) 45.73 ** 68.84 *
(16.13) (31.80)
disp 0.44 *** 0.54 ***
(0.06) (0.14)
wt -14.45
(17.10)
-----------------------------------
R^2 0.63 0.63
Adj. R^2 0.61 0.61
Num. obs. 32 32
RMSE 42.65 42.85
===================================
*** p < 0.001, ** p < 0.01, * p < 0.05
Run Code Online (Sandbox Code Playgroud)
我想将上面的内容转换为数据框或类似的内容,以便将其导出到 Excel,并保留其格式。
欢迎其他可以生成类似表格并将其导出到 …
我有以下日期数组
df$Date
[1] "2001-07-31" "2001-08-31" "2001-09-30" "2001-10-31" "2001-11-30" "2001-12-31" "2002-01-31" "2002-02-28"
[9] "2002-03-31" "2002-04-30" "2002-05-31" "2002-06-30" "2002-07-31" "2002-08-31" "2002-09-30" "2002-10-31"
[17] "2002-11-30" "2002-12-31" "2003-01-31" "2003-02-28" "2003-03-31" "2003-04-30" "2003-05-31" "2003-06-30"
[25] "2003-07-31" "2003-08-31" "2003-09-30" "2003-10-31" "2003-11-30" "2003-12-31" "2004-01-31" "2004-02-29"
[33] "2004-03-31" "2004-04-30" "2004-05-31" "2004-06-30" "2004-07-31" "2004-08-31" "2004-09-30" "2004-10-31"
[41] "2004-11-30" "2004-12-31" "2005-01-31" "2005-02-28" "2005-03-31" "2005-04-30" "2005-05-31" "2005-06-30"
[49] "2005-07-31" "2005-08-31" "2005-09-30" "2005-10-31" "2005-11-30" "2005-12-31" "2006-01-31" "2006-02-28"
[57] "2006-03-31" "2006-04-30" "2006-05-31" "2006-06-30" "2006-07-31" "2006-08-31" "2006-09-30" "2006-10-31"
[65] "2006-11-30" "2006-12-31" "2007-01-31" "2007-02-28" "2007-03-31" …Run Code Online (Sandbox Code Playgroud) 我如何在python代码中实现R的case_when函数?
这是R的case_when函数:
https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/case_when
作为最小工作示例,假设我们有以下数据帧(后面是python代码):
import pandas as pd
import numpy as np
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'age': [42, 52, 36, 24, 73],
'preTestScore': [4, 24, 31, 2, 3],
'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
df
Run Code Online (Sandbox Code Playgroud)
假设我们要创建一个名为"老年人"的新列,该列查看"年龄"列并执行以下操作:
if age < 10 then baby
if age >= 10 and age < 20 then kid
if age >=20 and age < 30 then young
if age >= 30 and age …Run Code Online (Sandbox Code Playgroud) 我想在 Python 中生成用于出版物的高质量统计表。
在 Stata 中,可以使用社区贡献的命令系列estout:
sysuse auto, clear
regress mpg weight
estimates store A
regress mpg weight price
estimates store B
regress mpg weight price length
estimates store C
regress mpg weight price length displacement
estimates store D
esttab A B C D, se r2 nonumber mtitle("Model 1" "Model 2" "Model 3" "Model 4")
----------------------------------------------------------------------------
Model 1 Model 2 Model 3 Model 4
----------------------------------------------------------------------------
weight -0.00601*** -0.00582*** -0.00304 -0.00354
(0.000518) (0.000618) (0.00177) (0.00212)
price -0.0000935 …Run Code Online (Sandbox Code Playgroud) 我有一个条形图,我还想包括一些显示它们之间百分比差异的线条,如下图所示:
图中的线条只是为了说明我理想中想要的东西。
有人可以帮我弄这个吗?
这是复制图形的数据框:
structure(list(shares = c(0.39, 3.04, 9.32, 22.29, 64.97, 0.01,
0.11, 5.83, 21.4, 72.64), quantile = structure(c(4L, 1L, 2L,
3L, 5L, 4L, 1L, 2L, 3L, 5L), .Label = c("2nd Quantile", "3rd Quantile",
"4nd Quantile", "Poorest 20%", "Richest 20%"), class = "factor"),
case = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L
), .Label = c("No Debt", "With Debt"), class = "factor")), row.names = c(NA,
-10L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
这是我用来制作条形图的代码:
ggplot(df_cum, aes(fill = case , quantile, shares)) …Run Code Online (Sandbox Code Playgroud) 我有以下数据帧
a <- c(rep("CGR", 6), rep("AUS", 6), rep("ROW", 6) )
b <- c("AUT", "CH", "ROW", "ROW", "ROW", "ROW", "AUT", "CH", "ROW", "ROW", "ROW", "ROW", "AUT", "CH", "ROW", "ROW", "ROW", "ROW" )
v <- 1:18
category <- c("a", "b", "a", "a", "b", "b", "a", "b", "a", "a", "b", "b", "a", "b", "a", "a", "b", "b")
data.frame(a,b,v,category)
a b v category
1 CGR AUT 1 a
2 CGR CH 2 b
3 CGR ROW 3 a
4 CGR ROW 4 a
5 …Run Code Online (Sandbox Code Playgroud) 我想从列表中过滤掉所有没有特定元素的元组。具体来说,假设:
mylist = [("sagap", "apple", "orange"), ("apple", "orange", "crazy"), ("crazy", "orange"), ("orange", "banana", "do", "does"), ("apple", "do", "does", "something", "response")]
Run Code Online (Sandbox Code Playgroud)
我想排除/删除列表中所有元组中不包含"apple"' and “orange”的元组
预期的结果应该是一个包含元组的新列表,如下所示:
mylist_new = [("sagap", "apple", "orange"), ("apple", "orange", "crazy") ]
Run Code Online (Sandbox Code Playgroud)
我会很感激你的帮助。请考虑在我的实际项目中,该列表大约有 10000 个元组。
理想情况下,我想要这样的东西:
list_of_items = ["apple, "orange"]
search in my list which tuples have list_of_times and keep those in my list
Run Code Online (Sandbox Code Playgroud)
请考虑项目的数量不一定只有两个,可以是要考虑的任何大量项目
r ×5
python ×3
pandas ×2
regression ×2
dataframe ×1
date ×1
export ×1
ggplot2 ×1
list ×1
lm ×1
missing-data ×1
stata ×1
statsmodels ×1
summarize ×1
tuples ×1