小编msh*_*855的帖子

填补缺失的级别

我有以下类型的数据帧:

Country <- rep(c("USA", "AUS", "GRC"),2)
Year    <- 2001:2006
Level   <- c("rich","middle","poor",rep(NA,3))
df <- data.frame(Country, Year,Level)

df 
Country Year  Level
1     USA 2001   rich
2     AUS 2002 middle
3     GRC 2003   poor
4     USA 2004   <NA>
5     AUS 2005   <NA>
6     GRC 2006   <NA>
Run Code Online (Sandbox Code Playgroud)

我想用右列中的最后一个用正确的级别标签填充缺失的值.

所以预期的结果应该是这样的:

Country Year  Level
1     USA 2001   rich
2     AUS 2002 middle
3     GRC 2003   poor
4     USA 2004   rich
5     AUS 2005 middle
6     GRC 2006   poor
Run Code Online (Sandbox Code Playgroud)

r missing-data

12
推荐指数
4
解决办法
986
查看次数

多元回归以求卓越

我想将多重回归的结果以非常特定的格式导出到 Excel 文件中

MWS

data("mtcars")
str(mtcars)
m1<-lm(hp ~ disp, data = mtcars)
m2<-lm(hp ~ disp + wt, data = mtcars)
Run Code Online (Sandbox Code Playgroud)

我发现这种格式最合适:

library(texreg)
screenreg(list(m1, m2))

===================================
             Model 1     Model 2   
-----------------------------------
(Intercept)   45.73 **    68.84 *  
             (16.13)     (31.80)   
disp           0.44 ***    0.54 ***
              (0.06)      (0.14)   
wt                       -14.45    
                         (17.10)   
-----------------------------------
R^2            0.63        0.63    
Adj. R^2       0.61        0.61    
Num. obs.     32          32       
RMSE          42.65       42.85    
===================================
*** p < 0.001, ** p < 0.01, * p < 0.05
Run Code Online (Sandbox Code Playgroud)

我想将上面的内容转换为数据框或类似的内容,以便将其导出到 Excel,并保留其格式

欢迎其他可以生成类似表格并将其导出到 …

regression export r lm

5
推荐指数
1
解决办法
1825
查看次数

使日期从R中的同一天开始

我有以下日期数组

df$Date
  [1] "2001-07-31" "2001-08-31" "2001-09-30" "2001-10-31" "2001-11-30" "2001-12-31" "2002-01-31" "2002-02-28"
  [9] "2002-03-31" "2002-04-30" "2002-05-31" "2002-06-30" "2002-07-31" "2002-08-31" "2002-09-30" "2002-10-31"
 [17] "2002-11-30" "2002-12-31" "2003-01-31" "2003-02-28" "2003-03-31" "2003-04-30" "2003-05-31" "2003-06-30"
 [25] "2003-07-31" "2003-08-31" "2003-09-30" "2003-10-31" "2003-11-30" "2003-12-31" "2004-01-31" "2004-02-29"
 [33] "2004-03-31" "2004-04-30" "2004-05-31" "2004-06-30" "2004-07-31" "2004-08-31" "2004-09-30" "2004-10-31"
 [41] "2004-11-30" "2004-12-31" "2005-01-31" "2005-02-28" "2005-03-31" "2005-04-30" "2005-05-31" "2005-06-30"
 [49] "2005-07-31" "2005-08-31" "2005-09-30" "2005-10-31" "2005-11-30" "2005-12-31" "2006-01-31" "2006-02-28"
 [57] "2006-03-31" "2006-04-30" "2006-05-31" "2006-06-30" "2006-07-31" "2006-08-31" "2006-09-30" "2006-10-31"
 [65] "2006-11-30" "2006-12-31" "2007-01-31" "2007-02-28" "2007-03-31" …
Run Code Online (Sandbox Code Playgroud)

r date type-conversion

5
推荐指数
1
解决办法
47
查看次数

case_when从R到Python的函数

我如何在python代码中实现R的case_when函数?

这是R的case_when函数:

https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/case_when

作为最小工作示例,假设我们有以下数据帧(后面是python代码):

import pandas as pd
import numpy as np

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'age': [42, 52, 36, 24, 73], 
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
df
Run Code Online (Sandbox Code Playgroud)

假设我们要创建一个名为"老年人"的新列,该列查看"年龄"列并执行以下操作:

if age < 10 then baby
 if age >= 10 and age < 20 then kid 
if age >=20 and age < 30 then young 
if age >= 30 and age …
Run Code Online (Sandbox Code Playgroud)

python data-analysis dataframe pandas

4
推荐指数
2
解决办法
476
查看次数

用Python生成统计表并导出到Excel

我想在 Python 中生成用于出版物的高质量统计表。

在 Stata 中,可以使用社区贡献的命令系列estout

sysuse auto, clear

regress mpg weight
estimates store A

regress mpg weight price 
estimates store B

regress mpg weight price length
estimates store C

regress mpg weight price length displacement
estimates store D

esttab A B C D, se r2 nonumber mtitle("Model 1" "Model 2" "Model 3" "Model 4")

----------------------------------------------------------------------------
                  Model 1         Model 2         Model 3         Model 4   
----------------------------------------------------------------------------
weight           -0.00601***     -0.00582***     -0.00304        -0.00354   
               (0.000518)      (0.000618)       (0.00177)       (0.00212)   

price                          -0.0000935 …
Run Code Online (Sandbox Code Playgroud)

python regression stata pandas statsmodels

4
推荐指数
1
解决办法
1477
查看次数

在ggplot2中的条之间绘制百分比线

我有一个条形图,我还想包括一些显示它们之间百分比差异的线条,如下图所示:

在此处输入图片说明

图中的线条只是为了说明我理想中想要的东西。

有人可以帮我弄这个吗?

这是复制图形的数据框:

structure(list(shares = c(0.39, 3.04, 9.32, 22.29, 64.97, 0.01, 
0.11, 5.83, 21.4, 72.64), quantile = structure(c(4L, 1L, 2L, 
3L, 5L, 4L, 1L, 2L, 3L, 5L), .Label = c("2nd Quantile", "3rd Quantile", 
"4nd Quantile", "Poorest 20%", "Richest 20%"), class = "factor"), 
    case = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L
    ), .Label = c("No Debt", "With Debt"), class = "factor")), row.names = c(NA, 
-10L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

这是我用来制作条形图的代码:

ggplot(df_cum, aes(fill = case , quantile, shares)) …
Run Code Online (Sandbox Code Playgroud)

r ggplot2

1
推荐指数
1
解决办法
419
查看次数

R中按类别的动态聚合

我有以下数据帧

a        <- c(rep("CGR", 6), rep("AUS", 6), rep("ROW", 6) )
b        <- c("AUT", "CH", "ROW", "ROW", "ROW", "ROW", "AUT", "CH", "ROW", "ROW", "ROW", "ROW", "AUT", "CH", "ROW", "ROW", "ROW", "ROW" )
v        <- 1:18
category <- c("a", "b", "a", "a", "b", "b", "a", "b", "a", "a", "b", "b", "a", "b", "a", "a", "b", "b")

data.frame(a,b,v,category)


     a   b  v category
1  CGR AUT  1        a
2  CGR  CH  2        b
3  CGR ROW  3        a
4  CGR ROW  4        a
5 …
Run Code Online (Sandbox Code Playgroud)

r summarize

0
推荐指数
1
解决办法
68
查看次数

根据条件从列表中删除元组

我想从列表中过滤掉所有没有特定元素的元组。具体来说,假设:

mylist = [("sagap", "apple", "orange"), ("apple", "orange", "crazy"), ("crazy", "orange"), ("orange", "banana", "do", "does"), ("apple", "do", "does", "something", "response")]
Run Code Online (Sandbox Code Playgroud)

我想排除/删除列表中所有元组中不包含"apple"' and “orange”的元组

预期的结果应该是一个包含元组的新列表,如下所示:

mylist_new = [("sagap", "apple", "orange"), ("apple", "orange", "crazy") ] 
Run Code Online (Sandbox Code Playgroud)

我会很感激你的帮助。请考虑在我的实际项目中,该列表大约有 10000 个元组。

理想情况下,我想要这样的东西:

list_of_items = ["apple, "orange"] 

search in my list which tuples have list_of_times and keep those in my list 
Run Code Online (Sandbox Code Playgroud)

请考虑项目的数量不一定只有两个,可以是要考虑的任何大量项目

python tuples list

0
推荐指数
1
解决办法
38
查看次数