小编msh*_*855的帖子

填补缺失的级别

我有以下类型的数据帧:

Country <- rep(c("USA", "AUS", "GRC"),2)
Year    <- 2001:2006
Level   <- c("rich","middle","poor",rep(NA,3))
df <- data.frame(Country, Year,Level)

df 
Country Year  Level
1     USA 2001   rich
2     AUS 2002 middle
3     GRC 2003   poor
4     USA 2004   <NA>
5     AUS 2005   <NA>
6     GRC 2006   <NA>

Run Code Online (Sandbox Code Playgroud)

我想用右列中的最后一个用正确的级别标签填充缺失的值.

所以预期的结果应该是这样的:

Country Year  Level
1     USA 2001   rich
2     AUS 2002 middle
3     GRC 2003   poor
4     USA 2004   rich
5     AUS 2005 middle
6     GRC 2006   poor

Run Code Online (Sandbox Code Playgroud)

r missing-data

msh*_*855

2017 12-22

12
推荐指数

4
解决办法

986
查看次数

多元回归以求卓越

我想将多重回归的结果以非常特定的格式导出到 Excel 文件中

MWS

data("mtcars")
str(mtcars)
m1<-lm(hp ~ disp, data = mtcars)
m2<-lm(hp ~ disp + wt, data = mtcars)

Run Code Online (Sandbox Code Playgroud)

我发现这种格式最合适：

library(texreg)
screenreg(list(m1, m2))

===================================
             Model 1     Model 2   
-----------------------------------
(Intercept)   45.73 **    68.84 *  
             (16.13)     (31.80)   
disp           0.44 ***    0.54 ***
              (0.06)      (0.14)   
wt                       -14.45    
                         (17.10)   
-----------------------------------
R^2            0.63        0.63    
Adj. R^2       0.61        0.61    
Num. obs.     32          32       
RMSE          42.65       42.85    
===================================
*** p < 0.001, ** p < 0.01, * p < 0.05

Run Code Online (Sandbox Code Playgroud)

我想将上面的内容转换为数据框或类似的内容，以便将其导出到 Excel，并保留其格式。

欢迎其他可以生成类似表格并将其导出到 …

regression export r lm

msh*_*855

2018 05-24

5
推荐指数

1
解决办法

1825
查看次数

使日期从R中的同一天开始

我有以下日期数组

df$Date
  [1] "2001-07-31" "2001-08-31" "2001-09-30" "2001-10-31" "2001-11-30" "2001-12-31" "2002-01-31" "2002-02-28"
  [9] "2002-03-31" "2002-04-30" "2002-05-31" "2002-06-30" "2002-07-31" "2002-08-31" "2002-09-30" "2002-10-31"
 [17] "2002-11-30" "2002-12-31" "2003-01-31" "2003-02-28" "2003-03-31" "2003-04-30" "2003-05-31" "2003-06-30"
 [25] "2003-07-31" "2003-08-31" "2003-09-30" "2003-10-31" "2003-11-30" "2003-12-31" "2004-01-31" "2004-02-29"
 [33] "2004-03-31" "2004-04-30" "2004-05-31" "2004-06-30" "2004-07-31" "2004-08-31" "2004-09-30" "2004-10-31"
 [41] "2004-11-30" "2004-12-31" "2005-01-31" "2005-02-28" "2005-03-31" "2005-04-30" "2005-05-31" "2005-06-30"
 [49] "2005-07-31" "2005-08-31" "2005-09-30" "2005-10-31" "2005-11-30" "2005-12-31" "2006-01-31" "2006-02-28"
 [57] "2006-03-31" "2006-04-30" "2006-05-31" "2006-06-30" "2006-07-31" "2006-08-31" "2006-09-30" "2006-10-31"
 [65] "2006-11-30" "2006-12-31" "2007-01-31" "2007-02-28" "2007-03-31" …

Run Code Online (Sandbox Code Playgroud)

r date type-conversion

msh*_*855

2019 04-13

5
推荐指数

1
解决办法

47
查看次数

case_when从R到Python的函数

我如何在python代码中实现R的case_when函数？

这是R的case_when函数:

https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/case_when

作为最小工作示例,假设我们有以下数据帧(后面是python代码):

import pandas as pd
import numpy as np

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'age': [42, 52, 36, 24, 73], 
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
df

Run Code Online (Sandbox Code Playgroud)

假设我们要创建一个名为"老年人"的新列,该列查看"年龄"列并执行以下操作:

if age < 10 then baby
 if age >= 10 and age < 20 then kid 
if age >=20 and age < 30 then young 
if age >= 30 and age …

Run Code Online (Sandbox Code Playgroud)

python data-analysis dataframe pandas

msh*_*855

2019 02-13

4
推荐指数

2
解决办法

476
查看次数

用Python生成统计表并导出到Excel

我想在 Python 中生成用于出版物的高质量统计表。

在 Stata 中，可以使用社区贡献的命令系列estout：

sysuse auto, clear

regress mpg weight
estimates store A

regress mpg weight price 
estimates store B

regress mpg weight price length
estimates store C

regress mpg weight price length displacement
estimates store D

esttab A B C D, se r2 nonumber mtitle("Model 1" "Model 2" "Model 3" "Model 4")

----------------------------------------------------------------------------
                  Model 1         Model 2         Model 3         Model 4   
----------------------------------------------------------------------------
weight           -0.00601***     -0.00582***     -0.00304        -0.00354   
               (0.000518)      (0.000618)       (0.00177)       (0.00212)   

price                          -0.0000935 …

Run Code Online (Sandbox Code Playgroud)

python regression stata pandas statsmodels

msh*_*855

2019 02-27

4
推荐指数

1
解决办法

1477
查看次数

在ggplot2中的条之间绘制百分比线

我有一个条形图，我还想包括一些显示它们之间百分比差异的线条，如下图所示：

图中的线条只是为了说明我理想中想要的东西。

有人可以帮我弄这个吗？

这是复制图形的数据框：

structure(list(shares = c(0.39, 3.04, 9.32, 22.29, 64.97, 0.01, 
0.11, 5.83, 21.4, 72.64), quantile = structure(c(4L, 1L, 2L, 
3L, 5L, 4L, 1L, 2L, 3L, 5L), .Label = c("2nd Quantile", "3rd Quantile", 
"4nd Quantile", "Poorest 20%", "Richest 20%"), class = "factor"), 
    case = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L
    ), .Label = c("No Debt", "With Debt"), class = "factor")), row.names = c(NA, 
-10L), class = "data.frame")

Run Code Online (Sandbox Code Playgroud)

这是我用来制作条形图的代码：

ggplot(df_cum, aes(fill = case , quantile, shares)) …

Run Code Online (Sandbox Code Playgroud)

r ggplot2

msh*_*855

lucky-day

1
推荐指数

1
解决办法

419
查看次数

R中按类别的动态聚合

我有以下数据帧

a        <- c(rep("CGR", 6), rep("AUS", 6), rep("ROW", 6) )
b        <- c("AUT", "CH", "ROW", "ROW", "ROW", "ROW", "AUT", "CH", "ROW", "ROW", "ROW", "ROW", "AUT", "CH", "ROW", "ROW", "ROW", "ROW" )
v        <- 1:18
category <- c("a", "b", "a", "a", "b", "b", "a", "b", "a", "a", "b", "b", "a", "b", "a", "a", "b", "b")

data.frame(a,b,v,category)


     a   b  v category
1  CGR AUT  1        a
2  CGR  CH  2        b
3  CGR ROW  3        a
4  CGR ROW  4        a
5 …

Run Code Online (Sandbox Code Playgroud)

r summarize

msh*_*855

lucky-day

0
推荐指数

1
解决办法

68
查看次数

根据条件从列表中删除元组

我想从列表中过滤掉所有没有特定元素的元组。具体来说，假设：

mylist = [("sagap", "apple", "orange"), ("apple", "orange", "crazy"), ("crazy", "orange"), ("orange", "banana", "do", "does"), ("apple", "do", "does", "something", "response")]

Run Code Online (Sandbox Code Playgroud)

我想排除/删除列表中所有元组中不包含"apple"' and “orange”的元组

预期的结果应该是一个包含元组的新列表，如下所示：

mylist_new = [("sagap", "apple", "orange"), ("apple", "orange", "crazy") ]

Run Code Online (Sandbox Code Playgroud)

我会很感激你的帮助。请考虑在我的实际项目中，该列表大约有 10000 个元组。

理想情况下，我想要这样的东西：

list_of_items = ["apple, "orange"] 

search in my list which tuples have list_of_times and keep those in my list

Run Code Online (Sandbox Code Playgroud)

请考虑项目的数量不一定只有两个，可以是要考虑的任何大量项目

python tuples list

msh*_*855

2021 04-27

0
推荐指数

1
解决办法

38
查看次数

标签统计

r ×5

python ×3

pandas ×2

regression ×2

data-analysis ×1

dataframe ×1

date ×1

export ×1

ggplot2 ×1

list ×1

lm ×1

missing-data ×1

stata ×1

statsmodels ×1

summarize ×1

tuples ×1

type-conversion ×1

标签 统计

小编msh_855的帖子

标签统计