Generate a comprehensive report format in R

Mik*_*ike 6 r dplyr tidyverse

I have fetched some information form the MySQL server into R which looks like as follows in my R dataframe:

barcode_no   Inspection_date        current_profile      score    Tag_log   prod_log
12345678     2020-01-15 14:34:13    Large                10       C1        WIP
12345678     2020-01-15 18:33:11    Medium               20       C2        Hold
12345678     2020-01-15 13:23:24    Medium               50       C3        Hold
12345678     2020-01-15 12:12:23    Medium               70                 Shipped
12345678     2020-01-15 11:12:45    Medium               120      C1        Shipped
12345678     2020-01-15 12:22:32    Small                150      C2        Shipped
12345678     2020-01-15 15:23:23    Small                10       C3        WIP
12345678     2020-01-15 16:34:08    Small                20       C2        Hold
12345678     2020-01-15 17:07:13    Small                130      C1        Hold
12345678     2020-01-15 17:09:05    Small                40                 Hold 
Run Code Online (Sandbox Code Playgroud)

The requirement is to fit the particulars of the above-mentioned dataframe in a comprehensive report structure for date and moth wise.

comprehensive_df (Date): Will consider the latest date as per the system date if some or all the records are not available for that date then fill the comprehensive report df with 0.

Current_profile     # of records  % of records C1 C2 C3 [Null] # of records  % of records C1 C2 C3 [Null] # of records  % of records C1 C2 C3 [Null] Total    % Total
**Large               01            16.67        1  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     1        10.00**
Shipped             0             0.0          0  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     0        0.0
Hold                0             0.0          0  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     0        0.0
WIP                 01             1.0         1  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     1        100.00
**Small               03            50.00        0  1  1  1      0             0            0   0  0    0     02             66.67        1  1  0   0     5        50.00**
Shipped             0             0            0  0  0  0      0             0            0   0  0    0     01             50.00        0  1  0   0     1        20.00
Hold                02            66.67        0  1  0  1      0             0            0   0  0    0      1             100.00       1  0  0   0     3        60.00
WIP                 01            33.33        1  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     1        20.00
**Medium              02            33.33        0  1  1  0      1             100.00       0   0  0    1      1             33.33        1  0  0   0     4        40.00**
Shipped             0              0           0  0  0  0      1             100.00       0   0  0    1      1             100.00       0  0  0   0     2        50.00
Hold                2            100.00        0  1  1  0      0             0            0   0  0    0      0             0            0  0  0   0     2        50.00
WIP                 0            0             0  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     0        0
Total               06            0.10         1  0  0  0      1             0            0   0  0    0      3             0            0  0  0   0     1        0.10
Run Code Online (Sandbox Code Playgroud)

I have divided the comprehensive dataframe in sections, where column 2 to 7 represent the count of those have score from 0 to <=50, column 8 to 13 represent the count of those have score from >50 to 100 and column 14 to 20 represent the count of those have score from >100.

The code that I'm trying:

df1<- df %>%
  mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
  group_by(Month) %>%
  dplyr::summarise(`current_profile` = n())

df2<- df %>%
  mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
  group_by(Month) %>%
  dplyr::summarise(`Tag_log` = n())

df3<- df %>%
  mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
  group_by(Month) %>%
  dplyr::summarise(`prod_log` = n())
Run Code Online (Sandbox Code Playgroud)

And so on for every variable. And then trying to full_join all the dataframe by Date for Date wise comprehensive format and month for month wise comprehensive format.

comprehensive_df <- df1 %>% full_join(df1, by = 'Month') %>% 
                      full_join(df2, by = 'Month') %>%
                      full_join(df3, by = 'Month')
Run Code Online (Sandbox Code Playgroud)

Dav*_*Mas 0

我不确定我明白你需要什么,但也许是这样的?

library(magrittr)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

dat <- tibble::tribble(
  ~barcode_no, ~Inspection_date, ~current_profile, ~score, ~Tag_log, ~prod_log,
    12345678L,     "15/01/2020",          "Large",    10L,     "C1",     "WIP",
    12345678L,     "15/01/2020",         "Medium",    20L,     "C2",    "Hold",
    12345678L,     "15/01/2020",         "Medium",    50L,     "C3",    "Hold",
    12345678L,     "15/01/2020",         "Medium",    70L,       NA, "Shipped",
    12345678L,     "15/01/2020",         "Medium",   120L,     "C1", "Shipped",
    12345678L,     "15/01/2020",          "Small",   150L,     "C2", "Shipped",
    12345678L,     "15/01/2020",          "Small",    10L,     "C3",     "WIP",
    12345678L,     "15/01/2020",          "Small",    20L,     "C2",    "Hold",
    12345678L,     "15/01/2020",          "Small",   130L,     "C1",    "Hold",
    12345678L,     "15/01/2020",          "Small",    40L,       NA,    "Hold"
  )



dat$Inspection_date = as.Date(dat$Inspection_date,format = "%d/%m/%Y")

today = Sys.Date()

param_date = as.Date("15/01/2020",format = "%d/%m/%Y")

dat$month = format(ymd(dat$Inspection_date),'%b-%Y')

dat$score_group = dplyr::case_when(
  dat$score <= 50 ~ "low",
  dat$score < 100 ~ "med",
  TRUE ~ "high"
)

dat %>% dplyr::filter(Inspection_date >= param_date) %>%
  dplyr::group_by(current_profile, month, score_group, Tag_log,prod_log) %>% 
  dplyr::summarise(count = dplyr::n()) %>% 
  tidyr::pivot_wider(names_from = c("score_group","Tag_log"),
                     values_from = count,
                     values_fill  = list(count = 0)) -> res_dat


knitr::kable(res_dat,format = "markdown")
Run Code Online (Sandbox Code Playgroud)
|current_profile |month    |prod_log | low_C1| high_C1| low_C2| low_C3| med_NA| high_C2| low_NA|
|:---------------|:--------|:--------|------:|-------:|------:|------:|------:|-------:|------:|
|Large           |Jan-2020 |WIP      |      1|       0|      0|      0|      0|       0|      0|
|Medium          |Jan-2020 |Shipped  |      0|       1|      0|      0|      1|       0|      0|
|Medium          |Jan-2020 |Hold     |      0|       0|      1|      1|      0|       0|      0|
|Small           |Jan-2020 |Hold     |      0|       1|      1|      0|      0|       0|      1|
|Small           |Jan-2020 |Shipped  |      0|       0|      0|      0|      0|       1|      0|
|Small           |Jan-2020 |WIP      |      0|       0|      0|      1|      0|       0|      0|
Run Code Online (Sandbox Code Playgroud)