首頁>Program>source

我有一个資料框,其中每一行代表一个城市中發生的單个事件.資料框顯示城市名稱和發生日期,如下所示:

df <- data.frame(city = c("Seattle", "Seattle", "Seattle", "Seattle", "Seattle", "NYC", "NYC", "NYC", "Chicago",
                         "Chicago", "Chicago", "Chicago", "Chicago"),
                     date_of_event = c("01/13/2011", "01/17/2011", "03/15/2011", "05/21/2011", "05/23/2011",
                                      "01/20/2011", "01/22/2011", "03/23/2011", "01/18/2011", "02/24/2011",
                                       "02/26/2011", "04/30/2011", "06/18/2011"),
                     stringsAsFactors = FALSE)
df$date_of_event <- as.Date(df$date_of_event, "%m/%d/%Y")

以上只是一个示例,我的資料實際上是在具有數千行,许多城市,许多日期等的csv中。我想做的是生成一个新的資料框,该資料框每个城市和每个月都有一行 /年在資料集中表示,並且相應的計數列顯示原始資料框中每个城市每个月發生的次數.第二个資料幀看起来像這樣:

df2 <- data.frame(city = c("Seattle", "Seattle", "Seattle", "Seattle", "Seattle", "Seattle", "NYC", "NYC", "NYC", "NYC",
                           "NYC", "NYC", "Chicago", "Chicago", "Chicago", "Chicago", "Chicago", "Chicago"),
                     month_year = c("01/01/2011", "02/01/2011", "03/01/2011", "04/01/2011", "05/01/2011", "06/01/2011",
                                    "01/01/2011", "02/01/2011", "03/01/2011", "04/01/2011", "05/01/2011", "06/01/2011",
                                    "01/01/2011", "02/01/2011", "03/01/2011", "04/01/2011", "05/01/2011", "06/01/2011"),
                  count = c(2, 0, 1, 0, 2, 0, 2, 0, 1, 0, 0, 0, 1, 2, 0, 1, 0, 1),
                     stringsAsFactors = FALSE)
df2$month_year <- as.Date(df2$month_year, "%m/%d/%Y")

我知道您可以使用dplyr中的count,也可以將日期四舍五入到每个月的第一天,但​​是我尝試並未能正確进行分組和計數以产生我想要的第二个資料幀 .有人可以帮我吗? 提前非常感谢。

最新回復
  • 6月前
    1 #

    您可以尝試以下方法:

    library(tidyverse)
    library(lubridate)
    df3 <- df %>% mutate(new_date = floor_date(date_of_event, "month")) 
    tt <- as.data.frame(table(df3[-2])) 
    tt[order(desc(tt$city), tt$new_date),]
          city   new_date Freq
       Seattle 2011-01-01    2
       Seattle 2011-02-01    0
       Seattle 2011-03-01    1
       Seattle 2011-04-01    0
       Seattle 2011-05-01    2
       Seattle 2011-06-01    0
           NYC 2011-01-01    2
           NYC 2011-02-01    0
           NYC 2011-03-01    1
           NYC 2011-04-01    0
           NYC 2011-05-01    0
           NYC 2011-06-01    0
       Chicago 2011-01-01    1
       Chicago 2011-02-01    2
       Chicago 2011-03-01    0
       Chicago 2011-04-01    1
       Chicago 2011-05-01    0
       Chicago 2011-06-01    1
    

    要包括零計數的延长期限,您可以尝試以下操作:

    # assign a name to the output obtained previously
    df4 <- tt[order(desc(tt$city), tt$new_date),]
    a <- mdy("01/01/11") # starting period 
    b <- a + months(0:92)  # period sequence
    df5 <- expand.grid(city = c("Chicago", "Seattle", "NYC"), new_date = as.factor(b)) 
    df6 <- setdiff(df5, df4[-3])
    df6$Freq <- 0 # assign zero count
    df7 <- rbind(df4, df6)
    df8 <- df7[order(df7$city, df7$new_date), ]
    
    

  • ubuntu:microk8s,DEVOPS:無法連線到服務器:x509:證书對<內部IP>有效,對<外部IP>有效
  • shell:在不包含字元串的檔案中查詢行,並在其開頭添加文字