Skip to content

Several related issues with fill() #1057

Closed
@moodymudskipper

Description

@moodymudskipper

fill() doesn't work as expected, see reprex.

library(tidyverse)
mydb <- DBI::dbConnect(RSQLite::SQLite(), "")
mydata <- tibble(id = 1:3, val = c(NA, "hello", NA))
DBI::dbWriteTable(mydb, "mydata", mydata)

# The message is a bit cryptic, it's not obvious that `arrange()` will do some
# magic since id is already sorted
# also, typo "determinstic"
dplyr::tbl(mydb, "mydata") %>% 
  tidyr::fill(val, .direction = "down")
#> Error in `tidyr::fill()`:
#> ✖ `.data` does not have explicit order.
#> ℹ Please use `arrange()` or `window_order()` to make determinstic.

# If I do use arrange, I get a warning that doesn't make sense for a R user
dplyr::tbl(mydb, "mydata") %>% 
  arrange(id) %>% 
  tidyr::fill(val, .direction = "down")
#> Warning: ORDER BY is ignored in subqueries without LIMIT
#> ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?
#> # Source:     SQL [3 x 2]
#> # Database:   sqlite 3.39.4 []
#> # Ordered by: id
#>      id val  
#>   <int> <chr>
#> 1     1 <NA> 
#> 2     2 hello
#> 3     3 hello

# window order doesn't exist, it's a {dbplyr} function, not obvious for {dplyr} user who never use {dbplyr} explicitly
dplyr::tbl(mydb, "mydata") %>% 
  window_order(id) %>% 
  tidyr::fill(val, .direction = "up")
#> Error in window_order(., id): could not find function "window_order"

# this works perfectly
dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "down") 
#> # Source:     SQL [3 x 2]
#> # Database:   sqlite 3.39.4 []
#> # Ordered by: id
#>      id val  
#>   <int> <chr>
#> 1     1 <NA> 
#> 2     2 hello
#> 3     3 hello

# however if I fill "up" the order is reversed
dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "up") 
#> # Source:     SQL [3 x 2]
#> # Database:   sqlite 3.39.4 []
#> # Ordered by: id
#>      id val  
#>   <int> <chr>
#> 1     3 <NA> 
#> 2     2 hello
#> 3     1 hello

# and updown and downup are not supported
dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "updown") 
#> Error in `tidyr::fill()`:
#> ! `.direction` must be one of "down" or "up", not "updown".
#> ℹ Did you mean "down"?

dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "downup") 
#> Error in `tidyr::fill()`:
#> ! `.direction` must be one of "down" or "up", not "downup".
#> ℹ Did you mean "down"?

# though it is achievable
dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "up") %>% 
  tidyr::fill(val, .direction = "down") 
#> # Source:     SQL [3 x 2]
#> # Database:   sqlite 3.39.4 []
#> # Ordered by: id
#>      id val  
#>   <int> <chr>
#> 1     1 hello
#> 2     2 hello
#> 3     3 hello

Created on 2022-11-30 with reprex v2.0.2

I think we need :

  • A better error message : ditch the reference to arrange(), something like :

Please use call dbplyr::window_order() before fill() to set an explicit row order.

  • Fix row order after .direction = "up"
  • Support "downup" and "updown"
  • A note in the tidyr doc, or a link to a help file for the tbl_lazy method, documenting the row order issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementverb trans 🤖Translation of dplyr verbs to SQL

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions