But it is documented

R
{fuj}
Published

May 20, 2024

Note

This interesting behavior was related to {fuj} v0.2.0 lst() behavior. An issue was created to track the fix: jmbarbone/fuj#60.

Short answer: Yes. Feel free to read through the journey of discovery.

The original post showed the behavior with substitute(), which made this feel a little more confusing that it really is.

When trying to compare two values, R makes an attempt to find a common data type. This shouldn’t be new:

Code
# double and integer
0.0 == 0L
#> [1] TRUE

# double and logical
0 == FALSE
#> [1] TRUE

# double and character
0 == "0"
#> [1] TRUE

# date and character
as.Date("2024-05-08") == "2024-05-08"
#> [1] TRUE

In the above examples, all the values are coerced to the same type before the comparison is made. These all resolve in the values being equal. However, there is a special behavior invoked when we compare lists.

Here’s the wat simplified:

Code
# NA resolves to NA
NA == ""
#> [1] NA
NA_character_ == ""
#> [1] NA

# but not in a list
list(NA) == ""
#> [1] FALSE

# unless it's a character
list(NA_character_) == ""
#> [1] NA

Let’s rewrite this as direct character conversions:

Code
as.character(NA)
#> [1] NA
as.character(NA_character_)
#> [1] NA
as.character(list(NA))
#> [1] "NA"
as.character(list(NA_character_))
#> [1] NA

There it is: list(NA) turns into "NA". wat? Well, these two help files provide the relevant information on what is happening here:

Language objects such as symbols and calls are deparsed to character strings before comparison.
?base::Comparison, Details

For lists and pairlists (including language objects such as calls) it deparses the elements individually, except that it extracts the first element of length-one character vectors.
?base::character, Value

The extraction of length 1 character vectors is the described behavior that I was missing. When calling deparse() on a non-list vector, we get a different result for NA and NA_character_: So, here’s what happens:

  1. R detects that one comparison is a list
  2. R converts the list vector to a character vector
  3. For each element in the list, R converts checks if it is a single length character vector or not
  4. If it is a single length character vector, R simply returns the (first element of the) value
  5. If it is not a single length character vector, R deparses the value

We can replicate this with a custom function:

Code
my_convert <- function(x) {
  do <- function(i) {
    if (is.character(i) && length(i) == 1L) {
      # extracts the first element of length-one character vectors
      return(i[1L])
    }
    
    # deparses elements individually
    deparse1(i, control = NULL)
  }
  
  # determine routine for each element
  vapply(x, do, "", USE.NAMES = FALSE)
}

x <- list(1:3, NA, NA_real_, NA_character_)
waldo::compare(as.character(x), sapply(x, deparse, control = NULL))
#> `old`: "1:3" "NA" "NA" NA  
#> `new`: "1:3" "NA" "NA" "NA"
waldo::compare(as.character(x), my_convert(x))
#> ✔ No differences
Tip

There is a default control of "keepNA", which retains additional information about our NA values. The list to character conversion doesn’t seem to use this as all the values are resolved to "NA" rather than "NA_character_", or the like.

Code
sapply(x, deparse, control = "keepNA")
#> [1] "1:3"           "NA"            "NA_real_"      "NA_character_"