Code
# double and integer
0.0 == 0L
#> [1] TRUE
# double and logical
0 == FALSE
#> [1] TRUE
# double and character
0 == "0"
#> [1] TRUE
# date and character
as.Date("2024-05-08") == "2024-05-08"
#> [1] TRUE
May 20, 2024
This interesting behavior was related to {fuj} v0.2.0 lst()
behavior. An issue was created to track the fix: jmbarbone/fuj#60.
Short answer: Yes. Feel free to read through the journey of discovery.
The original post showed the behavior with substitute()
, which made this feel a little more confusing that it really is.
When trying to compare two values, R makes an attempt to find a common data type. This shouldn’t be new:
In the above examples, all the values are coerced to the same type before the comparison is made. These all resolve in the values being equal. However, there is a special behavior invoked when we compare lists.
Here’s the wat simplified:
Let’s rewrite this as direct character conversions:
There it is: list(NA)
turns into "NA"
. wat? Well, these two help files provide the relevant information on what is happening here:
Language objects such as symbols and calls are deparsed to character strings before comparison.
?base::Comparison
, Details
For lists and pairlists (including language objects such as calls) it deparses the elements individually, except that it extracts the first element of length-one character vectors.
?base::character
, Value
The extraction of length 1 character vectors is the described behavior that I was missing. When calling deparse()
on a non-list vector, we get a different result for NA
and NA_character_
: So, here’s what happens:
We can replicate this with a custom function:
my_convert <- function(x) {
do <- function(i) {
if (is.character(i) && length(i) == 1L) {
# extracts the first element of length-one character vectors
return(i[1L])
}
# deparses elements individually
deparse1(i, control = NULL)
}
# determine routine for each element
vapply(x, do, "", USE.NAMES = FALSE)
}
x <- list(1:3, NA, NA_real_, NA_character_)
waldo::compare(as.character(x), sapply(x, deparse, control = NULL))
#> `old`: "1:3" "NA" "NA" NA
#> `new`: "1:3" "NA" "NA" "NA"
waldo::compare(as.character(x), my_convert(x))
#> ✔ No differences
There is a default control
of "keepNA"
, which retains additional information about our NA
values. The list
to character
conversion doesn’t seem to use this as all the values are resolved to "NA"
rather than "NA_character_"
, or the like.
---
title: "wat? Character NAs"
subtitle: "But it is documented"
date: 2024-05-20
categories: ["R", "{fuj}", "wat"]
draft: false
---
::: {.callout-note}
This interesting behavior was related to [`{fuj} v0.2.0 lst()`](https://github.com/jmbarbone/fuj/blob/v0.2.0/R/list.R) behavior.
An issue was created to track the fix: [jmbarbone/fuj#60](https://github.com/jmbarbone/fuj/issues/60).
:::
<iframe src="https://fosstodon.org/@barbone/112401680792851241/embed" class="mastodon-embed" style="max-width: 100%; border: 0" width="800" allowfullscreen="allowfullscreen"></iframe><script src="https://fosstodon.org/embed.js" async="async"></script>
Short answer: Yes.
Feel free to read through the journey of discovery.
The original post showed the behavior with [`substitute()`](https://rdrr.io/r/base/substitute.html), which made this feel a little more confusing that it really is.
When trying to compare two values, **R** makes an attempt to find a common data type.
This shouldn't be new:
```{r}
# double and integer
0.0 == 0L
# double and logical
0 == FALSE
# double and character
0 == "0"
# date and character
as.Date("2024-05-08") == "2024-05-08"
```
In the above examples, all the values are coerced to the same type before the comparison is made.
These all resolve in the values being _equal_.
However, there is a special behavior invoked when we compare lists.
Here's the _wat_ simplified:
```{r}
# NA resolves to NA
NA == ""
NA_character_ == ""
# but not in a list
list(NA) == ""
# unless it's a character
list(NA_character_) == ""
```
Let's rewrite this as direct character conversions:
```{r}
as.character(NA)
as.character(NA_character_)
as.character(list(NA))
as.character(list(NA_character_))
```
There it is: `list(NA)` turns into `"NA"`.
_wat_?
Well, these two help files provide the relevant information on what is happening here:
> Language objects such as symbols and calls are deparsed to character strings before comparison.
[`?base::Comparison`, Details](https://rdrr.io/r/base/Comparison.html)
> For lists and pairlists (including language objects such as calls) it deparses the elements individually, except that it extracts the first element of length-one character vectors.
[`?base::character`, Value](https://rdrr.io/r/base/character.html)
The extraction of length 1 character vectors is the described behavior that I was missing.
When calling [`deparse()`](https://rdrr.io/r/base/deparse.html) on a non-list vector, we get a different result for `NA` and `NA_character_`:
So, here's what happens:
1. **R** detects that one comparison is a list
2. **R** converts the list vector to a character vector
3. For each element in the list, **R** converts checks if it is a single length character vector or not
4. If it is a single length character vector, **R** simply returns the (first element of the) value
5. If it is not a single length character vector, **R** _deparses_ the value
We can replicate this with a custom function:
```{r}
my_convert <- function(x) {
do <- function(i) {
if (is.character(i) && length(i) == 1L) {
# extracts the first element of length-one character vectors
return(i[1L])
}
# deparses elements individually
deparse1(i, control = NULL)
}
# determine routine for each element
vapply(x, do, "", USE.NAMES = FALSE)
}
x <- list(1:3, NA, NA_real_, NA_character_)
waldo::compare(as.character(x), sapply(x, deparse, control = NULL))
waldo::compare(as.character(x), my_convert(x))
```
::: {.callout-tip}
There is a default `control` of `"keepNA"`, which retains additional information about our `NA` values.
The `list` to `character` conversion doesn't seem to use this as all the values are resolved to `"NA"` rather than `"NA_character_"`, or the like.
```{r}
sapply(x, deparse, control = "keepNA")
```
:::