Code
library(scribe)
<- command_args(string = "-a 1 -b 0")
ca $add_argument("-a", default = 0L)
ca$add_argument("-b", default = 0L)
ca<- ca$parse()
args $a + args$b
args#> [1] 1
March 13, 2023
I’m excited to be finalizing release preparations of {scribe}
. This package supports writing your own Rscript files and executing them through a terminal.
We’ll start with a simple example. For most of these, I’ll be using the direct R interface. However, this package is best used with a shebang script 1.
1 I’m pretty sure this is pronounced like sha-bang because it’s a hash (#
) and bang (!
). But I used to think she-bang, which conjures The Stone Roses’ She Bangs the Drums, a pleasant ear-worm. I think it also works to just shout octothorpe!
Keep in mind that command_args()
doesn’t need an explicit input, and when used with Rscript will automatically capture command line arguments.
That’s a little easy, so maybe we can make something a bit more interesting.
First we’ll make ourselves a little modeling function. This is not meant for completeness, but simply provides a few examples for creativity.
my_model <- function(
data = c("penguins", "mtcars", "sat.act"),
y,
x = NA,
family = "gaussian",
correlation = FALSE
) {
data <- match.arg(data)
data <- switch(
data,
penguins = palmerpenguins::penguins,
mtcars = datasets::mtcars,
sat.act = transform(
psych::sat.act,
gender = as.integer(gender == 1)
)
)
if (isTRUE(is.na(x))) {
x <- setdiff(colnames(data), y)
}
data <- data[, c(y, x)]
form <- stats::DF2formula(data)
mod <- stats::glm(form, data = data, family = family)
summary(mod, correlation = correlation)
}
Now that we have that, we can set up the command args to parse what our string inputs are.
# we'll pass arguments after
ca <- command_args()
ca$add_description("run a quick model")
ca$add_argument(
"data",
default = "penguins",
info = "a dataset to view"
)
ca$add_argument("y", info = "value to predict")
ca$add_argument("x", default = NA, info = "variables")
ca$add_argument(
"--family",
default = "gaussian",
info = "error distribution, link function"
)
ca$add_argument(
"--correlation",
action = "flag",
info = "when set, returns the correlation matrix"
)
ca$add_example("my-model.R penguins body_mass_g")
ca$add_example("my-model.R mtcars mpg --correlation")
There’s a default help arg added to the scribeCommandArg
object. When --help
is found in the command line arguments, the script will try to exit, returning only the help information.
options(scribe.interactive = TRUE)
ca$set_input("--help")
ca$parse()
#> {scribe} command_args
#>
#> file : /home/jordan/github/quarto-cli/src/resources/rmd/rmd.R
#>
#> DESCRIPTION
#> run a quick model
#>
#> USAGE
#> rmd.R [--help | --version]
#> rmd.R [data [ARG]] [y [ARG]] [x [ARG]] [--family [ARG]] [--correlation, --no-correlation]
#>
#> ARGUMENTS
#> --help : prints this and quietly exits
#> --version : prints the version of {scribe} and quietly exits
#> data [ARG] : a dataset to view
#> y [ARG] : value to predict
#> x [ARG] : variables
#> --family [ARG] : error distribution, link function
#> --correlation, --no-correlation : when set, returns the correlation matrix
#>
#> EXAMPLES
#> $ my-model.R penguins body_mass_g
#> $ my-model.R mtcars mpg --correlation
Let’s simulate a few examples:
ca$set_input(c("penguins", "body_mass_g"))
do.call(my_model, ca$parse())
#>
#> Call:
#> stats::glm(formula = form, family = family, data = data)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -809.70 -180.87 -6.25 176.76 864.22
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 84087.945 41912.019 2.006 0.04566 *
#> speciesChinstrap -282.539 88.790 -3.182 0.00160 **
#> speciesGentoo 890.958 144.563 6.163 2.12e-09 ***
#> islandDream -21.180 58.390 -0.363 0.71704
#> islandTorgersen -58.777 60.852 -0.966 0.33482
#> bill_length_mm 18.964 7.112 2.667 0.00805 **
#> bill_depth_mm 60.798 20.002 3.040 0.00256 **
#> flipper_length_mm 18.504 3.128 5.915 8.46e-09 ***
#> sexmale 378.977 48.074 7.883 4.95e-14 ***
#> year -42.785 20.949 -2.042 0.04194 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for gaussian family taken to be 82096.03)
#>
#> Null deviance: 215259666 on 332 degrees of freedom
#> Residual deviance: 26517018 on 323 degrees of freedom
#> (11 observations deleted due to missingness)
#> AIC: 4725
#>
#> Number of Fisher Scoring iterations: 2
ca$set_input(c("mtcars", "mpg", "--correlation"))
do.call(my_model, ca$parse())
#>
#> Call:
#> stats::glm(formula = form, family = family, data = data)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -3.4506 -1.6044 -0.1196 1.2193 4.6271
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 12.30337 18.71788 0.657 0.5181
#> cyl -0.11144 1.04502 -0.107 0.9161
#> disp 0.01334 0.01786 0.747 0.4635
#> hp -0.02148 0.02177 -0.987 0.3350
#> drat 0.78711 1.63537 0.481 0.6353
#> wt -3.71530 1.89441 -1.961 0.0633 .
#> qsec 0.82104 0.73084 1.123 0.2739
#> vs 0.31776 2.10451 0.151 0.8814
#> am 2.52023 2.05665 1.225 0.2340
#> gear 0.65541 1.49326 0.439 0.6652
#> carb -0.19942 0.82875 -0.241 0.8122
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for gaussian family taken to be 7.023544)
#>
#> Null deviance: 1126.05 on 31 degrees of freedom
#> Residual deviance: 147.49 on 21 degrees of freedom
#> AIC: 163.71
#>
#> Number of Fisher Scoring iterations: 2
#>
#> Correlation of Coefficients:
#> (Intercept) cyl disp hp drat wt qsec vs am gear
#> cyl -0.67
#> disp -0.02 -0.27
#> hp -0.07 -0.18 -0.52
#> drat -0.42 0.28 -0.12 0.09
#> wt 0.09 0.11 -0.77 0.24 0.17
#> qsec -0.77 0.27 0.29 0.11 0.04 -0.51
#> vs 0.09 0.32 0.10 -0.27 -0.03 0.08 -0.37
#> am -0.23 0.26 0.03 -0.05 -0.16 0.09 0.27 0.21
#> gear -0.41 0.35 -0.08 -0.09 -0.07 0.18 0.08 -0.04 -0.31
#> carb 0.12 -0.23 0.67 -0.53 -0.21 -0.70 0.27 0.09 0.06 -0.42
ca$set_input(c("sat.act", "gender", "--family", "binomial", "--correlation"))
do.call(my_model, ca$parse())
#>
#> Call:
#> stats::glm(formula = form, family = family, data = data)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.6500 -0.9385 -0.7658 1.2356 2.0129
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -1.804944 0.587445 -3.073 0.00212 **
#> education -0.220411 0.069023 -3.193 0.00141 **
#> age 0.024923 0.010339 2.411 0.01593 *
#> ACT -0.019895 0.022941 -0.867 0.38582
#> SATV -0.002496 0.001026 -2.434 0.01493 *
#> SATQ 0.005462 0.001069 5.110 3.22e-07 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 895.09 on 686 degrees of freedom
#> Residual deviance: 854.46 on 681 degrees of freedom
#> (13 observations deleted due to missingness)
#> AIC: 866.46
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Correlation of Coefficients:
#> (Intercept) education age ACT SATV
#> education -0.01
#> age -0.28 -0.55
#> ACT -0.28 -0.09 -0.11
#> SATV -0.25 0.01 0.07 -0.30
#> SATQ -0.24 -0.01 0.06 -0.38 -0.46
If I needed this, maybe it would make sense to be able to read the data from a file path, then execute something like:
For a more real example, I’ll use a trimmed down version of a {pak}
cli utiliy I’ve been using a lot. I really like using python’s pip
and wanted to have something similar to R. {pak}
is fantastic and highly recommended.
So, to make our own little command line utility, we just need to include small things and get going:
#!/usr/bin/env -S Rscript --vanilla
library(scribe)
ca <- command_args()
ca$add_argument("pkg", action = "dots", default = "local::.")
ca$add_argument("-d", "--dependencies, action = "list", default = TRUE)
args <- ca$parse()
do.call(pak::pak, args)
Now, I can install packages nicely in a terminal:
---
title: "`{scribe}` release"
subtitle: "A package to support `Rscript` files"
date: "2023-03-13"
categories: ["R", "{scribe}"]
---
I'm excited to be finalizing release preparations of [`{scribe}`](https://jmbarbone.github.io/scribe/).
This package supports writing your own **Rscript** files and executing them through a terminal.
We'll start with a simple example.
For most of these, I'll be using the direct **R** interface.
However, this package is best used with a [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) script ^[I'm pretty sure this is pronounced like _sha-bang_ because it's a hash (`#`) and bang (`!`). But I used to think _she-bang_, which conjures [The Stone Roses' _She Bangs the Drums_](https://www.youtube.com/watch?v=wD6Pq0bSMPo), a pleasant ear-worm. I think it also works to just shout [_octothorpe_](https://en.wikipedia.org/wiki/Number_sign#Octothorp)!].
Keep in mind that `command_args()` doesn't need an explicit input, and when used with **Rscript** will automatically capture command line arguments.
```{r}
library(scribe)
ca <- command_args(string = "-a 1 -b 0")
ca$add_argument("-a", default = 0L)
ca$add_argument("-b", default = 0L)
args <- ca$parse()
args$a + args$b
```
```{r}
ca$set_input(c("-a 10 -b 10"))
args <- ca$parse()
args$a + args$b
```
That's a little easy, so maybe we can make something a bit more interesting.
First we'll make ourselves a little modeling function.
This is not meant for completeness, but simply provides a few examples for creativity.
```{r}
my_model <- function(
data = c("penguins", "mtcars", "sat.act"),
y,
x = NA,
family = "gaussian",
correlation = FALSE
) {
data <- match.arg(data)
data <- switch(
data,
penguins = palmerpenguins::penguins,
mtcars = datasets::mtcars,
sat.act = transform(
psych::sat.act,
gender = as.integer(gender == 1)
)
)
if (isTRUE(is.na(x))) {
x <- setdiff(colnames(data), y)
}
data <- data[, c(y, x)]
form <- stats::DF2formula(data)
mod <- stats::glm(form, data = data, family = family)
summary(mod, correlation = correlation)
}
```
Now that we have that, we can set up the command args to parse what our string inputs are.
```{r}
# we'll pass arguments after
ca <- command_args()
ca$add_description("run a quick model")
ca$add_argument(
"data",
default = "penguins",
info = "a dataset to view"
)
ca$add_argument("y", info = "value to predict")
ca$add_argument("x", default = NA, info = "variables")
ca$add_argument(
"--family",
default = "gaussian",
info = "error distribution, link function"
)
ca$add_argument(
"--correlation",
action = "flag",
info = "when set, returns the correlation matrix"
)
ca$add_example("my-model.R penguins body_mass_g")
ca$add_example("my-model.R mtcars mpg --correlation")
```
There's a default _help_ arg added to the `scribeCommandArg` object.
When `--help` is found in the command line arguments, the script will try to exit, returning only the help information.
```{r}
options(scribe.interactive = TRUE)
ca$set_input("--help")
ca$parse()
```
Let's simulate a few examples:
```bash
my-model.R penguins body_mass_g
```
```{r}
ca$set_input(c("penguins", "body_mass_g"))
do.call(my_model, ca$parse())
```
```bash
my-mode.R mtcars mpg --correlation
```
```{r}
ca$set_input(c("mtcars", "mpg", "--correlation"))
do.call(my_model, ca$parse())
```
```bash
my-model.R sat.act gender --family binomial --correlation
```
```{r}
ca$set_input(c("sat.act", "gender", "--family", "binomial", "--correlation"))
do.call(my_model, ca$parse())
```
If I needed this, maybe it would make sense to be able to read the data from a file path, then execute something like:
```sh
my-model.R data/example.csv response
```
For a more real example, I'll use a trimmed down version of a [`{pak}`](https://pak.r-lib.org/) [cli utiliy](https://github.com/jmbarbone/jmb/blob/main/bin/pak) I've been using a lot.
I really like using python's [`pip`](https://pypi.org/project/pip/) and wanted to have something similar to **R**.
[`{pak}`](https://pak.r-lib.org/) is fantastic and highly recommended.
So, to make our own little command line utility, we just need to include small things and get going:
```r
#!/usr/bin/env -S Rscript --vanilla
library(scribe)
ca <- command_args()
ca$add_argument("pkg", action = "dots", default = "local::.")
ca$add_argument("-d", "--dependencies, action = "list", default = TRUE)
args <- ca$parse()
do.call(pak::pak, args)
```
Now, I can install packages nicely in a terminal:
```bash
pak github::jmbarbone/mark -d
pak dplyr dbplyr dtplyr
```