Skip to contents

Read a bib file into a data.frame

Usage

read_bib(file, skip = 0L, max_lines = NULL, encoding = "UTF-8")

Arguments

file

File or connection

skip

The lines to skip

max_lines

The maximum number of lines to read

encoding

Assumed encoding of file (passed to readLines()

Value

A data.frame with each row as a bib entry and each column as a field

Details

Inspired and partially credited to bib2df::bib2df() although this has no dependencies outside of base functions and much quicker. This speed seems to come from removing stringr functions and simplifying a few *apply functions. This will also include as many categories as possible from the entry.

See also

Examples

file <- "https://raw.githubusercontent.com/jmbarbone/bib-references/master/references.bib"
bibdf <- read_bib(file, max_lines = 51L)

if (package_available("tibble")) {
  tibble::as_tibble(bibdf)
} else {
  head(bibdf)
}
#> # A tibble: 5 × 13
#>   key     field author journal title year  issn  month number pages volume doi  
#>   <chr>   <chr> <chr>  <chr>   <chr> <chr> <chr> <chr> <chr>  <chr> <chr>  <chr>
#> 1 ames20… arti… Ames,… Journa… The … 2006  0092… aug   4      440-… 40     10.1…
#> 2 anders… arti… Ander… Psycho… Effe… 2001  NA    NA    5      353-… 12     10.1…
#> 3 ayduk2… arti… Ayduk… Journa… Regu… 2000  NA    NA    5      776   79     10.1…
#> 4 baker2… arti… Baker… Nature… 1,50… 2016  NA    NA    7604   452   533    10.1…
#> 5 begley… arti… NA     NA      NA    NA    NA    NA    NA     NA    NA     NA   
#> # ℹ 1 more variable: publisher <chr>

if (package_available("bib2df") & package_available("bench")) {
  file <- system.file("extdata", "bib2df_testfile_3.bib", package = "bib2df")

  # Doesn't include the 'tidying' up
  foo <- function(file) {
    bib <- ("bib2df" %colons% "bib2df_read")(file)
    ("bib2df" %colons% "bib2df_gather")(bib)
  }

# \donttest{
  bench::mark(
    read_bib = read_bib(file),
    bib2df = bib2df::bib2df(file),
    foo = foo(file),
    check = FALSE
  )[1:9]
# }
}
#> Warning: `as_data_frame()` was deprecated in tibble 2.0.0.
#>  Please use `as_tibble()` (with slightly different semantics) to convert to a
#>   tibble, or `as.data.frame()` to convert to a data frame.
#>  The deprecated feature was likely used in the bib2df package.
#>   Please report the issue to the authors.
#> # A tibble: 3 × 9
#>   expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:tm> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 read_bib     1.77ms 1.83ms      539.    9.41KB     6.60   245     3      454ms
#> 2 bib2df       7.14ms 7.35ms      133.    4.69MB     6.86    58     3      438ms
#> 3 foo          2.66ms 2.74ms      363.  130.72KB     6.65   164     3      451ms