Skip to contents

One of the aim for this package is to help dealing with Pandoc changes over time. Pandoc is a project which has often some breaking change between versions. However, tools like R Markdown aims to work with any Pandoc versions, and hide those changes for the user by insure backward compatibility and adjustment toward new features.

Dealing with this can require to compare versions and now when a change occurs.

Use Case 1: Finding which version introduced a change

Let’s take the example of a change in Pandoc regarding gfm format: It appears at some point that raw HTML <span></span> was removed during conversion. When does this happened ? Let’s take a look.

For that, we want to convert "This is a <span>Span</span>." to gfm format and look for the max version that keep the raw HTML <span> in its markdown output.

As a prerequisite to the analysis, all available pandoc versions needs to be installed.

purrr::walk(pandoc_available_releases(), purrr::safely(pandoc_install))

Once this is done, it is easy to iterate over versions to do an analysis over an hypothesis.

First, we use each Pandoc version to convert "This is a <span>Span</span>." to Github Markdown

# Get the available versions
versions <- pandoc_installed_versions()
versions <- purrr::set_names(versions)

# Do conversion for each version
res <- purrr::map(
  versions,
  ~ pandoc_convert(text = "This is a <span>Span</span>.", to = "gfm", version = .x)
)

With the result, we can build a tibble to filter out which is the maximum version that preserve the raw <span>

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tab <- res %>%
  purrr::map_chr(as.character) %>%
  tibble::enframe("ver", "string")

tab %>%
  # No nightly
  filter(ver != "nightly") %>%
  # converting to numeric version for easier ordering
  mutate(num_ver = as.numeric_version(ver)) %>%
  # Which version does keep <span> ?
  filter(grepl("<span>", string, fixed = TRUE)) %>%
  # Order version
  arrange(num_ver) %>%
  # which is the max one ?
  slice_tail(n = 1)
#> # A tibble: 1 × 3
#>   ver   string                       num_ver   
#>   <chr> <chr>                        <nmrc_vrs>
#> 1 2.10  This is a <span>Span</span>. 2.10

We can confirm looking at 2.10 and 2.10.1, that <span> is indeed not preserve.

tab %>%
  filter(ver %in% c("2.10", "2.10.1"))
#> # A tibble: 2 × 2
#>   ver    string                      
#>   <chr>  <chr>                       
#> 1 2.10.1 This is a Span.             
#> 2 2.10   This is a <span>Span</span>.

From there, it is easier to know how to adapt, by first looking into the changes in Pandoc’s release not

pandoc_browse_release("2.10.1")
#>  Open URL
#> https://github.com/jgm/pandoc/releases/2.10.1

To answer this question, it is interesting to be able to iterate over all available formats, for a specific version. Let’s take version “2.17” as example.

# Activating Pandoc version to use
pandoc_activate("2.17", rmarkdown = FALSE)
#>  Version '2.17' is now the active one.

library(dplyr)
library(tidyr)
format_extensions <- pandoc_list_formats("output") %>%
  filter(!formats %in% c("pdf")) %>%
  group_by(formats) %>%
  mutate(exts = list(pandoc_list_extensions(formats))) %>%
  ungroup()

has_smart <- format_extensions %>%
  hoist(exts, "extensions", "default") %>%
  select(-exts) %>%
  unnest_longer(c("extensions", "default")) %>%
  filter(extensions == "smart")

Which has it by default ?

has_smart %>%
  filter(default)
#> # A tibble: 7 × 4
#>   type   formats      extensions default
#>   <chr>  <chr>        <chr>      <lgl>  
#> 1 output beamer       smart      TRUE   
#> 2 output commonmark_x smart      TRUE   
#> 3 output context      smart      TRUE   
#> 4 output latex        smart      TRUE   
#> 5 output markdown     smart      TRUE   
#> 6 output opml         smart      TRUE   
#> 7 output textile      smart      TRUE

Which does not ?

has_smart %>%
  filter(!default)
#> # A tibble: 17 × 4
#>    type   formats           extensions default
#>    <chr>  <chr>             <chr>      <lgl>  
#>  1 output commonmark        smart      FALSE  
#>  2 output epub              smart      FALSE  
#>  3 output epub2             smart      FALSE  
#>  4 output epub3             smart      FALSE  
#>  5 output gfm               smart      FALSE  
#>  6 output html              smart      FALSE  
#>  7 output html4             smart      FALSE  
#>  8 output html5             smart      FALSE  
#>  9 output ipynb             smart      FALSE  
#> 10 output markdown_github   smart      FALSE  
#> 11 output markdown_mmd      smart      FALSE  
#> 12 output markdown_phpextra smart      FALSE  
#> 13 output markdown_strict   smart      FALSE  
#> 14 output mediawiki         smart      FALSE  
#> 15 output org               smart      FALSE  
#> 16 output plain             smart      FALSE  
#> 17 output rst               smart      FALSE