pandoc is currently an experimental R package primarily develop to help maintainers of R Markdown ecosystem.
Indeed, the R Markdown ecosystem is highly dependent on Pandoc (https://pandoc.org/) changes and it is designed to be as version independent as possible. R Markdown is best used with the latest Pandoc version but any rmarkdown package version should work with previous version of Pandoc, and new change in Pandoc should not break any rmarkdown features.
This explains the needs for a more focused tooling to:
- Install and manage several Pandoc versions. This is useful for testing versions and comparing between them.
- Call Pandoc’s command directly without the layers added by rmarkdown. This is useful for debugging or quickly iterating and finding where a bug comes from.
- Retrieve information from Pandoc directly. Each version comes with changes and some of them are included into the binary. Being able to retrieve those information and compare between versions is important to help maintain the user exposed tooling.
This package can also be useful to advanced developers that are working around Pandoc through rmarkdown or not.
Installation
Install from CRAN:
install.packages("pandoc")
The development version can be install from GitHub with:
# install.packages("pak")
pak::pak("cderv/pandoc")
Install pandoc
Main usage is to install latest released version of
Pandoc. This requires the gh
package as it
will fetch information from Github and download the bundle from
there.
pandoc_install()
#> ℹ Using cached version 'github-cache.rds' in instead of fetching GH
#> ✔ Pandoc 3.1.6.2 already installed.
#> Use 'force = TRUE' to overwrite.
If a specific older Pandoc version is needed (e.g for testing differences between version), a version can be specified.
pandoc_install("2.11.4")
#> ✔ Pandoc 2.11.4 already installed.
#> Use 'force = TRUE' to overwrite.
Information fetched from Github are cached for the duration of the session.
Sometimes, the dev version of Pandoc is required. Pandoc’s team is building a binary every day called nightly in there CI.
# install the nightly version (overwrites previous one)
pandoc_install_nightly() # or pandoc_install("nightly")
#> ℹ Retrieving last available nightly informations...
#> ℹ Removing old Pandoc nightly version 54b9eeb6a72f1c6f0ae3675cb9e7c29fa3183316
#> ℹ Installing last available nightly...
#> Downloading bundle...
#> ✔ Last Pandoc nightly installed: 3cb6130d1a6e16f34f98f02ebfdee3f30cdc20c0
All those versions can live together and are installed in an isolated directory in user’s data folders.
# Which version are currently installed ?
pandoc_installed_versions()
#> [1] "nightly" "3.1.6.2" "3.1.6.1" "3.1.6" "3.1.5" "3.1.4"
#> [7] "3.1.3" "3.1.2" "3.1.1" "3.1" "3.0.1" "3.0"
#> [13] "2.19.2" "2.19.1" "2.19" "2.18" "2.17.1.1" "2.17.1"
#> [19] "2.17.0.1" "2.17" "2.16.2" "2.16.1" "2.16" "2.15"
#> [25] "2.14.2" "2.14.1" "2.14.0.3" "2.14.0.2" "2.14.0.1" "2.14"
#> [31] "2.13" "2.12" "2.11.4" "2.11.3.2" "2.11.3.1" "2.11.3"
#> [37] "2.11.2" "2.11.1.1" "2.11.1" "2.11.0.4" "2.11.0.2" "2.11.0.1"
#> [43] "2.11" "2.10.1" "2.10" "2.9.2.1" "2.9.2" "2.9.1.1"
#> [49] "2.9.1" "2.9" "2.8.1" "2.8.0.1" "2.8" "2.7.3"
#> [55] "2.7.2" "2.7.1" "2.7" "2.6" "2.5" "2.4"
#> [61] "2.3.1" "2.3" "2.2.3.2" "2.2.3.1" "2.2.2.1" "2.2.2"
#> [67] "2.2.1" "2.2" "2.1.3" "2.1.2" "2.1.1" "2.1"
#> [73] "2.0.6" "2.0.5" "2.0.4" "2.0.3"
# Which is the latest version installed (nightly excluded)?
pandoc_installed_latest()
#> [1] "3.1.6.2"
# Is a specific version installed ?
pandoc_is_installed("2.11.4")
#> [1] TRUE
pandoc_is_installed("2.7.3")
#> [1] TRUE
Downloaded bundles are also cached to speed up further installation. This is useful into tests to quickly install and uninstall a pandoc version for a specific test.
pandoc_install("2.7.3")
#> ✔ Pandoc 2.7.3 already installed.
#> Use 'force = TRUE' to overwrite.
pandoc_uninstall("2.7.3")
pandoc_install("2.7.3")
#> ℹ Installing Pandoc release 2.7.3
#> Downloading bundle...
#> ✔ Pandoc version 2.7.3 installed.
To quickly install the last available release, run
pandoc_update()
(alias of
pandoc_install()
which already default to latest
version).
Find where a pandoc binary is located
For any version installed with this package,
pandoc_locate()
will return the folder where it was
installed.
pandoc_locate("2.11.4")
#> [1] "~/.local/share/r-pandoc/2.11.4"
pandoc_locate("nightly")
#> [1] "~/.local/share/r-pandoc/nightly"
For example purposes in this vignette, the path above is in a temp
directory. Correct location is in user’s data directory computed with
rappdirs::user_data_dir()
(e.g on Windows
C:/Users/chris/AppData/Local/r-pandoc/r-pandoc
)
To get the path to a pandoc binary, pandoc_bin()
can be
used
pandoc_bin("2.11.4")
#> ~/.local/share/r-pandoc/2.11.4/pandoc
pandoc_bin("nightly")
#> ~/.local/share/r-pandoc/nightly/pandoc
This function also brings support for external pandoc version, like
- the one shipped with RStudio IDE (
version = "rstudio")
:-
pandoc::pandoc_bin("rstudio")
or aliaspandoc::pandoc_rstudio_bin()
-
- the one set by default on the system (in PATH)
(
version = "system"
):-
pandoc::pandoc_bin("system")
or aliaspandoc::pandoc_system_bin()
-
Activate a Pandoc version
As multiple versions can be installed, a default active pandoc
version will be used with any of the function.
(version = "default"
).
A specific version can be made active using
pandoc_activate()
# Default to latest version installed
pandoc_activate()
#> ✔ Version '3.1.6.2' is now the active one.
#> ℹ Pandoc version also activated for rmarkdown functions.
pandoc_locate()
#> [1] "~/.local/share/r-pandoc/3.1.6.2"
pandoc_bin()
#> ~/.local/share/r-pandoc/3.1.6.2/pandoc
# Activate specific version
pandoc_activate("2.11.4")
#> ✔ Version '2.11.4' is now the active one.
#> ℹ Pandoc version also activated for rmarkdown functions.
pandoc_locate()
#> [1] "~/.local/share/r-pandoc/2.11.4"
pandoc_bin()
#> ~/.local/share/r-pandoc/2.11.4/pandoc
# including nightly
pandoc_activate("nightly")
#> ✔ Version 'nightly' is now the active one.
#> ℹ Pandoc version also activated for rmarkdown functions.
pandoc_locate()
#> [1] "~/.local/share/r-pandoc/nightly"
pandoc_bin()
#> ~/.local/share/r-pandoc/nightly/pandoc
# Activate system version
pandoc_activate("system")
#> ✔ Version 'system' is now the active one.
#> ℹ Pandoc version also activated for rmarkdown functions.
pandoc_bin()
#> /usr/bin/pandoc
A default active version will be set when the package is loaded (i.e
using onLoad
) following this search order:
- Latest version install by this package (i.e
pandoc_installed_latest()
) - Version shipped with RStudio IDE (found when run inside RStudio IDE)
- pandoc binary found in system PATH (i.e
Sys.which("pandoc")
)
pandoc_is_active()
allows to easily know if a specific
version is active or not.
pandoc_is_active("system")
#> [1] TRUE
pandoc_is_active("2.7.3")
#> [1] FALSE
pandoc_activate("2.7.3")
#> ✔ Version '2.7.3' is now the active one.
#> ℹ Pandoc version also activated for rmarkdown functions.
pandoc_is_active("2.7.3")
#> [1] TRUE
Working with rmarkdown functions related to Pandoc
By default, if rmarkdown is installed,
pandoc_activate()
will also set the version active for all
rmarkdown functions (using
rmarkdown::find_pandoc()
). This allows to use this package
easily in order to test rmarkdown with different
version of Pandoc.
pandoc_activate("2.7.3")
#> ✔ Version '2.7.3' is now the active one.
#> ℹ Pandoc version also activated for rmarkdown functions.
rmarkdown::pandoc_available()
#> [1] TRUE
rmarkdown::pandoc_version()
#> [1] '2.7.3'
rmarkdown::find_pandoc()
#> $version
#> [1] '2.7.3'
#>
#> $dir
#> [1] "/home/runner/.local/share/r-pandoc/2.7.3"
These calls are equivalent:
pandoc_activate("2.7.3", rmarkdown = TRUE)
#> ✔ Version '2.7.3' is now the active one.
#> ℹ Pandoc version also activated for rmarkdown functions.
rmarkdown::find_pandoc(cache = FALSE, dir = pandoc::pandoc_locate("2.7.3"))
#> $version
#> [1] '2.7.3'
#>
#> $dir
#> [1] "/home/runner/.local/share/r-pandoc/2.7.3"
If setting the default Pandoc version for rmarkdown
is not desired, just run with rmarkdown = FALSE
pandoc::pandoc_activate("2.11.4", rmarkdown = FALSE)
#> ✔ Version '2.11.4' is now the active one.
rmarkdown::pandoc_version()
#> [1] '2.7.3'
During testing, it also interesting to run a specific code with a
specific version. with_pandoc_version()
or
local_pandoc_version()
allows by running
pandoc_activate()
for the expression only (helper like withr).
# with pandoc package functions
with_pandoc_version("2.11.4", {
pandoc::pandoc_version()
})
#> [1] '2.11.4'
# with rmarkdown package functions
rmarkdown::pandoc_version()
#> [1] '2.11.4'
# It will also activate version for rmarkdown
with_pandoc_version("2.11.4", {
rmarkdown::pandoc_version()
})
#> [1] '2.11.4'
# rmarkdown = FALSE can be set if not desired
with_pandoc_version("2.11.4", rmarkdown = FALSE, {
rmarkdown::pandoc_version()
})
#> [1] '2.11.4'
Default behavior for local_pandoc_version()
and
with_pandoc_version()
is determined by option
pandoc.activate_rmarkdown
.
Check if a pandoc version is available
Is a pandoc version available to use (i.e a version is active), and if so what is the full path ?
if (pandoc_available()) pandoc_bin()
#> ~/.local/share/r-pandoc/2.11.4/pandoc
Is the pandoc activated meeting some requirements ?
# Is the active version above 2.10.1 ?
pandoc_available(min = "2.10.1")
#> [1] TRUE
# Is the active version below 2.11 ?
pandoc_available(max = "2.11")
#> [1] FALSE
# Is the active version between 2.10.1 and 2.11, both side include ?
pandoc_available(min = "2.10.1", max = "2.11")
#> [1] FALSE
Pandoc version can also easily be retrieved, including for external binaries
# Get version from current active one
pandoc_version()
#> [1] '2.11.4'
# Get version for a specific version
pandoc_version("nightly")
#> [1] '3.1.6.2.9999'
# Get version for a specific version
pandoc_version("system") # equivalent to pandoc_system_version()
#> [1] '2.19.2'
Run Pandoc CLI from R
Low level call to Pandoc
pandoc_run()
is the function to call pandoc binary with
some arguments. By default, it will use the active version
(version = "default"
, see
?pandoc_activate
)
pandoc_run("--version")
#> [1] "pandoc 2.11.4"
#> [2] "Compiled with pandoc-types 1.22, texmath 0.12.1, skylighting 0.10.2,"
#> [3] "citeproc 0.3.0.5, ipynb 0.1.0.1"
#> [4] "User data directory: /home/runner/.local/share/pandoc or /home/runner/.pandoc"
#> [5] "Copyright (C) 2006-2021 John MacFarlane. Web: https://pandoc.org"
#> [6] "This is free software; see the source for copying conditions. There is no"
#> [7] "warranty, not even for merchantability or fitness for a particular purpose."
equivalent to calling
pandoc --version
with the correct binary.
Using the version=
argument allows to run a specific
version
pandoc_run("--version", version = "system")
will execute the pandoc command with pandoc binary on PATH.
Convert a document
This function is highly experimental and probability of API change is high.
Main usage of Pandoc is to convert a document. The
pandoc::pandoc_convert()
is currently a thinner wrapper
than rmarkdown::pandoc_convert()
. Both allow to convert a
file but the former also allow to convert from text and not just a
file.
# convert from text directly
pandoc_convert(text = "# A header", to = "html")
#> <h1 id="a-header">A header</h1>
pandoc_convert(text = "# A header", to = "html", version = "system")
#> <h1 id="a-header">A header</h1>
# convert from file
tmp <- tempfile(fileext = ".md")
writeLines("**bold** word!", tmp)
pandoc_convert(tmp, to = "html")
#> <p><strong>bold</strong> word!</p>
# write to file
out <- tempfile(fileext = ".html")
outfile <- pandoc_convert(tmp, to = "html", output = out, standalone = TRUE, version = "system")
readLines(outfile, n = 5)
#> [1] "<!DOCTYPE html>"
#> [2] "<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"\" xml:lang=\"\">"
#> [3] "<head>"
#> [4] " <meta charset=\"utf-8\" />"
#> [5] " <meta name=\"generator\" content=\"pandoc\" />"
Various Wrapper functions around pandoc CLI
All other included functions to run pandoc are wrapping
pandoc_run()
with some command flags from Pandoc MANUAL. Each of these
functions can take the version=
argument to run with a
specific version of Pandoc instead of the current activated one.
Some of those functions can only be used with specific pandoc versions and an error will be thrown if the version requirement is not met.
List supported extensions for a format
pandoc_list_extensions()
#> # A tibble: 68 × 3
#> format extensions default
#> <chr> <chr> <lgl>
#> 1 markdown abbreviations FALSE
#> 2 markdown all_symbols_escapable TRUE
#> 3 markdown angle_brackets_escapable FALSE
#> 4 markdown ascii_identifiers FALSE
#> 5 markdown auto_identifiers TRUE
#> 6 markdown autolink_bare_uris FALSE
#> 7 markdown backtick_code_blocks TRUE
#> 8 markdown blank_before_blockquote TRUE
#> 9 markdown blank_before_header TRUE
#> 10 markdown bracketed_spans TRUE
#> # ℹ 58 more rows
pandoc_list_extensions(format = "gfm")
#> # A tibble: 26 × 3
#> format extensions default
#> <chr> <chr> <lgl>
#> 1 gfm ascii_identifiers FALSE
#> 2 gfm autolink_bare_uris TRUE
#> 3 gfm bracketed_spans FALSE
#> 4 gfm definition_lists FALSE
#> 5 gfm east_asian_line_breaks FALSE
#> 6 gfm emoji TRUE
#> 7 gfm fancy_lists FALSE
#> 8 gfm fenced_code_attributes FALSE
#> 9 gfm fenced_divs FALSE
#> 10 gfm footnotes FALSE
#> # ℹ 16 more rows
pandoc_list_extensions(format = "html", version = "nightly")
#> # A tibble: 17 × 3
#> format extensions default
#> <chr> <chr> <lgl>
#> 1 html ascii_identifiers FALSE
#> 2 html auto_identifiers TRUE
#> 3 html east_asian_line_breaks FALSE
#> 4 html empty_paragraphs FALSE
#> 5 html epub_html_exts FALSE
#> 6 html gfm_auto_identifiers FALSE
#> 7 html line_blocks TRUE
#> 8 html literate_haskell FALSE
#> 9 html native_divs TRUE
#> 10 html native_spans TRUE
#> 11 html raw_html FALSE
#> 12 html raw_tex FALSE
#> 13 html smart FALSE
#> 14 html task_lists FALSE
#> 15 html tex_math_dollars FALSE
#> 16 html tex_math_double_backslash FALSE
#> 17 html tex_math_single_backslash FALSE
List available input or output formats
pandoc_list_formats("input")
#> # A tibble: 38 × 2
#> type formats
#> <chr> <chr>
#> 1 input biblatex
#> 2 input bibtex
#> 3 input commonmark
#> 4 input commonmark_x
#> 5 input creole
#> 6 input csljson
#> 7 input csv
#> 8 input docbook
#> 9 input docx
#> 10 input dokuwiki
#> # ℹ 28 more rows
pandoc_list_formats("output")
#> # A tibble: 61 × 2
#> type formats
#> <chr> <chr>
#> 1 output asciidoc
#> 2 output asciidoctor
#> 3 output beamer
#> 4 output biblatex
#> 5 output bibtex
#> 6 output commonmark
#> 7 output commonmark_x
#> 8 output context
#> 9 output csljson
#> 10 output docbook
#> # ℹ 51 more rows
pandoc_list_formats("output", version = "nightly")
#> # A tibble: 65 × 2
#> type formats
#> <chr> <chr>
#> 1 output asciidoc
#> 2 output asciidoc_legacy
#> 3 output asciidoctor
#> 4 output beamer
#> 5 output biblatex
#> 6 output bibtex
#> 7 output chunkedhtml
#> 8 output commonmark
#> 9 output commonmark_x
#> 10 output context
#> # ℹ 55 more rows
List available highlight style
pandoc_list_highlight_style()
#> [1] "pygments" "tango" "espresso" "zenburn" "kate"
#> [6] "monochrome" "breezedark" "haddock"
List supported highlight language
pandoc_list_highlight_languages()
#> [1] "abc" "actionscript" "ada" "agda"
#> [5] "apache" "asn1" "asp" "ats"
#> [9] "awk" "bash" "bibtex" "boo"
#> [13] "c" "changelog" "clojure" "cmake"
#> [17] "coffee" "coldfusion" "comments" "commonlisp"
#> [21] "cpp" "cs" "css" "curry"
#> [25] "d" "default" "diff" "djangotemplate"
#> [29] "dockerfile" "dot" "doxygen" "doxygenlua"
#> [33] "dtd" "eiffel" "elixir" "elm"
#> [37] "email" "erlang" "fasm" "fortranfixed"
#> [41] "fortranfree" "fsharp" "gcc" "glsl"
#> [45] "gnuassembler" "go" "graphql" "groovy"
#> [49] "hamlet" "haskell" "haxe" "html"
#> [53] "idris" "ini" "isocpp" "j"
#> [57] "java" "javadoc" "javascript" "javascriptreact"
#> [61] "json" "jsp" "julia" "kotlin"
#> [65] "latex" "lex" "lilypond" "literatecurry"
#> [69] "literatehaskell" "llvm" "lua" "m4"
#> [73] "makefile" "mandoc" "markdown" "mathematica"
#> [77] "matlab" "maxima" "mediawiki" "metafont"
#> [81] "mips" "modelines" "modula2" "modula3"
#> [85] "monobasic" "mustache" "nasm" "nim"
#> [89] "noweb" "objectivec" "objectivecpp" "ocaml"
#> [93] "octave" "opencl" "pascal" "perl"
#> [97] "php" "pike" "postscript" "povray"
#> [101] "powershell" "prolog" "protobuf" "pure"
#> [105] "purebasic" "python" "qml" "r"
#> [109] "relaxng" "relaxngcompact" "rest" "rhtml"
#> [113] "roff" "ruby" "rust" "scala"
#> [117] "scheme" "sci" "sed" "sgml"
#> [121] "sml" "spdxcomments" "sql" "sqlmysql"
#> [125] "sqlpostgresql" "stata" "tcl" "tcsh"
#> [129] "texinfo" "toml" "typescript" "verilog"
#> [133] "vhdl" "xml" "xorg" "xslt"
#> [137] "xul" "yacc" "yaml" "zsh"
Export a data file
outfile <- pandoc_export_data_file(file = "styles.html")
#> ✔ Template written to styles.html
outfile
#> [1] "styles.html"
readLines(outfile, n = 5)
#> [1] "$if(document-css)$"
#> [2] "html {"
#> [3] " line-height: $if(linestretch)$$linestretch$$else$1.5$endif$;"
#> [4] " font-family: $if(mainfont)$$mainfont$$else$Georgia, serif$endif$;"
#> [5] " font-size: $if(fontsize)$$fontsize$$else$20px$endif$;"
Export a highlight style JSON file
outfile <- pandoc_export_highlight_theme(style = "zenburn")
#> ✔ Style written to zenburn.theme
outfile
#> zenburn.theme
readLines(outfile, n = 5)
#> [1] "{"
#> [2] " \"text-color\": \"#cccccc\","
#> [3] " \"background-color\": \"#303030\","
#> [4] " \"line-number-color\": null,"
#> [5] " \"line-number-background-color\": null,"
Export a DOCX or PTTX reference doc
ref_docx <- pandoc_export_reference_doc(type = "docx")
#> ✔ Template written to reference.docx
ref_docx
#> reference.docx
ref_pptx <- pandoc_export_reference_doc(type = "pptx")
#> ✔ Template written to reference.pptx
ref_pptx
#> reference.pptx
Export a template for a format
pandoc_export_template(format = "jira")
#> $for(include-before)$
#> $include-before$
#>
#> $endfor$
#> $body$
#> $for(include-after)$
#>
#> $include-after$
#> $endfor$
outfile <- pandoc_export_template(format = "latex", output = "default.latex")
#> ✔ Template written to default.latex
outfile
#> [1] "default.latex"
readLines(outfile, n = 5)
#> [1] "% Options for packages loaded elsewhere"
#> [2] "\\PassOptionsToPackage{unicode$for(hyperrefoptions)$,$hyperrefoptions$$endfor$}{hyperref}"
#> [3] "\\PassOptionsToPackage{hyphens}{url}"
#> [4] "$if(colorlinks)$"
#> [5] "\\PassOptionsToPackage{dvipsnames,svgnames*,x11names*}{xcolor}"
Helpers to easily browse Pandoc’s online resources
pandoc_browse_*()
helpers are included to quickly open
an online document like the Pandoc MANUAL
(pandoc_browse_manual()
) or a documentation for an
extensions (pandoc_browse_extension("smart")
). See reference
doc for more.