In part 1 of Converting LaTeX to Word, I explained how I used Pandoc to convert from LaTeX to Word (doc, docx, RTF), but there were problems getting figure reference numbers to show up, because by default Pandoc cannot handle automatic numbering and referencing of figures like Latex can. The [pandoc-reference-filter] package was written to solve this problem.
- Pandoc Latex To Word Converter
- Pandoc Convert Latex To Word
- Pandoc Convert Pdf To Word
- Pandoc Latex To Word Converter
Dec 04, 2017 Word to Markdown using Pandoc. Markdown has become the de-facto standard for writing software documentation. This post documents my experience using Pandoc to convert Word documents (docx) to markdown. To follow along, install Pandoc, if you haven’t done so already. Word documents need to be in the docx format. Legacy binary doc files are not. Easy LaTeX with Markdown and Pandoc. Jul 13, 2015 ∙ 4 mins I remember writing my very first LaTeX report in high school, learning the most useful commands, what goes in the preamble, including other documents, getting the figures to show up etc.
[Note: Unfortunately I couldn’t get Pandoc to recognize the filters package – probably due to my inability to install Python packages in Windows correctly (doh!) – so I stopped trying because Latex2RTF works well for me, for now.]
How to get figure and table references to show up with the Pandoc LaTex to Word conversion scripts
- You need Python and of course Pandoc
- Install [pandoc-reference-filter] and [pandocfilters version 1.2.3]
- And then finally get your main ingredients together: see the previous post here
- Follow the markup and usage examples in [pandoc-reference-filter], compile with Pandoc, and your figures should be numbered and referenced correctly
Pros and Cons of using Pandoc to convert from Latex to Word
- Pandoc is a simple but useful tool to convert documents type. You can download it from or If you like to install it from command line th.
- Pandoc parses the LaTeX, but is not a full TeX interpreter. By design, it cannot support all packages and document classes. The elsarticle class requires a custom methods to specify metadata, and that custom method is not supported by pandoc.
- Feb 25, 2019 Pandoc handles Latex equations nicely, all the equations are converted into Word equation editor so there is no requirement of MathType.
+ Pros
- You can use the 7000+ (as of this date) style files already available in the Zotero and CSL repositories. And BTW, csl style files are much easier to edit than bst files! This, IMO, is a huge benefit to using Pandoc for your conversions.
- You can convert to many other formats besides doc/rtf (e.g. HTML)
- Can easily define (hardcode) the name of the output file in the conversion script. This is handy because you might want to call the first draft “filename_v1.doc”, and after a revision, call it “filename_v2.doc”. For each revision, you just have to change the output filename in the script and every time you run the script, it will give it the name you predefined.
– Cons
- You need Python (hopefully this isn’t a dealbreaker for most people, though I couldn’t get it to work myself)
- Page breaks still don’t work, but there might a solution to that, somewhere, in some corner of the interwebs…
- Some special Latex commands may not work (check the FAQ and the mailing lists for further help)
- To solve this last issue, it’s been suggested to write directly in Markdown rather than Latex – though this defeats the purpose of writing in Latex.
Other posts in the LaTeX to RTF conversion series
- Using Scrivener (AKA Converting LaTeX to Word – part 4) — coming soon
Markdown has become the de-facto standard for writing software documentation. This post documents my experience using Pandoc to convert Word documents (docx) to markdown.
To follow along, install Pandoc, if you haven’t done so already. Word documents need to be in the docx format. Legacy binary doc files are not supported.
Pandoc supports several flavors of markdown such as the popular GitHub flavored Markdown (GFM). To produce a standalone GFM document from docx, run
The --extract-media
option tells Pandoc to extract media to a ./media
folder.
Creating a PDF
To create a PDF, run
Pandoc requires (LaTeX) to produce the PDF. Remove --toc
option if you don’t want Pandoc to create a table of contents (TOC). Remove -N
option if you don’t want it to number sections automatically.
Markdown Editor
You’ll need a text editor to edit a markdown file. I use vscode. It has built-in support for editing and previewing markdown files. I use a few additional plugins to make editing markdown files more productive
HTML in Markdown
GFM allows HTML blocks in markdown. These get rendered when previewed in vscode, GitHub, or GitLab. Pandoc suppresses raw HTML output to PDF format and hence HTML blocks get rendered as plain text. For example, <sup>1</sup>
gets rendered as (1) instead of (^1). You can use ^text^
in Pandoc’s markdown syntax to render superscript.
You can use HTML character entities to write out characters and symbols not available on the keyboard.
Tables
Pandoc converts docx tables whose cells contain a single line of text each, to the pipe table syntax. Column text alignment is not rendered—you can add that back using colons. Relative column widths can be specified using dashes. Pipe table cells with long text or images, may stretch beyond the page.
Tables in docx that have complex data in cells such as lists and multiple lines, are converted to HTML table syntax. That is highly unfortunate because Pandoc renders HTML tables to PDF as plain text.
It is not unusual for docx tables, with complex layouts such as merged cells, to be missing columns or rows. I suggest simplifying such tables, in the original docx, before conversion.
Review all tables very carefully!
I’ve obtained nice results with Pandoc’s grid table syntax, but these tables cannot be previewed in vscode, GitHub, or GitLab.
Table of Contents
Pandora converts TOC in docx as a sequence of lines, where each line corresponds to a topic or section. Section headings are generated without numbering. I suggest deleting the TOC, and using the command line options discussed earlier to number sections and to render TOC.
If you have cross-references in docx that use section numbers, you can generate a hyperlinked TOC using the Markdown TOC plugin of vscode. The plugin can also add, update, or remove section numbers.
I suggest avoiding section numbers for cross-referencing and using hyperlinked section references instead.
Images
Pandoc Latex To Word Converter
Images are exported to their native format and size. They are rendered in GFM using the ![[caption]](path)
syntax. Image sizes cannot be customized in GFM syntax, but Pandoc’s markdown syntax allows setting image attributes such as width using the ![[caption]](path){key1=value1 key2=value2}
syntax.
Figures
Pandoc does not convert vector diagrams created using Word’s figures and shapes. You’ll need to screen grab, or copy and paste, the image rendered by Word.
You can use mermaid.js syntax to recreate diagrams such as flowcharts and message sequence charts. mermaid.js syntax can be embedded in markdown, and converted using mermaid-filter
GitHub doesn’t yet allow you to preview mermaid.js diagrams, but GitLab does. vscode is able to preview them using the Markdown Preview Mermaid Support plugin.
Captions
Pandoc converts captions in the docx as plain text positioned after an image or table. I suggest using Pandoc’s native markdown syntax for captions.
Pandoc Convert Latex To Word
Cross-references
GFM does not natively support linking to figures and tables, and HTML anchors are not a viable option with Pandoc. Link to the section containing a figure or table when referencing it from other parts of the document.
Figure and table numbers in docx may sometimes go missing from cross-references.
I suggest reviewing captions and cross-references very carefully!
Pandoc Convert Pdf To Word
Large Documents
Pandoc can handle large documents that have hundreds of pages. You may want to maintain large documents in separate markdown files. This makes concurrent editing productive and allows for reuse. It also allows for faster previews on GitHub or GitLab. In fact, previewing may entirely fail to work for complex documents. You may want to pre-render such documents to HTML using Pandoc.
Pandoc is capable of converting multiple markdown files
Regular Expressions
Using regular expressions significantly speeds up your ability to search and replace text. Some examples follow
Pandoc Latex To Word Converter
Empty heading
^#+s*$
Line with trailing spaces
s+$
Repeated whitespace between words
bss+b
Whitespace before , or .
s+[,;.]
Paragraph starts with small case
nn[a-z]
Word figure not followed by a number
figures+(?!([d]){1,})
Word section not followed by a number
sections+(?!(d+.*d*?){1,})