One of the oft-cited issues with switching to Linux is file compatibility. You’ll invariably send users of other operating systems files, and they won’t look the same when opened in applications like Word. While you can install fonts or try VMs or emulators to try to ensure things come out with consistent look, another approach is to do your work in a plain text format, then convert it after you’re done.
One tool you can use to convert between formats is pandoc, an essential tool in any Linux user’s toolbox.
Basic Pandoc Installation and Usage
Installing pandoc on most Linux distributions is a matter of a simple trip to the repositories. On Ubuntu-based systems, the following command installs it for you:
sudo apt-get install pandoc
Once installed, you can start using the command line program to convert files. Excellent at handling Markdown and other lightweight markup languages, if you have an .MD file lying around, you can convert this to HTML with the following:
pandoc -o myfile.html myfile.md
The -o flag tells the name of the output file you want. In this case it also infers the output format (HTML) by the filename extension. You can use the -r (for read) and -w (for write) flags to tell pandoc the type of conversion you want. Suppose you’re used to writing in Markdown, but need to post something to a Mediawiki-based page:
pandoc -r markdown -w mediawiki -o markdown.wiki markdown.md
In its earlier versions, pandoc was focused on “upgrading” files, in the sense it could convert simpler formats (such as Markdown) to more complex ones (e.g. ODT or Microsoft’s DOCX). But it will now read these more complicated formats as well. This means if you’re accustomed to a word processor but are tempted by all the reasons to use a smaller and more portable plain text format, it has become a lot easier.
Given a directory full of Word files, the following command will convert each of them to Markdown:
for file in * do pandoc -r docx -w markdown -o "$file".md "$file" done
Note that this will leave you with files named filename.docx.md, so you’ll need to run a quick rename command (or better yet, add it to the above as a shell script).
Pandoc Command Line Options
Now that you’ve got some basics, we’ll look at some of the more advanced options of pandoc’s command line options.
ODT/DOCX Reference Files
Suppose you’ve converted all your old, bulky word processor files to Markdown. While you’re reveling in the joy of authoring in plain text, at some point you’ll need to share these with someone. And that someone may not be as enlightened as you. You can simply reverse the read and write flags to convert your file back to Word format:
pandoc -r markdown -w docx -o wordfile.docx wordfile.md
But some folks like their Word files with particular fonts, numbered headings, etc. Pandoc’s DOCX back-end supports template files, called reference files, for just such an occasion. These are ODT or DOCX files you’ve set up with all the styling you need. Then pandoc applies these styles when it converts if you pass it the reference file at the command line:
pandoc -r markdown -w odt --reference-odt=/home/user/path/to/ref-file.odt -o lowriter.odt lowriter.md
Notice how the fonts configured in the reference file above (Arial Black for Heading 1, etc.) display in the converted file below. You can create as many of these reference files as you need (for example, one per client). Then ignore formatting entirely while you’re writing, and apply the styling in one step as you convert.
PDF Rendering Back-Ends
Creating PDFs is also a simple exercise, once you install some necessary packages. A lightweight way to get PDF-writing capability is to install the wkhtmltopdf package, a command line tool to convert HTML to PDF. pandoc supports this natively, so if you set the write flag to HTML, but the output file as PDF, it will interpret this as your intent to use wkhtmltopdf all by itself!
pandoc -r markdown -w html -o nicepub.pdf nicepub.md
Alternately, you can go for the full-featured option by using the TeTex typesetting system. Take advantage of the fact that these packages are Suggested Installs for the pandoc package by re-installing with the following command:
sudo apt-get install --install-suggests pandoc
Then, sit back while a lot (really, a lot) of packages install. Once they’re complete, you can convert your file directly to PDF by specifying it as the write flag:
pandoc -r markdown -w pdf -o nicepub-tetex.pdf nicepub.md
While the wkhtmltopdf option requires the install of only one package, you can get some more print-friendly results with TeTex. Namely, serif fonts are used by default, and the pages are automatically numbered.
Finally, pandoc can convert your files to ebooks suitable for reading on a phone or e-reader. The epub and epub3 back-ends will give you a properly formatted ebook:
pandoc -r markdown -w epub -o mybook.epub mybook.md
The advantages of pandoc go beyond its power as a command line utility… for example, it includes support for an improved version of Markdown, and can easily be integrated with graphical applications.
Pandoc’s Markdown Flavor
In addition to being a conversion tool, pandoc supports a slightly enhanced flavor of Markdown. By using pandoc instead of the standard markdown command, you have some additional features available, including the following:
- Metadata — Pandoc’s flavor of Markdown allows you to include information in the header of your document such as author, date, email address, etc.
- Text Decorations — You can apply text decorations such as strikethrough or super/subscript that aren’t supported in standard Markdown through pandoc.
- Tables — This alone makes pandoc worthwhile compared to “vanilla” Markdown. Using the pipe character to separate table cells, you can create a table that ranges from really ugly to human-readable in plain text as well as rendered format.
- Fancy Lists — Pandoc allows you to format lists with outline-style levels, e.g. “1.,” then “A.,” then “i.,” etc. You can also specify a starting number for lists, where lists in plain Markdown start from “1.”
- Code Syntax Highlighting — You can have highlighting applied to your code blocks by telling pandoc what the language is.
The above are only a selection of pandoc Markdown’s features. Visit the manual page on pandoc.org for a full list of the extras this flavor of Markdown provides.
Use a GUI With pandoc
While pandoc is effective as a command-line tool, it does contain a lot of options. If you’re new to Linux, you may prefer to use pandoc with a GUI interface. While it doesn’t contain a graphical interface by default, you can install PanDocElectrion to convert your docs with point-and-click. Download the install script from the app’s website, then run it to install all the necessary packages and the program itself.
Once installed, the npm start command in the PanDocElectron directory will launch the application. With dropdown lists for formats and the ability to choose the input file with a dialog, this will help you get used to the “in and outs” of pandoc, as it were.
If you’re comfortable with pandoc’s myriad options and flags but just want a way to easily call it, you can integrate it with your GUI text editor. For example, the Atom editor contains a number of packages that provide the ability to save the current file out to different formats using pandoc (package pandoc-convert):
Another option is to run pandoc commands using an editor’s built-in functions, such as the build command. Atom’s build-tools package gives you the ability to specify custom commands:
Then, you can call the build command on your pandoc-compatible files, just as you would on source code:
Pandoc Takes Some of the Stress Out of Switching
With pandoc in your toolkit, you can rest easier knowing you can always get your documents to other people in the format they need. At the same time, you can take advantage of some of the great features of Linux (consider giving one of the terminal-based text editors like vim a try).
Do you often find yourself converting files back and forth between formats? If you’re running into compatibility problems, let us know in the comments, and we’ll see if we can use pandoc to sort you out!
Image Credit: Nirat.pix via Shutterstock.com