用pandoc把markdown转化为pdf文档

本文主要是介绍用pandoc把markdown转化为pdf文档，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

Latex环境
第一个Latex文档
Latex格式简介
- 源文件的结构
使用texdoc获得帮助
Latex中文支持
- 列出可用的字体
- 安装新的字体
- 在pandoc中指定中文字体
- pandoc中的中文换行问题
- 无序列表前没有点号的原因
pandoc中的pdf模板
关于字体的更多说明
- 设置标题的字体为黑体

本文的目的是由markdown生成pdf格式的文件。其基本转换流程如下：

可以看到这里需要Latex的支持。Latex中需要设置中文的字体。不然，生成的pdf中，中文将无法显示。

这里先简单的介绍下Latex以及其中文支持。

Latex环境

ubuntu 15.10

xuyang@ubuntu15:~/blog$ uname -a
Linux ubuntu15 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 15:35:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

安装的时候，选择语言为简体中文。该系统的安装过程详见ubuntu 15.10安装过程及其上的软件包

Latex

Latex安装的是texlive-full包，这是个比较暴力的方法。该包的安装需要较长的时间（估计1个小时左右）

第一个Latex文档

在进行Latex介绍之前，先看一个实例。这是一个包含中文的最简单的latex文件：

\documentclass[12pt]{article}
\usepackage{CJK}
\begin{document}
\begin{CJK}{UTF8}{gbsn}
你好世界
\end{CJK}
\begin{CJK}{UTF8}{gkai}
你好世界
\end{CJK}
\end{document}

可以使用latex aa.tex ; dvipdfm aa.dvi ;生成aa.pdf文件。效果如图所示：

Latex格式简介

基于上面的这一份简单文档，我们非常简单的介绍一下Latex格式。对Latex格式有一些基本的了解以后，就可以方便的去修改pandoc的模板，以生成自己想要的效果。

参考文档为一份不太简短的LATEX2ε介绍

源文件的结构

每个文档必须以\documentclass[options]{class}开始，指定文档类，比如book,article等

比如：\documentclass[11pt,twoside,a4paper]{article}
开始语句后，可以引入各种宏包，用以控制各种排版格式，引入宏包的格式为\usepackage[options]{package}。宏包引入以后，就可以在后面的正文中使用包中定义的宏来控制文档的内容和格式。

有很多宏包是和Latex一起发布的，要查询给定宏包的说明文档，可以使用texdoc命令。比如texdoc fontenc会打开fontenc的文档。关于texdoc的使用方法可以使用texdoc texdoc看到。
页面样式\pagestyle{style} , style值为plain,headings,empty三者之一，其中plain在页脚显示页码;headings在页眉中显示章节名及页码，页脚空白;empty页眉和页脚空白。
文档开始\begin{document}
文档内容, 其中可以包含其他tex文档的内容，这个在大型项目（比如一本书）中很有用，你可以把每一章做成一个tex文件，然后在书的tex文件中包含各个章节的内容。语法是\include{filename}
文档结束\end{document}

使用texdoc获得帮助

上面的介绍算是最简单的介绍，在实际的使用过程中，这是远远不够的，因此我们需要学会怎样去查阅tex的文档。

基本命令使用texdoc的基本命令是texdoc <package-name>，这个命令将打开包<package-name>的文档。这个package-name 可以指定多个。
模式

texdoc提供多种模式，你可以选择一种模式来浏览latex的文档。
- view模式，显示指定包的文档
- list模式，列出相关的文档供你选择，然后你选择一个需要文档打开
- mix模式，混合上面的两种模式，如果只有一个相关文档，那就打开这个文档，否则就会列出相关的文档
- showall模式，列出比list模式更多的相关文档。
要使用给定的模式来浏览文档，使用命令行texdoc [mode] <name>

其中，模式参数为

texdoc模式选项
模式	选项	长格式选项
view	-w	–view
list	-m	–mix
mix	-l	–list
showall	-s	–showall

举例来说，你想要查询latex中关于title的文档，可以使用texdoc -l title,输出如下

xuyang@ubuntu15:~/mybook$ texdoc -l title1 /usr/share/texmf/doc/context/third/title/title-doc.pdf2 /usr/share/texlive/texmf-dist/doc/latex/titleref/titleref.pdf= Package documentation3 /usr/share/texmf/doc/context/third/title/README4 /usr/share/texlive/texmf-dist/doc/latex/tufte-latex/graphics/be-title.pdf5 /usr/share/texlive/texmf-dist/doc/latex/tufte-latex/graphics/ei-title.pdf6 /usr/share/texlive/texmf-dist/doc/bibtex/economic/itaxpf-ex-title.pdf7 /usr/share/texlive/texmf-dist/doc/latex/tufte-latex/graphics/vdqi-title.pdf8 /usr/share/texlive/texmf-dist/doc/latex/tufte-latex/graphics/ve-title.pdf
Please enter the number of the file to view, anything else to skip:

如果使用-s模式，将会输出长达60多项的选择。

如果要在所有的tex文档中进行搜索，可以使用texdoc -l <keyword>进行搜索，这样会列出所有文件名中包含<keyword>的文档，你可以选择一份进行阅读。

Latex中文支持

列出可用的字体

使用命令fc-list,

要列出所有的中文字体，可使用命令fc-list -f "%{family}\n" :lang=zh，以下是我安装了一些字体以后的输出

AR PL UMing CN
AR PL UKai TW MBE
黑体,SimHei
AR PL UKai HK
Droid Sans Fallback
AR PL UKai CN
楷体_GB2312,KaiTi_GB2312
AR PL UKai TW
AR PL UMing HK
仿宋_GB2312,FangSong_GB2312
AR PL UMing TW
华文行楷,STXingkai
AR PL UMing TW MBE

其中，,后面是该字体的别名。

安装新的字体

可以从windows下拷贝一些中文字体过来，一下是使用这些字体的步骤

mkdir -p /usr/share/fonts/myfonts 在fonts目录下创建一个新的目录myfonts
把字体文件考进去，比如SIMFANG.TTF、SimHei.sfd、SIMHEI.TTF等等
进入这个字体目录，也就是/usr/share/fonts/myfonts，运行下面的命令：
1. sudo mkfontscale create an index of scalable font files for X。这个命令将会在当前目录下创建一个fonts.scale文件
2. sudo mkfontdir create an index of X font files in a directory。这个命令会会在当前目录下创建一个fonts.dir文件
3. sudo fc-cache -fv build font information cache files。

重启电脑，然后运行fc-list，在输出里面应该就可以看到新添加的字体，比如

/usr/share/fonts/myfonts/SIMFANG.TTF: 仿宋_GB2312,FangSong_GB2312:style=Regular
/usr/share/fonts/myfonts/SIMHEI.TTF: 黑体,SimHei:style=Regular
/usr/share/fonts/myfonts/SIMKAI.TTF: 楷体_GB2312,KaiTi_GB2312:style=Regular

如此，新的字体就可以使用了。

在pandoc中指定中文字体

这篇文档是用markdown写的，pandoc可以直接把markdown转为pdf，试着直接使用命令：

pandoc -f markdown -t latex --latex-engine=xelatex -o ubuntu_latex.md ubuntu_latex.md

注意，这里的参数--latex-engine=xelatex是必须的，否则latex会抱怨说有utf8编码不认识。利用上面的命令可以生成一个pdf，但是其中只有英文，没有中文！

我利用pandoc -D latex仔细查看了下其latex模板文件，其中有这样的行

$if(mainfont)$\setmainfont{$mainfont$}
$endif$
$if(sansfont)$\setsansfont{$sansfont$}
$endif$
$if(monofont)$\setmonofont[Mapping=tex-ansi]{$monofont$}
$endif$
$if(mathfont)$\setmathfont(Digits,Latin,Greek){$mathfont$}
$endif$

这里可以看到pandoc模板中是可以指定字体的，而pandoc的-M参数可以用来指定pandoc的 meta data。因此使用下面的命令行：

pandoc -f markdown -t latex  -M mainfont:KaiTi_GB2312 -M sansfont:FangSong_GB2312 -M monofont:SimHei --latex-engine=xelatex -o aa.pdf ubuntu_latex.md

其中，指定的字体是前面利用fc-list看到的中文字体。最终生成的pdf文件中中文可以正常显示。

现在看到的问题是，pdf中过长的行没有被自动换行而导致行后面的文字都丢掉了。但是英文的换行没有问题。

pandoc中的中文换行问题

上面问题在与中文行没有被换行，参考了一下Tzeng Yuxio同学的模板，中文的换行通过\XeTeXlinebreaklocale "zh"实现。而在pandoc的缺省模板中是没有这个设置的。因此考虑修改缺省的latex模板。方法如下：

导出缺省的模板 pandoc -D latex > template.latex
在模板的这个部分下面加上这一行\XeTeXlinebreaklocale "zh"
```
$if(mathfont)$\setmathfont(Digits,Latin,Greek){$mathfont$}
$endif$
```
在上面的命令行中指定使用新的模板。

新的命令行如下：

pandoc -f markdown -t latex -s -M mainfont:KaiTi_GB2312 -M sansfont:FangSong_GB2312 -M monofont:SimHei --latex-engine=xelatex -o aa.pdf  -M geometry:"top=1in, inner=1in,outer=1in,bottom=1in, headheight=3ex, headsep=2ex" --template=./template.tex ubuntu_latex.md

注意，这里除了指定模板外，还指定了geometry，因为我发现即使中文换行了，还有一些代码到了页面以外，故而加上了geometry参数，这个命令行以后，生成的pdf文档基本上是一个看上来还可以的文档。不过字体的设置看起来不怎么合理，还有无序列表前的点没有出现，还需要进一步的调整。首先是要处理下无序列表。

无序列表前没有点号的原因

使用上面的命令行，无序列表前的点号总是出不来，四处查找，发现1这个问题是字体文件导致的。使用的字体中没有点号，所以无序列表的点号没有了。解决方案是使用比较新的字体文件。另一个比较简单的方法是在latex文件的导言区添加这一行指定这个点号

\renewcommand\labelitemi{\ensuremath{\bullet}}

对我而言，具体的添加位置是\XeTeXlinebreaklocale "zh"这一行的后面。

之后，生成的无序列表就没问题了。

pandoc中的pdf模板

pandoc对于latex输出使用的缺省模板如下

pandoc -D latex
#> \documentclass[$if(fontsize)$$fontsize$,$endif$$if(lang)$$babel-lang$,$endif$$if(papersize)$$papersize$paper,$endif$$for(classoption)$$classoption$$sep$,$endfor$]{$documentclass$}
#> $if(fontfamily)$
#> \usepackage[$for(fontfamilyoptions)$$fontfamilyoptions$$sep$,$endfor$]{$fontfamily$}
#> $else$
#> \usepackage{lmodern}
#> $endif$
#> $if(linestretch)$
#> \usepackage{setspace}
#> \setstretch{$linestretch$}
#> $endif$
#> \usepackage{amssymb,amsmath}
#> \usepackage{ifxetex,ifluatex}
#> \usepackage{fixltx2e} % provides \textsubscript
#> \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
#>   \usepackage[$if(fontenc)$$fontenc$$else$T1$endif$]{fontenc}
#>   \usepackage[utf8]{inputenc}
#> $if(euro)$
#>   \usepackage{eurosym}
#> $endif$
#> \else % if luatex or xelatex
#>   \ifxetex
#>     \usepackage{mathspec}
#>   \else
#>     \usepackage{fontspec}
#>   \fi
#>   \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
#> $if(euro)$
#>   \newcommand{\euro}{€}
#> $endif$
#> $if(mainfont)$
#>     \setmainfont[$for(mainfontoptions)$$mainfontoptions$$sep$,$endfor$]{$mainfont$}
#> $endif$
#> $if(sansfont)$
#>     \setsansfont[$for(sansfontoptions)$$sansfontoptions$$sep$,$endfor$]{$sansfont$}
#> $endif$
#> $if(monofont)$
#>     \setmonofont[Mapping=tex-ansi$if(monofontoptions)$,$for(monofontoptions)$$monofontoptions$$sep$,$endfor$$endif$]{$monofont$}
#> $endif$
#> $if(mathfont)$
#>     \setmathfont(Digits,Latin,Greek)[$for(mathfontoptions)$$mathfontoptions$$sep$,$endfor$]{$mathfont$}
#> $endif$
#> $if(CJKmainfont)$
#>     \usepackage{xeCJK}
#>     \setCJKmainfont[$for(CJKoptions)$$CJKoptions$$sep$,$endfor$]{$CJKmainfont$}
#> $endif$
#> \fi
#> % use upquote if available, for straight quotes in verbatim environments
#> \IfFileExists{upquote.sty}{\usepackage{upquote}}{}
#> % use microtype if available
#> \IfFileExists{microtype.sty}{%
#> \usepackage{microtype}
#> \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
#> }{}
#> $if(geometry)$
#> \usepackage[$for(geometry)$$geometry$$sep$,$endfor$]{geometry}
#> $endif$
#> \usepackage{hyperref}
#> $if(colorlinks)$
#> \PassOptionsToPackage{usenames,dvipsnames}{color} % color is loaded by hyperref
#> $endif$
#> \hypersetup{unicode=true,
#> $if(title-meta)$
#>             pdftitle={$title-meta$},
#> $endif$
#> $if(author-meta)$
#>             pdfauthor={$author-meta$},
#> $endif$
#> $if(keywords)$
#>             pdfkeywords={$for(keywords)$$keywords$$sep$; $endfor$},
#> $endif$
#> $if(colorlinks)$
#>             colorlinks=true,
#>             linkcolor=$if(linkcolor)$$linkcolor$$else$Maroon$endif$,
#>             citecolor=$if(citecolor)$$citecolor$$else$Blue$endif$,
#>             urlcolor=$if(urlcolor)$$urlcolor$$else$Blue$endif$,
#> $else$
#>             pdfborder={0 0 0},
#> $endif$
#>             breaklinks=true}
#> \urlstyle{same}  % don't use monospace font for urls
#> $if(lang)$
#> \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
#>   \usepackage[shorthands=off,$for(babel-otherlangs)$$babel-otherlangs$,$endfor$main=$babel-lang$]{babel}
#> $if(babel-newcommands)$
#>   $babel-newcommands$
#> $endif$
#> \else
#>   \usepackage{polyglossia}
#>   \setmainlanguage[$polyglossia-lang.options$]{$polyglossia-lang.name$}
#> $for(polyglossia-otherlangs)$
#>   \setotherlanguage[$polyglossia-otherlangs.options$]{$polyglossia-otherlangs.name$}
#> $endfor$
#> \fi
#> $endif$
#> $if(natbib)$
#> \usepackage{natbib}
#> \bibliographystyle{$if(biblio-style)$$biblio-style$$else$plainnat$endif$}
#> $endif$
#> $if(biblatex)$
#> \usepackage$if(biblio-style)$[style=$biblio-style$]$endif${biblatex}
#> $if(biblatexoptions)$\ExecuteBibliographyOptions{$for(biblatexoptions)$$biblatexoptions$$sep$,$endfor$}$endif$
#> $for(bibliography)$
#> \addbibresource{$bibliography$}
#> $endfor$
#> $endif$
#> $if(listings)$
#> \usepackage{listings}
#> $endif$
#> $if(lhs)$
#> \lstnewenvironment{code}{\lstset{language=Haskell,basicstyle=\small\ttfamily}}{}
#> $endif$
#> $if(highlighting-macros)$
#> $highlighting-macros$
#> $endif$
#> $if(verbatim-in-note)$
#> \usepackage{fancyvrb}
#> \VerbatimFootnotes % allows verbatim text in footnotes
#> $endif$
#> $if(tables)$
#> \usepackage{longtable,booktabs}
#> $endif$
#> $if(graphics)$
#> \usepackage{graphicx,grffile}
#> \makeatletter
#> \def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
#> \def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
#> \makeatother
#> % Scale images if necessary, so that they will not overflow the page
#> % margins by default, and it is still possible to overwrite the defaults
#> % using explicit options in \includegraphics[width, height, ...]{}
#> \setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
#> $endif$
#> $if(links-as-notes)$
#> % Make links footnotes instead of hotlinks:
#> \renewcommand{\href}[2]{#2\footnote{\url{#1}}}
#> $endif$
#> $if(strikeout)$
#> \usepackage[normalem]{ulem}
#> % avoid problems with \sout in headers with hyperref:
#> \pdfstringdefDisableCommands{\renewcommand{\sout}{}}
#> $endif$
#> $if(indent)$
#> $else$
#> \IfFileExists{parskip.sty}{%
#> \usepackage{parskip}
#> }{% else
#> \setlength{\parindent}{0pt}
#> \setlength{\parskip}{6pt plus 2pt minus 1pt}
#> }
#> $endif$
#> \setlength{\emergencystretch}{3em}  % prevent overfull lines
#> \providecommand{\tightlist}{%
#>   \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
#> $if(numbersections)$
#> \setcounter{secnumdepth}{5}
#> $else$
#> \setcounter{secnumdepth}{0}
#> $endif$
#> $if(subparagraph)$
#> $else$
#> % Redefines (sub)paragraphs to behave more like sections
#> \ifx\paragraph\undefined\else
#> \let\oldparagraph\paragraph
#> \renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
#> \fi
#> \ifx\subparagraph\undefined\else
#> \let\oldsubparagraph\subparagraph
#> \renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
#> \fi
#> $endif$
#> $if(dir)$
#> \ifxetex
#>   % load bidi as late as possible as it modifies e.g. graphicx
#>   $if(latex-dir-rtl)$
#>   \usepackage[RTLdocument]{bidi}
#>   $else$
#>   \usepackage{bidi}
#>   $endif$
#> \fi
#> \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
#>   \TeXXeTstate=1
#>   \newcommand{\RL}[1]{\beginR #1\endR}
#>   \newcommand{\LR}[1]{\beginL #1\endL}
#>   \newenvironment{RTL}{\beginR}{\endR}
#>   \newenvironment{LTR}{\beginL}{\endL}
#> \fi
#> $endif$
#> $for(header-includes)$
#> $header-includes$
#> $endfor$
#> 
#> $if(title)$
#> \title{$title$$if(thanks)$\thanks{$thanks$}$endif$}
#> $endif$
#> $if(subtitle)$
#> \providecommand{\subtitle}[1]{}
#> \subtitle{$subtitle$}
#> $endif$
#> $if(author)$
#> \author{$for(author)$$author$$sep$ \and $endfor$}
#> $endif$
#> $if(institute)$
#> \institute{$for(institute)$$institute$$sep$ \and $endfor$}
#> $endif$
#> \date{$date$}
#> 
#> \begin{document}
#> $if(title)$
#> \maketitle
#> $endif$
#> $if(abstract)$
#> \begin{abstract}
#> $abstract$
#> \end{abstract}
#> $endif$
#> 
#> $for(include-before)$
#> $include-before$
#> 
#> $endfor$
#> $if(toc)$
#> {
#> $if(colorlinks)$
#> \hypersetup{linkcolor=$if(toccolor)$$toccolor$$else$black$endif$}
#> $endif$
#> \setcounter{tocdepth}{$toc-depth$}
#> \tableofcontents
#> }
#> $endif$
#> $if(lot)$
#> \listoftables
#> $endif$
#> $if(lof)$
#> \listoffigures
#> $endif$
#> $body$
#> 
#> $if(natbib)$
#> $if(bibliography)$
#> $if(biblio-title)$
#> $if(book-class)$
#> \renewcommand\bibname{$biblio-title$}
#> $else$
#> \renewcommand\refname{$biblio-title$}
#> $endif$
#> $endif$
#> \bibliography{$for(bibliography)$$bibliography$$sep$,$endfor$}
#> 
#> $endif$
#> $endif$
#> $if(biblatex)$
#> \printbibliography$if(biblio-title)$[title=$biblio-title$]$endif$
#> 
#> $endif$
#> $for(include-after)$
#> $include-after$
#> 
#> $endfor$
#> \end{document}

关于字体的更多说明

前面我们看到有写关于字体的设置，如setmainfont,setsansfont,setmonofont等。其中的mainfont为正文罗马族，sansfont意为正文无衬线族，monofont为正文等宽族。这些为fontspec包中的设置。另外有一个包叫xeCJK，可以使用texdoc xecjk来查看这个包的文档。这个包由国人维护，内容也是中文的。其中有关于如何查找和使用中文字体的说明。

在西方字体中，分为两大类，Sans Serif和Serif，其中Serif的意思是在笔画起始处有额外的修饰。而Sans Serif字体则是没有修饰的。这样对比起来，Serif就相对于中文的楷体，而Sans Serif就相当于是黑体。其区别如下所示