Generating PDF using pandoc in GitHub actions

introduction

Today, I wanted to fix the bug with changing the toml to yaml script and try to use GitHub actions to generate PDFs on my website.

Fixing the bug with changing toml to yaml

In the previous code that I had for changing toml frontmatters to yaml frontmattes I was reading only the top 20 lines. To be honest, it wasn’t optimal at all, and I faced some bugs. For example, in my code snippets, I had some =, which was converted to : , and this is not acceptable at all. So, at first, I reverted those comments that were affected by my code. Then I changed the code as follows:

from pathlib import Path


def main():
    p = Path("site/content")

    for path in p.glob("**/*.md"):
        print(path)

        state = 0

        with open(path) as f:
            content = f.read()
            lines = content.split("\n")
            for i in range(len(lines)):
                if state == 1:
                    lines[i] = lines[i].replace(" = ", ": ")

                if "+++" in lines[i]:
                    lines[i] = lines[i].replace("+++", "---")
                    state += 1

                if state == 2:
                    break

            result = "\n".join(lines)
        with open(path, "w") as f:
            f.write(result)


if __name__ == '__main__':
    main()

Now, I have a variable called state. Every time that I see +++, state increments by 1. So, it helps me to change only the parts surrounded by +++. After that, I ran my code again and fixed the parts that were affected.

Test GitHub workflow with act

Now that I have a way to generate PDFs, I thought it would be a good idea to write a GitHub workflow for that. I wanted to test my GitHub workflows on my local environment. After some research, I found out that there is a package called act that we can use to test our GitHub workflows. So, I installed it using the code below:

brew install act

And to test all the workflows, I can use this code:

act

If I want to reuse the container that has been built and not install all the packages again, I can use the code below:

act --reuse

Also, for a specific workflow, we can use -W option, like below:

act -W .github/workflows/workflow.yml --reuse

Write a separate GitHub Workflow

At first, I wrote a separate file to generate PDFs. My thought was that I would generate them for the public part of my site, so those two can run separately. I have had a workflow like this:

name: Build PDFs with Pandoc

on:
  push:
    branches:
      - main

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Install Pandoc
        run: |
          sudo apt-get update
          sudo apt-get install -y \
          pandoc \
          texlive-latex-base \
          texlive-fonts-recommended \
          texlive-fonts-extra \
          texlive-latex-extra \
          texlive-xetex \
          librsvg2-bin
          sudo apt-get install -y imagemagick

      - name: Convert each md to pdf
        run: |
          bash convert_to_pdf_pandoc/convert_each_md_to_pdf.sh

At first, I installed all the necessary things for pandoc and latex. After that, I run the script (convert_each_md_to_pdf.sh) that I modified to work with GitHub actions. Here is the content of convert_each_md_to_pdf.sh:

#!/bin/bash


current_dir=$(dirname "$(realpath "$0")")

dir_path=$(dirname "${current_dir}")/site/content

header_path="$current_dir"/header.tex
date_format_path="$current_dir"/date-format.lua


find "$dir_path" -name "index.md" -print0 | while IFS= read -r -d $'\0' file; do

    should_delete=("pandoc.md")

    parent_file=$(dirname "$file")

    destination_path=$(echo "$parent_file" | sed "s:/site/content/:/site/public/pdf/:g" )
    if [ ! -d "$destination_path" ]; then
      mkdir -p "$destination_path"
    fi

    echo "$parent_file"
    cd "$parent_file" || exit

    shopt -s nullglob

    for img in *.webp; do
        convert "$img" "${img%.webp}.png"
        should_delete+=("${img%.webp}.png")
    done

    for img in *.gif; do
        convert "${img}[0]" "${img%.gif}.png"
        should_delete+=("${img%.gif}.png")
    done

    shopt -u nullglob

    sed "s:\.webp:\.png:g" index.md > pandoc.md
    sed -i "s:\.gif:\.png:g" pandoc.md

    pandoc \
        pandoc.md \
        -o "$destination_path"/output.pdf \
        --pdf-engine=xelatex \
        --metadata author="Ramin Zarebidoky (LiterallyTheOne)" \
        --include-in-header="$header_path" \
        --highlight-style=tango \
        --lua-filter "$date_format_path"

    for x in "${should_delete[@]}"; do
      rm "$x"
    done

done

I have made some changes to this file. The most important ones are:

  • that I used "$0" instead of .
    • To give it the ability to run from every path
  • I changed magick to convert
    • because the latest version that I could install on Docker was 6.9

But unfortunately, my thought that these two can work separately didn’t work. So, at first, I changed the destination_path to generate PDFs in static instead of public. Then, I observed that when public folder is being uploaded for GitHub Pages, PDFs can’t be found. Also, the process of installing LaTeX on Ubuntu Docker takes too long.

Make a container for pandoc

To fix the problem of installing Latex each time, I thought it would be a good idea to pre-build a Docker image and upload it. So, I only pull that image instead of downloading and building LaTeX every time. After some research, I came up with the Dockerfile with the content below:

FROM ubuntu:latest

RUN apt-get update && \
    apt-get install -y \
        pandoc \
        texlive-latex-base \
        texlive-fonts-recommended \
        texlive-fonts-extra \
        texlive-latex-extra \
        texlive-xetex \
        librsvg2-bin \
        imagemagick && \
    rm -rf /var/lib/apt/lists/*

CMD ["tail", "-f", "/dev/null"]

This Dockerfile installs everything needed for me. Now, I only need to build it and upload it somewhere that I can use. I found out that GHCR (GitHub Container Registry) is the best place. So, I built my image using the code below:

docker build -t ghcr.io/literallytheone/pandoc-builder:latest .

Then I logged in and uploaded my Docker image using the code below:

echo $GITHUB_TOKEN | docker login ghcr.io -u literallytheone --password-stdin
docker push ghcr.io/literallytheone/pandoc-builder:latest

Now, I should change my workflow to use this container. I can do this as follows:

runs-on: ubuntu-latest
container:
  image: ghcr.io/literallytheone/pandoc-builder:latest

The problem that I have right now is that it can be pulled by GitHub Actions, but it says that it is not running correctly. I don’t have a problem running it with act on my MacBook. So, I should debug it tomorrow.

Final thoughts

Github Actions is a super useful tool for CI/CD. With act, we can test our Workflow before using GitHub Action, which eliminates so many bugs and saves so much time. I was installing so many packages in the ubuntu:latest container, which was taking so long. I decided to use a pre-built container, and I faced some bugs in Github Actions which I’m going to resolve as soon as possible.

© 2025 LiterallyTheOne