Generating PDF using pandoc in GitHub actions
introduction
Today, I wanted to fix the bug with changing the toml to yaml script and
try to use GitHub actions to generate PDFs on my website.
Fixing the bug with changing toml to yaml
In the previous code that I had for changing toml frontmatters to yaml frontmattes
I was reading only the top 20 lines.
To be honest, it wasn’t optimal at all, and I faced some bugs.
For example, in my code snippets, I had some =, which was converted to : , and this is not acceptable at all.
So, at first, I reverted those comments that were affected by my code.
Then I changed the code as follows:
from pathlib import Path
def main():
p = Path("site/content")
for path in p.glob("**/*.md"):
print(path)
state = 0
with open(path) as f:
content = f.read()
lines = content.split("\n")
for i in range(len(lines)):
if state == 1:
lines[i] = lines[i].replace(" = ", ": ")
if "+++" in lines[i]:
lines[i] = lines[i].replace("+++", "---")
state += 1
if state == 2:
break
result = "\n".join(lines)
with open(path, "w") as f:
f.write(result)
if __name__ == '__main__':
main()
Now, I have a variable called state.
Every time that I see +++, state increments by 1.
So, it helps me to change only the parts surrounded by +++.
After that, I ran my code again and fixed the parts that were affected.
Test GitHub workflow with act
Now that I have a way to generate PDFs, I thought it would be a good idea to write a GitHub workflow
for that.
I wanted to test my GitHub workflows on my local environment.
After some research, I found out that there is a package called act that we can use to test
our GitHub workflows.
So, I installed it using the code below:
brew install act
And to test all the workflows, I can use this code:
act
If I want to reuse the container that has been built and not install all the packages again, I can use the code below:
act --reuse
Also, for a specific workflow, we can use -W option, like below:
act -W .github/workflows/workflow.yml --reuse
Write a separate GitHub Workflow
At first, I wrote a separate file to generate PDFs. My thought was that I would generate them for the public part of my site, so those two can run separately. I have had a workflow like this:
name: Build PDFs with Pandoc
on:
push:
branches:
- main
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install Pandoc
run: |
sudo apt-get update
sudo apt-get install -y \
pandoc \
texlive-latex-base \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-extra \
texlive-xetex \
librsvg2-bin
sudo apt-get install -y imagemagick
- name: Convert each md to pdf
run: |
bash convert_to_pdf_pandoc/convert_each_md_to_pdf.sh
At first, I installed all the necessary things for pandoc and latex.
After that, I run the script (convert_each_md_to_pdf.sh) that I modified to work with GitHub actions.
Here is the content of convert_each_md_to_pdf.sh:
#!/bin/bash
current_dir=$(dirname "$(realpath "$0")")
dir_path=$(dirname "${current_dir}")/site/content
header_path="$current_dir"/header.tex
date_format_path="$current_dir"/date-format.lua
find "$dir_path" -name "index.md" -print0 | while IFS= read -r -d $'\0' file; do
should_delete=("pandoc.md")
parent_file=$(dirname "$file")
destination_path=$(echo "$parent_file" | sed "s:/site/content/:/site/public/pdf/:g" )
if [ ! -d "$destination_path" ]; then
mkdir -p "$destination_path"
fi
echo "$parent_file"
cd "$parent_file" || exit
shopt -s nullglob
for img in *.webp; do
convert "$img" "${img%.webp}.png"
should_delete+=("${img%.webp}.png")
done
for img in *.gif; do
convert "${img}[0]" "${img%.gif}.png"
should_delete+=("${img%.gif}.png")
done
shopt -u nullglob
sed "s:\.webp:\.png:g" index.md > pandoc.md
sed -i "s:\.gif:\.png:g" pandoc.md
pandoc \
pandoc.md \
-o "$destination_path"/output.pdf \
--pdf-engine=xelatex \
--metadata author="Ramin Zarebidoky (LiterallyTheOne)" \
--include-in-header="$header_path" \
--highlight-style=tango \
--lua-filter "$date_format_path"
for x in "${should_delete[@]}"; do
rm "$x"
done
done
I have made some changes to this file. The most important ones are:
- that I used
"$0"instead of.- To give it the ability to run from every path
- I changed
magicktoconvert- because the latest version that I could install on Docker was
6.9
- because the latest version that I could install on Docker was
But unfortunately, my thought that these two can work separately didn’t work.
So, at first, I changed the destination_path to generate PDFs in static instead of public.
Then, I observed that when public folder is being uploaded for GitHub Pages, PDFs can’t be found.
Also, the process of installing LaTeX on Ubuntu Docker takes too long.
Make a container for pandoc
To fix the problem of installing Latex each time, I thought it would be a good idea to
pre-build a Docker image and upload it.
So, I only pull that image instead of downloading and building LaTeX every time.
After some research, I came up with the Dockerfile with the content below:
FROM ubuntu:latest
RUN apt-get update && \
apt-get install -y \
pandoc \
texlive-latex-base \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-extra \
texlive-xetex \
librsvg2-bin \
imagemagick && \
rm -rf /var/lib/apt/lists/*
CMD ["tail", "-f", "/dev/null"]
This Dockerfile installs everything needed for me.
Now, I only need to build it and upload it somewhere that I can use.
I found out that GHCR (GitHub Container Registry) is the best place.
So, I built my image using the code below:
docker build -t ghcr.io/literallytheone/pandoc-builder:latest .
Then I logged in and uploaded my Docker image using the code below:
echo $GITHUB_TOKEN | docker login ghcr.io -u literallytheone --password-stdin
docker push ghcr.io/literallytheone/pandoc-builder:latest
Now, I should change my workflow to use this container. I can do this as follows:
runs-on: ubuntu-latest
container:
image: ghcr.io/literallytheone/pandoc-builder:latest
The problem that I have right now is that it can be pulled by GitHub Actions, but
it says that it is not running correctly.
I don’t have a problem running it with act on my MacBook.
So, I should debug it tomorrow.
Final thoughts
Github Actions is a super useful tool for CI/CD.
With act, we can test our Workflow before using GitHub Action, which eliminates so
many bugs and saves so much time.
I was installing so many packages in the ubuntu:latest container, which was taking so long.
I decided to use a pre-built container, and I faced some bugs in Github Actions which I’m going to resolve
as soon as possible.
