Knowledge Base

Preserving for the future: Shell scripts, AoC, and more

Migration plan from Wordpress to nikola

Overview

In August 2021, I started implementing a new website powered by nikola static site generator to replace my wordpress site which I have run for over 5 years. My new site is hosted at https://bgstack15.ddns.net/blog/.

Architecture

Systems involved:

  • server1a: nikola site generator logic
  • doc7-01a: isso comment server and serves the output of nikola

Server1a runs the nikola logic, and then sends the contents up to doc7-01a which is the droplet. I exported the wordpress contents, and then nikola imported its parts, and isso imported its parts.

Building the Blog

Prepare server1

Created user blog and initialize nikola site generator.

sudo useradd blog -s /bin/bash
sudo usermod -a -G nginx blog
sudo su blog
cd
python3 -m venv nikola
cd nikola
source bin/activate
bin/python -m pip install -U pip setuptools wheel
bin/python -m pip install -U "Nikola[extras]"
bin/python -m pip install -U "html2text" # important for wordpress html-to-markdown

Modify ~blog/.bashrc with these contents at the end:

source ~/nikola/bin/activate
cd /mnt/public/Support/Programs/nikola/kb2

Prepare an ssh key for use to the service account blog on the web server.

ssh-keygen

Installation directions taken directly from the Getting started page of Nikola handbook.

Export Wordpress contents

To populate the blog with my existent contents, we need to extract them from Wordpress.

Export everything except media library

In Wordpress wp-admin, visit Tools -> Export: https://bgstack15.wordpress.com/wp-admin/export.php Select "All content" and then "Download Export file". It emails a link, and the output file is /mnt/bgstack15/Backups/Blog/bgstack15.wordpress.com-2021-08-29-18_22_23-yx1jpksw0pvvlunqoxnly4k3zsy7mtly.zip.

Export media library

For some reason it is incredibly difficult in Wordpress to export the media library. The page is https://wordpress.com/export/bgstack15.wordpress.com. Select Export media library -> Download. The file is saved as /mnt/bgstack15/Backups/Blog/media-export-77328224-from-0-to-2775_2021-08-26.tar.gz.

Import wordpress contents to nikola

Make a directory on server1 that will store all the source files for the blog.

mkdir /mnt/public/Support/Programs/nikola/kb2
chmod o=rwX kb2

Extract the xml file for import.

7za x bgstack15.wordpress.com-2021-08-29-18_22_23-yx1jpksw0pvvlunqoxnly4k3zsy7mtly.zip
mv bgstack15.wordpress.com-2021-08-29-18_22_19/*xml .
rmdir bgstack15.wordpress.com-2021-08-29-18_22_19/

Switch to user blog and then import the file.

sudo su blog
# the user profile should already run `source ~/nikola/bin/activate`
# the user profile should already cd /mnt/public/Support/Programs/nikola/
time nikola import_wordpress -o kb2 --squash-newlines --html2text --export-comments --one-file knowledgebase.wordpress.2021-08-29.001.xml

Manually modify conf.py, which is stored in /mnt/public/Support/Programs/nikola/initial/conf.py.

One of the changes in the file is the theme. I have customized the bootblog-jinja theme and made it my own, knowledgebase. Copy in /mnt/public/Support/Programs/nikola/initial/themes to the kb2/ path.

cp -pr /mnt/public/Support/Programs/nikola/initial/themes /mnt/public/Support/Programs/nikola/kb2/

Prepare the output directory.

OUTDIR=/mnt/public/Support/Programs/nikola/blog/
mkdir -p "${OUTDIR}" ; sudo chown blog.admins "${OUTDIR}" ; chmod 0775 "${OUTDIR}"

Tweak the markdown contents to fix some broken links (due to newline characters in long links).

Update local links within post contents. Note how /blog was not needed here! Nikola interprets the relative links as relative to the top-level dir in conf.py.

time sed -i -r -e 's@https://bgstack15.wordpress.com/@/posts/@g;' $( grep --include '*md' -l -riIE 'https://bgstack15.wordpress.com/' posts )

Fix references to a custom wordpress domain for the files.

sed -i -r -e 's@https://bgstack15.files.wordpress.com/@/@g;' $( grep -l --exclude-dir 'cache' -riIE 'https://bgstack15\.files\.wordpress\.com' posts )

Fix links that were broken across multiple lines by html2md process of nikola wordpress migration action

time sed -i -r -e ':a;/\]\([^\)]+-$/{N;s/\n//;:ba;}' $( grep --include '*md' -l -riIE '\]\([^\)]+-$' posts )

Fix more broken links.

sed -i -r -e ':a;/]\(/{/-$/{N;s/\n//;:ba}}' $( grep -l --exclude-dir 'cache' -riIE ']\(.*-$' posts )

Run the initial buildout.

sudo su blog # which should already source the venv, and cd to kb2/
nikola build

The contents are ready to be deployed, but the destination web server needs configuration.

Prepare doc7-01a

On the web server doc7-01a, establish a user, blog.

Set up the account and isso. We have to use a python virtual environment for isso because that is the only way to make isso work on CentOS 7; the native python seems to not operate correctly with isso.

sudo useradd -s /bin/bash blog
sudo su blog
# as user blog:
python3 -m venv isso
cd isso
source bin/activate
bin/python -m pip install -U isso

Load in the public ssh key for service account blog from server1.

Fix some localization within isso files:

# still as user blog:
cd ~/isso
sed -i -r -e 's/One Comment/1 Comment/g;' $(  grep -l -riE 'One Comment' )`

Establish the isso config file /home/blog/iso.kb2.conf:

[general]
dbpath = /home/blog/isso.kb2.db
host = https://bgstack15.ddns.net/blog/
[server]
listen = http://127.0.0.1:8080/
public-endpoint = https://bgstack15.ddns.net/isso/

Establish file /etc/systemd/system/isso.service from [reference 7][7].

[Unit]
Description=Isso Comment Server
After=network.service
Before=network.target
# Ref: /mnt/public/Support/Platforms/vps/vps.action

[Service]
Type=simple
User=blog
WorkingDirectory=/home/blog
ExecStart=/home/blog/isso/bin/isso -c /home/blog/isso.kb2.conf
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
Import wordpress comments to isso

Copy the wordpress .xml file up to the web server (not shown here). Import the wordpress comments to the isso database.

sudo su blog -s /bin/bash
~/isso/bin/isso -c /home/blog/isso.kb2.conf import /tmp/knowledgebase.wordpress.2021-08-29.001.xml

My intial import had 137 threads and 281 comments.

Modify the threads' URLs to work within the new blog address. A custom python script was developed for this operation, scripts/fix-isso-comments.py. Run these following commands as user blog.

Fix comments for migration from wordpress.com to nikola site at /blog/

/usr/local/bin/fix-isso-comments.py -a -v --dbfile isso.kb2.db -m "" -N "/blog/posts"

Fix pingback website entries.

/usr/local/bin/fix-isso-comments.py -a -v --dbfile isso.kb2.db -m "https://bgstack15.wordpress.com/" -N "/blog/posts/" --action pingbacks

Isso is ready to run, with corrected content, so test it to ensure correctness.

sudo systemctl daemon-reload
sudo systemctl start isso

Ensure that port 8080 is listening, and that you can pull isso contents. Isso only responds to hostname requests that match what is configured, so you have to use the --resolve parameter so curl sends that hostname.

sudo netstat -tplnu | grep 8080
curl -L --resolve bgstack15.ddns.net:8080:127.0.0.1 http://bgstack15.ddns.net/isso/js/embed.min.js
Modify nginx

Nginx is used on doc7-01a primarily for this blog, but it does have other uses. This post might not contain all the nginx settings in use. Nginx was modified with the certbot logic to use https and is not described in detail here. The CentOS 7 default nginx.conf includes all files from glob default.d/*.conf.

Establish file /etc/nginx/default.d/isso.conf:

location / {
   root /var/www;
}

location /isso {
   proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_set_header X-Script-Name /isso;
   proxy_set_header Host $host;
   proxy_set_header X-Forwarded-Proto $scheme;
   proxy_pass http://127.0.0.1:8080;
}

The path /var/www/blog will contain the contents of the blog directory from server1.

Additional assets deployed at initialization time.

Files on server1a:

Files on doc7-01a:

The generate-tag-cloud.sh script produces the tagcloud.html and .css files used in the sidebar of every page. The deploy-part2 script calls this one.

Operations

These tasks are expected to be run in the future and/or on a recurring basis.

Writing a new post

Nikola has its own command for making a new post, where it builds a mostly empty file with a few metadata fields.

nikola new_post

Or you can manually make a new markdown file in the kb2/posts/YYYY/MM/DD directory. Just inspect any extant markdown file for the metadata fields to use.

Adding media

Place files underneath directory kb2/blog/files/, normally in a YYYY/MM/DD directory that matches the post that first uses the image.

Refer to these images with this markdown snippet, but html can also be used directly.

[![](/2021/08/30/bgstack15_logo.png)](/2021/08/30/bgstack15_logo.png)

Deploying site

On any system on the internal network, run this script as your regular login user. It will connect to server1 and run any necessary commands.

/mnt/public/Support/Programs/nikola/scripts/build.sh

And now deploy to the web server.

/mnt/public/Support/Programs/nikola/scripts/deploy.sh

References

  1. [internal file]
  2. [internal file]
  3. original blog
  4. [internal file]
  5. Isso comment server
  6. Nikola static site generator
  7. systemd service file for isso

Some other netizens migrated to nikola from Wordpress, or use nikola:

  1. https://anteru.net/blog/2016/moving-from-wordpress-to-nikola/index.html
  2. https://baptiste-wicht.com/posts/2014/03/migrated-from-wordpress-to-nikola.html
  3. Martin Wimpress of Ubuntu fame
  4. unit193 Xubuntu dev who has interacted with me personally on irc. He uses nikola and that's how I discovered it.

Alternatives

  1. My original site, which is the wordpress one.
  2. List of comment solutions for static sites

Additional posts in this series

  1. /posts/2021/09/07/fix-isso-comments-from-old-url-to-new-url
  2. /posts/2021/09/11/automatic-build-script-for-my-static-site
  3. /posts/2021/09/15/deploy-my-nikola-site
  4. /posts/2021/09/19/scheduling-my-blog-updates
  5. /posts/2021/09/23/deploy-my-nikola-site-part-2
  6. /posts/2021/09/27/generate-tag-cloud-for-static-site

Comments