Knowledge Base

Preserving for the future: Shell scripts, AoC, and more

Fix Isso comments from old url to new url

While isso has migration steps for Wordpress comments which work well, my url changed. I had to perform some transformations of the comment urls (i.e., which comments show up on which blog post) and comment author links. To facilitate this process, I wrote a python script.

fix-isso-comments.py script

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
#!/usr/bin/env python3
# File: fix-isso-comments.py
# Location: https://gitlab.com/bgstack15/former-gists
# Author: bgstack15
# Startdate: 2021-08-26
# SPDX-License-Identifier: GPL-3.0
# Title: Fix URLs in Isso for Importing Wordpress Blog
# Purpose: Make it possible for a oneliner to fix the virtual paths for comments that isso imported from wordpress
# History:
# Usage:
#    ./fix-isso-comments.py --help
#    # Fix comments for migration from wordpress.com to nikola site at /blog/
#    ./fix-isso-comments.py -a -v --dbfile isso.db -m "" -N "/blog/posts"
#    # Fix pingback website entries
#    ./fix-isso-comments.py -a -v --dbfile isso.db -m "https://bgstack15.wordpress.com/" -N "/blog/posts/" --action pingbacks
# Reference:
# Improve:
#    Write a new function that fixes comments posted by other posts on this blog (which is called "pingback" in Wordpress)
# Documentation:
#    This is partially a general library for sqlite, but only for the one function that changes the values within a single column.
import sqlite3, argparse

# Functions
def dict_factory(cursor, row):
    d = {}
    for idx, col in enumerate(cursor.description):
        d[col[0]] = row[idx]
    return d

def open_sqlite_as_dict(dbfile,table):
   con = sqlite3.connect(dbfile)
   response = []
   # load contents as a list of dicts
   con.row_factory = dict_factory
   cur = con.cursor()
   cur.execute(f"select * from {table}")
   a = 0
   while a is not None:
      a = cur.fetchone()
      if a is not None:
         response.append(a)
   return response, con

def replace_value_in_column_in_sqlite(dbfile,verbose=False,dryrun=True,table="",pk="id",column="",matchvalue="",newvalue=""):
   mylist, con = open_sqlite_as_dict(dbfile,table)
   for item in mylist:
      try:
         item[column]=item[column].replace(matchvalue,newvalue,1)
      except:
         pass
      substr = ""
      for key in item.keys():
         #if key != pk: # only excluding primary key is not enough
         if key == column:
            substr = substr + f", {key} = '{item[key]}'"
      substr=substr.lstrip(", ") # strip leading comma
      command = f"UPDATE {table} SET {substr} WHERE {pk} = {item[pk]} OR {pk} = '{item[pk]}'"
      if verbose:
         print(command)
      if not dryrun:
         con.execute(command)
   if not dryrun:
      print("Applying changes.")
      con.commit()

def fix_thread_urls(dbfile,verbose=False,dryrun=True,matchvalue="",newvalue=""):
   return replace_value_in_column_in_sqlite(
      dbfile=dbfile,
      verbose=verbose,
      dryrun=dryrun,
      table="threads",
      pk="id",
      column="uri",
      matchvalue=matchvalue,
      newvalue=newvalue
   )

def fix_pingbacks(dbfile,verbose=False,dryrun=True,matchvalue="",newvalue=""):
   return replace_value_in_column_in_sqlite(
      dbfile=dbfile,
      verbose=verbose,
      dryrun=dryrun,
      table="comments",
      pk="id",
      column="website",
      matchvalue=matchvalue,
      newvalue=newvalue
   )

# Parse arguments
parser = argparse.ArgumentParser()
parser.add_argument("--dbfile","-d", help="sqlite3 database file")
parser.add_argument("--verbose","-v", action="store_true", help="Verbose.")
dryrun_group = parser.add_mutually_exclusive_group()
dryrun_group.add_argument("--dryrun","--dry-run","-n", action='store_true', help="Dry run only. Default.")
dryrun_group.add_argument("--apply","-a", action='store_true', help="Apply changes.")
parser.add_argument("--match","-m", required=True,help="String within value to be replaced.")
parser.add_argument("--newvalue","--new-value","-N", required=True,help="String to be inserted.")
parser.add_argument("--action","-A",required=False,choices=["threads","pingbacks"],default="threads")

args = parser.parse_args()
print(args)
dbfile = args.dbfile
verbose = args.verbose
dryrun = not args.apply # yes, dryrun = not args.apply does what we want.
matchvalue = args.match
newvalue = args.newvalue
action = args.action

# Main
if "threads" == action:
   fix_thread_urls(
      dbfile=dbfile,
      verbose=verbose,
      dryrun=dryrun,
      matchvalue=matchvalue,
      newvalue=newvalue
   )
elif "pingbacks" == action:
   fix_pingbacks(
      dbfile=dbfile,
      verbose=verbose,
      dryrun=dryrun,
      matchvalue=matchvalue,
      newvalue=newvalue
   )
else:
   print(f"Invalid action: {action}. Aborted.")

The documentation in the script explains how to use it, as well as the previous blog post.

Step 1 is to fix the comment thread urls, so the right comments show up on the blog posts where they belong. Because my previous wordpress instance was served at virtual path / but my new site is served at /blog/, we need to fix every comment url.

/usr/local/bin/fix-isso-comments.py -a -v --dbfile isso.kb2.db -m "" -N "/blog/posts"

Observe how here we add the "/blog/posts" to Isso contents. Any relative links stored in the source files for nikola itself get parsed by nikola and prepended with the baseurl that nikola is configured with. But isso doesn't read that and doesn't care; it needs the full virtual path to each page.

Fix pingback website entries. In Wordpress, a pingback is where a blog post comments on any link that is a Wordpress post. That way, the author of the wordpress blog that was linked to has an option to show the reverse link. It can get spammy, and I assume it was an option, but I had left it on. And so any links I have in the comments need to be updated to the new urls.

/usr/local/bin/fix-isso-comments.py -a -v --dbfile isso.kb2.db -m "https://bgstack15.wordpress.com/" -N "/blog/posts/" --action pingbacks

Comments