My DVD-ripping solution

bgstack15

2021-08-12 09:14

This is a work in progress, but its current status is as follows.

Ripping the DVDs

To rip my DVD collection, use my handbrake wrapper script. My ripping machine uses two disc drives. Run the script once per drive, and save all output to a temporary file.

{ INPUT=/dev/sr0 handbrake.sh ; INPUT=/dev/sr1 handbrake.sh ; } > ~/handbrake1

Inspect the script for correctness. Mine looks like:

HandBrakeCLI --markers --format mkv --min-duration 125 --input /dev/sr0 --output /mnt/public/Video/temp/TNGS2D3_V1.mkv --encoder x264 --rate 30 --native-language eng --title 1 --audio 1,2, --subtitle 1,2,scan --native-language eng --mixdown 6ch --aencoder ffaac
HandBrakeCLI --markers --format mkv --min-duration 125 --input /dev/sr0 --output /mnt/public/Video/temp/TNGS2D3_V2.mkv --encoder x264 --rate 30 --native-language eng --title 2 --audio 1,2, --subtitle 1,2,scan --native-language eng --mixdown 6ch --aencoder ffaac

The handbrake.sh script can be found at the bottom of this post and is explained further there. Run the temp file's commands. I like to do this in a GNU screen session.

time sh -x ~/handbrake1

Adding TV show metadata

Prepare a csv with s, sep, ep, airdate, title, and filename columns.

have,s,ep,sep,title,airdate,filename
1,1,1,01-e02,Encounter at Farpoint,1987-09-28,s01e01-e02 - Encounter at Farpoint
1,1,3,3,The Naked Now,1987-10-05,s01e03 - The Naked Now
1,1,4,4,Code of Honor,1987-10-12,s01e04 - Code of Honor
1,1,5,5,The Last Outpost,1987-10-19,s01e05 - The Last Outpost
1,1,6,6,Where No One Has Gone Before,1987-10-26,s01e06 - Where No One Has Gone Before

This filename output is supposed to be what Jellyfin wants for TV shows. I used the information from Wikipedia and some LibreOffice Calc functions to make this csv file, and some hand-munging for the double-long episodes.

="s" & TEXT(A2,"\00") & "e" & TEXT(C2,"00") & " - " & REGEX($D2,"[,!']","")

I'm a bit obsessive about removing commas, exclamation marks, question marks, and quote marks from my filenames. Inspect the saved files and rename each one to the filename string from the csv that corresponds to the correct episode. To add the mkv metadata including airdate, and next and previous episodes, run the next script on the directory. This does not currently recurse into subdirectories.

time tv-mkv-helper.py --inputcsv "/mnt/public/Video/TV/Star Trek The Next Generation (1987)/STTNG.csv" -d /mnt/public/Video/temp/

Alternatives

Automatic Ripping Machine is just too fully-featured for me.

References

Jellyfin TV show naming scheme

Scripts

handbrake.sh

My handbrake.sh wrapper reads the disc contents to learn the disc title and the number of video tracks longer than 125 seconds.

#!/bin/sh
# startdate: 2021-03-10 11:36
# Alternatives:
#   handbrake preset: /mnt/public/Support/Programs/DVDs/1920x1080.json

test -z "${INPUT}" && INPUT=/dev/sr0 ; export INPUT
test -z "${OUTPUT}" && OUTPUT=/mnt/public/Video/temp ; export OUTPUT

raw="$( HandBrakeCLI --markers --format mkv --min-duration 125 --scan --input "${INPUT}" --output "${OUTPUT}"  --encoder x264 --rate 30 --native-language eng --title 0 2>&1 )"
disctitle="$( echo "${raw}" | sed -n -r -e '/DVD Title:/{s/.*DVD Title: //;p}' )"
array="$( echo "${raw}" | grep -E '^\s*\+' | awk 'BEGIN{x=0} /title [0-9]/{gsub(":","",$NF);x++;title[x]=$NF} /kbps/{audio[x]=audio[x]""$2} !/subtitle tracks:/ && !/\+ [0-9]+,/ {us=0} /subtitle tracks:/{us=1} us == 1 && $2 ~ /[0-9]+,/{subtitle[x]=subtitle[x]""$2} END {for(i in title) {{print title[i],audio[i],subtitle[i]}}}' )"
commands="$( echo "${array}" | awk -v "disctitle=${disctitle}" -v "input=${INPUT}" -v "output=${OUTPUT}" '{a=$1;b=$2;c=$3;print "HandBrakeCLI --markers --format mkv --min-duration 125 --input "input" --output "output"/"disctitle"_V"a".mkv --encoder x264 --rate 30 --native-language eng --title "a" --audio "b" --subtitle "c"scan --native-language eng --mixdown 6ch --aencoder ffaac"}' )" 
echo "${commands}"

My hard-coded values include outputting to mkv, and x264 video encoding, and 30 bps, with English language. I also tell it to use the specific audio encoder ffaac which was the first one I tried when trying to get my 6-channel mixdown (that's 5.1 surround sound) option to work. I'm not a media bitrate snob; just trying to preserve some semblance of fancier file value than I can even consume, should I ever upgrade my viewing equipment in the next 40 years.

tv-mkv-helper.py

#!/usr/bin/env python3
# Startdate: 2021-08-07
# Purpose: given INCSV="/mnt/public/Video/TV/Star Trek The Next Generation 1987)/STTNG.csv"
#    with columns s, ep, sep, title, airdate, filename,
#    update files in provided directory, whose name matches the filename column, with the mkv properties
# Usage:
#    time tv-mkv-helper.py --inputcsv "/mnt/public/Video/TV/Star Trek The Next Generation (1987)/STTNG.csv" -d /mnt/public/Video/temp/
# References:
#    /mnt/public/Support/Programs/DVDs/handbrake-internal.sh
# Dependencies:
#    STTNG.csv sorted ascending, with columns s, sep, title, airdate, filename
#    mkvpropedit

import csv, os, subprocess

###############################################################
def load_file_to_array(inputcsv):
   episode_array = []
   with open(inputcsv,"r") as o:
      csv_reader = csv.reader(o)
      headers = next(csv_reader)
      error_string = ""

      # sanity check for the columns we want
      if "s" not in headers:
         error_string = "Inputcsv needs column named 's' for season number."
      #elif "ep" not in headers:
      #   error_string = "Inputcsv needs column named 'ep' for episode number."
      elif "sep" not in headers:
         error_string = "Inputcsv needs column named 'sep' for season episode number."
      elif "airdate" not in headers:
         error_string = "Inputcsv needs column named 'airdate'."
      elif "filename" not in headers:
         error_string = "Inputcsv needs column named 'filename'."
      if error_string != "":
         print(error_string)
         return -1

      x = 0
      for row in csv_reader:
         x = x + 1
         ep = {key: value for key, value in zip(headers,row)}
         episode_array.append(ep)
   return episode_array

############################################################
def fix_directory_contents(directory, inputcsv):
   os.chdir(directory)
   episodes = load_file_to_array(inputcsv)
   if episodes == -1:
      print("Aborting.")
      return -1
   # simple one-level down method. This might be insufficient someday.
   for filename in os.listdir(directory):
      if filename.endswith(".mkv"):
         print("----------------------")
         print(f"Checking file {filename}")
         filename_sans_ext, _ = os.path.splitext(filename)
         # loop through episodes to find this file
         # need enumerate, per https://stackoverflow.com/questions/1011938/loop-that-also-accesses-previous-and-next-values/1011962#1011962
         l = len(episodes)
         for index, ep in enumerate(episodes):
            previous = next_ = None
            if ep['filename'] == filename_sans_ext:
               if index > 0:
                  previous = episodes[index-1]['filename']
               if index < ( l - 1 ):
                  next_ = episodes[index+1]['filename']
               fix_file(
                  filename=filename,
                  title=ep['filename'],
                  season=ep['s'],
                  seasonep=ep['sep'],
                  next_name=next_,
                  prev_name=previous,
                  airdate=ep['airdate']
               )

############################################
def fix_file(filename,title,season,seasonep,airdate,next_name, prev_name):
   print(f"Please fix {filename} to have")
   print(f"title: {title}")
   print(f"season: {season}")
   print(f"seasonep: {seasonep}")
   print(f"airdate: {airdate}")
   print(f"next_name: {next_name}")
   print(f"prev_name: {prev_name}")
   if next_name is not None and not next_name.endswith(".mkv"):
      next_name = next_name + ".mkv"
   if prev_name is not None and not prev_name.endswith(".mkv"):
      prev_name = prev_name + ".mkv"
   date_string = str(airdate) + "T12:00:00Z" # just set it to noon UTC, so the EST/EDT time will still be the same day.
   #cmd_string = f"mkvpropedit --set 'title={title}' --set 'date={date_string}' --set 'segment-filename={filename}.mkv' --set 'prev-filename={prev_name}' --set 'next-filename={next_name}'"
   cmd_array = ["mkvpropedit","--set",f'title={title}',"--set",f'date={date_string}',"--set",f'segment-filename={filename}',"--set",f'prev-filename={prev_name}',"--set",f'next-filename={next_name}',filename]
   result = subprocess.run(cmd_array)
   print(result)
   aux_cmd_array=["mkvpropedit",filename,"--delete"]
   if next_name is None or next_name == "":
      aux_cmd_array.append("next-filename")
   if prev_name is None or prev_name == "":
      aux_cmd_array.append("prev-filename")
   # if we added anything to the auxiliary command array, then run it
   if aux_cmd_array[-1] != "--delete":
      result = subprocess.run(aux_cmd_array)
      print(result)

# argparse stuff
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-i","--inputcsv",required=True,help="Episode CSV, with columns s, sep, title, filename, airdate")
parser.add_argument("-d","--directory",required=True,help="Path to directory where files should be given metadata")
args = parser.parse_args()

inputcsv = args.inputcsv
directory = args.directory

if __name__ == "__main__":
   print(f"running with inputcsv=\"{inputcsv}\", directory={directory}")
   fix_directory_contents(directory,inputcsv)

In a departure from my normal style where I put all the functions in a library and call it, I wanted a self-contained script with the argument parsing. It was quick and dirty. And all I'm doing is adding the 3 or 4 mkv attributes to the mkv files whose filename matches an entry in the csv. I do all this using a custom csv, because I'm too lazy to learn how to use some fancy TheTVDB api or whatever. I'm obsessive enough to be fine with manually scraping metadata from Wikipedia tv show episode lists.

Knowledge Base