Knowledge Base

Preserving for the future: Shell scripts, AoC, and more

TheMovieDb: Generate my Custom CSV for a show

I've discussed before how I use a CSV to help apply metadata to media files. One of my problems with just manually munging the Wikipedia list of TV show episodes is that TheMovieDB organizes episodes a little differently. Sometimes a two-parter is listed as a single episode, and so on. All minor things, but my episode number can get off which affects my filenames and metadata.

So I decided to investigate using TheMovieDb to generate my CSV/spreadsheet. I found a great python library for tmdbv3api. I think there's multiple python libraries for this, but they all get to the same thing: the contents of the show details.

Here's my little wrapper script: files/2023/04/listings/eplib.py (Source)

#!/usr/bin/env python3
# File: eplib.py
# Location: /mnt/public/Support/Programs/themoviedb
# Author: bgstack15
# Startdate: 2023-04-19-4 13:51
# Title: Episode CSV Generator
# Purpose: facilitate my csv list of episodes of a TV show
# History:
# Usage:
#    source ~/venv1/bin/activate ; cd /mnt/public/Support/Programs/themoviedb/
#    python3
#    >>> import importlib, eplib
# References:
#    1. https://pypi.org/project/tmdbv3api/
#    2. https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-normalize-in-a-python-unicode-string
# Improve:
# Dependencies:
#    One time:
#       python3 -m venv ~/venv1
#       source ~/venv1/bin/activate
#       pip3 install tmdbv3api
from tmdbv3api import TMDb, TV, Season
import os
tmdb = TMDb()
tmdb.language = "en"
tmdb.debug = True
tv = TV()
def set_api_key(filename = None):
   """ Given a filename (or use hardcoded default file if None), load api key from file. """
   if filename is None:
      filename = "/mnt/public/Support/Programs/themoviedb/apikey"
   try:
      with open(filename,"r") as o:
         tmdb.api_key = o.read().rstrip('\n')
   except:
      return -1
   return 0
### Ripped from ref 2
from unicodedata import combining, normalize
LATIN = "ä  æ  ǽ  đ ð ƒ ħ ı ł ø ǿ ö  œ  ß  ŧ ü "
ASCII = "ae ae ae d d f h i l o o oe oe ss t ue"
def remove_diacritics(s, outliers=str.maketrans(dict(zip(LATIN.split(), ASCII.split())))):
    return "".join(c for c in normalize("NFD", s.translate(outliers)) if not combining(c))
### end ref 2
# bgstack15 function
def filename_safifier(s):
   """ Cruddy detox for myself. Should consider using my customized detoxrc """
   return remove_diacritics(s.replace("'","").replace('"',"").replace("...","").replace(",","").replace(":","_").replace("!","").replace(".","").replace("?","").replace("&","and").replace("/","_"))
def get_tv_details(show_id):
   """ Uses web connection to get details of the item, either id number or show name """
   if show_id.isdigit():
      details = tv.details(show_id)
   else:
      show_temp = get_tv_search(show_id)
      try:
         details = tv.details(show_temp[0]["id"])
      except:
         print("Invalid TV show name. Found these names though:")
         for i in show_temp:
            print(f"id {i['id']}: {i['name']}")
         return -1
   return details
def get_tv_search(show_id):
   """ Uses web connection to search for item, probably the name of the show """
   return tv.search(show_id)
def get_episodes_for_show(show_id = "", include_specials = False):
   """
   Given a show id (can be show_id integer, show name string, or a TV().details object), return useful list of episodes. If given show_id is a TV().details object, use that to avoid all the web calls. Example:
      a = TV().details(1855)
      get_episodes_for_show(a)
      OR
      a = get_tv_details("Star Trek Deep Space Nine")
      get_episodes_for_show(a)
   If include_specials, then also list episode titles and airdates for these special features but they still are not counted as absolute episode numbers.
   """
   if type(show_id).__name__ == "AsObj":
   #if show_details is not None:
      show = show_id
   else:
      if show_id.isdigit():
      # IMPROVE: accept show name?
         show = get_tv_details(show_id)
      else:
         show_temp = get_tv_search(show_id)
         try:
            show = get_tv_details(show_temp[0]["id"])
         except:
            print("Invalid TV show name. Found these names though:")
            for i in show_temp:
               print(f"id {i['id']}: {i['name']}")
            return -1
   #print(show)
   name = f"{show['name']} ({show['first_air_date'][0:4]})"
   season = Season()
   seasons = show["seasons"]
   print(f"Name: {name}")
   show_id = show["id"]
   abs_epnum = 0
   abs_epnum_including_specials = 0
   for s in show["seasons"]:
      snum = s["season_number"]
      season_eps = season.details(show_id, snum).episodes
      for e in season_eps:
         abs_epnum_including_specials += 1
         if snum != 0:
            abs_epnum += 1
         if include_specials or snum != 0:
            sep = e["episode_number"]
            epname = e["name"].replace("(1)","Part 1").replace("(2)","Part 2").replace("(3)","Part 3")
            if '"' in epname:
               epname = "'" + epname + "'"
            if ',' in epname and not '"' in epname:
               epname = '"' + epname + '"'
            airdate = e["air_date"]
            filename = filename_safifier(f"s{snum:02d}e{sep:02d} - {epname}")
            #print(f"{snum},{abs_epnum},{sep},{epname},{airdate},")
            if abs_epnum_including_specials == 1:
               print("s,ep,sep,epname,airdate,filename,")
            print(f"{snum:02d},{abs_epnum:03d},{sep:02d},{epname},{airdate},{filename},")
            #print("%02d,%03d,%02d,%s,%s," % (snum, abs_epnum, sep, epname, airdate))
# Run this regardless of __main__
set_api_key(os.environ.get("TMDB_APIKEY"))

So now I can get the output I want!

Comments