Knowledge Base

Preserving for the future: Shell scripts, AoC, and more

Sync files and use hardlinks

I had a small project for myself: synchronize a subset of my locally-stored music to my mobile phone. I already have syncthing which is great. The biggest problem is my music collection is way larger than the current storage on my mobile phone. So I wrote some functions to randomly select so many megabytes of my collection, and then sync those in to a separate directory. I thought rsync would do what I want, but I couldn't get it to actually create hardlinks even with the --link-dest=/src/dir/. But I eventually found cp -l which does what I want!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#!/bin/sh
# File: /mnt/public/Support/Programs/syncthing/device18_music-sync.sh
# License: CC-BY-SA 4.0
# Author: bgstack15
# Startdate: 2021-06-26 09:19
# Title: Music Subset Selection for Syncing to device18
# Purpose: Place hardlinks to randomly-selected music directories in a spot for syncthing
# History:
# Usage:
#    This should be run from server1 itself, because of the hardlinks involved.
#    Can use this in cron.
# Reference:
# Improve:
#    make excludes work!
# Documentation: 
#    Rsync
#    Workflow:
#       generate list of X number of inputdirs from INDIR
#       for any outdirs in OUTDIR that are not in the list, unlink all files and then rmdir these expired outdirs.
#       create directories in OUTDIR that match these inputdirs
#       hardlink contents in inputdirs to each outputdir.

INDIR=/var/server1/shares/bgstack15/Music
OUTDIR=/var/server1/shares/public/Music/syncthing/device18_music
EXCLUDE_PATTERNS="*.png;*.jpg;*.txt;*.m3u"
MAXDISKSPACE="2048" # in MB
LOGFILE="/var/server1/shares/public/Support/Systems/server1/var/log/sync-device18_music/$( date "+%F" ).log"
# accept variables DEBUG APPLY

generate_list() {
   # return to stdout a list of directory names underneath INDIR
   # call:
   #    NOT FULLY IMPLEMENTED: generate_list "${INDIR}" "${COUNT}" "${MAXDISKSPACE}"
   full_list="$( cd "${INDIR}" ; find . -maxdepth 1 -mindepth 1 ! -type f -exec du -sxBM {} \; )"
   partial_list="$( echo "${full_list}" | sort --random-sort | awk -v "maxsize=${MAXDISKSPACE}" '{a=a+$1; final=final""$0"\n";if(a>=maxsize){print final;exit;};}' | head -n -1 )"
   echo "${partial_list}" | awk '{$1="";gsub("./","",$2);print;}' | sed -r -e 's/^ //;' | sort
}

sync_selected_folders() {
   # read from stdin the directory names underneath $INDIR
   ssf_input="$( cat )"
   mkdir -p "${OUTDIR}"
   cd "${OUTDIR}"
   rsync_excludes="$( echo "${EXCLUDE_PATTERNS}" | sed -r -e 's/^|;/  --exclude=/g;' )"

   # step 1: remove anything here not in the list
   # which we accomplish by moving every old thing out of the way to a new dir
   mkdir -p "old" ; mv * old/ 2>/dev/null ;

   # step 2: make new dirs
   echo "${ssf_input}" | while read td ;
   do
      if ls -l "old/${td}" 1>/dev/null 2>&1 ;
      then
         test -n "${DEBUG}" && printf '%s\n' "Already exists: ${td}"
         test -n "${APPLY}" && mv "old/${td}" .
      else
         test -n "${DEBUG}" && printf '%s\n' "Copy in: ${td}"
         test -n "${APPLY}" && {
            cp -prl "${INDIR}/${td}" "./${td}"
         }
         # 2021-07-01 12:09 rsync just isn't making hardlinks like I want
         #test -n "${APPLY}" && rsync -rpltgoD -t -H ${DEBUG:+-v} --link-dest="${INDIR}/${td}/" ${EXCLUDE_PATTERNS:+${rsync_excludes}} "${INDIR}/${td}" ./
      fi
   done

   # yes, inverse. We needed everything in ./old/ for the checks above, so now we need to move them back, if we were not applying anything.
   test -z "${APPLY}" && { ls -1 old/* 1>/dev/null 2>&1 && mv "old/"* . ; } ;
   # so now, anything still in the old/ is not on today's list. So time to delete it.
   test -n "${DEBUG}" && { find "./old/" -maxdepth 1 -mindepth 1 -type d -printf '%f\n' 2>/dev/null | sed -r -e "s/^/Removing /;" ; }
   test -n "${APPLY}" && rm -rf "./old/"

   cd "${OLDPWD}"
}

{
   lecho "START device18_music-sync"
   echo "INDIR=${INDIR}"
   echo "OUTDIR=${OUTDIR}"
   echo "DEBUG=${DEBUG}"
   echo "APPLY=\"${APPLY}\""
   echo "LOGFILE=${LOGFILE}"
   list="$( generate_list )"
   echo "${list}" | sync_selected_folders
   lecho "END device18_music-sync"
} 2>&1 | tee -a "${LOGFILE}"

Yes, the lecho is my recently-revamped lecho script, which merely wraps around plecho, which really just uses moreutils' ts utility.

TZ=UTC ts "[%FT%TZ]${USER}@${HOSTNAME}:${@+ $@}"

This script sounds very similar to one I wrote last year for symlinks. Who knows how many times I've redone my work?

Comments