Overview
Gitlab is the curent source of truth for all repos, except for the secure data
stored locally at /mnt/public/packages. To prepare my on-prem git
repos, I need to be be able to synchronize my repositories from the
sources of truth. This process sets up the bare repos on the main git and web
server, and sets up the sync-location git repos on any VM.
Preparing list assets
We need the csv lists that show the old and new locations.
Generating gitlab token
Visit gitlab.com in web browser, sign in, and visit "Edit profile" in user
icon menu. Visit "Access Tokens" in left-side menu, and create a personal
access token that is read-only. Save the token, which will resemble string
MnrEnTVfA-7kujMarjsG
, to file
/mnt/public/work/gitlab/gitlab.com.personal_access_token
.
Generating list of all relevant projects
Use generate-list.sh
to pull the list of all my personal projects from
gitlab. It is not complete, because even Your
projects on gitlab shows 70 projects
total, not the 60 the API returns. So after running that file, manually add
any additional entries that should be synced.
./generate-list.sh > list.csv
# manually add any additional repos to list.csv.
To add the destination links, run add-dest.sh
with redirection in and out.
< list.csv ./add-dest.sh > repos.csv
Preparing main git destinations
With all the lists created, now prepare the final destinations of the
repositories.
Preparing blank repositories on git server
You need to make the blank repositories ahead of time due to how git-over-http
works: It only accepts pushes for extant projects. For this task, run make-
blank.sh
.
time < repos.csv sh -x ./make-blank.sh
On main web server, fix the permissions of these new git repos.
sudo chgrp apache -R /mnt/public/www/git ; sudo chmod g+rwX -R /mnt/public/www/git ;
Setting up sync locations and synchronizing
Now that the destinations are prepared, use a temporary (or at least
alternate) location to pull and then push the repositories.
Initialize sync-location git repos
VM d2-03a is the main implementation of the sync-location. This will be the
system that performs the main work of pulling all git repos and contents down,
and pushing them up to the server.
time OUTDIR=~/dev/sync-git INFILE=/mnt/public/Support/Programs/cgit/populate/repos.csv /mnt/public/Support/Programs/cgit/populate/populate-git-remotes.sh
The above command will need APPLY=1 as a variable when ready for real
execution.
Perform synchronization of all git repos
This is the main operation of this whole process. It could take some time to
execute.
INDIR=~/dev/sync-git /mnt/public/Support/Programs/cgit/populate/sync-all.sh
Build initial permissions list
To restrict push access to all the new repos, run this command, and save its
output inside /etc/git_access.conf
.
find /var/www/git -mindepth 1 -maxdepth 1 -type d -printf '%f\n' | awk '{print "Use Project "$0" \"user bgstack15\" \"all granted\""}' | sort
And of course an httpd -t
and then reload.
Appendix A: File listings
generate-list.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49 |
#!/bin/sh
# Startdate: 2021-04-16 20:16
# Goal: generate csv to stdout of directory,origin,dest
# STEP 1
# Notes:
# the gitlab api doesn't show the "contributed" projects in the API at all, so the output from this is incomplete. The output file will need to be curated with additional entries for projects not from my userspace.
# Dependencies:
# Gitlab token at /mnt/public/work/gitlab/gitlab.com.personal_access_token
# jq
# Reference:
# https://stackoverflow.com/questions/57242240/jq-object-cannot-be-csv-formatted-only-array
# https://stackoverflow.com/questions/32960857/how-to-convert-arbitrary-simple-json-to-csv-using-jq/32965227#32965227
test -z "${TOKEN_FILE}" && TOKEN_FILE="/mnt/public/work/gitlab/gitlab.com.personal_access_token"
test ! -r "${TOKEN_FILE}" && { echo "Fatal! Cannot find token file ${TOKEN_FILE}. Aborted." 1>&2 ; exit 1 ; }
TOKEN="$( cat "${TOKEN_FILE}" )"
echo "${TOKEN}" | grep -qE "token:" && { TOKEN="$( echo "${TOKEN}" | awk '/^token:/{print $NF}' )" ; }
GUSER=bgstack15
# Functions
handle_pagination() {
# call: handle_pagination "https://gitlab.com/api/v4/users/${GUSER}/projects"
# return: stdout: a single json list of of all returned objects from all pages
# GLOBALS USED: TOKEN
___hp_url="${1}"
___hp_next_link="dummy value"
___hp_json=""
___hp_MAX=30 # safety valve
x=0
___hp_thisurl="${___hp_url}"
while test -n "${___hp_next_link}" && test ${x} -lt ${___hp_MAX} ;
do
x=$((x+1))
raw="$( curl --include --header "PRIVATE-TOKEN: ${TOKEN}" "${___hp_thisurl}" )"
set +x ; links="$( echo "${raw}" | awk '/^link:/' )" ; set -x
___hp_next_link="$( echo "${links}" | tr ',' '\n' | sed -n -r -e '/rel="next"/{s/.*<!--/;s/-->;.*$//;p;}' )" # will be blank if next_link is not valid; that is, if this is the last page.
___hp_thisurl="${___hp_next_link}"
set +x ; ___hp_json="${___hp_json}$( echo "${raw}" | awk '/^\[/' )" ; set -x
done
# combine all json lists into one
# ref: https://stackoverflow.com/a/34477713/3569534
echo "${___hp_json}" | jq --compact-output --null-input 'reduce inputs as $in (null; . + $in)'
}
# MAIN
json="$( handle_pagination "https://gitlab.com/api/v4/users/${GUSER}/projects" )"
echo "${json}" | jq '.[] | [{ web_url, path }]' | jq -r '(map(keys) | add | unique) as $cols | map(. as $row|$cols|map($row[.])) as $rows | $cols, $rows[] | @csv' | awk 'NR == 1 {print} NR >1 && !/"web_url"/{print}'
|
list.csv
Just a snippet:
"ansible01","https://gitlab.com/bgstack15/ansible01"
"ansible-ssh-tunnel-for-proxy","https://gitlab.com/bgstack15/ansible-ssh-tunnel-for-proxy"
el7-gnupg2-debmirror/gnupg2,https://gitlab.com/el7-gnupg2-debmirror/gnupg2
el7-gnupg2-debmirror/libksba,https://gitlab.com/el7-gnupg2-debmirror/libksba
el7-gnupg2-debmirror/libassuan,https://gitlab.com/el7-gnupg2-debmirror/libassuan
add-dest.sh
|
#!/bin/sh
# Startdate: 2021-04-17
# STEP 2
# Goal: fix column names, and also parse to add dest column.
test -z "${GIT_URL_BASE}" && GIT_URL_BASE="https://www.example.com/git"
{
# fix column name, and then add the dest link
sed -r -e '1s/web_url/origin/;' | tr -d '"' | \
awk -v "topurl=${GIT_URL_BASE}" -F',' 'BEGIN{OFS=","} NR==1 {print $0",dest"} NR>1 {$NF=$NF","topurl"/"$1;print}'
}
|
repos.csv
The final list asset. Just a snippet:
ansible01,https://gitlab.com/bgstack15/ansible01,https://www.example.com/git/ansible01
ansible-ssh-tunnel-for-proxy,https://gitlab.com/bgstack15/ansible-ssh-tunnel-for-proxy,https://www.example.com/git/ansible-ssh-tunnel-for-proxy
el7-gnupg2-debmirror/gnupg2,https://gitlab.com/el7-gnupg2-debmirror/gnupg2,https://www.example.com/git/el7-gnupg2-debmirror/gnupg2
el7-gnupg2-debmirror/libksba,https://gitlab.com/el7-gnupg2-debmirror/libksba,https://www.example.com/git/el7-gnupg2-debmirror/libksba
el7-gnupg2-debmirror/libassuan,https://gitlab.com/el7-gnupg2-debmirror/libassuan,https://www.example.com/git/el7-gnupg2-debmirror/libassuan
make-blank.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 |
#!/bin/sh
# Startdate: 2021-04-17 16:13
# Goal: given the repos.csv output from STEP 2 add-dest.sh script, make the blank git repos for each of those on the final destination server
# STEP 3
test -z "${GIT_URL_BASE}" && GIT_URL_BASE="https://www.example.com/git"
test -z "${GIT_TOP_DIR}" && GIT_TOP_DIR="/mnt/public/www/git"
cd "${GIT_TOP_DIR}"
# this awk will read stdin, and skip the first line which is the headers for the columns
for word in $( awk -F',' -v "topurl=${GIT_URL_BASE%%/}/" 'NR>1 {gsub(topurl,"",$3);print $3}' ) ;
do
# if OVERWRITE and the dir already exists, then delete it
test -d "${word}" && test -n "${OVERWRITE}" && rm -r "${word}"
# If inside a namespace, then perform a few extra steps.
echo "${word}" | grep -qE "\/" && {
# make any subdirs between here and there
mkdir -p "${word}"
# DISABLED; can just use section-from-path=1 in main cgitrc
## if in a subdir, add a cgitrc file for this repo that indicates its section.
# section="$( echo "${word}" | awk -F'/' 'BEGIN{OFS="/"} {$NF="";print}' )"
# if ! grep -qE "section=.+" "${word}/cgitrc" 2>/dev/null ;
# then
# echo "section=${section%%/}" >> "${word}/cgitrc"
# fi
}
# actually make the blank git repo
git init --bare "${word}" &
done
wait
|
sync-all.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 |
#!/bin/sh
# STEP 5 repeating
# Startdate: 2020-05-21
# Goal: download every single git repository in full from bitbucket cloud for migration to bitbucket on-prem.
# History:
# 2021-04-17 forked from gituser.tgz to Support/Programs/cgit/populate project
# Usage:
# INDIR=~/dev/sync-git
# References:
# git-sync-all.ps1
# Dependencies:
SYNCSCRIPT=/mnt/public/Support/Programs/cgit/populate/sync.sh
if test -z "${INDIR}" || ! test -r "${INDIR}" ;
then
echo "Fatal! Invalid INDIR ${INDIR} which is either absent or unreadable. Aborted" 1>&2
exit 1
fi
cd "${INDIR}"
x=0
for dir in $( find . -maxdepth 2 -mindepth 1 -type d -name '.git' -printf '%h\n' ) ;
do
x=$(( x=x+1 ))
lecho "Starting repo $x ${dir}"
cd "${INDIR}/${dir}"
"${SYNCSCRIPT}"
done
|
sync.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 |
#!/bin/sh
# STEP 5 or manual
# Startdate: 2020-05-21
# Goal: sync just this one directory git repo.
# History:
# 2021-04-17 forked from gituser.tgz to Supoprt/Programs/cgit/populate project
# References:
# How to actually pull all branches from the remote https://stackoverflow.com/questions/67699/how-to-clone-all-remote-branches-in-git/16563327#16563327
# Dependencies:
# $PWD is the git repo in question.
git pull --all
{
git branch -a | sed -n "/\/HEAD /d; /remotes\/origin/p;" | xargs -L1 git checkout -t
} 2>&1 | grep -vE 'fatal:.* already exists'
git push dest --all
|
Comments