diff options
Diffstat (limited to 'flow.md')
-rw-r--r-- | flow.md | 57 |
1 files changed, 57 insertions, 0 deletions
@@ -0,0 +1,57 @@ +#### Metadata +Startdate: 2020-05-30 15:51 +References: +Everything on this page, for jq filtering. https://stedolan.github.io/jq/manual/#Basicfilters + + +# Flow + +1. Use gitlablib to list all issue web urls, and then remove all the "build", "buildmodify" and similar CI/CD issues. + + . gitlablib.sh + list_all_issues | tee output/issues.all + <output/issues.all jq '.[]| if(.title|test("build-?(a(ll)?|mod(ify)?|add|del)?$")) then empty else . end | .web_url' | sed -r -e 's/"//g;' > output/issues.all.web_url + + Manually munge the data to put the devuan/devuan-project/issues/20 on top. + +2. Use fetch-issue-webpages.py to fetch all those webpages + + ln -s issues.all.web_url output/files-to-fetch.txt + ./fetch-issue-webpages.py + +3. munge the downloaded html + All of the following is performed by `flow-part2.sh` + + * fix newlines + + sed -i -r -e 's/\\n/\n/g;' /mnt/public/www/issues/*.html + + * find data-original-titles and replace the <time> tag contents with the value of its data-original-title. Also, this will BeautifulSoup pretty-print the html so some of the following commands work correctly. + + ls -1 /mnt/public/www/issues/*.html > output/files-for-timestamps.txt + ./fix-timestamps.py + + * download all relevant images, and then fix them. + + ./fetch-images.sh + sed -i -f fix-images-in-html.sed /mnt/public/www/issues/*.html + + * download all stylesheets and then fix them. + + mkdir -p /mnt/public/www/issues/css + ./fetch-css.sh + sed -i -f fix-css-in-html.sed /mnt/public/www/issues/*.html + + * fix some encoding oddities + + sed -i -f remove-useless.sed /mnt/public/www/issues/*.html + + * remove html components that are not necessary + + remove-useless.py + + * Fix links that point to defunct domain without-systemd.org. + + sed -i -r -f fix-without-systemd-links.sed /mnt/public/www/issues/*.html + + * build some sort of index? |