diff options
author | B Stack <bgstack15@gmail.com> | 2020-06-16 13:40:39 -0400 |
---|---|---|
committer | B Stack <bgstack15@gmail.com> | 2020-06-16 13:40:39 -0400 |
commit | 0ca931489f98b65e1025a4c4f00ae9eb8484dc27 (patch) | |
tree | 231f9fa86e40a62ef59ca0cc1ba81d52086b3597 /flow.md | |
parent | initial commit (diff) | |
download | glip-0ca931489f98b65e1025a4c4f00ae9eb8484dc27.tar.gz glip-0ca931489f98b65e1025a4c4f00ae9eb8484dc27.tar.bz2 glip-0ca931489f98b65e1025a4c4f00ae9eb8484dc27.zip |
add minor fixes and major image fixes
perform better unicode removal/conversion
fix image retrieval, including svg and minor graphics assets
remove even more html elements not necessary for archival display
Diffstat (limited to 'flow.md')
-rw-r--r-- | flow.md | 22 |
1 files changed, 15 insertions, 7 deletions
@@ -24,27 +24,27 @@ Everything on this page, for jq filtering. https://stedolan.github.io/jq/manual/ * fix newlines - sed -i -r -e 's/\\n/\n/g;' /mnt/public/www/issues/*.html + sed -i -r -e 's/\\n/\n/g;' /mnt/public/www/gitlab-issues/*.html * find data-original-titles and replace the <time> tag contents with the value of its data-original-title. Also, this will BeautifulSoup pretty-print the html so some of the following commands work correctly. - ls -1 /mnt/public/www/issues/*.html > output/files-for-timestamps.txt + ls -1 /mnt/public/www/gitlab-issues/*.html > output/files-for-timestamps.txt ./fix-timestamps.py * download all relevant images, and then fix them. ./fetch-images.sh - sed -i -f fix-images-in-html.sed /mnt/public/www/issues/*.html + sed -i -f fix-images-in-html.sed /mnt/public/www/gitlab-issues/*.html * download all stylesheets and then fix them. - mkdir -p /mnt/public/www/issues/css + mkdir -p /mnt/public/www/gitlab-issues/css ./fetch-css.sh - sed -i -f fix-css-in-html.sed /mnt/public/www/issues/*.html + sed -i -f fix-css-in-html.sed /mnt/public/www/gitlab-issues/*.html * fix some encoding oddities - sed -i -f remove-useless.sed /mnt/public/www/issues/*.html + sed -i -f remove-useless.sed /mnt/public/www/gitlab-issues/*.html * remove html components that are not necessary @@ -52,6 +52,14 @@ Everything on this page, for jq filtering. https://stedolan.github.io/jq/manual/ * Fix links that point to defunct domain without-systemd.org. - sed -i -r -f fix-without-systemd-links.sed /mnt/public/www/issues/*.html + sed -i -r -f fix-without-systemd-links.sed /mnt/public/www/gitlab-issues/*.html + + * Perform final encoding conversion to remove any remaining broken characters + + ./conversion.sh /mnt/public/www/gitlab-issues/*.html + + * Fix some images that have a src="data:" that do not load, but the data-src property is the proper link + + ./use-datasrc-instead-src.py * build some sort of index? |