summaryrefslogtreecommitdiff
path: root/flow.md
diff options
context:
space:
mode:
authorB Stack <bgstack15@gmail.com>2020-06-16 13:40:39 -0400
committerB Stack <bgstack15@gmail.com>2020-06-16 13:40:39 -0400
commit0ca931489f98b65e1025a4c4f00ae9eb8484dc27 (patch)
tree231f9fa86e40a62ef59ca0cc1ba81d52086b3597 /flow.md
parentinitial commit (diff)
downloadglip-0ca931489f98b65e1025a4c4f00ae9eb8484dc27.tar.gz
glip-0ca931489f98b65e1025a4c4f00ae9eb8484dc27.tar.bz2
glip-0ca931489f98b65e1025a4c4f00ae9eb8484dc27.zip
add minor fixes and major image fixes
perform better unicode removal/conversion fix image retrieval, including svg and minor graphics assets remove even more html elements not necessary for archival display
Diffstat (limited to 'flow.md')
-rw-r--r--flow.md22
1 files changed, 15 insertions, 7 deletions
diff --git a/flow.md b/flow.md
index 5c81d5e..ca81d52 100644
--- a/flow.md
+++ b/flow.md
@@ -24,27 +24,27 @@ Everything on this page, for jq filtering. https://stedolan.github.io/jq/manual/
* fix newlines
- sed -i -r -e 's/\\n/\n/g;' /mnt/public/www/issues/*.html
+ sed -i -r -e 's/\\n/\n/g;' /mnt/public/www/gitlab-issues/*.html
* find data-original-titles and replace the <time> tag contents with the value of its data-original-title. Also, this will BeautifulSoup pretty-print the html so some of the following commands work correctly.
- ls -1 /mnt/public/www/issues/*.html > output/files-for-timestamps.txt
+ ls -1 /mnt/public/www/gitlab-issues/*.html > output/files-for-timestamps.txt
./fix-timestamps.py
* download all relevant images, and then fix them.
./fetch-images.sh
- sed -i -f fix-images-in-html.sed /mnt/public/www/issues/*.html
+ sed -i -f fix-images-in-html.sed /mnt/public/www/gitlab-issues/*.html
* download all stylesheets and then fix them.
- mkdir -p /mnt/public/www/issues/css
+ mkdir -p /mnt/public/www/gitlab-issues/css
./fetch-css.sh
- sed -i -f fix-css-in-html.sed /mnt/public/www/issues/*.html
+ sed -i -f fix-css-in-html.sed /mnt/public/www/gitlab-issues/*.html
* fix some encoding oddities
- sed -i -f remove-useless.sed /mnt/public/www/issues/*.html
+ sed -i -f remove-useless.sed /mnt/public/www/gitlab-issues/*.html
* remove html components that are not necessary
@@ -52,6 +52,14 @@ Everything on this page, for jq filtering. https://stedolan.github.io/jq/manual/
* Fix links that point to defunct domain without-systemd.org.
- sed -i -r -f fix-without-systemd-links.sed /mnt/public/www/issues/*.html
+ sed -i -r -f fix-without-systemd-links.sed /mnt/public/www/gitlab-issues/*.html
+
+ * Perform final encoding conversion to remove any remaining broken characters
+
+ ./conversion.sh /mnt/public/www/gitlab-issues/*.html
+
+ * Fix some images that have a src="data:" that do not load, but the data-src property is the proper link
+
+ ./use-datasrc-instead-src.py
* build some sort of index?
bgstack15