Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs into the repo #756

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Add docs into the repo #756

wants to merge 14 commits into from

Conversation

jtojnar
Copy link
Contributor

@jtojnar jtojnar commented Oct 23, 2022

I downloaded the HTML from live site and created a script (markdownify.py) that will convert them to Markdown. This PR attempts to make everything work. We will remove/update the outdated content in a follow-up PR.

To regenerate the markdown files, run python3 markdownify.py in the SimplePie branch after installing python3-beautifulsoup4, python3-pypandoc and prettier. Or if you have Nix, you can just run nix-shell -I 'nixpkgs=channel:nixos-unstable' -p 'python3.withPackages (ps: with ps; [ beautifulsoup4 pypandoc ])' nodePackages.prettier --run 'python3 markdownify.py'.

Currently I am using Zola as the generator, as I am most familiar with it. A different SSG can be used if preferred.

To preview run zola serve in the docs subdirectory.

TODO

Fixes: #543

@jtojnar jtojnar force-pushed the static-docs branch 2 times, most recently from 5ab8ab6 to 27f5f39 Compare October 23, 2022 20:44
@Art4
Copy link
Contributor

Art4 commented Oct 24, 2022

Great work so far. 👍 What do you think about having the website on an orphan gh-pages branch to maintain the website and deployment tools independent from the master branch?

We should also put the name change of Sam on the todo list (#543 (comment))

@jtojnar jtojnar mentioned this pull request Oct 24, 2022
6 tasks
@jtojnar
Copy link
Contributor Author

jtojnar commented Oct 24, 2022

Using a different repo would be even cleaner. But then it is harder to do coordinated changes (e.g. updating tutorial to new API), as Git & GitHub does not really support this. Plus “out of sight out of mind” applies. Since the goal of this effort is to make updating the docs simpler, I think using the same branch is probably the best choice here. Zola is essentially zero-config (only really requires setting the site URL config.toml and few templates) so it should not bloat the repo much.

Opened #757 for the todos.

@jtojnar jtojnar force-pushed the static-docs branch 3 times, most recently from 8094df0 to ef48853 Compare October 30, 2022 22:16
@jtojnar
Copy link
Contributor Author

jtojnar commented Oct 31, 2022

@mblaney The last remaining question content-wise is what to do with the demo:

  • Upload the static website to the FTP server (instead of GitHub pages) using GitHub actions, keep the demo as a PHP script.
  • Move the demo to a demo.simplepie.org subdomain, occasionally updated manually, and point the website to that.
  • Move only the backend part of the demo to a subdomain, and convert the frontend to use AJAX.
  • Remove the demo completely.

We should also decide on the host we want to use:

  • Use the existing web host
    • ➕ Allows running demo using PHP.
    • ➕ Supports redirects using .htaccess.
    • ➕ No need to fiddle with DNS.
    • ➕ Trivial setup.
    • ➖ No preview support.
  • Netlify
    • ➕ Supports _redirects.
    • ➕ Will create a subdomain for each pull requests as a preview.
    • ➖ DNS changes necessary.
    • ➕ Easy setup.
    • ➖ Free plan only allows a single user to access administration.
    • ➕ though most things can be configured in a config file in the repository.
  • Cloudflare Pages
    • ➕ Supports _redirects file.
    • ➕ Will create a subdomain for each pull requests as a preview.
    • ➖ DNS changes necessary.
    • ❓ Not sure how hard to set up, I have not used this service yet.
  • GitHub pages
    • ➖ No support for redirects.
    • ➖ No preview support.
    • ➖ DNS changes necessary.
    • ➕ Easy setup.

@mblaney
Copy link
Member

mblaney commented Dec 2, 2022

thanks @jtojnar I wouldn't go for the subdomain option because that involves finding someone who can do DNS changes. It looks like the domain is owned by Automatiic, if someone here wants to help make that happen hopefully they will jump in, otherwise I would say continue with the other options.

By process of elimination that means staying with the current host and either removing the demo or possibly using the github action you've suggested. Happy for you to decide on that.

@jtojnar
Copy link
Contributor Author

jtojnar commented Dec 2, 2022

@mblaney If we stay with the current host, keeping the demo working is not that hard.

Do you have FTP or SSH credentials for the host? Depending on that, we will need to choose either https://github.com/marketplace/actions/ftp-deploy or https://github.com/marketplace/actions/web-deploy-anything.

And either way, we will need to set up credentials on GitHub: https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository

@mblaney
Copy link
Member

mblaney commented Dec 13, 2022

I have FTP credentials but I'm not a project owner, so not sure how far I will get. Let me know when you're ready and I can try adding the credentials.

@jtojnar jtojnar force-pushed the static-docs branch 2 times, most recently from 9559027 to 3ed0fc1 Compare January 3, 2023 02:28
@jtojnar jtojnar marked this pull request as ready for review January 3, 2023 02:40
@jtojnar
Copy link
Contributor Author

jtojnar commented Jan 3, 2023

I have tested this on my repo, seems to work well, including the demo: http://simplepie.ogion.cz/ (use test as username and password)

So it should be ready now.

@mblaney Now you should:

  1. Back up the contents of the FTP server.
  2. Add the following repository secrets on https://github.com/simplepie/simplepie/settings/secrets/actions:

FTP_PASSWORD
FTP_SERVER
FTP_USERNAME

  1. Merge this PR. Please do not squash the commits so that the HTML files are preserved in the git history to allow us to fix Markdown conversion issues if we notice them later.

@mblaney
Copy link
Member

mblaney commented Jan 12, 2023

nice one @jtojnar your demo site looks great! I get a 404 for that actions url though?

@jtojnar
Copy link
Contributor Author

jtojnar commented Jan 12, 2023

@mblaney This is what I see in my fork:

Actions secrets page

Or maybe we need someone with member status on the repo?

@Art4
Copy link
Contributor

Art4 commented Jan 12, 2023

@jtojnar I noticed some styling errors an the API Docs page. Take a look a the left sidebar:

http://simplepie.ogion.cz/api/

grafik

This is how it looks like atm: http://simplepie.org/api/

grafik

@jtojnar jtojnar force-pushed the static-docs branch 4 times, most recently from 2ba363f to 5feb86f Compare January 12, 2023 13:56
@jtojnar
Copy link
Contributor Author

jtojnar commented Jan 12, 2023

@Art4 Tweaked the style, should be fixed now.

@Art4
Copy link
Contributor

Art4 commented Jan 12, 2023

Thank you @jtojnar. I also noted some other things:

  1. The source code view has no indentation: http://simplepie.ogion.cz/api/source-src.SimplePie/#690-695
    grafik
  2. The blog posts are randomly shuffled and don't show the date somewhere. There is also no pagination, but imho that's not important. http://simplepie.ogion.cz/blog/

@jtojnar jtojnar force-pushed the static-docs branch 3 times, most recently from bb9a312 to 9818cd8 Compare January 12, 2023 16:05
@jtojnar
Copy link
Contributor Author

jtojnar commented May 2, 2023

One concern would be increased repo size:

  1. master branch: 8.5 MiB (4.61 MiB compressed)
  2. this PR: 14.7 MiB (7.25 MiB compressed)
  3. this PR without the original HTML files: 13.9 MiB (6.52 MiB compressed)
Methodology
  1. Cloned the repo with git clone [email protected]:simplepie/simplepie.git
  2. In the copy of ①, I fetched the PR: gh co 756
  3. In copy of ②, I squashed the Remove original website source commit into Convert website into a static site and fetched the branch into a copy of ① with git fetch ../simplepie2 static-docs-clean and git checkout FETCH_HEAD

Then I ran git fsck; git prune; git gc

For the compressed sizes, I ran git clone -v file://$PWD/simplepie3 $(mktemp -d)

This is not that drastic but it would be a permanent cost going forward so perhaps we should store the backup somewhere else.

Also I noticed just wiki/reference takes 1.1 MiB in the repo unpacked. Maybe we will not want to include it and move the content to PHPDoc comments.

@jtojnar
Copy link
Contributor Author

jtojnar commented May 3, 2023

@mblaney Actually, looks like WordPress supports export without the need to access the database. Are you able to log into the administration and get the export from http://simplepie.org/blog/wp-admin/export.php? And are you able to download the contents of the FTP server and upload it as an archive here?

@skyzyx
Copy link
Member

skyzyx commented May 6, 2023

Working on this now. Sorry for the delay.

@skyzyx
Copy link
Member

skyzyx commented May 6, 2023

@mblaney, @jtojnar:

Added these secrets:

  • SFTP_USERNAME
  • SFTP_PASSWORD

Added these variables:

  • SFTP_SERVER
  • SFTP_PORT

@skyzyx
Copy link
Member

skyzyx commented May 6, 2023

Backing up the public_html directory. So far, it's several GBs. I'll update when the backup, tarballing, and uploading is complete.

@jtojnar
Copy link
Contributor Author

jtojnar commented May 11, 2023

@skyzyx You can also try uploading the following PHP script and run it to create an archive on the server. It should be much faster than downloading individual files:

<?php
set_time_limit(0);
error_reporting(E_ALL);

$zip = new ZipArchive();
$zip->open(__DIR__ . '/simplepie_website_backup.zip', ZipArchive::CREATE);
echo 'Creating zip archive<br>';
$directory = new \RecursiveDirectoryIterator(
    // Or change the directory path.
    __DIR__,
    FilesystemIterator::KEY_AS_PATHNAME | FilesystemIterator::CURRENT_AS_FILEINFO | FilesystemIterator::SKIP_DOTS
);
$iterator = new \RecursiveIteratorIterator($directory);
foreach ($iterator as $info) {
    echo 'Adding ' . $info->getPathname() . '<br>';
    $zip->addFile($info->getPathname());
}
$zip->close();
echo 'Finished<br>';

@skyzyx
Copy link
Member

skyzyx commented May 13, 2023

It took some time to finish the download, but it finally completed at 7.4 GiB. I removed the cache files, tarred the directory, and gzipped it with -9. The resulting archive is just under 50 MB. I uploaded it to the root of the SFTP server.

/public_html_2023-05-11T18-20-00Z.tar.gz

@mblaney
Copy link
Member

mblaney commented May 18, 2023

thanks @skyzyx that's great. @jtojnar this PR just needs updating for the SFTP change?

@jtojnar
Copy link
Contributor Author

jtojnar commented May 18, 2023

@mblaney I wanted to use the files from the backup as a base for generating since the scraping is not perfect (wget I used for mirroring returns a different set of pages each time and I noticed few places where the wiki software produces messed up HTML).

But since I do not have an access to FTP so I cannot access the backup. Could you please re-upload it somewhere publicly available?

Also do you have access to the WordPress administration? The export would be helpful for similar reason.
It should be available on the following URL if you are able to sign in:
http://simplepie.org/blog/wp-admin/export.php

Otherwise, if the WordPress installation is too broken, could you try getting a database dump, e.g. by uploading a tool like Adminer and exporting the database using the credentials from wp-config.php file?
https://www.adminer.org/en/

@mblaney
Copy link
Member

mblaney commented May 18, 2023

hi @jtojnar the backup is just the wordpress install, so nothing usable like that is it? It appears to be too broken to log in, and I don't want to re-upload because it contains login credentials (even though I can't use them).

I can try the database dump if you like, but not sure that will provide anything better than scraping?

@jtojnar
Copy link
Contributor Author

jtojnar commented May 18, 2023

@mblaney IIRC the wiki system stores the content in the directory so that is the main thing I am after. The issue with scraping is that it is incomplete – there are some pages missing or returning error 500. I managed to get some of them out of internet archive but DB dump would be preferred since we can never be certain if wget did get everything.

jtojnar added 12 commits May 22, 2023 01:29
This was changed in simplepie#745
but without any rationale. The only HTML file is in tests and that should not be manually edited at all.
Ran the following within `nix-shell -I 'nixpkgs=channel:nixos-unstable' -p zola`
to create the website tree:

    zola init docs

Filled in the website URL and disabled everything for now.

Then created templates based on the successive commits.
There are only two markdown files and both use 2 spaces.
Ran the following within `nix-shell -I 'nixpkgs=channel:nixos-unstable' -p wget2 yq-go dos2unix 'python3.withPackages (ps: with ps; [ beautifulsoup4 pypandoc ])' nodePackages.prettier`

```sh
# Download the website contents from the web, and the pages that fail with error 500 from Internet Archive.
wget2 --user-agent 'Mozilla/5.0 (X11; Linux x86_64; rv:106.0) Gecko/20100101 Firefox/106.0' --mirror --force-directories --no-robots --retry-on-http-error=403 --http2-request-window=1 --random-wait --exclude-directories=/wiki/lib/exe/ http://simplepie.org
rm simplepie.org/blog/2006/03/06/forums-powered-by-punbb/index.html
wget2 https://web.archive.org/web/20190404091911/simplepie.org/blog/2006/03/06/forums-powered-by-punbb/ --directory-prefix=simplepie.org/blog/2006/03/06/forums-powered-by-punbb
rm simplepie.org/blog/2012/10/30/simplepie-1-3-1-is-now-available/index.html
sed -i 's/\xbb//' docs/content/blog/2006-03-06-forums-powered-by-punbb.html # fix encoding
wget2 https://web.archive.org/web/20210812123158/https://simplepie.org/blog/2012/10/30/simplepie-1-3-1-is-now-available/ --directory-prefix=simplepie.org/blog/2012/10/30/simplepie-1-3-1-is-now-available

# Copy the downloaded contents into the website tree.
cp -r simplepie.org/* docs/content
mkdir -p docs/static
mv docs/content/{scripts,favicon.ico,images,css,robots.txt} docs/static

# Standardize line endings.
dos2unix docs/**

# Drop API docs, we will generate them later.
rm -r docs/content/api

# Drop mint (analytics), it is abandoned.
rm -r docs/content/mint

# Drop dynamically generated demo pages.
rm -r docs/content/demo/newsblocks

# Drop downloads – we will just link GitHub.
rm docs/content/downloads/*\?* docs/content/downloads/*.zip

# Drop ancient scripts, no more font replacement using flash, or tricks to make PNGs transparent in IE (Sleight).
rm -r docs/static/css/sIFR-* docs/static/scripts/

# Download headers explicitly since they are currently rotated by PHP
# and wget was not able to find them.
rm docs/static/images/headers/rotate-old.php
wget http://simplepie.org/images/headers/rotate-xspf.xml --directory-prefix docs/static/images/headers/
cat docs/static/images/headers/rotate-xspf.xml | yq -p=xml '"http://simplepie.org" + .playlist.trackList.track[].location' | xargs wget --directory-prefix docs/static/images/headers/

# Drop Wordpress plug-in clutter.
rm -r docs/content/blog/wp-{content,includes,json}

# Drop feeds.
rm docs/content/blog/**/feed/index.html
rmdir docs/content/blog/**/feed

# Drop wiki noise.
rm docs/content/wiki/lib/exe/css.php?*
mv docs/content/wiki/lib/tpl/simplepie/wikistyles.css docs/static/css/
mv docs/content/wiki/lib/images/smileys/icon_exclaim.gif docs/static/images/
rm -r docs/content/wiki/lib
rm docs/content/wiki/feed.php*
rm -r docs/content/wiki/{_detail,_export}
find docs/content/wiki/ -name '*\?idx=*' -exec rm '{}' \;
find docs/content/wiki/ -name '*\?do=*' -exec rm '{}' \;
rm docs/content/wiki/_media/wiki/dokuwiki-128.png docs/content/wiki/wiki/dokuwiki
rmdir docs/content/wiki/wiki
mv 'docs/content/wiki/_media/tutorial/update_simplepie_cache.jpg?cache=' 'docs/content/wiki/_media/tutorial/update_simplepie_cache.jpg'
rm docs/content/wiki/_media/tutorial/update_simplepie_cache.jpg\?*

# Add extension to wiki pages.
find docs/content/wiki -type f ! -name '*.jpg' ! -name '*.html' -print0 | xargs -0 -I '{}' mv '{}' '{}.html'

# Remove duplicate wiki page
rm docs/content/wiki/faq/Supported_Character_Encodings.html
echo /wiki/faq/Supported_Character_Encodings /wiki/faq/supported_character_encodings >> docs/static/_redirects
rm docs/content/wiki/plugins/wordpress/simplepie_plugin_for_wordpress.1.html
# Rename start files (used as directory index in DokuWiki) to _index.html used by Zola.
find docs/content/wiki/ -name start.html | sed -E 's#(docs/content/(.*))/start.html#mv "\0" "\1/_index.html"; echo "/\2/start /\2/" >> docs/static/_redirects#g' | sh -

# Simplify blog structure.
rm -r docs/content/blog/page docs/content/blog/index.html
find docs/content/blog/2* -name index.html | sed -E 's#docs/content/blog/(....)/(..)/(..)/(.*)/index.html#mv "\0" "docs/content/blog/\1-\2-\3-\4.html"#g' | sh -
rmdir docs/content/blog/*/*/*/*
rmdir docs/content/blog/*/*/*
rmdir docs/content/blog/*/*
rmdir docs/content/blog/????
ls docs/content/blog/*.html | sed -E 's#docs/content/blog/(....)-(..)-(..)-(.+)\.html#/blog/\1/\2/\3/\4/ /blog/\4/#g' >> docs/static/_redirects

# Prepare redirects for Apache
sed -i 's/^/Redirect 302 /' docs/static/_redirects
mv docs/static/{_redirects,.htaccess}

# Manually extracted main template into templates/.
```
Produced by markdownify.py
They are big and many are outdated.
So that Zola does not complain about being broken once we remove the wiki.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update your website/release notes/docs/etc.
4 participants