-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate git-scm.com to a static site, generated via Hugo, served via GitHub Pages #1804
base: main
Are you sure you want to change the base?
Conversation
1db01e4
to
bd332cc
Compare
In the current effort to migrate https://git-scm.com/ to a static Hugo site (see git#1804), we saw a bogus tag that would confuse Hugo. We also saw a now-unused banner that we probably do not want to bother migrating to Hugo. So let's drop both. Signed-off-by: Johannes Schindelin <[email protected]>
4bd3b3f
to
7c5e7c5
Compare
🎉 This is great! Thank you so much for picking this up! The demo site looks great! |
👋 Sneaking in here with some thoughts from the search side! On first interactions, the search has some notable issues compared to the production rails search, for a few reasons on both sides of the fence.
(Amazing work migrating this to Hugo! ❤️) |
In the current effort to migrate https://git-scm.com/ to a static Hugo site (see git#1804), we saw a bogus tag that would confuse Hugo. We also saw a now-unused banner that we probably do not want to bother migrating to Hugo. So let's drop both. Signed-off-by: Johannes Schindelin <[email protected]>
Oh wow, Mr Pagefind himself! I'm honored to meet you, @bglw!
I kind of wanted to be able to find stuff in old versions that is no longer present in current versions. That's why I added dscho@e9fa963).
Excellent!
Heh, thank you for that!
Right, I had not worked on that because I hoped that the sorting by relevance would be "good enough"... |
About Heroku
That is true, but here has been an update since that 2022 mail. https://lore.kernel.org/git/ZRHTWaPthX%[email protected]/
It does seem like the PLC is still in favor of moving to a static solution, though. https://lore.kernel.org/git/[email protected]/
About the preview:Search
That is true. And in both the search results page as well as the little preview ( Minor issuesThere are some broken links in the preview on https://dscho.github.io/git-scm.com/docs/ that lead to https://dscho.github.io/docs/ <topic> There's a broken link on https://dscho.github.io/git-scm.com/about/free-and-open-source/ to https://dscho.github.io/git-scm.com/trademark. On the live site that redirects from https://git-scm.com/trademark to https://git-scm.com/about/trademark (dscho#1) The "Setup and Config" headline on https://dscho.github.io/git-scm.com/docs/ is blue in the preview, but not in the live site. This is not happening for me in local testing. There's some redirect that swallows anchors. https://dscho.github.io/git-scm.com/docs/ links to https://dscho.github.io/git-scm.com/docs/git#_git_commands , which redirects to https://dscho.github.io/git-scm.com/docs/git/ instead of https://dscho.github.io/git-scm.com/docs/git/#_git_commands https://dscho.github.io/git-scm.com/downloads/mac/ has an odd grammar issue that https://git-scm.com/download/mac doesn't. (dscho#2) It says
https://git-scm.com/download/mac correctly says
Also note the slight url change there from dowload to downloads. There is a redirect for that, though, so that should be fine. |
One additional note: There is a commit about porting the old 404 page, 18a3ac2, but I've only seen the generic GitHub pages 404 page on the preview in my testing. |
Switching to pagefind also changed search behaviour in another way. The rails site always searches the english content. Pagefind defaults to what they call multilingual search, i.e. searching only pages in the same language as the one you're searching from. That's theoretically a usability improvement, but with the partial nature of our non-english content (availability of any given language can vary from man page to man page, the book exists in languages that don't have any man pages, everything else only exists in english), we might need a fallback to english here. Pagefind offers an option to force all pages to be indexed as english, but I think we can slightly abuse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review. Only looked at the first 47 commits
This addresses that part of git#1804 (comment): There are some broken links in the preview on https://dscho.github.io/git-scm.com/docs/ that lead to https://dscho.github.io/docs/ <topic> Signed-off-by: Johannes Schindelin <[email protected]>
I managed to fix it via 2d0f6c8 |
Hmm. The more I think about it, the more I get convinced that the older versions of the manual pages should be excluded from the search, I thought it was a feature, but it looks as if it incurs more downsides than upsides. |
this was a major effort @dscho , thank you very much! sorry for the silence, but i've been busy with other stuff. in the meanwhile, and to ensure this effort wont be wasted, can you summarize what do you need to make this merge-ready? what do you still need to tackle? where do you need help from other people? :) |
@pedrorijo91 Yes.
The big blocker is the "live search" one. |
Oh, and there's a ton of work still needed to address @rimrul's excellent feedback. |
In the current effort to migrate https://git-scm.com/ to a static Hugo site (see git#1804), we saw a bogus tag that would confuse Hugo. We also saw a now-unused banner that we probably do not want to bother migrating to Hugo. So let's drop both. Signed-off-by: Johannes Schindelin <[email protected]>
There is not (yet) any official Lychee version that supports `--fallback-extensions`. Once that's available, this needs to be changed to use `lycheeverse/lychee-action@v1` instead. Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
WIP deploy: check for broken links TODO: There is not (yet) any official Lychee version that supports `--fallback-extensions`. Once that's available, this needs to drop the `lycheeVersion: nightly` line. Broken links are quite an annoyance for readers, and at least for links that point within the same static website, there are tools to help identify those. One such tool is called `lychee`. It has already a number of very useful options and even sports a GitHub Action for easy integration into GitHub workflows. Lychee was taught the trick needed to support checking links in a static website with "pretty URLs" (i.e. URLs lacking the `.html` file extension even though the files backing those URLs do have that extension, something GitHub Pages supports). With this mode, the automation that deploys https://git-scm.com/ can make use of that link checker. Seeing as broken links often originate from repositories outside of https://github.com/git/git-scm.com's control, rather than failing deployment when broken links are detected, let's follow the "best effort" strategy and open a ticket about the broken links while still letting the deployments complete. Signed-off-by: Johannes Schindelin <[email protected]>
Changes
This Pull Request adjusts the existing files such that the site is no longer served via a Rails App, but by GitHub Pages instead. A preview can be seen here: https://dscho.github.io/git-scm.com/ (which is generated and deployed from this Pull Request's branch, and will be updated via automation whenever that branch changes).
It is the culmination of a very long, and large, effort that started in February 2017 with the first attempt to migrate
the site to Jekyll. Several years, and a substantial effort by @spraints, @vdye and myself, later, here is the result: No longer a Jekyll site but a Hugo site (because of render times: 20 minutes vs 30 seconds), search implemented using Pagefind.
The main themes of the subsequent migration from the Rails App to a Hugo-generated static site are:
We move the original Rails App files that contain Rails code mixed into HTML to
content/
, where the files defining the pages live in the Hugo world, then modify them to drop the Rails code and replace it with Hugo constructs. More often than not, we separate the commits that move the files from the commits that adjust the contents, to help Git realize that there has been a move (as opposed to a delete/add). This allows for noticing upstream changes that need to be reflected in moved & modified files when rebasing to upstream.In Hugo setups, the files live in the following locations:
hugo.yml
This is the central configuration file that tells Hugo how to render the site.
content/
This defines the content of the pages that are served. Only a subset of Hugo's functionality is available here (the idea is to leave the complicated stuff to the layout used to render the pages).
Most, but not all, of the files living in this directory tree are HTML files that are generated (and then committed) using external repositories, e.g. the ProGit book and its translations.
layouts/
This is where the "boiler plate" is defined that ties the site together, i.e. the header, the footer and the sidebar as well as the main scaffolding in which the pages' content is to be rendered.
This is the location where most of Hugo's functionality is available and complex stuff can happen such as looping or accessing site parameters.
layouts/partials/
This directory contains recurring templates, i.e. reusable partial layouts that are used to structure the elements of the site. This includes the side bar, how videos are rendered, etc.
layouts/shortcodes/
This directory contains so-called "shortcodes", i.e. reusable elements similar to partial layouts. The major difference is that shortcodes can be used within
content/
while partial layouts can only be used from withinlayouts/
.See https://gohugo.io/content-management/shortcodes/ for more information on this topic.
static/
These files are not processed by Hugo, but copied as-are. Good for images, for example.
assets/
These files are processed in specific ways. That is where the SASS-based style sheets live, for example.
data/
These files define metadata that can be used in Hugo's functions. For example, it contains the list of documentation categories that are rendered in various ways.
In contrast to most Hugo-managed sites, we will refrain from using a Hugo theme, and instead stick with the existing style sheets.
Likewise, we refrain from using Markdown at all: The existing site did not use it, therefore it makes little sense to start using it now.
In addition to Hugo's directories, we also have these:
script/
This directory contains scripts to perform recurring tasks such as rendering Git's manual pages into HTML that are then stored inside
contents/docs/
.For historical reasons, these are Ruby scripts for the most part, as it is easier to follow the development when that functionality is extracted from the current Rails App and turned into Ruby scripts that can be run stand-alone.
.github/workflows/
and.github/actions/
The latter directory contains a file that defines a custom GitHub Action that accommodates for the lack of Hugo support in GitHub Pages: By default, only Jekyll pages are supported out of the box, but Hugo sites require a custom GitHub workflow to deploy the site.
The former directory contains files that define GitHub workflows that are typically run on a schedule, updating the various parts that are generated from external sources: the Git version, the ProGit Book, manual pages, etc. These workflows essentially keep the rendered HTML files in
content/
up to date with the respective external repositories.These workflows can be seen in action (pun intended) here: https://github.com/dscho/git-scm.com/actions
_generated-asciidoc/
This directory serves as a cache of "expanded AsciiDoc": many of Git's manual pages include content from other files, and therefore it is non-trivial to determine whether or not a manual page has changed and needs to be re-rendered (essentially, the only way is to expand them by inlining the
include
d files). Caching this content speeds up updating the manual pages drastically.Most of the core logic lives in
layouts/
. Hugo discerns between logic that is allowed inlayouts/
and logic that is allowed incontent/
; The latter can only access so-called "shortcodes" https://gohugo.io/content-management/shortcodes/. These shortcodes are free to use the entire set of Hugo's functionality.tl;dr whenever we need to do something complicated that is confined to only a few pages, we have to implement it in
layouts/shortcodes/
and insert the corresponding{{< shortcode-name >}}
in the page itself. Whenever we need to something complicated that is used in more places, it is implemented elsewhere inlayouts/
.Some of the logic that cannot be performed statically (such as telling the user how long ago the latest macOS installer was released, or adjusting the Windows downloads to reflect the CPU architecture indicated by the current user agent) are implemented using Javascript instead.
The site search needs to move to the client side, as there is no longer a server that can perform that functionality. Luckily, Pagefind (https://pagefind.app/) matured in the meantime, a very performant client-side search solution implemented in Javascript that relies on a search index that is generated at build time and that is served incrementally, as needed, via static files. This is what we use, then.
Context
Changes required to finalize the migration in addition to this Pull Request
This Pull Request is not actually meant to be merged, not to the
main
branch at least, but to the (not-yet-existing)gh-pages
branch.To successfully deploy to GitHub Pages, the
Pages
configuration needs to be switched from "Deploy from a branch" to "GitHub Actions":Once everything is golden in this Pull Request and the decision to move to GitHub Pages is final,
git-scm.com
needs to pointed to GitHub Pages (read:CNAME
needs to be configured to make use of the GitHub Pages-deployed site).The Pull Request branch could actually be pushed to
gh-pages
already way before closing this Pull Request, as https://git-scm.github.io/ would be serving a different site than https://git-scm.com/ before theCNAME
entry is adjusted.Why make these changes?
hugo serve -w
, then editing the files to your heart's extent.