Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create mirrors.cicku.me #1030

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Create mirrors.cicku.me #1030

wants to merge 5 commits into from

Conversation

cicku
Copy link

@cicku cicku commented May 10, 2024

Add a new mirror

@jonathanspw
Copy link
Member

Please follow the geolocation example at the bottom of the page at https://wiki.almalinux.org/Mirrors.html

The system uses this to serve your mirror to users geographically close to your mirror.

@jonathanspw
Copy link
Member

Just noticed the filename doesn't end in .yml. Please rename the file to mirrors.cicku.me.yml

@cicku
Copy link
Author

cicku commented May 10, 2024

My mirror is behind CDN, what country should I fill it? It is available everywhere. Even cloud_type is not sufficient.

@jonathanspw
Copy link
Member

I see. Are the files pre-cached basically everywhere or only cached on each edge based on need? I see CF Magic Transit is what's behind it.

This is certainly an interesting situation that we've not encountered yet.

Where is the actual server that sits behind the CDN in this case?

@cicku
Copy link
Author

cicku commented May 10, 2024

There are 4 bare metal servers powering the mirror:

  1. US West is the main one, it has ~160 TB storage.
  2. US East is the fallback, it has ~120 TB storage.
  3. UK is the one load balancing with US West and not used much at the moment due to some hardware issues. ~80 TB storage.
  4. Singapore has a ~40TB server as the fallback of US West when there is a cable issue, not having all mirror files.

And I have additional VPSs which will act as the warm-up site.

image

Files will not always be stored in each metal, but if more users visit the same edge, these requested files will be cached in the same PoP (referred as hot cache). If hot cache is not found, Cache Reserve will be used (cold cache), Tiered Cache is also enabled for fast fetch of both hot and cold, all cache files will be stored for a certain time period. I have rules written for each project I support, so I can customize (for example, ISO files do not need to updated often, they can be in the cold cache for a month or two without changes).


Magic Transit is not directly used by the mirror site (my homepage is the same across different subdomains 💡), it is behind the scene though. I do have a rsync service behind Spectrum, but it is a private one and due to bandwidth concern I do not plan to announce it.

@jonathanspw
Copy link
Member

CDNs are a bit tricky for the mirror system because it goes against what it was designed to do. The mirror system distributes traffic to local mirrors and having potentially cold caches where files don't currently exist would degrade the user experience.

The best thing will be to set the geolocation on this to the primary location where the files will always be with a DNS entry tied only to that location, and not one that'd fall back to other locations which even if hot, would result in sub-par user experience by potentially serving users across the country/world.

@cicku
Copy link
Author

cicku commented May 31, 2024

My understanding of a modern mirror ecosystem is that CDN can co-exist with local mirrors because CDN may not technically be the best/fastest, it is for load balancing global traffic instead. A package manager should regularly perform latency check/speed test and select the best, like fastestmirror in dnf. Since CDN does not have bandwidth issue (I can do 0.5 PB in a single day based on the load testing), latency will be the only concern.

The best thing will be to set the geolocation on this to the primary location where the files will always be with a DNS entry tied only to that location

I have a long list of subdomains like jp.mirrors.cicku.me that only serves traffic around Japan. If you need them I can also provide in that way, we can just try a few for testing before adding all of them to the list.

@cicku
Copy link
Author

cicku commented May 31, 2024

I do have 1 question about yaml format, should I put all subdomains in a single file and create one by one?

@jonathanspw
Copy link
Member

My understanding of a modern mirror ecosystem is that CDN can co-exist with local mirrors because CDN may not technically be the best/fastest, it is for load balancing global traffic instead. A package manager should regularly perform latency check/speed test and select the best, like fastestmirror in dnf. Since CDN does not have bandwidth issue (I can do 0.5 PB in a single day based on the load testing), latency will be the only concern.

We do not rely on fastestmirror, instead our mirror system does the logic to try to serve the best mirror to you and that's why CDNs don't play nicely with our mirror system. Since we expect one mirror to represent one location for our geolocation logic a CDN that can exist from any number of places poses a problem.

Furthermore, when said CDN has an endpoint that dies and it redirects traffic to another endpoint that is great, but it causes the user sub-par performance if it is redirect to one far away. We can do a better job of removing problematic mirrors from the list and redirecting users to other mirrors that are close to them rather than the CDN doing potentially less than ideal things.

Having said all that - if you have records to each of your locations, and it sounds like you do, the solution here is to create a mirror entry in mirrors.d for each location with DNS that goes directly to it and doesn't hit CDN/fallback logic on your end. Then you can also provide accurate location data for each mirror and we can serve it to users accordingly. Preferably these locations have direct storage of the files and don't rely on a caching architecture that has the potential to have the files get removed - again it's all about providing the best experience to end users (translation: fast dnf transactions/downloads).

If you must use a hot/cold caching architecture then there are some TTLs we could provide that would result in good UX, but you'd have to bypass any rules that'd remove things based on infrequent access...but again direct storage is much preferred.

Thanks for working on setting up mirroring, it is very much appreciated :) Let me know what you think about my comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants