Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Redirector plugin #1796

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

Mis1eader-dev
Copy link
Member

#1783

Needs path redirection

I'm aware the pull request includes a small edit to SlashRemover, I believe it doesn't need its own PR for a small change.

@Mis1eader-dev
Copy link
Member Author

Mis1eader-dev commented Sep 26, 2023

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

  1. PreRedirectHandler
  2. PathRedirectHandler
  3. PostRedirectHandler
  4. PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

@Mis1eader-dev
Copy link
Member Author

I noticed the formatter doesn't move & ampersands next to the identifiers.

const string& host = ...;

Will not get formatted to:

const string &host = ...;

Is it perhaps a bug, or did formatting rules get changed to allow both styles?

@an-tao
Copy link
Member

an-tao commented Sep 26, 2023

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

  1. PreRedirectHandler
  2. PathRedirectHandler
  3. PostRedirectHandler
  4. PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

This seems too complex for users.

@an-tao
Copy link
Member

an-tao commented Sep 26, 2023

I noticed the formatter doesn't move & ampersands next to the identifiers.

const string& host = ...;

Will not get formatted to:

const string &host = ...;

Is it perhaps a bug, or did formatting rules get changed to allow both styles?

Both styles seem to be present in the current code, you can try the configuration of clang-format and choose the second style.

@Mis1eader-dev
Copy link
Member Author

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

  1. PreRedirectHandler
  2. PathRedirectHandler
  3. PostRedirectHandler
  4. PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

This seems too complex for users.

Yeah, and we don't have another choice. The only way to not introduce another group is to integrate this plugin into Redirector itself.

I have added the 3rd group and renamed the handlers for better clarity, should I revert it?

We can think of it the same as AOP callbacks, we can say they also are complex in a way without reading the documentation and seeing the picture visualization.

@rbugajewski
Copy link
Collaborator

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

  1. PreRedirectHandler
  2. PathRedirectHandler
  3. PostRedirectHandler
  4. PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

This seems too complex for users.

I also think that it is kind of over-engineered for most use cases and makes the API harder to understand for new users. I’m in favor of keeping things simple.

@Mis1eader-dev Mis1eader-dev marked this pull request as ready for review September 27, 2023 08:26
@Mis1eader-dev Mis1eader-dev marked this pull request as draft September 27, 2023 08:40
@Mis1eader-dev
Copy link
Member Author

Mis1eader-dev commented Oct 1, 2023

This is a rough idea on implementing it using multiple nested unordered maps, it is complex and I don't plan on implementing it that way, it generates a map with common characters split into nested nodes. In case anybody is interested in changing the implementation in the future. The data structure for this is called a Trie.
image

It will be easy to use:

// Some nodes need to store information like how many folders are common within that node,
// for example "/common/places" has a common depth of 2, once the path substring lookup
// gets to "/common/places/open", it will substring with a depth of 2 and looks up
// ["/common/places"] within the unordered map of that node, then substring further up until "/open".
const auto &ruleTo = rulesFrom["www.example.com"]["/maps"]["/common/places"]["/open"];

A vector of strings may be more desirable and easier to implement for this plugin. I will go for this route.

Regarding introducing a new group for Redirector, we can discuss it once this plugin's implementation is finished.

Edit: Using vectors is a hassle, there still is a need to know common points within the rules, using unordered maps is the way to go.

@Mis1eader-dev
Copy link
Member Author

Mis1eader-dev commented Oct 2, 2023

This is still a draft
image

The shown diagram redirects in the following fashion:
10.10.10.1 -> files.example.com
10.10.10.1/any -> www.example.com/any

10.10.10.1/files -> files.example.com (Doesn't have a strict match rule, so it redirects even if there is no subpath)
10.10.10.1/files/video -> files.example.com/video
10.10.10.1/files/video/asia.mp4 -> files.example.com/video/asia.mp4

10.10.10.1/files/map -> www.example.com/maps
10.10.10.1/files/map/asia -> www.example.com/maps/asia
10.10.10.1/files/maps -> www.example.com/maps
10.10.10.1/files/maps/asia -> Does not redirect, the rule is strict, not wildcard

10.10.10.1/files/docs -> files.example.com/docs
10.10.10.1/files/docs/drogon.docx -> files.example.com/docs/drogon.docx

10.10.10.1/files/img -> files.example.com/imgs
10.10.10.1/files/img/asia.png -> Does not redirect, the rule is strict, not wildcard
10.10.10.1/files/imgs -> files.example.com/imgs
10.10.10.1/files/imgs/asia.png -> Does not redirect, the rule is strict, not wildcard


The code of said draft

#include <cstddef>
#include <cstdint>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>

using std::cout;
using std::string;

struct RedirectTo
{
	string host, path;
};

struct RedirectPath
{
	std::unordered_map<string, RedirectPath*> paths;
	size_t minPathLen = SIZE_MAX;
	size_t minPathDepth = SIZE_MAX;
	RedirectTo* wildcard;
	RedirectTo* to;
};

static void recurse(const RedirectPath* parent, size_t tabs = 1)
{
	if(!parent)
		return;

	const size_t moreTab = tabs + 1;
	for(const auto& [childPathStr, childPathObj] : parent->paths)
	{
		bool newline = false;
		if(childPathObj->to)
		{
			for(size_t t = tabs; t; --t)
				cout << '\t';

			cout << childPathStr;

			cout << " -> " << childPathObj->to->host << childPathObj->to->path;

			newline = true;
		}

		if(childPathObj->wildcard)
		{
			if(newline)
				cout << '\n';

			for(size_t t = tabs; t; --t)
				cout << '\t';

			cout << childPathStr;

			cout << " (*)";
			cout << " -> " << childPathObj->wildcard->host << childPathObj->wildcard->path << "/*";

			newline = true;
		}

		if(newline)
			cout << '\n';
		else
		{
			for(size_t t = tabs; t; --t)
				cout << '\t';
			cout << childPathStr << '\n';
		}

		recurse(childPathObj, moreTab);
	}
}

int main()
{
	std::unordered_map<string, RedirectPath> rulesFrom;
	std::vector<RedirectTo> rulesTo;
	rulesTo.push_back({"www.example.com"});
	rulesTo.push_back({"files.example.com"});
	rulesTo.push_back({"files.example.com", "/imgs"});
	rulesTo.push_back({"files.example.com", "/docs"});
	rulesTo.push_back({"www.example.com", "/maps"});

	{
		auto rp0 = rulesFrom["10.10.10.1"].paths["/"] = new RedirectPath;
		rp0->wildcard = &rulesTo[0];
		rp0->to = &rulesTo[1];
	}

	{
		auto rp0 = rulesFrom["10.10.10.1"].paths["/files"] = new RedirectPath;
		rp0->wildcard = &rulesTo[1];
		{
			{
				auto rp1 = rp0->paths["/img"] = new RedirectPath;
				rp1->wildcard = &rulesTo[2];
				{
					auto rp2 = rp1->paths[""] = new RedirectPath;
					rp2->to = &rulesTo[2];
				}
				{
					auto rp2 = rp1->paths["s"] = new RedirectPath;
					rp2->to = &rulesTo[2];
				}
			}
			{
				auto rp1 = rp0->paths["/map"] = new RedirectPath;
				{
					auto rp2 = rp1->paths[""] = new RedirectPath;
					rp2->wildcard = &rulesTo[4];
				}
				{
					auto rp2 = rp1->paths["s"] = new RedirectPath;
					rp2->to = &rulesTo[4];
				}
			}
		}
		{
			auto rp1 = rp0->paths["/docs"] = new RedirectPath;
			rp1->wildcard = &rulesTo[3];
		}
	}

	for(const auto& [host, redirects] : rulesFrom)
	{
		cout << host << ":\n";
		recurse(&redirects);
	}
}

The final implementation should do some checks to determine common points and make the data structure compact, and cache some information for quick execution during runtime after the initial mapping from startup.

If anybody has a better idea, feel free to share.

Edit: Change of algorithm, the maps can be populated using inequality boundaries and length exhaustion.

// 10.10.10.1/user/file
// 10.10.10.1/user/files/*
// 10.10.10.1/user/files/audio
// 10.10.10.1/user/files/video/*
//
// 10.10.10.1/global/*
//
// 10.10.10.1/user/crocodile

rulesFrom = {
	"10.10.10.1": {
		maxPathLen = 6,
		"/user/": {
			maxPathLen = 4,
			"file": {
				to = ...,
				maxPathLen = 1,
				"s": {
					wildcard = ...,
					maxPathLen = 6,
					"/audio": {
						to = ...,
						maxPathLen = 0
					},
					"/video": {
						wildcard = ...,
						maxPathLen = 0
					}
				}
			},
			"croc": {
				maxPathLen = 5,
				"odile": {
					to = ...,
					maxPathLen = 0
				}
			}
		},
		"/globa": {
			maxPathLen = 1,
			"l": {
				wildcard = ...,
				maxPathLen = 0
			}
		}
	}
}

@Mis1eader-dev
Copy link
Member Author

Mis1eader-dev commented Oct 3, 2023

Through experimenting, I came up with two five methods of populating the nodes

1. Nearest slash separator

This splits out nodes based on the closest slash in a specific window.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 5,
        "/user": {
            maxPathLen = 5,
            "/file": {
                maxPathLen = 1,
                "s": {
                    to = ...,
                    maxPathLen = 0
                }
            },
            "/apps": {
                to = ...,
                maxPathLen = 0
            }
        },
        "/usr/": {
            maxPathLen = 3,
            "lib": {
                to = ...,
                maxPathLen = 0
            },
            "inc": {
                maxPathLen = 4,
                "lude": {
                    to = ...,
                    maxPathLen = 0
                }
            }
        }
    }
}

Pros:

  • Less tree depth, saves some memory.
  • Less CPU usage, from doing less map lookups and hash calculations.
  • Less overall memory usage due to lesser tree depths, meaning less maps.

Cons:

  • Slightly more memory usage for data, as it may store duplicate prefixes.
  • Complex to implement.

2. Common letter separator

Splits out nodes based on intersecting letters in a specific window.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 2,
        "/us": {
            maxPathLen = 2,
            "er": {
                maxPathLen = 1,
                "/": {
                    maxPathLen = 4,
                    "file": {
                        maxPathLen = 1,
                        "s": {
                            to = ...,
                            maxPathLen = 0
                        }
                    },
                    "apps": {
                        to = ...,
                        maxPathLen = 0
                    }
                }
            },
            "r/": {
                maxPathLen = 3,
                "lib": {
                    to = ...,
                    maxPathLen = 0
                },
                "inc": {
                    maxPathLen = 4,
                    "lude": {
                        to = ...,
                        maxPathLen = 0
                    }
                }
            }
        }
    }
}

Pros:

  • Slightly less memory usage for data, it doesn't store any duplicates.

Cons:

  • More tree depth, in cases where there are too many similar looking paths.
  • More CPU usage, from doing deep nested map lookups. (Due to increase in tree depth)
  • More overall memory usage due to increased tree depth.
  • Complex to implement.

I choose the first implementation. I listed both in case others favor the 2nd implementation.

Edit:

3. Local length boundary separator

Splits out nodes on local length boundaries.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 8,
        "/user/fi": {
            maxPathLen = 3,
            "les": {
                to = ...,
                maxPathLen = 0
            }
        },
        "/user/ap": {
            maxPathLen = 2,
            "ps": {
                to = ...,
                maxPathLen = 0
            }
        }
        "/usr/lib": {
            to = ...,
            maxPathLen = 0
        },
        "/usr/inc": {
            maxPathLen = 4,
            "lude": {
                to = ...,
                maxPathLen = 0
            }
        }
    }
}

Pros:

  • Lesser tree depth than 1st method.
  • Lesser CPU usage, from doing less map lookups and hash calculations.
  • Lesser overall memory usage due to lesser tree depths, meaning less maps.

Cons:

  • More memory usage for data, it can store multiple duplicate prefixes.
  • Less complex to implement. However, still requires recursive population.

4. Global length boundary separator

Splits out nodes on length boundaries across all rules.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 8,
        "/user/fi": {
            maxPathLen = 2,
            "le": {
                maxPathLen = 1,
                "s": {
                    to = ...,
                    maxPathLen = 0
                }
            }
        },
        "/user/ap": {
            maxPathLen = 2,
            "ps": {
                to = ...,
                maxPathLen = 0
            }
        }
        "/usr/lib": {
            to = ...,
            maxPathLen = 0
        },
        "/usr/inc": {
            maxPathLen = 2,
            "lu": {
                maxPathLen = 1,
                "d": {
                    maxPathLen = 1,
                    "e": {
                        to = ...,
                        maxPathLen = 0
                    }
                }
            }
        }
    }
}

Pros:

  • Easy to implement.

Cons:

  • More tree depth, in cases where there are gradual length differences (staircase).
  • Slightly more CPU usage, from doing deep nested map lookups. (Due to increase in tree depth)
  • More overall memory usage due to increased tree depth.

5. Sorted vector and binary search

Inserts each strict rule into an unordered map with path length as key, and has an unordered map within each one with the path as key, and its to pointer as value.
Inserts each wildcard rule into a vector in a sorted manner, and during runtime it performs binary search for length, and checks those below the request path length to see which one matches.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/maps
10.10.10.1/usr/include
10.10.10.1/usr/include/*
10.10.10.1/usr/bin/*

rulesFrom = {
    "10.10.10.1": {
        8: {
            "/usr/lib": ...
        },
        10: {
            "/user/apps": ...,
            "/user/maps": ...
        },
        11: {
            "/user/files": ...
        },
        12: {
            "/usr/include": ...
        }
    ]
}
rulesFromWildcard = {
    "10.10.10.1": [
        {
            "/usr/bin", (8)
            wildcard = ...
        },
        {
            "/usr/include", (12)
            wildcard = ...
        }
    ]
}

Pros:

  • Easiest to implement.

Cons:

  • More memory usage for data, it stores multiple duplicate prefixes.
  • Two roundtrips in the worst case scenario, as it first looks in the strict rules map, if it didn't find the path, then it looks within the wildcard map. If the request path is longer than every wildcard rule's path, it ends up going through all wildcard rules to check against the request path.

Uncertainties:

  • CPU usage can't be determined, the cost is the same as performing binary search on a sorted string container. Can't be compared to previous implementations, since they use unordered maps primarily.

I'm thinking of method 1, 4, or 5. Will update on which one I decide to implement.

Update: Decided on method 3, it seems the most efficient in terms of CPU usage, memory, and runtime speed.

@Mis1eader-dev
Copy link
Member Author

The implementation of the third method is complete. See the standalone source code and let me know if there is a need for any changes before implementing it within this drogon plugin.

@Mis1eader-dev Mis1eader-dev marked this pull request as ready for review October 9, 2023 06:33
@Mis1eader-dev
Copy link
Member Author

TODO: Add support for dynamic addition and removal of rules.
This final feature will allow this drogon plugin to create applications like Cloudflare Pages.

We can start discussing where this plugin ends up in the drogon ecosystem.

@Mis1eader-dev
Copy link
Member Author

Currently the plugin has support for normal redirects and mandated splat, if we go off based on Cloudflare Pages redirects
https://developers.cloudflare.com/pages/platform/redirects/#advanced-redirects

I will add another TODO: Allow splatting to be optional, by having an asterisk added to the To rules, meaning:
/home/* -> /, will discard everything after /home and redirect to / always
/home/* -> /*, will flatten everything after /home and redirect to /{flattened}

I would argue this can replace drogon's controller routers if it uses for loops or regexes to check for matches one by one, however, we have to add support for additional things before doing so, as regex controllers must be kept, and normal controllers can rely on this trie implementation to quickly route requests, and potentially redirect if its configuration becomes part of the drogon's "app" config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants