Host Redirector plugin #1796

Mis1eader-dev · 2023-09-25T08:34:04Z

Needs path redirection

I'm aware the pull request includes a small edit to SlashRemover, I believe it doesn't need its own PR for a small change.

Mis1eader-dev · 2023-09-26T06:09:36Z

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

PreRedirectHandler
PathRedirectHandler
PostRedirectHandler
PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

Mis1eader-dev · 2023-09-26T08:16:54Z

I noticed the formatter doesn't move & ampersands next to the identifiers.

const string& host = ...;

Will not get formatted to:

const string &host = ...;

Is it perhaps a bug, or did formatting rules get changed to allow both styles?

an-tao · 2023-09-26T08:51:13Z

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

PreRedirectHandler

PathRedirectHandler

PostRedirectHandler

PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

This seems too complex for users.

an-tao · 2023-09-26T09:02:24Z

I noticed the formatter doesn't move & ampersands next to the identifiers.
const string& host = ...;
Will not get formatted to:
const string &host = ...;
Is it perhaps a bug, or did formatting rules get changed to allow both styles?

Both styles seem to be present in the current code, you can try the configuration of clang-format and choose the second style.

Mis1eader-dev · 2023-09-26T09:12:56Z

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

PreRedirectHandler

PathRedirectHandler

PostRedirectHandler

PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

This seems too complex for users.

Yeah, and we don't have another choice. The only way to not introduce another group is to integrate this plugin into Redirector itself.

I have added the 3rd group and renamed the handlers for better clarity, should I revert it?

We can think of it the same as AOP callbacks, we can say they also are complex in a way without reading the documentation and seeing the picture visualization.

rbugajewski · 2023-09-27T07:39:58Z

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

PreRedirectHandler

PathRedirectHandler

PostRedirectHandler

PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

This seems too complex for users.

I also think that it is kind of over-engineered for most use cases and makes the API harder to understand for new users. I’m in favor of keeping things simple.

Mis1eader-dev · 2023-10-01T07:59:38Z

This is a rough idea on implementing it using multiple nested unordered maps, it is complex ~~and I don't plan on implementing it that way~~, it generates a map with common characters split into nested nodes. ~~In case anybody is interested in changing the implementation in the future~~. The data structure for this is called a Trie.

It will be easy to use:

// Some nodes need to store information like how many folders are common within that node,
// for example "/common/places" has a common depth of 2, once the path substring lookup
// gets to "/common/places/open", it will substring with a depth of 2 and looks up
// ["/common/places"] within the unordered map of that node, then substring further up until "/open".
const auto &ruleTo = rulesFrom["www.example.com"]["/maps"]["/common/places"]["/open"];

~~A vector of strings may be more desirable and easier to implement for this plugin. I will go for this route.~~

Regarding introducing a new group for Redirector, we can discuss it once this plugin's implementation is finished.

Edit: Using vectors is a hassle, there still is a need to know common points within the rules, using unordered maps is the way to go.

Mis1eader-dev · 2023-10-02T08:51:05Z

This is still a draft

The shown diagram redirects in the following fashion:
10.10.10.1 -> files.example.com
10.10.10.1/any -> www.example.com/any

10.10.10.1/files -> files.example.com (Doesn't have a strict match rule, so it redirects even if there is no subpath)
10.10.10.1/files/video -> files.example.com/video
10.10.10.1/files/video/asia.mp4 -> files.example.com/video/asia.mp4

10.10.10.1/files/map -> www.example.com/maps
10.10.10.1/files/map/asia -> www.example.com/maps/asia
10.10.10.1/files/maps -> www.example.com/maps
10.10.10.1/files/maps/asia -> Does not redirect, the rule is strict, not wildcard

10.10.10.1/files/docs -> files.example.com/docs
10.10.10.1/files/docs/drogon.docx -> files.example.com/docs/drogon.docx

10.10.10.1/files/img -> files.example.com/imgs
10.10.10.1/files/img/asia.png -> Does not redirect, the rule is strict, not wildcard
10.10.10.1/files/imgs -> files.example.com/imgs
10.10.10.1/files/imgs/asia.png -> Does not redirect, the rule is strict, not wildcard

The code of said draft

#include <cstddef>
#include <cstdint>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>

using std::cout;
using std::string;

struct RedirectTo
{
	string host, path;
};

struct RedirectPath
{
	std::unordered_map<string, RedirectPath*> paths;
	size_t minPathLen = SIZE_MAX;
	size_t minPathDepth = SIZE_MAX;
	RedirectTo* wildcard;
	RedirectTo* to;
};

static void recurse(const RedirectPath* parent, size_t tabs = 1)
{
	if(!parent)
		return;

	const size_t moreTab = tabs + 1;
	for(const auto& [childPathStr, childPathObj] : parent->paths)
	{
		bool newline = false;
		if(childPathObj->to)
		{
			for(size_t t = tabs; t; --t)
				cout << '\t';

			cout << childPathStr;

			cout << " -> " << childPathObj->to->host << childPathObj->to->path;

			newline = true;
		}

		if(childPathObj->wildcard)
		{
			if(newline)
				cout << '\n';

			for(size_t t = tabs; t; --t)
				cout << '\t';

			cout << childPathStr;

			cout << " (*)";
			cout << " -> " << childPathObj->wildcard->host << childPathObj->wildcard->path << "/*";

			newline = true;
		}

		if(newline)
			cout << '\n';
		else
		{
			for(size_t t = tabs; t; --t)
				cout << '\t';
			cout << childPathStr << '\n';
		}

		recurse(childPathObj, moreTab);
	}
}

int main()
{
	std::unordered_map<string, RedirectPath> rulesFrom;
	std::vector<RedirectTo> rulesTo;
	rulesTo.push_back({"www.example.com"});
	rulesTo.push_back({"files.example.com"});
	rulesTo.push_back({"files.example.com", "/imgs"});
	rulesTo.push_back({"files.example.com", "/docs"});
	rulesTo.push_back({"www.example.com", "/maps"});

	{
		auto rp0 = rulesFrom["10.10.10.1"].paths["/"] = new RedirectPath;
		rp0->wildcard = &rulesTo[0];
		rp0->to = &rulesTo[1];
	}

	{
		auto rp0 = rulesFrom["10.10.10.1"].paths["/files"] = new RedirectPath;
		rp0->wildcard = &rulesTo[1];
		{
			{
				auto rp1 = rp0->paths["/img"] = new RedirectPath;
				rp1->wildcard = &rulesTo[2];
				{
					auto rp2 = rp1->paths[""] = new RedirectPath;
					rp2->to = &rulesTo[2];
				}
				{
					auto rp2 = rp1->paths["s"] = new RedirectPath;
					rp2->to = &rulesTo[2];
				}
			}
			{
				auto rp1 = rp0->paths["/map"] = new RedirectPath;
				{
					auto rp2 = rp1->paths[""] = new RedirectPath;
					rp2->wildcard = &rulesTo[4];
				}
				{
					auto rp2 = rp1->paths["s"] = new RedirectPath;
					rp2->to = &rulesTo[4];
				}
			}
		}
		{
			auto rp1 = rp0->paths["/docs"] = new RedirectPath;
			rp1->wildcard = &rulesTo[3];
		}
	}

	for(const auto& [host, redirects] : rulesFrom)
	{
		cout << host << ":\n";
		recurse(&redirects);
	}
}

The final implementation should do some checks to determine common points and make the data structure compact, and cache some information for quick execution during runtime after the initial mapping from startup.

If anybody has a better idea, feel free to share.

Edit: Change of algorithm, the maps can be populated using inequality boundaries and length exhaustion.

// 10.10.10.1/user/file
// 10.10.10.1/user/files/*
// 10.10.10.1/user/files/audio
// 10.10.10.1/user/files/video/*
//
// 10.10.10.1/global/*
//
// 10.10.10.1/user/crocodile

rulesFrom = {
	"10.10.10.1": {
		maxPathLen = 6,
		"/user/": {
			maxPathLen = 4,
			"file": {
				to = ...,
				maxPathLen = 1,
				"s": {
					wildcard = ...,
					maxPathLen = 6,
					"/audio": {
						to = ...,
						maxPathLen = 0
					},
					"/video": {
						wildcard = ...,
						maxPathLen = 0
					}
				}
			},
			"croc": {
				maxPathLen = 5,
				"odile": {
					to = ...,
					maxPathLen = 0
				}
			}
		},
		"/globa": {
			maxPathLen = 1,
			"l": {
				wildcard = ...,
				maxPathLen = 0
			}
		}
	}
}

Mis1eader-dev · 2023-10-03T08:03:04Z

Through experimenting, I came up with ~~two~~ five methods of populating the nodes

1. Nearest slash separator

This splits out nodes based on the closest slash in a specific window.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 5,
        "/user": {
            maxPathLen = 5,
            "/file": {
                maxPathLen = 1,
                "s": {
                    to = ...,
                    maxPathLen = 0
                }
            },
            "/apps": {
                to = ...,
                maxPathLen = 0
            }
        },
        "/usr/": {
            maxPathLen = 3,
            "lib": {
                to = ...,
                maxPathLen = 0
            },
            "inc": {
                maxPathLen = 4,
                "lude": {
                    to = ...,
                    maxPathLen = 0
                }
            }
        }
    }
}

Pros:

Less tree depth, saves some memory.
Less CPU usage, from doing less map lookups and hash calculations.
Less overall memory usage due to lesser tree depths, meaning less maps.

Cons:

Slightly more memory usage for data, as it may store duplicate prefixes.
Complex to implement.

2. Common letter separator

Splits out nodes based on intersecting letters in a specific window.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 2,
        "/us": {
            maxPathLen = 2,
            "er": {
                maxPathLen = 1,
                "/": {
                    maxPathLen = 4,
                    "file": {
                        maxPathLen = 1,
                        "s": {
                            to = ...,
                            maxPathLen = 0
                        }
                    },
                    "apps": {
                        to = ...,
                        maxPathLen = 0
                    }
                }
            },
            "r/": {
                maxPathLen = 3,
                "lib": {
                    to = ...,
                    maxPathLen = 0
                },
                "inc": {
                    maxPathLen = 4,
                    "lude": {
                        to = ...,
                        maxPathLen = 0
                    }
                }
            }
        }
    }
}

Pros:

Slightly less memory usage for data, it doesn't store any duplicates.

Cons:

More tree depth, in cases where there are too many similar looking paths.
More CPU usage, from doing deep nested map lookups. (Due to increase in tree depth)
More overall memory usage due to increased tree depth.
Complex to implement.

~~I choose the first implementation. I listed both in case others favor the 2nd implementation.~~

Edit:

3. Local length boundary separator

Splits out nodes on local length boundaries.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 8,
        "/user/fi": {
            maxPathLen = 3,
            "les": {
                to = ...,
                maxPathLen = 0
            }
        },
        "/user/ap": {
            maxPathLen = 2,
            "ps": {
                to = ...,
                maxPathLen = 0
            }
        }
        "/usr/lib": {
            to = ...,
            maxPathLen = 0
        },
        "/usr/inc": {
            maxPathLen = 4,
            "lude": {
                to = ...,
                maxPathLen = 0
            }
        }
    }
}

Pros:

Lesser tree depth than 1st method.
Lesser CPU usage, from doing less map lookups and hash calculations.
Lesser overall memory usage due to lesser tree depths, meaning less maps.

Cons:

More memory usage for data, it can store multiple duplicate prefixes.
Less complex to implement. However, still requires recursive population.

4. Global length boundary separator

Splits out nodes on length boundaries across all rules.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 8,
        "/user/fi": {
            maxPathLen = 2,
            "le": {
                maxPathLen = 1,
                "s": {
                    to = ...,
                    maxPathLen = 0
                }
            }
        },
        "/user/ap": {
            maxPathLen = 2,
            "ps": {
                to = ...,
                maxPathLen = 0
            }
        }
        "/usr/lib": {
            to = ...,
            maxPathLen = 0
        },
        "/usr/inc": {
            maxPathLen = 2,
            "lu": {
                maxPathLen = 1,
                "d": {
                    maxPathLen = 1,
                    "e": {
                        to = ...,
                        maxPathLen = 0
                    }
                }
            }
        }
    }
}

Pros:

Easy to implement.

Cons:

More tree depth, in cases where there are gradual length differences (staircase).
Slightly more CPU usage, from doing deep nested map lookups. (Due to increase in tree depth)
More overall memory usage due to increased tree depth.

5. Sorted vector and binary search

Inserts each strict rule into an unordered map with path length as key, and has an unordered map within each one with the path as key, and its to pointer as value.
Inserts each wildcard rule into a vector in a sorted manner, and during runtime it performs binary search for length, and checks those below the request path length to see which one matches.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/maps
10.10.10.1/usr/include
10.10.10.1/usr/include/*
10.10.10.1/usr/bin/*

rulesFrom = {
    "10.10.10.1": {
        8: {
            "/usr/lib": ...
        },
        10: {
            "/user/apps": ...,
            "/user/maps": ...
        },
        11: {
            "/user/files": ...
        },
        12: {
            "/usr/include": ...
        }
    ]
}
rulesFromWildcard = {
    "10.10.10.1": [
        {
            "/usr/bin", (8)
            wildcard = ...
        },
        {
            "/usr/include", (12)
            wildcard = ...
        }
    ]
}

Pros:

Easiest to implement.

Cons:

More memory usage for data, it stores multiple duplicate prefixes.
Two roundtrips in the worst case scenario, as it first looks in the strict rules map, if it didn't find the path, then it looks within the wildcard map. If the request path is longer than every wildcard rule's path, it ends up going through all wildcard rules to check against the request path.

Uncertainties:

CPU usage can't be determined, the cost is the same as performing binary search on a sorted string container. Can't be compared to previous implementations, since they use unordered maps primarily.

I'm thinking of method 1, 4, or 5. Will update on which one I decide to implement.

Update: Decided on method 3, it seems the most efficient in terms of CPU usage, memory, and runtime speed.

Mis1eader-dev · 2023-10-07T13:34:39Z

The implementation of the third method is complete. See the standalone source code and let me know if there is a need for any changes before implementing it within this drogon plugin.

Mis1eader-dev · 2023-11-27T08:56:55Z

TODO: Add support for dynamic addition and removal of rules.
This final feature will allow this drogon plugin to create applications like Cloudflare Pages.

We can start discussing where this plugin ends up in the drogon ecosystem.

Mis1eader-dev · 2023-12-10T07:01:11Z

Currently the plugin has support for normal redirects and mandated splat, if we go off based on Cloudflare Pages redirects
https://developers.cloudflare.com/pages/platform/redirects/#advanced-redirects

I will add another TODO: Allow splatting to be optional, by having an asterisk added to the To rules, meaning:
/home/* -> /, will discard everything after /home and redirect to / always
/home/* -> /*, will flatten everything after /home and redirect to /{flattened}

I would argue this can replace drogon's controller routers if it uses for loops or regexes to check for matches one by one, however, we have to add support for additional things before doing so, as regex controllers must be kept, and normal controllers can rely on this trie implementation to quickly route requests, and potentially redirect if its configuration becomes part of the drogon's "app" config.

Mis1eader-dev force-pushed the host_redirector_plugin branch from 24ba2ac to 5702c1c Compare September 26, 2023 08:01

Mis1eader-dev marked this pull request as ready for review September 27, 2023 08:26

Mis1eader-dev marked this pull request as draft September 27, 2023 08:40

Mis1eader-dev force-pushed the host_redirector_plugin branch from f2743a3 to 46c5c73 Compare October 9, 2023 06:32

Mis1eader-dev marked this pull request as ready for review October 9, 2023 06:33

Mis1eader-dev force-pushed the host_redirector_plugin branch from 89f73f0 to e5c2c40 Compare October 12, 2023 12:58

Mis1eader-dev force-pushed the host_redirector_plugin branch from 40b9104 to 078a395 Compare February 11, 2024 06:38

Mis1eader-dev force-pushed the host_redirector_plugin branch from 078a395 to f8af65e Compare April 5, 2024 14:29

Mis1eader-dev added 10 commits May 8, 2024 13:10

Rebase

4e3d4df

Fix cmake indentation

42adf2f

Add path redirection, rename handlers

49e6a98

Use path variable

7a662e9

Implementation complete

3080807

Fix config parsing behavior

dc6150a

format

e1af0b9

Free manually allocated memory in shutdown

fbe8013

Fix Windows build

b2608b3

Update

0a420f8

Mis1eader-dev added 9 commits May 8, 2024 13:10

Add parse checks, and allow for extension in the future

e4c7568

Move redirectFromValue above std::move

4aa8613

Memory optimization of lookup function

39ea1cf

Optimization and safety

3e4a2ee

Format

bb18421

Fix potential error and rebase

c86962d

Rebase and optimize

59f847e

Further optimize

e10fb67

Small micro-optimizations and rebase

886f40c

Mis1eader-dev force-pushed the host_redirector_plugin branch from 65b3d69 to 886f40c Compare May 8, 2024 10:19

Mis1eader-dev added 3 commits May 8, 2024 13:32

Revert two emplace_backs

8bc9130

Optimize a vector away and unnecessary strings

c7b0450

Remove unused assert header

3567949

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Host Redirector plugin #1796

Host Redirector plugin #1796

Mis1eader-dev commented Sep 25, 2023

Mis1eader-dev commented Sep 26, 2023 •

edited

Mis1eader-dev commented Sep 26, 2023

an-tao commented Sep 26, 2023

an-tao commented Sep 26, 2023

Mis1eader-dev commented Sep 26, 2023

rbugajewski commented Sep 27, 2023

Mis1eader-dev commented Oct 1, 2023 •

edited

Mis1eader-dev commented Oct 2, 2023 •

edited

Mis1eader-dev commented Oct 3, 2023 •

edited

Mis1eader-dev commented Oct 7, 2023

Mis1eader-dev commented Nov 27, 2023

Mis1eader-dev commented Dec 10, 2023

Host Redirector plugin #1796

Are you sure you want to change the base?

Host Redirector plugin #1796

Conversation

Mis1eader-dev commented Sep 25, 2023

Mis1eader-dev commented Sep 26, 2023 • edited

Mis1eader-dev commented Sep 26, 2023

an-tao commented Sep 26, 2023

an-tao commented Sep 26, 2023

Mis1eader-dev commented Sep 26, 2023

rbugajewski commented Sep 27, 2023

Mis1eader-dev commented Oct 1, 2023 • edited

Mis1eader-dev commented Oct 2, 2023 • edited

Mis1eader-dev commented Oct 3, 2023 • edited

1. Nearest slash separator

Example

Pros:

Cons:

2. Common letter separator

Example

Pros:

Cons:

3. Local length boundary separator

Example

Pros:

Cons:

4. Global length boundary separator

Example

Pros:

Cons:

5. Sorted vector and binary search

Example

Pros:

Cons:

Uncertainties:

Mis1eader-dev commented Oct 7, 2023

Mis1eader-dev commented Nov 27, 2023

Mis1eader-dev commented Dec 10, 2023

Mis1eader-dev commented Sep 26, 2023 •

edited

Mis1eader-dev commented Oct 1, 2023 •

edited

Mis1eader-dev commented Oct 2, 2023 •

edited

Mis1eader-dev commented Oct 3, 2023 •

edited