Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Three dimesional modeling (from Brice) #46

Open
rolkra opened this issue Mar 8, 2024 · 0 comments
Open

Three dimesional modeling (from Brice) #46

rolkra opened this issue Mar 8, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@rolkra
Copy link
Owner

rolkra commented Mar 8, 2024

I propose that a similar process model be included in your explore package related to three-dimensional modeling. So, the way this would work is very similar to how you essentially captured the essence of an entire R package (xgboost) with one function!

There is an R package called svgViewR that essentially allows a user to model multivariate data in three dimensions using html-based interactions. Using a concept called MDS or multidimensional scaling, a single value is generated after which it is colorized by gradient, then plotted.

While this algorithm is remarkable, it would be even more compelling if it were captured in a single model and resulting plot using ONE function. The algorithm in toto can be found on page 15 of the current version of svgViewR, an R package currently available on CRAN.

To facilitate this effort, I worked with a colleague of mine to replicate what is referred to as the pair distance, called pdist in the svgViewR documentation, converting it to a separate function. This function, called pairDist, can be found in the quickcode package, also stored on CRAN. Since including this function would create a dependency in your code, it's up to you whether you want to use this function or use the pair distance mathematics provided in svgViewR.

I am envisioning that the output be represented as a list object which contains the following artifacts:
A data frame containing the original variables used in the analysis along which is appended the following additional variables:
pdist
colHex
color
cluster
If possible, the html file as a ggplot object which may or may not be possible to include

To facilitate these extra proposed (4) data elements added to the output, the following pseudo-code is provided:
library(DescTools)
library(xlsx)

points3d = as.data.frame(points3d)
points3d$pdist = pdist
points3d$colHex = col
points3d$col = HexToCol(points3d$colHex)
points3d$col = as.factor(points3d$col)
points3d$cluster = unclass(points3d$col)

Optionally, you could convert the 'points3d' data object to an Excel file, or alternatively, include an argument that would control for this:
write.xlsx(x = points3d, file = "filepath/points3d.xlsx", row.names = FALSE, sheetName = "points3D")

To facilitate complete transparency on an understanding of what is being proposed, the following explanation breaks down the code on page 15 into bullet points as I have already thoroughly studied the svgViewR package and its associated code. These are my remarks:

  1. Library Inclusion:

    • The code includes the svgViewR library, which is likely used for creating interactive 3D scatter plots in SVG (Scalable Vector Graphics) format.
  2. Data Generation:

    • Generates a matrix points3d with 300 rows and 3 columns.
    • Each column is populated with random numbers generated from normal distributions with different standard deviations (3, 2, and 1).
  3. SVG Initialization:

    • Opens a new SVG file named 'plot_static_points.html' for writing.
  4. Distance Calculation:

    • Computes the Euclidean distance from each point in points3d to the mean point of all points.
    • The distances are stored in the variable pdist.
  5. Color Mapping:

    • Defines a color gradient from red to blue using colorRampPalette.
    • col_grad holds the gradient with 50 colors.
  6. Color Assignment:

    • Calculates colors for each point based on their distance using linear interpolation.
    • The colors are assigned to the variable col.
  7. SVG Plotting:

    • Plots the 3D points in the SVG file using svg.points.
    • The color of each point is determined by the previously calculated col.
  8. SVG Frame Initialization:

    • Initializes an SVG frame for the 3D points using svg.frame.
  9. SVG File Closing:

    • Closes the SVG file with svg.close().

In summary, this code generates a 3D scatter plot with 300 points, each having random coordinates. The color of each point is determined by its distance from the mean point, and the plot is saved in an SVG file named 'plot_static_points.html'. The use of the svgViewR library suggests that the resulting SVG file can be interactive, allowing users to manipulate and explore the 3D plot.

There is one more remarkable aspect to this function if you were to accept this idea - the html generated is a single self-contained file. This makes it incredibly easy to distribute!

I know there is a lot here but Roland I believe that creating a single function that can be used to model any number of numeric variables three-dimensionally would be worth the effort to create and add to the explore package. This function would do for three-dimensional modeling what your explain_xgboost function did for feature engineering.

I can answer any questions you may have regarding this proposal.

Warmest regards,

Brice

@rolkra rolkra added the enhancement New feature or request label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant