Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gridding of XYC categorical data #261

Open
mycarta opened this issue May 22, 2020 · 2 comments
Open

Gridding of XYC categorical data #261

mycarta opened this issue May 22, 2020 · 2 comments
Labels
enhancement Idea or request for a new feature question Further information is requested

Comments

@mycarta
Copy link

mycarta commented May 22, 2020

Description of the desired feature

Being able to grid XYC categorical data (where C is the categorical feature) would be very useful.
I can think of at least two use cases:

  1. LABELED DATA
    1.1. Gridding of categorical geological data: for example a well performance classification that is not a production number, (high, medium, low), or a fracture intensity or other rock quality classification. Cross-validation would be important; in the case of well performance it may be nice to have the option for block cross-validation since wells often are drilled in clusters with relatively uniform reservoir (intra-area), but not necessarily homogeneous among clusters (inter-area).
    1.2. gridding of geological facies. This is often done in the context of 3D geocellular modelling, but having a 2D implementation in Python would be great, with both options for cross-validation and using weights (if facies probabilities are available).

  2. UNLABELED data
    I am thinking here numerical categories such as output from clustering done with Gaussian Mixture Model. Data would be in XYCP format, where P is the probability output, and it would be great to be able to grid it using the probability as a weight. In this case cross-validation would not be possible because there is no label to us as ground truth.

Are you willing to help implement and maintain this feature? Yes/No

No. In the sense that I would not be available for coding; but I would definitely be interested and available as a tester.

@welcome
Copy link

welcome bot commented May 22, 2020

👋 Thanks for opening your first issue here! Please make sure you filled out the template with as much detail as possible.

You might also want to take a look at our Contributing Guide and Code of Conduct.

@leouieda
Copy link
Member

leouieda commented Jun 4, 2020

@mycarta that's an interesting use case. This might be a bit challenging because we're then getting into spatial prediction of things that aren't well represented by a surface under a load. So it's likely that the best predictors wouldn't be the coordinates of the points. Instead, you'd likely want to use other features. This is related to #188 by @fmaussion. I understand the use case better now and might be able to form ideas on a possible implementation.

So what we would need is a way to wrap a scikit-learn estimator into a Verde gridder. This shouldn't be too hard. The assumption would be that the feature matrix is a column stack of the given "coordinates". See #268. I think that could be a general solution for this.

Having the estimator wrapped by a gridder would allow use of any of our cross-validation tools.

@leouieda leouieda added the enhancement Idea or request for a new feature label Jun 4, 2020
@leouieda leouieda added the question Further information is requested label Oct 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants