Gridding of XYC categorical data #261

mycarta · 2020-05-22T18:01:38Z

Description of the desired feature

Being able to grid XYC categorical data (where C is the categorical feature) would be very useful.
I can think of at least two use cases:

LABELED DATA
1.1. Gridding of categorical geological data: for example a well performance classification that is not a production number, (high, medium, low), or a fracture intensity or other rock quality classification. Cross-validation would be important; in the case of well performance it may be nice to have the option for block cross-validation since wells often are drilled in clusters with relatively uniform reservoir (intra-area), but not necessarily homogeneous among clusters (inter-area).
1.2. gridding of geological facies. This is often done in the context of 3D geocellular modelling, but having a 2D implementation in Python would be great, with both options for cross-validation and using weights (if facies probabilities are available).
UNLABELED data
I am thinking here numerical categories such as output from clustering done with Gaussian Mixture Model. Data would be in XYCP format, where P is the probability output, and it would be great to be able to grid it using the probability as a weight. In this case cross-validation would not be possible because there is no label to us as ground truth.

Are you willing to help implement and maintain this feature? Yes/No

No. In the sense that I would not be available for coding; but I would definitely be interested and available as a tester.

welcome · 2020-05-22T18:01:39Z

👋 Thanks for opening your first issue here! Please make sure you filled out the template with as much detail as possible.

You might also want to take a look at our Contributing Guide and Code of Conduct.

leouieda · 2020-06-04T15:52:05Z

@mycarta that's an interesting use case. This might be a bit challenging because we're then getting into spatial prediction of things that aren't well represented by a surface under a load. So it's likely that the best predictors wouldn't be the coordinates of the points. Instead, you'd likely want to use other features. This is related to #188 by @fmaussion. I understand the use case better now and might be able to form ideas on a possible implementation.

So what we would need is a way to wrap a scikit-learn estimator into a Verde gridder. This shouldn't be too hard. The assumption would be that the feature matrix is a column stack of the given "coordinates". See #268. I think that could be a general solution for this.

Having the estimator wrapped by a gridder would allow use of any of our cross-validation tools.

leouieda mentioned this issue Jun 4, 2020

Class to wrap a scikit-learn estimator in a Verde gridder #268

Open

leouieda added the enhancement Idea or request for a new feature label Jun 4, 2020

leouieda mentioned this issue Jun 4, 2020

Deal with class imbalance in blocked cross-validation #262

Open

leouieda added the question Further information is requested label Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gridding of XYC categorical data #261

Gridding of XYC categorical data #261

mycarta commented May 22, 2020 •

edited

welcome bot commented May 22, 2020

leouieda commented Jun 4, 2020

Gridding of XYC categorical data #261

Gridding of XYC categorical data #261

Comments

mycarta commented May 22, 2020 • edited

welcome bot commented May 22, 2020

leouieda commented Jun 4, 2020

mycarta commented May 22, 2020 •

edited