Adds a semi-supervised (specifically a combination of supervised and weakly-supervised data) version of weak algorithms #268

RobinVogel · 2019-11-28T18:01:55Z

Closes #233

For now I only wrote what I believe to be expected for #233 for the RCA algorithm.
It is a simple modification of the supervised version of the RCA. The test is very basic as well.

It is just based on concatenating the weakly supervised information and the weakly supervised information of the transformed labeled data (strongly supervised information).

It is convenient but increases the volume of the code and documentation.
There is a random_state parameter passed to the fit function in RCA, it is marked
as deprecated and augments the volume of tests needed for the Semi Supervised algorithms.
I will check whether a random_state is present in other algorithms, to understand its relevance.

I will do the other algorithms and better tests if we agree on this structure.

…to issue_255

terrytangyuan · 2019-12-04T14:05:48Z

test/metric_learn_test.py

+ chunks = cons.chunks(num_chunks=20)
+ rca_semisupervised.fit(X[:n], y[:n],
+ X[n:], chunks)
+ rca_semisupervised.fit(X[:n], y[:n],


Probably add more tests around what rca_semisupervised looks like after fitting

bellet · 2019-12-05T09:58:52Z

Just a quick reminder: "solves" is not part of the keywords that GitHub recognizes to automatically close issues ;-)

bellet · 2019-12-05T10:13:56Z

I think this creates a major API problem due to the fact that fit takes as input 4 arguments X, y, X_u, chunks where X and y do not generally have the same number of rows as X_u and chunks. This likely breaks compatibility with model selection routines from sklearn.

Furthermore, this strong supervision + weak supervision is not a major use-case in practice. So indeed the overhead induced by introducing new classes, having to test and document them etc, is probably too large compared to the benefits.

I would favor a solution based on helper functions which combine pairs/quadruplets/chunks provided by the user with those generated from labeled data so that users can then easily fit RCA with the output of this helper function. So essentially something similar to what you wrote for RCA but without creating a new class. We can then add a short paragraph to mention the existence of such helper functions in the doc and we're good.

Note: as pointed out by @hansen7 on #233, semi-supervised is probably not the right term to describe this. This is more a combination of supervised and weakly supervised.

bellet · 2019-12-05T10:14:31Z

Of course I am happy to hear whether @terrytangyuan @perimosocordiae @wdevazelhes have a different opinion

terrytangyuan · 2019-12-05T14:12:45Z

I agree. In this case API compatibility is more important, especially now that we are in scikit-learn-contrib. We can start with the helper function and if it becomes popular to users we can then re-consider this.

RobinVogel added 9 commits November 14, 2019 17:58

maj

6b789d3

Merge branch 'master' of http://github.com/RobinVogel/metric-learn in…

7d52b1c

…to issue_255

added fit checks

275c69a

maj

1c28b56

Added checks that the function was fitted.

76ffccb

Wrote a semi-supervised-rca.

340ac69

added a very simple test

36694f6

typos

77fb53a

test cov correction

b3445c5

terrytangyuan reviewed Dec 4, 2019

View reviewed changes

RobinVogel changed the title ~~Adds a semi-supervised version of weak algorithms~~ Adds a semi-supervised (specifically a combination of supervised and weakly-supervised data) version of weak algorithms Dec 10, 2019

RobinVogel closed this by deleting the head repository Jun 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a semi-supervised (specifically a combination of supervised and weakly-supervised data) version of weak algorithms #268

Adds a semi-supervised (specifically a combination of supervised and weakly-supervised data) version of weak algorithms #268

RobinVogel commented Nov 28, 2019 •

edited

terrytangyuan Dec 4, 2019

bellet commented Dec 5, 2019

bellet commented Dec 5, 2019

bellet commented Dec 5, 2019

terrytangyuan commented Dec 5, 2019

Adds a semi-supervised (specifically a combination of supervised and weakly-supervised data) version of weak algorithms #268

Adds a semi-supervised (specifically a combination of supervised and weakly-supervised data) version of weak algorithms #268

Conversation

RobinVogel commented Nov 28, 2019 • edited

terrytangyuan Dec 4, 2019

Choose a reason for hiding this comment

bellet commented Dec 5, 2019

bellet commented Dec 5, 2019

bellet commented Dec 5, 2019

terrytangyuan commented Dec 5, 2019

RobinVogel commented Nov 28, 2019 •

edited