Add of Online Hierarchical Clustering #1218

kchardon · 2023-04-11T09:41:16Z

No description provided.

MaxHalford · 2023-04-30T16:31:29Z

Hey there! I hope it's ok for me to answer only by now.

Am I correct in assuming the algorithm stores all the data points it sees in memory (i.e. the X attribute)?

kchardon · 2023-05-22T15:11:54Z

Hey there! I hope it's ok for me to answer only by now.

Am I correct in assuming the algorithm stores all the data points it sees in memory (i.e. the X attribute)?

Hello, sorry for replying that late.
I use the window_size attribute and when there are more data than allowed, it deletes the oldest data point.
If window_size < 1, then it stores all the data points

MaxHalford · 2023-05-22T15:55:13Z

If window_size < 1, then it stores all the data points

Ok I see, fair. But I don't think we'll ever want that behavior. Could you remove it?

kchardon · 2023-05-22T19:53:58Z

If window_size < 1, then it stores all the data points

Ok I see, fair. But I don't think we'll ever want that behavior. Could you remove it?

Yes I can. So I add an error if the value of window_size is not an integer > 0 ?

MaxHalford · 2023-05-23T07:11:37Z

Nope, no need to check for an error. An exception will raise itself at some point. In general, we don't do input validation. Instead, we document well.

kchardon · 2023-05-23T19:59:04Z

Okay ! I will delete it

…computational speed.

…ing Tree." at the end of the tree.

hoanganhngo610 · 2023-09-13T18:53:11Z

@MaxHalford I think I'm quite happy with the current state of the code! Only one small concern is that the unit tests keep failing because there is an attribute error, saying that HierarchicalClustering does not have the attribute distance_func, although I believe atm there is no distance_func in my code anywhere anymore. Do you think of any potential reasons behind this?

… not used within the algorithm).

hoanganhngo610 · 2023-09-25T18:52:47Z

@MaxHalford @kchardon If possible, I would really hope that both of you would be able to take a final look at the current state of the implementation and see if there are any changes you want to make. In case there is no opposition from both of you, when applicable, I would want to merge this into the main branch of River.

MaxHalford · 2023-09-25T20:07:33Z

I will review next week :)

MaxHalford · 2023-10-03T10:42:45Z

I (finally) took a look at this. There's a lot of nits to fix. But the main issue I see is that the code relies on numpy. It converts input dictionaries to numpy arrays. Would it be possible to use dictionaries only instead? I don't see any good reason to rely on numpy. Indeed, it's not in the spirit of River to rely on numpy when it can be avoided.

add of Hierarchical Clustering

ccc4864

kchardon requested review from MaxHalford and smastelini as code owners April 11, 2023 09:41

kchardon added 6 commits April 11, 2023 12:10

Fixed issues

a61677b

Fix issues

c2e15dd

Fixed issues

3fcaca1

Fix issues

d69c8e4

fixed trailing spaces

c44b32e

Fixed black and isort

fd2d699

kchardon marked this pull request as draft April 11, 2023 11:55

kchardon added 2 commits April 11, 2023 14:05

reverting to the version passing the 'build river ubuntu'

6ccca75

fixed isort and black

2fd212d

kchardon marked this pull request as ready for review April 11, 2023 12:26

kchardon and others added 11 commits May 23, 2023 22:11

Deleted the possibility to use all the data points

914435d

Merge branch 'main' into main

978d11c

correction for ruff hook

ddefcba

correction for ruff hook

88ad11a

correction for ruff hook

09c7413

Modify Eucliean distance calculation using np.linalg.norm for better …

ead6603

…computational speed.

Refactor elements related to the distance function.

7b92fca

Remove data types of attributes

62c9419

�Refactor inter subtree similarity function.

43f3722

Refactor intra subtree similarity function

d03de5c

Refactor leave finding function.

41b368c

hoanganhngo610 requested review from hoanganhngo610 and Dennis1989 as code owners September 11, 2023 10:13

hoanganhngo610 added 23 commits September 11, 2023 17:58

Remove unnecessary comments.

4d3432f

Refactor description of the algorithm within Docstring.

90a7858

Refactor tests in Docstring.

d00b186

Refactor merge_nodes function.

d99ffbb

Refactor comments in merge_nodes

0cfe7c9

Rename predict_otd.

d0f9f49

Simplify comments.

bef5f29

Modify __str__ printed output by adding "Printed Hierarchical Cluster…

fd3f9b9

…ing Tree." at the end of the tree.

Rename predict_otd.

05f9cf4

Split comments and rename printTree to print_tree.

d8da76b

Modify self.X to self.x_clusters.

98eee88

Lexical changes.

d3d9398

Remove unnecessary comments.

f5cebe7

Refactor Docstring

9ef224d

Refactor comment.

aa597da

Make find_path() a static method.

66927cd

Refactor Docstring.

87cf7fd

Make print_tree static method.

d12f243

Refactor code to account for failing tests.

43884f0

Refactor distance function used in Hierarchical Clustering class.

2a80d3c

Delete euclidean_distance function (due to being unnecessary).

74055db

Code refactoring to align with other algorithms available in River.

c356828

Modify Docstring description for dist_func.

851b710

Delete least common ancestor finding function (since this function is…

14a09af

… not used within the algorithm).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add of Online Hierarchical Clustering #1218

Add of Online Hierarchical Clustering #1218

kchardon commented Apr 11, 2023

MaxHalford commented Apr 30, 2023

kchardon commented May 22, 2023 •

edited

MaxHalford commented May 22, 2023

kchardon commented May 22, 2023

MaxHalford commented May 23, 2023

kchardon commented May 23, 2023

hoanganhngo610 commented Sep 13, 2023

hoanganhngo610 commented Sep 25, 2023

MaxHalford commented Sep 25, 2023

MaxHalford commented Oct 3, 2023

Add of Online Hierarchical Clustering #1218

Are you sure you want to change the base?

Add of Online Hierarchical Clustering #1218

Conversation

kchardon commented Apr 11, 2023

MaxHalford commented Apr 30, 2023

kchardon commented May 22, 2023 • edited

MaxHalford commented May 22, 2023

kchardon commented May 22, 2023

MaxHalford commented May 23, 2023

kchardon commented May 23, 2023

hoanganhngo610 commented Sep 13, 2023

hoanganhngo610 commented Sep 25, 2023

MaxHalford commented Sep 25, 2023

MaxHalford commented Oct 3, 2023

kchardon commented May 22, 2023 •

edited