Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(geomean): add geomean function #6223

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

prabhathc
Copy link

@prabhathc prabhathc commented Jun 13, 2022

Add geomean function, closes #6152!

@prabhathc prabhathc changed the title feature(geomean): added geomean function + upkeep feature(geomean): added geomean function Jun 13, 2022
@prabhathc prabhathc changed the title feature(geomean): added geomean function feature(geomean): add geomean function Jun 13, 2022
@prabhathc prabhathc marked this pull request as draft June 13, 2022 00:49
@acxz acxz force-pushed the feature/geomean branch 2 times, most recently from e2c41a9 to cf2dc9a Compare June 13, 2022 01:09
@prabhathc prabhathc marked this pull request as ready for review June 13, 2022 01:13
src/plots/plots.js Outdated Show resolved Hide resolved
draftlogs/6223_add.md Outdated Show resolved Hide resolved
@nicolaskruchten
Copy link
Member

@alexcjohnson should this be supported in other places like histfunc etc?

@acxz
Copy link

acxz commented Jun 14, 2022

@nicolaskruchten is that referring to just geomean or the entire categoryorder attribute?

@nicolaskruchten
Copy link
Member

geomean: all the other aggregation functions you can sort by (sum, mean, median) are also supported in a few other places like histfunc, so it would be nice to keep that parallelism going if possible.

@acxz
Copy link

acxz commented Jun 14, 2022

I'd be down to tackle that in another PR if you can list where else to add the geomean statistic.

@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

Wondering geomean may not be the best naming as it could be point to our geo subplots.
How do you feel about geometric mean or g-mean alternatives?
Also there is harmonic mean which one may want to add. Right?
In that case something like h-mean works for me; but harmean does not :)

@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

On another note, we are wondering why the automatic CircleCI tests for this PR are not triggered?!
Do you use any special flags when pushing?

@acxz
Copy link

acxz commented Jun 14, 2022

yea, harmean doesn't sound so nice.
I don't particularly like g-mean either since it may not be obvious. I'd actually rather pref geometric_mean over g-mean.

edit: a quick goog/bing search shows the wikipedia article of Geometric mean with a search query of g-mean. I suppose, it is descriptive enough.

@acxz
Copy link

acxz commented Jun 14, 2022

On another note, we are wondering why the automatic CircleCI tests for this PR are not triggered?!

I was gonna ask you guys the same thing lol.
There were commits where the CI was triggered, but I believe this stopped when we made this PR a draft and then undid that action.

@acxz
Copy link

acxz commented Jun 14, 2022

I can try rebasing these commits on master and force pushing to see if that triggers the CI?

@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

I can try rebasing these commits on master and force pushing to see if that triggers the CI?

Force pushing is discouraged on open pull requests as it could confuse reviewers.
And your branch is only 11 commits behind upstream master. So that shouldn't be the reason.
Let's see next time you push a commit into this PR, that may trigger the tests to run on CircleCI.

@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

On another note, we are wondering why the automatic CircleCI tests for this PR are not triggered?!

I was gonna ask you guys the same thing lol. There were commits where the CI was triggered, but I believe this stopped when we made this PR a draft and then undid that action.

@antoinerg any idea how we could trigger CI runs on this pull?

@alexcjohnson
Copy link
Collaborator

Wondering geomean may not be the best naming as it could be point to our geo subplots.

I see your point, though I think in context it's pretty clear. Looking around I see gmean used in scipy.stats and various npm packages, geomean used in MATLAB and Excel, and geometric_mean used in the Python stdlib statistics package. I'm comfortable leaving it as geomean but @archmoj if you feel strongly about it I'd be OK with either gmean or geometric mean (with a space, not a dash or underscore, so the full values would be geometric mean ascending or geometric mean descending)

Harmonic mean commonly refers to the inverse of the mean of inverses (see eg npm and python stdlib) so let's stay away from that. But given that we all agree harmean is unpleasant, so if we ever did add it we'd need to use hmean or harmonic mean, perhaps that argues against geomean for consistency? In that case I'm leaning toward geometric mean, clarity over terseness.

should this be supported in other places like histfunc etc?

Yes, that would be nice. I only see two more places we could add this, and one of them (the aggregate transform) is deprecated so I'm happy to ignore that. So histfunc (attribute values, bin functions, which get applied here) is really the only place this needs to be added. @nicolaskruchten am I missing anything?

Two frustrating things though:

(1) we weren't consistent about the naming - for histfunc and the aggregate transform we said avg rather than mean. I would suggest that we add mean as a synonym for avg and maybe eventually deprecate avg, then add geometric mean alongside this.

(2) The implementation for histfunc is different from that used in category ordering, in that it loops over the array only once, calculating all the quantities it needs for all the bins simultaneously (ie sums and counts), then in a separate step it loops over all the bins to complete the calculation (divide each sum by each count).

src/lib/stats.js Outdated Show resolved Hide resolved
@archmoj
Copy link
Contributor

archmoj commented Jun 14, 2022

I'd vote for geometric mean too, when there is an election here :)

@nicolaskruchten
Copy link
Member

Two frustrating things though

Ah yeah I remember these. Well, probably reason enough not to do extra work here then :)

@acxz
Copy link

acxz commented Jun 18, 2022

Let's see next time you push a commit into this PR, that may trigger the tests to run on CircleCI.

I have already tried that.

geometric mean

done.

@acxz
Copy link

acxz commented Jun 18, 2022

how we could trigger CI runs on this pull?

For right now, I'll just make a duplicate pull to see the tests and be able to iterate on them if they are incorrect.

duplicate pr does not start the workflow either

@archmoj
Copy link
Contributor

archmoj commented Jun 20, 2022

I fetched your branch and pushed it to plotly.js repository at the geomeanbranch.
See https://github.com/plotly/plotly.js/compare/geomean
The tests are now running at https://app.circleci.com/pipelines/github/plotly/plotly.js/6863/workflows/f4c7b53b-1569-4ba3-b492-e46418c1b39d.

@archmoj
Copy link
Contributor

archmoj commented Jun 20, 2022

@archmoj
Copy link
Contributor

archmoj commented Jul 6, 2022

With the changes npm run test-jasmine calcdata should pass.
Would you please have a look at the failing tests here?
https://app.circleci.com/pipelines/github/plotly/plotly.js/6866/workflows/7690f800-a50d-4321-97bb-1e19d7814ec8/jobs/156536/parallel-runs/6?filterBy=FAILED

Also if you fetch upstream/master and merge it into this branch the "export test" would be successful.

@acxz
Copy link

acxz commented Jul 6, 2022

Ah okay sweet.

@acxz
Copy link

acxz commented Jul 8, 2022

@archmoj I'm not sure how to fix the following error: https://app.circleci.com/pipelines/github/plotly/plotly.js/6866/workflows/7690f800-a50d-4321-97bb-1e19d7814ec8/jobs/156536/parallel-runs/6?filterBy=FAILED&invite=true#step-104-1026

9) takes the geometric mean of all values per category across traces of type ohlc
     calculated data and points category ordering by value
     Error: Expected $[1][1] = 2.82842712474619 to equal 2.8284271247461903.
    at <Jasmine>
    at /tmp/test/jasmine/tests/calcdata_test.js:1057:1 <- /tmp/8028cfc1362ed8eb2b52841f495d61eb.browserify.js:283624:41

10) takes the geometric mean of all values per category across traces of type candlestick
     calculated data and points category ordering by value
     Error: Expected $[1][1] = 2.82842712474619 to equal 2.8284271247461903.
    at <Jasmine>
    at /tmp/test/jasmine/tests/calcdata_test.js:1057:1 <- /tmp/8028cfc1362ed8eb2b52841f495d61eb.browserify.js:283624:41

Also if you could tell me how to run just a specific test locally that would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Add "geometric mean" to categorical ordering
5 participants