Add DodgedScatter glyph #11382

allefeld · 2021-03-26T17:26:21Z

allefeld
Mar 26, 2021

I propose to have a function, similar to to the jitter transform, which moves circle glyphs in a one-dimensional scatter plot so that they do not overlap.

Example:

It is similar to a jitter insofar as it is meant to avoid overlap, but not randomly, but by placing circles exactly such that they almost touch each other.

I created this example image via an implementation in Python. The function circlestack(plot, renderer) takes both the plot and the circle GlyphRenderer as arguments, and adjusts the offset in the ('level', offset) representation of categorical values in the data source.

I'm not sure it is 100% correct, but works decently well. However, it is more of a hack, and I think it would be great if something like this could be included in the JS side of Bokeh itself.

bryevdv · 2021-03-26T17:39:29Z

bryevdv
Mar 26, 2021
Maintainer

The main challenge with something like this is that it is necessarily glyph-specific and requires different work for different glyphs, or may not be applicable to many glyphs at all. This is unlike jitter which is completely generic and can be used to jitter any coordinate or number spec value at all without having to know anything about what it is jittering.

This specificity is not, in and of itself, "bad" . But it does make the task of figuring how to fit something in to the existing API somewhat harder. I.e. we can't just have a transform like Jitter because transforms can apply to any numeric property, and this new things definitely can not. It could only apply to coordinates circles and any other subset of glyphs that are taught how to behave this way. That restriction needs to be obvious and/or enforced in the API to keep users in the pit of success.

@allefeld Do you have any concrete thoughts on how this might be "spelled" in the API? It would be really helpful if you could provide proposed code samples that you would like to be able to write.

0 replies

bryevdv · 2021-03-26T17:43:33Z

bryevdv
Mar 26, 2021
Maintainer

I guess my offhand initial thought is that things like this might be better suited for higher level tools on top of Bokeh eg. Holoviews or Pandas-Bokeh or Chartify, because they operate at a better level of context to coordinate things like this.

cc @bokeh/dev

0 replies

allefeld · 2021-03-26T17:49:37Z

allefeld
Mar 26, 2021
Author

I first posted an incomplete version, sorry, see completed version now.

I agree, this only really makes sense for circle glyphs, and only for the combination of a quantitative axis (DataRange1D) and a categorical axis, or no other axis (a single 'category').

I'm only getting into the Bokeh API right now, so I don't have a deep understanding and can't really comment, especially not on the JS side.

How I could imagine it, on the Python side: It could be a method on GlyphRenderer, which throws an exception if the conditions are not met (see the first few lines in my Python version). Assuming the RangeEvent we discussed is implemented, this method could be used in a callback, so that the stacking is automatically updated if the ranges are changed, and therefore the relation between circle size and axis scales.

0 replies

p-himik · 2021-03-26T17:51:17Z

p-himik
Mar 26, 2021
Collaborator

I agree with Bryan here. The amount of such tweaks is practically infinite and often only a handful of people needs a particular one of them. Bokeh already has everything a user needs to implement it by themselves via a custom Transform subclass.

0 replies

bryevdv · 2021-03-26T18:03:43Z

bryevdv
Mar 26, 2021
Maintainer

Something that might be more reasonable at the Bokeh level: A Python-only function that takes the concrete coordinates for a bunch of circles, and returns "splutzed coordinates" [1]

def splutz(x, y, size):
    ''' Return a new set of non-colliding coordinates for markers, given an input set 
    of coordinates and size. 
    '''

x0 = [...] 
y0 = [...]

x, y = splutz(x0, y0, 10)

# use the splutzed coordinates with circle
p.circle(x=x, y=y, size=10)

I'd be OK with that. But it would have limitations to aware of, namely that his is a one-time, up-front operation. If it is necessary to accommodate dynamic, changing data (or sizes), then that would only be an option in Bokeh server apps were splutz could just be called again.

[1] need a name for this, obviously, but I am not sure what it would be

0 replies

p-himik · 2021-03-26T18:13:41Z

p-himik
Mar 26, 2021
Collaborator

FWIW, I don't think it's worth it to introduce new things that alienate Bokeh and BokehJS even more.
Plus, as soon as you create such a function and people start using it, you will undoubtedly see new feature requests targeted at Bokeh to somehow expand this API or invent a similar function or make it available on the JS side. All while they would have always been able to solve all such problems with a custom Transform and Expression sub-classes.

I think the effort would be better spent in creating a small and well documented library of such sub-classes that would also be used in the gallery, just to show what's possible. Another step would be to make the NPM dependency for such sub-classes optional when the JS code is written if a form that's usable as-is. Then it would be perfect, IMO.

0 replies

bryevdv · 2021-03-26T18:52:38Z

bryevdv
Mar 26, 2021
Maintainer

FWIW, I don't think it's worth it to introduce new things that alienate Bokeh and BokehJS even more.

I don't have especially strong feelings about this particular ask, but I will say in general that I think the best lens to view things through is that of "BokehJS is ground truth and everything else is merely a language binding". For any language (Python, R, Scala, whatever) there should be close parity at the models level. But I also think any language should absolutely build more idiomatic, convenience or sophisticated API on top of that as appropriate. We abandoned bokeh.charts because it was over-ambitious and we didn't have resources, not because it was a bad idea, per se.

I'm generally sympathetic to the idea of offering convenience free-functions for purely data preparation when the data preparation is 1000x simpler to do in Pandas, e.g. These are just tools in an adjacent toolbox, to reach for if they happen to be useful, isolated back-box functions that are simple to test (because they have no interaction with models or BokehJS). What I am 👎 is new models that don't / can't play nice with existing patterns, adds to the existing web of model testing, or model-manipulating API that is easy to accidentally or unintentionally misuse.

A good example of this is hexbin. I don't think HexTile would get much or any use except that we provide hexbin as a convenience. Similarly all the CDS data-prep conveniences around Pandas and Groupbys. Neither of those has direct BokehJS analogues but it is good that they exist.

0 replies

bryevdv · 2021-03-26T19:20:54Z

bryevdv
Mar 26, 2021
Maintainer

I should also say: I think the proposal to better demonstrate Transform and Expression is very worthy (but should go in a separate issue)

0 replies

jbednar · 2021-03-26T19:34:05Z

jbednar
Mar 26, 2021
Collaborator

Sounds useful! I don't have much opinion about whether this functionality belongs in BokehJS, Bokeh Python, or outside of it, but would like to challenge the idea that it's restricted to circle glyphs. I'd say that instead it's only likely to be useful for statially compact glyphs, i.e. shapes for which a circular bounding region is a good approximation to the shape itself. For circles that criterion is trivially true, but I think it's also true for all of the typical markers in Bokeh:

If I fit a bounding circle to each of these and ensure there is no overlap between those bounding circles, I think that's a useful transformation regardless of marker shape, at least for the typical shapes of markers as above. With that in mind, even for circle glyphs one might want the bounding (keep-away) region not necessarily to be identical to the circle radius itself, to enforce a small whitespace border even between circles, at which point there's even less reason to distinguish between circles and other marker shapes.

0 replies

bryevdv · 2021-03-26T19:39:46Z

bryevdv
Mar 26, 2021
Maintainer

@jbednar my example code above refers to markers in general :)

0 replies

allefeld · 2021-03-26T20:13:54Z

allefeld
Mar 26, 2021
Author

Thanks everybody for considering my proposal. I can see that you don't consider this a good fit for the core of Bokeh, and I understand.

@bryevdv: splutz

Maybe I misunderstand, but the problem with that one-time approach is that the scales are needed. I can always preprocess the data I pass to Bokeh, but in order to achieve the effect of not-overlapping-but-touching I need both the scale ranges and the size of the plot box, and they are not available before the plot has been displayed.

@jbednar: to enforce a small whitespace border even between circles

Yes, I have done that in my Python example implementation. It's also true that this can be extended to every marker that is roughly spherical, not just circles, but I think the effect is less convincing.

Regarding Transform sub-classes, that sounds to me like the right approach. But I have to say, it is very hard to figure out how to do things, because the API is extremely complex, and the reference documentation tends to be cryptic if one doesn't already know one's way around:

That leaves me completely clueless on how to implement and use a Transform subclass. A demonstration would definitely be helpful.

What would also be helpful would be more documentation of the architecture, how the different pieces fit together. https://docs.bokeh.org/en/latest/docs/reference/models.html goes into that direction, though it is very short. – But I realize that's a lot of work.

Easier to realize would be improved navigation. For example, if I look at the Reference page for renderers, https://docs.bokeh.org/en/latest/docs/reference/models/renderers.html, it is quite long, and there is no in-page navigation. If I scroll down, I get lost in a long list of methods and properties, many of which repeat. Every class has js_event_callbacks, js_property_callbacks, subscribed_events, etc., and the information that name does not come with uniqueness guarantees is repeated 12 times on the page, every time with a big colored box and the same example code. On the other hand, finding the actual classes is hard when scrolling though.

Some ideas to improve that:
– An in-page navigation panel on the right side, like it already exists in the User Guide.
– Colored boxes for the class signatures, not (or less prominently) for notes.
– Class details are collapsed by default and can be expanded if needed, like in the holoviews API documentation.

0 replies

bryevdv · 2021-03-27T23:01:10Z

bryevdv
Mar 27, 2021
Maintainer

On the other hand, finding the actual classes is hard when scrolling though.

@allefeld I tend to agree with you but please know that other users have complained loudly (vociferously, even) in the past that it's not possibly a real reference guide, that it is worthless in fact, unless it includes absolutely every minute detail up to any point of repetition. So we no longer "factor out" base class properties or methods. I guess there is no possible pleasing everyone. The next release uses an updated Sphinx theme that affords a better right hand page navigation menu, though the reference guide may need some active editing to make the best use of it.

js_event_callbacks, js_property_callbacks, subscribed_events

Just FYI these should really all be hidden/private but could not be made so, for uninteresting technical reasons.

cc @tcmetzger re: docs ideas (separate issues should be branched off as appropriate)

0 replies

bryevdv · 2021-03-27T23:19:20Z

bryevdv
Mar 27, 2021
Maintainer

@allefeld Information about creating custom extensions (subclasses) is in the "Extending Bokeh" chapter of a the user's guide:

https://docs.bokeh.org/en/latest/docs/user_guide/extensions.html

I have to be frank and state that creating custom extensions is an advanced topic to begin with, and I think even amount that, this would be on the more difficult end of the spectrum.

To be honest, I don't actually see an especially clean path to this with an expression or transform. The cleanest, most isolated way to implement this would probably be a new scatter marker type. In the past with ~20 individual marker models that would have been a nightmare. But now that all markers are really just one Scatter model, it would not be quite so bad to add a single DodgedScatter or whatever. I think that would address @p-himik concern about python/JS parity as well.

0 replies

allefeld · 2021-03-28T16:28:39Z

allefeld
Mar 28, 2021
Author

@bryevdv, it goes without saying, but maybe should be said anyway, that this criticism doesn't take away from the amazing work you guys have done, and compared to many other complex projects, your documentation is extensive. Would it be helpful if I create another issue for the documentation-related proposals?

I like the idea of a DodgedScatter, but I'm not sure I'm up to the task to implement this on the JS side. I've worked a bit with JS, but I'm not by far as comfortable with it as I am with Python.

0 replies

bryevdv · 2021-03-28T17:06:56Z

bryevdv
Mar 28, 2021
Maintainer

@allefeld please don't read too much into my previous comment :) I think your docs ideas are good, but yes, for better task management I think they ought to go into dedicated issues.

0 replies

bryevdv · 2021-03-29T17:38:09Z

bryevdv
Mar 29, 2021
Maintainer

@p-himik do you have any comments about adding a new specialized scatter marker?

BTW I also wanted to mention another motivating case for my position above: contour plots. I would very, very much like it to be possible to create real contour plots with Bokeh. But this will always only be a "python side data prep" kind of thing, because of the extensive code and compute required. But even though it is "technically possible" now for anyone to create contour plots using MultiPolygon, realistically absolutely no one ever will without some helper API to make it much, much easier. I don't think adding compute-only helpers alienates Bokeh and BokehJS (as long as the helpers do not deal with models directly)

0 replies

p-himik · 2021-03-29T18:35:09Z

p-himik
Mar 29, 2021
Collaborator

I am very predisposed against piling up features. Features are endless. And Bokeh has a scope. Things in this area aren't invented that fast - how often do you see a new plot type being created, or a new web standard being widely accepted that would be greatly beneficial for Bokeh?
Sometimes libraries become complete, and that's a great thing and a great goal to aim for. It's not an unattainable ideal, it's an achievable goal.
This is a nice and a very short read on the topic: https://drewdevault.com/2021/01/04/A-culture-of-stability-and-reliability.html

As far as I can tell, Bokeh gives everything someone needs to create such a marker themselves. I haven't checked in ages, but perhaps the documentation in that department can be improved, but that's about it. Creating a new marker is a trivial task.

The counter-argument to that might be that there are Bokeh users that don't know JS and that would still like to have a custom marker or any other piece of functionality that requires custom JS code. To that I can only offer using support forums or hiring someone with enough skill to do it or to figure out how to do it.

And if someone creates such a marker/expression/transform/whatever, and publishes it with a nice permissive license, and tests it, and shows that there's at least some interest from users other than the maintainer itself - only then I would considered adding such an extension to the core library.

0 replies

p-himik · 2021-03-29T18:43:28Z

p-himik
Mar 29, 2021
Collaborator

On contour plots - I would add this feature to the core library. Only because contour plots have been around virtually forever and are still actively used. Whereas this particular feature request is just three days old. And I don't recall seeing such plots all that often (I might be completely wrong, but then someone needs to show it).

But before adding contour plots, I would make sure that I've completely exhausted all viable possibilities of making most of the stuff happen on the BokehJS side. As an example, if I'm not mistaken, a similar argument has been made about {v,h}stack quite some time ago now - the code ended up being only on the Python side. Well, about a year ago I found out that it's not that hard to implement it on the JS side as well.

0 replies

bryevdv · 2021-03-29T19:09:35Z

bryevdv
Mar 29, 2021
Maintainer

And I don't recall seeing such plots all that often (I might be completely wrong, but then someone needs to show it).

This is one variation of a dot plot, and this kind dodging is described explicitly in The Grammar of Graphics (1999) and probably earlier. I've definitely encountered this sort of plot in more statistically-inclined settings over the years with some regularity (e.g. it's built into ggpot geom_dot afaik).

I am very predisposed against piling up features. Features are endless.

I am also similarly predisposed, by-and-large, trust me :) But I am also not opposed to applying experience and judgment to ascertain the amount of risk and effort involved and input those factors in the the calculus as well. In this case, for instance, all changes would be isolated to a new code in single new class that does not need to interact with anything else or (most importantly) require changes elsewhere. That observation goes a good distance for me in this specific case.

The counter-argument to that might be that there are Bokeh users that don't know JS and that would still like to have a custom marker or any other piece of functionality that requires custom JS code. To that I can only offer using support forums or hiring someone with enough skill to do it or to figure out how to do it.

That is very specifically the argument, and I would claim that if the burden is "users have to write custom JS extensions" then 99% of the time that means Bokeh just won't get used. But also getting back to scope: the scope of Bokeh includes explicitly affording browser plotting to Python folks without having to know JS (at least to rough order).

Well, about a year ago I found out that it's not that hard to implement it on the JS side as well.

I'd love for you to elaborate on this (in a new separate issue). AFAIK the "issue" with vbar_stack is not so much that it is python-only, i.e. it would be trivial to implement vbar_stack in BokehJS. But rather that the stacking relations, once set up, are a pain to reconfigure (from either Python or JS).

Regarding contours, we are already looking at optional C or C++ packages to support it. The best I could imagine doing in JS is implementing a Marching Squares implementation. But it would probably be too slow, and worse, Marching Squares is a purely local algo that generates a pile of disconnected segments rather then actual iso-lines that could drive MultiPolygons. But this discussion should go in #8360 probably

0 replies

p-himik · 2021-03-29T19:29:55Z

p-himik
Mar 29, 2021
Collaborator

this kind dodging is described explicitly in The Grammar of Graphics. I've definitely encountered this sort of plot in more statistically-inclined settings over the years with some regularity.

Then I yield to your expertise. :)

In this case, for instance, all changes would be isolated to a new code in single new class that does not need to interact with anything else or (most importantly) require changes elsewhere

Sure, that's a good thing. But it still results in extra code that maintainers need to support. Markers have been rewritten before. There more markers there are, the more time each such global change will require, the more there's a chance to introduce new bugs.

99% of the time that means Bokeh just won't get used

And that's perfectly fine! If there's a better tool for the job, it should be chosen instead of Bokeh.
To be fair, this very same argument is true for every single library out there. Every single one of them lacks a feature that someone needs. The difference that really matters is that some libraries offer good extendability mechanisms, and some don't.

the scope of Bokeh includes explicitly affording browser plotting to Python folks without having to know JS

If it involves a widely used feature that's somehow not yet in Bokeh then I agree, because the situation is exactly the same as I've described above with the contour plots.
My whole point is that if a FR is a random "I want X and Bokeh doesn't have X but plotting library Y has X", then it should be left for an extension rather than the core. This particular FR seemed to me that way - after all, a bit strange to see such thing not having been implemented after so many years, given how simple it is. But your experience shows otherwise, so I concede. In this case it's not "an extra feature" then but rather "a missing feature".

I'd love for you to elaborate on this (in a new separate issue). AFAIK the "issue" with vbar_stack is not so much that it is python-only, i.e. it would be trivial to implement vbar_stack in BokehJS. But rather that the stacking relations, once set up, are a pain to reconfigure (from either Python or JS).

When and if I'm involved with that code again that uses vbar_stack, I'll definitely create an issue and a PR. IIRC it would have to be a new marker type exactly so changes of the CDS can be propagated properly.

0 replies

allefeld · 2021-03-29T19:38:15Z

allefeld
Mar 29, 2021
Author

The question whether this has a place in Bokeh is obviously up to you guys. But, rejecting it because you "don't recall seeing such plots all that often" is a bad argument. People necessarily stick to what plotting packages offer them, however good or bad that is. Random jitter is much easier to implement, therefore it's everywhere – I'd say it's a poor man's dodging. Both are intended to solve the problem of overplotting, but jitter doesn't do it properly.

By contrast, while contour plots in Bokeh would be nice, I don't think it's that important. One can use measure.find_contours from scikit-image, and then simply plot the lines with Bokeh. Similarly, I don't think histogram computation or kernel density estimation belong into a plotting library, because they are statistical data analysis procedures which can be separated from the plotting. Stacking circles cannot be separated from the plotting.

0 replies

p-himik · 2021-03-29T19:44:34Z

p-himik
Mar 29, 2021
Collaborator

rejecting it because you "don't recall seeing such plots all that often" is a bad argument

Of course, that's why I'm not rejecting it and keeping on discussing it.
If I were the only dev member here, I would simply expect or ask for a justification that's not just "this is something that I need" but rather a proper description of the method, how widespread it is, what value it has. All proper enhancement proposal stuff.

0 replies

jbednar · 2021-03-29T20:16:56Z

jbednar
Mar 29, 2021
Collaborator

By contrast, while contour plots in Bokeh would be nice, I don't think it's that important. One can use measure.find_contours from scikit-image, and then simply plot the lines with Bokeh.

Sure, you can get contour lines like that, but often what people are after for contour plots are labeled contour lines (holoviz/holoviews#4494, https://gitter.im/pyviz/pyviz?at=605cdd99563232374c43874e), which do require JS-level support or some serious hacking.

0 replies

bryevdv · 2021-03-29T20:17:25Z

bryevdv
Mar 29, 2021
Maintainer

@allefeld

One can use measure.find_contours from scikit-image, and then simply plot the lines with Bokeh.

I am afraid it is not quite so simple:

Output contours are not guaranteed to be closed:

MultiPolygons expects close polygons, and you want to use MultiPolygons rather than lines:

in order to be able to fill iso-band with colors or hatching
to be able treat entire iso-band as a logical unit, e.g. for selection/highlighting (even if disconnected)

Regarding:

But, rejecting it because you "don't recall seeing such plots all that often" is a bad argument.

I mean, it sort of is. There is always more work than people to do it, which mean prioritization is impossible to avoid. A reasonable prioritization that many projects employ is "what will have the most impact per effort for users?" [1] In this case, I think it is reasonably common and impactful but was avoided until now because the existence of ~20 individual marker glyphs made the effort too high. Now that there is just one Scatter I think that calculus has changed.

[1] devs are also human beings, so absent dedicated funding, "This is just something I would enjoy spending my limited personal free time on" is another perfectly valid criteria.

0 replies

allefeld · 2021-03-30T13:22:15Z

allefeld
Mar 30, 2021
Author

another perfectly valid criteria.

Of course. I was only protesting the suggestion that this is a weird thing that nobody ever uses. I think it would be appreciated by people with a stats background like myself.

0 replies

bryevdv · 2021-03-30T21:53:35Z

bryevdv
Mar 30, 2021
Maintainer

I've updated the title to reflect the recent thoughts. I'm leaning towards marking as feature but we need a a little more information to flesh out a proposal. First: What is the full range of applicability for this glyph? The use case above of distributions on on categorical axes is certainly a common one. But cases like this also seem applicable (these are also explicitly described in GoG):

Can one glyph cover both these cases? (I think so, but what are any specific considerations?)

Second: what is the actual API / set of properties that a dodged scatter has? e.g. I have to presume there are at least a few tunable parameters to control how the dodging is done, what are these? What does proposed usage look like

@allefeld This is where you can help greatly by working up a proposal for the code you would like to be able to write.

0 replies

allefeld · 2021-03-31T19:44:23Z

allefeld
Mar 31, 2021
Author

Range of applicability

The idea behind my proposal is that we have a scatter plot (to see the actual data, unlike a histogram or a KDE), but we want to avoid overplotting. If the data to be scatter-plotted consist of two numerical values (standard 2d scatter plot), then avoiding overplotting by dodging means one has to modify the data to be displayed, which goes contrary to seeing the actual data. (In that case I would rather go for the Datashader approach.)

If one dimension is categorical, then adding offsets to the categorical data does not change the category. That's why I restricted my implementation to the case where there is one categorical scale and one numerical scale. (Actually, I restricted to linear scale, but there is no reason not to also allow logarithmic etc.)

A possible extension of that approach that I see is to allow two categorical scales. In that case, the standard positions would be on the intersections of a 2d lattice, and dodging to avoid overplotting could be applied in both dimensions without changing category information, leading to clusters around the lattice intersections. I'm not sure how useful that would be, because there would be no information loss in just counting the number of data points per combination of categories; but I don't see anything wrong with it either.

The GoG plot

I don't think the plot from GoG is a dodged scatter plot, though it looks similar, because the vertical axis is labeled 'count' (though the tick labels are not counts?). This means that this plot is actually a variant of a histogram, with the difference that stacks of dots are used instead of the usual bars. The advantage could be that the discrete nature of counts is visually apparent, I like the plot for that.

In this interpretation, there is no dodging involved. The displayed data is a list of tuples (mpg, count), e.g. [(10, 2), (13, 1), (14, 1), (15, 5), …], as apparent from the axis labels, and each data point is visualized with zero or more circles.

Of course the same effect could be achieved with a dodged scatter plot, by feeding in only the mpg values in multiples, e.g. [10, 10, 13, 14, 15, 15, 15, 15, 15, …]. But that would be a case of numerical combined with categorical, where there is only one category. And there would be no 'counts' axis, but only offsets within this single category.

0 replies

allefeld · 2021-03-31T19:48:41Z

allefeld
Mar 31, 2021
Author

Tunable parameters

I assume that we're modeling on the Scatter glyph, which can be created from the plotting interface via the scatter method. I think we need two parameters in addition:

dodge_direction: direction towards which dodging occurs. Possible values are 'centered' (like in my implementation), 'increase' to only use dodging values >= 0, 'decrease' to only use values <= 0. Defaults to 'centered'.

This is what 'increase' looks like:

To make it more user-friendly, 'left' & 'right' and 'top' & 'bottom' could be aliases depending on whether the categorical axis is horizontal or vertical.

If we allow two categorical axes, I'm not sure anything other than 'centered' makes sense. A possibility I see is that this parameter could be either 'centered' or an angle (0 – 2 pi). Then 'left', 'right', 'top', 'bottom' would be aliases for pi, 0, 1/2 pi, 3/2 pi.

marker_sep: space left between adjacent markers, in screen units. Defaults to 1 pixel.

For a circle marker, where the size property is the diameter in pixels, the algorithm would position markers such that the distance between the centers of any two markers is >= size + marker_sep.

If as @jbednar proposed other markers are allowed, then it needs to be clear that for them size is the diameter of the smallest circle enclosing the marker, or size needs to be adjusted depending on the marker type.

If we allow markers of different sizes, then the minimum distance becomes size1 / 2 + size2 / 2 + marker_sep.

0 replies

allefeld · 2021-03-31T19:52:18Z

allefeld
Mar 31, 2021
Author

Algorithm

My implementation positions (dodges) markers in the sequence they appear in the data. It is based on a search over predefined possible dodge values (in pixels):

# possible offsets in pixels
# of both signs, in ascending order of absolute value
xcs = np.arange(np.floor((fr_scaling - size - 2) / 2)) + 1
xcs = np.hstack((0, np.vstack((xcs, -xcs)).reshape(-1, order='F')))

Here fr_scaling is the distance between adjacent categories in pixels. I hard-coded marker_sep to be 1, i.e the -2 here would be -2 * marker_sep in general. The array xcs has the form [0, 1, -1, 2, -2, …].

For each new marker to be positioned, it calculates the squared distance between already positioned markers and the possible offsets for the new one, and then the minimum across already positioned points:

# squared distances between actual positions of other points
# and possible positions of current point
d2 = (xo[:, None] - xcs[None, :]) ** 2 + (yo[:, None] - y[i]) ** 2
# minimum squared distance to other points for each possible offset
d2 = np.min(d2, axis=0)

Due to some surrounding logic, at this point the dimension along which dodging is applied is x and the numerical data dimension is y. xo and yo are the coordinates of the already positioned points in pixels. xcs are the possible x-coordinates for the new marker, and y[i] is its given y-coordinate.

The admissible x-coordinates are those for which d2 is >= threshold = (size + marker_sep) ** 2:

# admissible offsets
xc = xcs[d2 >= threshold]

The optimal choice is the first entry in the resulting array:

# take the first admissible offset
# because it is closest to the center
x[i] = xc[0]

The same logic could be used for the 'left' and 'right' options of dodge_direction by simply preparing the xcs array differently.

A problem occurs if there are so many markers that at some point there are no admissible offsets anymore, i.e. xc is empty. Here is my approach:

if len(xc) > 0:
    # take the first admissible offset
    # because it is closest to the center
    x[i] = xc[0]
else:
    # no admissible offsets? minimize overlap
    x[i] = xcs[np.argmax(d2)]

Limitations

1) My implementation uses a pre-defined array of possible offsets xcs. I did it this way because it allows to implement the positioning straightforwardly using NumPy array operations, which also makes it relatively fast (~ 60 ms for 1000 points). The drawback is that it can't easily use sub-pixel positions, and that it is not clear how this would be generalized for the two-dimensional case (two categorical axes). I'm also not sure how this would map onto JavaScript.

2) My implementation positions markers one-by-one in the sequence of the data. This can lead to strange artifacts. For example, if the data are sorted and many differences between subsequent values are lower than the marker size, this leads to 'tendrils':

Better would certainly be a global optimization approach, but that would be very costly. I have some ideas on how to improve this using cheaper tricks, if you are interested. On the other hand, the tendrils as an indication of being-sorted could also be seen as a feature.

0 replies

bryevdv · 2021-06-24T17:59:33Z

bryevdv
Jun 24, 2021
Maintainer

Going to move this to a GitHub discussion for further discussion and a new issue can be opened if/when we reach agreement on a concrete implementation/proposal

0 replies

Add DodgedScatter glyph #11382

allefeld Mar 26, 2021

Replies: 30 comments

bryevdv Mar 26, 2021 Maintainer

bryevdv Mar 26, 2021 Maintainer

allefeld Mar 26, 2021 Author

p-himik Mar 26, 2021 Collaborator

bryevdv Mar 26, 2021 Maintainer

p-himik Mar 26, 2021 Collaborator

bryevdv Mar 26, 2021 Maintainer

bryevdv Mar 26, 2021 Maintainer

jbednar Mar 26, 2021 Collaborator

bryevdv Mar 26, 2021 Maintainer

allefeld Mar 26, 2021 Author

bryevdv Mar 27, 2021 Maintainer

bryevdv Mar 27, 2021 Maintainer

allefeld Mar 28, 2021 Author

bryevdv Mar 28, 2021 Maintainer

bryevdv Mar 29, 2021 Maintainer

p-himik Mar 29, 2021 Collaborator

p-himik Mar 29, 2021 Collaborator

bryevdv Mar 29, 2021 Maintainer

p-himik Mar 29, 2021 Collaborator

allefeld Mar 29, 2021 Author

p-himik Mar 29, 2021 Collaborator

jbednar Mar 29, 2021 Collaborator

bryevdv Mar 29, 2021 Maintainer

allefeld Mar 30, 2021 Author

bryevdv Mar 30, 2021 Maintainer

allefeld Mar 31, 2021 Author

Range of applicability

The GoG plot

allefeld Mar 31, 2021 Author

Tunable parameters

allefeld Mar 31, 2021 Author

Algorithm

Limitations

bryevdv Jun 24, 2021 Maintainer

allefeld
Mar 26, 2021

bryevdv
Mar 26, 2021
Maintainer

bryevdv
Mar 26, 2021
Maintainer

allefeld
Mar 26, 2021
Author

p-himik
Mar 26, 2021
Collaborator

bryevdv
Mar 26, 2021
Maintainer

p-himik
Mar 26, 2021
Collaborator

bryevdv
Mar 26, 2021
Maintainer

bryevdv
Mar 26, 2021
Maintainer

jbednar
Mar 26, 2021
Collaborator

bryevdv
Mar 26, 2021
Maintainer

allefeld
Mar 26, 2021
Author

bryevdv
Mar 27, 2021
Maintainer

bryevdv
Mar 27, 2021
Maintainer

allefeld
Mar 28, 2021
Author

bryevdv
Mar 28, 2021
Maintainer

bryevdv
Mar 29, 2021
Maintainer

p-himik
Mar 29, 2021
Collaborator

p-himik
Mar 29, 2021
Collaborator

bryevdv
Mar 29, 2021
Maintainer

p-himik
Mar 29, 2021
Collaborator

allefeld
Mar 29, 2021
Author

p-himik
Mar 29, 2021
Collaborator

jbednar
Mar 29, 2021
Collaborator

bryevdv
Mar 29, 2021
Maintainer

allefeld
Mar 30, 2021
Author

bryevdv
Mar 30, 2021
Maintainer

allefeld
Mar 31, 2021
Author

allefeld
Mar 31, 2021
Author

allefeld
Mar 31, 2021
Author

bryevdv
Jun 24, 2021
Maintainer