-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summary bucketing to columns #29
Comments
Nice example. I need to read through your proposals in more detail. However, my initial thought is that what you mention in point 2 is the key. Using
A couple references on long/narrow and wide data formats: |
Real quick before bed: after posting this, I discovered datamash (exists.
Also that it) has cross-tabulation as a separate tool, so there's at least
one toolset in the same domain treating it as a separate tool.
Idly, I wonder if there's not room for both concepts to coexist?
|
Ahaha, it keeps happening! ;) This time, in a much longer form.
I'm not sure what to call this precisely, so I'll describe the problem and see what you think:
Say we have as input a list of actions and their outcome:
The goal is something like this:
The above was done with the following incantation:
This is pretty nasty: two subshells (with bonus bashisms), two scans of the input file, annoying and error-prone to edit... you can probably see why I'd like to improve this one.
Two ideas come to mind for how this might work:
tsv-summarize --group-by 1 --bucket 2
For the above input, I'd expect this to output:
i.e. each unique value in the group is counted and given a bucket.
Advantages:
Disadvantages:
In this case, there needs to be something to bridge the gap. Maybe something like this?
tsv-pivot --column 2 --fact sum:3
Output... probably the same as before.
Advantages:
Disadvantages:
The text was updated successfully, but these errors were encountered: