Tabulate by Batch, and readiness for Tabulate By X #816

artoonie · 2024-03-29T23:39:20Z

Request for comment on language.
As a noun, I call it a "a tabulate-by field"
As a verb, I call it "Tabulate By Field"

I can imagine preferences for other names, by "Tabulate By Pivot" or "Tabulate By Metadata Field" or something else, but "Tabulate By Field" seems sufficiently clear to me. Thoughts? Any other words that may reduce ambiguity?

Closes #807

artoonie · 2024-04-13T22:08:07Z

@yezr this has been lightly tested but I could use some help gathering test data, test configurations, and checking edge cases.

…tabulate-by

tarheel

This seems great. My feedback is mainly around the language. The verb phrase "tabulate by field" is OK, but could be clearer. There and elsewhere, my concern is mostly about how generic the word "field" is.

tarheel · 2024-05-19T20:54:56Z

src/main/java/network/brightspots/rcv/ContestConfig.java

- Logger.severe("tabulateByPrecinct may not be used with CDF files.");
+ for (TabulateByField tabulateByField : enabledFields()) {
+ validationErrors.add(ValidationError.CVR_CDF_TABULATE_BY_DISAGREEMENT);
+ Logger.severe("%s may not be used with CDF files.", tabulateByField);


We should maybe note that this is just because we haven't chosen to (fully) implement it.

tarheel · 2024-05-19T20:56:48Z

src/main/java/network/brightspots/rcv/ContestConfig.java

@@ -1291,5 +1332,21 @@ public String getInternalLabel() {
 }
 }

+ enum TabulateByField {


I find this name kind of confusing because it sounds like a verb phrase (as in "I want to tabulate by field"). Maybe TabulationBucketingField or TabulationGroupingField or TabulationSliceField?

I like slice, and think it will make sense to s/field/slice throughout this PR

tarheel · 2024-05-19T20:57:40Z

src/main/java/network/brightspots/rcv/ResultsWriter.java

@@ -9,7 +9,7 @@

 /*
 * Purpose: Ingests tabulation results and generates various summary report files.
- * Design: Generates per-precinct files if specified.
+ * Design: Generates per-field files if specified.


I would clarify what "per-field" means in this comment. Ideally we can choose a global term for this subgroups, like one of the terms I suggested above or something similar. Bucket, group, subgroup, slice, subset. "Pivot" and "metadata" feel maybe a bit too technical. Anyway, there are a bunch of places in this PR where we're just saying "field" or "fieldId", and I find that confusingly generic. I'd prefer for all of those to say something more along the lines of "bucketType"/"bucketId" or whatever to convey that that's the purpose of the fields and the IDs.

tarheel · 2024-05-19T21:00:06Z

src/main/java/network/brightspots/rcv/ResultsWriter.java

@@ -63,8 +66,8 @@ class ResultsWriter {

 // number of rounds needed to elect winner(s)
 private int numRounds;
- // all precinct Ids which may appear in the output cvrs
- private Set<String> precinctIds;
+ // all Field Ids which may appear in the output cvrs


s/which/that

tarheel · 2024-05-19T21:03:22Z

src/main/java/network/brightspots/rcv/ResultsWriter.java

@@ -565,7 +573,8 @@ String writeRctabCvrCsv(
 outputFile.getAbsolutePath());
 CSVPrinter csvPrinter;
 BufferedWriter writer = Files.newBufferedWriter(outputFile.toPath());
- csvPrinter = new CSVPrinter(writer, CSVFormat.DEFAULT);
+ CSVFormat format = CSVFormat.DEFAULT.builder().setNullString("").build();


tarheel · 2024-05-19T21:09:41Z

src/main/java/network/brightspots/rcv/Tabulator.java

- private final Map<String, Map<Integer, RoundTally>> precinctRoundTallies = new HashMap<>();
+ private final RoundTallies roundTallies = new RoundTallies();
+ // roundTalliesByField is a map from a tabulate-by field to roundTallies for that field
+ private final BreakdownByField<RoundTallies> roundTalliesByField = new BreakdownByField<>();


Am I misreading the code, or does this mean that if you're using more than one breakdown (with the current code, that would be both batch and precinct), there will be a collision if there's any overlap in the IDs? E.g. if there's a batch 1 and also a precinct 1, they'll be writing to and reading from the same tallies?

Ah, no, now I see that roundTalliesByField is a map from field to fieldId to round to vote count. Maybe roundTalliesByFields (plural) would be more clear?

tarheel · 2024-05-19T21:12:00Z

src/main/java/network/brightspots/rcv/Tabulator.java

- String precinctId = cvr.getPrecinct();
- if (precinctId != null) {
- precinctIds.add(precinctId);
+ for (TabulateByField field : config.enabledFields()) {


good example of the ambiguity here: without context, I would never guess that config.enabledFields() is giving me a list of enabled breakdown fields

tarheel · 2024-05-19T21:15:28Z

src/main/java/network/brightspots/rcv/Tabulator.java

@@ -789,8 +798,8 @@ void generateSummaryFiles(String timestamp) throws IOException {
 }
 }

- Set<String> getPrecinctIds() throws IOException {
- return precinctIds;
+ FieldIdSet getFieldIds() {


Should fieldIds/getFieldIds clarify that it's referring to the enabled ones? And why do we need the fieldIds instance variable instead of just using config.enabledFields()? Aren't we throwing an exception if there's a discrepancy between the config and the CVRs in this regard anyway?

tarheel · 2024-05-19T21:24:45Z

src/main/java/network/brightspots/rcv/Tabulator.java

+ * Breaks down the templated type T by a specific field.
+ * The field type is the outer map (TabulateByField), and the field ID is the inner map (String).
+ */
+ static class BreakdownByField<T> {


OK, so you made this generic so that you could reuse it for both tallies and tally transfers.

artoonie · 2024-05-26T18:52:34Z

@tarheel all comments are addressed!

WIP

ab81e47

artoonie added the WIP label Mar 29, 2024

complete initial implementation; stub of ES&S Batch ID implementation

d97838b

artoonie force-pushed the feature/issue-807_tabulate-by branch from 1fd1860 to d97838b Compare March 29, 2024 23:39

Armin Samii added 4 commits March 30, 2024 13:43

lint

67915cf

split change into its own PR

bd027bd

complete ES&S batch support

3ffe6df

add safety checks

2faa642

artoonie requested a review from yezr April 13, 2024 22:07

artoonie marked this pull request as ready for review April 13, 2024 22:07

artoonie removed the WIP label Apr 13, 2024

artoonie added the WIP label Apr 15, 2024

artoonie removed the WIP label Apr 30, 2024

artoonie changed the title ~~WIP: Tabulate by Batch, and readiness for Tabulate By X~~ Tabulate by Batch, and readiness for Tabulate By X Apr 30, 2024

Armin Samii added 2 commits April 30, 2024 19:58

Merge remote-tracking branch 'origin/develop' into feature/issue-807_…

66d5da7

…tabulate-by

fix argument order

982fe84

tarheel reviewed May 19, 2024

View reviewed changes

Armin Samii added 2 commits May 26, 2024 14:11

s/field/slice

186df22

address additional comments

6a014a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tabulate by Batch, and readiness for Tabulate By X #816

Tabulate by Batch, and readiness for Tabulate By X #816

artoonie commented Mar 29, 2024 •

edited

artoonie commented Apr 13, 2024

tarheel left a comment

tarheel May 19, 2024

tarheel May 19, 2024

artoonie May 20, 2024

tarheel May 19, 2024

tarheel May 19, 2024

tarheel May 19, 2024

tarheel May 19, 2024

tarheel May 19, 2024

tarheel May 19, 2024

tarheel May 19, 2024

tarheel May 19, 2024

artoonie commented May 26, 2024

Tabulate by Batch, and readiness for Tabulate By X #816

Are you sure you want to change the base?

Tabulate by Batch, and readiness for Tabulate By X #816

Conversation

artoonie commented Mar 29, 2024 • edited

artoonie commented Apr 13, 2024

tarheel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artoonie commented May 26, 2024

artoonie commented Mar 29, 2024 •

edited