Naming rules for join keys #78

smmaurer · 2018-12-14T00:50:12Z

It occurred to me that if we follow some simple naming rules for join keys, we can substantially improve usability and data validation.. cc @janowicz @mxndrwgrdnr

This idea is related to issue #67 in that it's also about column names, but they're pretty separate.

Rules

Each table has a primary key/ index of one or more columns (already true)
Foreign keys have the same name as the primary key they're associated with (already true 95% of the time)
Columns cannot have the same name as another table's primary key unless they're meant to be associated with it (hopefully already true)

Advantages

If we follow these rules, we don't need "broadcasts". Join relationships are known in advance from the column names. This is easier for users and avoids bugs associated with bad broadcast definitions.

It also allows us to validate table relationships at any time. I've been reluctant to validate broadcasts this way, because sometimes they're provided in advance but not meant to be used until later in a simulation when source tables are present.

Tricky cases

Should work fine for multi-column keys, which is a nice bonus because Orca broadcasts don't support them. (ChoiceModels implements interaction term merges this way.)

Sometimes tables have the same primary key as each other, one with a subset of the id's (e.g. master list of nodes and a smaller list representing a transit network). I don't see any problems supporting this as long as we're expecting it.

I only see one place in the current cloud platform data spec that violates these rules: building parcel_id maps to parcel primary_id.

Implementation

It would be helpful to implement support for auto-specified merges at the same time as the data loading (issue #66). Two possible approaches:

a. Templates automatically generate Orca broadcasts? I suspect this would be tricky, because Orca doesn't allow over-determined broadcasts. (If a is linked to b and c, and b is also linked to c, you can't orca-merge the three of them. Not sure if this is a bug or intentional.)

b. Templates first try Orca merge, and if the broadcasts aren't there it falls back to its own merge logic. Once it's working smoothly we can add it to Orca.

Diagram

The text was updated successfully, but these errors were encountered:

This was referenced Feb 14, 2019

Spec for loading tables #66

Closed

[0.2.dev0] Template for loading data #93

Merged

Templates for data i/o #94

Open

This was referenced Feb 27, 2019

Templates for derived variables #98

Open

Utilities for merging tables #100

Closed

smmaurer mentioned this issue Mar 6, 2019

[0.2.dev4] Utilities for merging tables #102

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Naming rules for join keys #78

Naming rules for join keys #78

smmaurer commented Dec 14, 2018 •

edited

Loading

Naming rules for join keys #78

Naming rules for join keys #78

Comments

smmaurer commented Dec 14, 2018 • edited Loading

Rules

Advantages

Tricky cases

Implementation

Diagram

smmaurer commented Dec 14, 2018 •

edited

Loading