-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Naming rules for join keys #78
Comments
This was referenced Feb 14, 2019
Closed
This was referenced Feb 27, 2019
8 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It occurred to me that if we follow some simple naming rules for join keys, we can substantially improve usability and data validation.. cc @janowicz @mxndrwgrdnr
This idea is related to issue #67 in that it's also about column names, but they're pretty separate.
Rules
Advantages
If we follow these rules, we don't need "broadcasts". Join relationships are known in advance from the column names. This is easier for users and avoids bugs associated with bad broadcast definitions.
It also allows us to validate table relationships at any time. I've been reluctant to validate broadcasts this way, because sometimes they're provided in advance but not meant to be used until later in a simulation when source tables are present.
Tricky cases
Should work fine for multi-column keys, which is a nice bonus because Orca broadcasts don't support them. (ChoiceModels implements interaction term merges this way.)
Sometimes tables have the same primary key as each other, one with a subset of the id's (e.g. master list of nodes and a smaller list representing a transit network). I don't see any problems supporting this as long as we're expecting it.
I only see one place in the current cloud platform data spec that violates these rules: building
parcel_id
maps to parcelprimary_id
.Implementation
It would be helpful to implement support for auto-specified merges at the same time as the data loading (issue #66). Two possible approaches:
a. Templates automatically generate Orca broadcasts? I suspect this would be tricky, because Orca doesn't allow over-determined broadcasts. (If a is linked to b and c, and b is also linked to c, you can't orca-merge the three of them. Not sure if this is a bug or intentional.)
b. Templates first try Orca merge, and if the broadcasts aren't there it falls back to its own merge logic. Once it's working smoothly we can add it to Orca.
Diagram
The text was updated successfully, but these errors were encountered: