Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphman copy indexing improvements #5425

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Conversation

zorancv
Copy link
Contributor

@zorancv zorancv commented May 17, 2024

Addresses the #5140 issue. The tests are still pending.

@zorancv zorancv marked this pull request as draft May 17, 2024 12:34
@zorancv zorancv changed the base branch from master to zoran/graphman-copy-index May 17, 2024 12:36
@zorancv zorancv force-pushed the zoran/graphman-copy-same branch 2 times, most recently from 4450622 to a656ea2 Compare May 17, 2024 14:11
@zorancv zorancv force-pushed the zoran/graphman-copy-index branch from 6287b01 to 1c4f291 Compare May 17, 2024 14:13
@zorancv zorancv changed the base branch from zoran/graphman-copy-index to zoran/graphman-copy-init May 17, 2024 14:14
@zorancv zorancv changed the base branch from zoran/graphman-copy-init to zoran/graphman-copy-index May 17, 2024 14:14
@zorancv zorancv changed the base branch from zoran/graphman-copy-index to master May 17, 2024 14:18
@zorancv zorancv changed the base branch from master to zoran/graphman-copy-index May 17, 2024 14:19
@zorancv zorancv changed the base branch from zoran/graphman-copy-index to tiago/receipts-in-mappings May 17, 2024 14:20
@zorancv zorancv changed the base branch from tiago/receipts-in-mappings to zoran/graphman-copy-index May 17, 2024 14:20
@zorancv zorancv changed the base branch from zoran/graphman-copy-index to zoran/graphman-copy-init May 17, 2024 14:20
@zorancv zorancv changed the base branch from zoran/graphman-copy-init to master May 17, 2024 14:21
@zorancv zorancv force-pushed the zoran/graphman-copy-same branch 4 times, most recently from cbd2732 to 2204c6e Compare May 22, 2024 16:26
@zorancv zorancv changed the title Grahman copy indexing improvements Graphman copy indexing improvements May 23, 2024
@zorancv zorancv force-pushed the zoran/graphman-copy-same branch 10 times, most recently from 844d9e2 to 35dbb91 Compare May 29, 2024 09:38
@zorancv zorancv changed the base branch from master to zoran/graphman-copy-only May 29, 2024 11:35
@zorancv zorancv changed the base branch from zoran/graphman-copy-only to master May 29, 2024 13:11
@zorancv zorancv force-pushed the zoran/graphman-copy-same branch 6 times, most recently from 4df57e5 to e4b8b6c Compare May 30, 2024 14:38
@zorancv zorancv force-pushed the zoran/graphman-copy-same branch 4 times, most recently from 4881b62 to 668b7c4 Compare May 31, 2024 13:54
@zorancv zorancv marked this pull request as ready for review May 31, 2024 15:01
@@ -155,31 +155,31 @@ fn test_manual_index_creation_ddl() {
#[test]
fn generate_ddl() {
let layout = test_layout(THING_GQL);
let sql = layout.as_ddl().expect("Failed to generate DDL");
let sql = layout.as_ddl(false).expect("Failed to generate DDL");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for one of these, there should also be a test that passes true to demonstrate which indexes are created in that case.

@@ -180,6 +180,7 @@ impl DeploymentStore {
graft_base: Option<Arc<Layout>>,
replace: bool,
on_sync: OnSync,
is_copy_op: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clearer to use a separate enum here rather than passing a boolean flag, maybe something like enum IndexMode { Immediate, Delay }. That enum could then also encapsulate some of the logic around when to create an index and when not, like mode.should_create(column, index_method) -> bool

writeln!(out)
}

fn colums_to_index(&self) -> impl Iterator<Item = &Column> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in columns_to_index

// the copy/graft operations.
// First recreate the indexes that existed in the original subgraph.
let conn = self.conn.deref_mut();
let namespace = self.dst.site.namespace.as_str().to_string();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very minor, but why not pass the Namespace around and only convert that to astring when it's absolutely needed?

@@ -828,7 +871,7 @@ impl Connection {
/// lower(v1.block_range) => v2.vid > v1.vid` and we can therefore stop
/// the copying of each table as soon as we hit `max_vid = max { v.vid |
/// lower(v.block_range) <= target_block.number }`.
pub fn copy_data(&mut self) -> Result<Status, StoreError> {
pub fn copy_data(&mut self, index_list: IndexList) -> Result<Status, StoreError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a comment here that says that index_list is the list of indexes that currently exist on the source

}
}
false
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just columns.iter().any(|c| c == column_name)? I am all for leaving this a helper for readability, though

false
}

fn some_column_contained(expresion: &String, columns: &Vec<String>) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in expression

.columns
.iter()
.map(|i| i.name.to_string())
.collect::<Vec<String>>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very minor here, and can stay like this, since this code isn't run very often, but in general it is better to avoid allocations like that; if this code was called more often, the helpers above should take a Table and iterate over its columns without allocating

.collect::<Vec<String>>();

match self {
CreateIndex::Unknown { defn: _ } => (),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that we return true for an index that we couldn't parse? It would be clearer if this arm had a return in it.

dst.as_ddl(schema, catalog, &mut query)?;
// In case of pruning we don't do delayed creation of indexes,
// as the asumption is that there is not that much data inserted.
dst.as_ddl(schema, catalog, None, &mut query)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants