Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMDS-3405] Write matrices and frames at site for federated write #1665

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

kev-inn
Copy link
Contributor

@kev-inn kev-inn commented Jul 15, 2022

Perform federated write at site of workers and locally write a MTD file containing the addresses. Frames work too, but testcases are still missing.

@Baunsgaard
Copy link
Contributor

yes please !

@kev-inn kev-inn changed the title [WIP] Write matrices and frames at site for federated write [SYSTEMDS-3405] Write matrices and frames at site for federated write Jul 17, 2022
Comment on lines +108 to +111
public void setFilepath(String filepath) {
_filepath = filepath;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to keep FederatedData constant and create a copy with a different filepath instead. I am open for discussion.

@kev-inn kev-inn marked this pull request as ready for review July 17, 2022 18:22
@kev-inn kev-inn marked this pull request as draft July 30, 2022 14:09
Instead of having a separate format `format=federated`, we write federated if the object is federated. We also do not create a JSON file with the federated addresses and ranges, instead we add this information to the MTD file. Note that we now also aligned the specification of addresses and ranges with the usage in our `federated()` function, such that the syntax is similar.
@kev-inn kev-inn force-pushed the fed_write_local_and_at_site branch from c1a130f to 4d1cb50 Compare July 31, 2022 14:51
@kev-inn kev-inn force-pushed the fed_write_local_and_at_site branch from ef6ca80 to bca9bd4 Compare July 31, 2022 18:33
@kev-inn
Copy link
Contributor Author

kev-inn commented Jul 31, 2022

There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to /tmp/systemds which is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?

The problems mostly impact our testcases, but I also don't really have a favorite from a user side.
Other choices for where to write and downsides:

  • current working directory (cwd): the cwd of our testcases is the root, therefore we would clutter it full of partitions
  • the scratch space: will be deleted when the worker is killed and in our case, because we just start the new workers from the same JVM, also when the testcase finishes. Therefore, we can't check the results.
  • A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessary

The other aspects are finished and this PR is ready for review.

@kev-inn kev-inn marked this pull request as ready for review July 31, 2022 18:41
@kev-inn
Copy link
Contributor Author

kev-inn commented Jul 31, 2022

There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to /tmp/systemds which is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?

The problems mostly impact our testcases, but I also don't really have a favorite from a user side.
Other choices for where to write and downsides:

  • current working directory (cwd): the cwd of our testcases is the root, therefore we would clutter it full of partitions
  • the scratch space: will be deleted when the worker is killed and in our case, because we just start the new workers from the same JVM, also when the testcase finishes. Therefore, we can't check the results.
  • A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessary

The other aspects are finished and this PR is ready for review.

@Baunsgaard
Copy link
Contributor

There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to /tmp/systemds which is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?

The problems mostly impact our testcases, but I also don't really have a favorite from a user side. Other choices for where to write and downsides:

* current working directory (cwd): the cwd of our testcases is the root, therefore we would clutter it full of partitions

* the scratch space: will be deleted when the worker is killed and in our case, because we just start the new workers from the same JVM, also when the testcase finishes. Therefore, we can't check the results.

* A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessary

The other aspects are finished and this PR is ready for review.

hmm, good question what the best option is.

I find the last option with a new configuration tempting.
It could be made such that the federated workers use a configured path to make a directory, furthermore we can use their process ID to make a unique sub directory, and maintain a static counter inside to guarantee incrementing folder/file names, while appending the user specified file name in the end.

@Baunsgaard
Copy link
Contributor

This PR also change the write(M) (if M is federated) to not collect the matrix and write it locally.
Is it because it is always expected to be written to federated site if the output is federated?

@j143
Copy link
Contributor

j143 commented Jan 26, 2023

Hi @kev-inn , this PR seems to be ready.

about your question:

There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to /tmp/systemds which is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?

I guess @Baunsgaard is in favor of your suggestion:

A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessary

shall we wrap this up?

Regards

@kev-inn
Copy link
Contributor Author

kev-inn commented Jan 27, 2023

I will need some time to refresh my memory and get it in a merge ready state next month.

@j143
Copy link
Contributor

j143 commented Jan 28, 2023

I will need some time to refresh my memory and get it in a merge ready state next month.

sure thing, @kev-inn - thanks

@j143 j143 added this to the next-release milestone Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
3 participants