Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline execution on CONP datasets #63

Open
glatard opened this issue Jun 13, 2019 · 13 comments
Open

Pipeline execution on CONP datasets #63

glatard opened this issue Jun 13, 2019 · 13 comments

Comments

@glatard
Copy link
Contributor

glatard commented Jun 13, 2019

We should streamline the processing of CONP datasets with CONP pipelines, possibly by reviving https://github.com/CONP-PCNO/conp-pipeline

@paiva
Copy link
Contributor

paiva commented Jun 19, 2019

I agree with this suggestion. How would you like to proceed?

@glatard
Copy link
Contributor Author

glatard commented Jun 28, 2019

This has two aspects:

  1. Following a discussion with @shots47s. From the portal, the frontend should be developed to launch a specific pipeline on a specific dataset using CBRAIN's REST API. The new CBRAIN GUI will soon provide widgets to facilitate that.
  2. From the command line, this is already possible using Boutiques+DataLad. I don't think we need to add anything specific there.

@shots47s
Copy link

shots47s commented Jul 3, 2019

I think we should do the CBRAIN execution in two stages:

  1. redirect to the portal as a first pass (i.e. send over to CBRAIN the information about which dataset and pipeline to be executed) and let the users use CBRAIN to run it.
  2. Then move to providing modular UI components from our new interface and run the jobs through the CBRAIN API. It may not be necessary then to code up an actual connection to the API, because the React Components will have the baked in.

@cmadjar
Copy link
Collaborator

cmadjar commented May 14, 2020

@glatard should this issue be closed?

@cmadjar
Copy link
Collaborator

cmadjar commented May 14, 2020

actually, I will close it. Feel free to reopen it if you think there is still work involved on that issue.

@cmadjar cmadjar closed this as completed May 14, 2020
@glatard
Copy link
Contributor Author

glatard commented May 14, 2020

On the CBRAIN front it would be useful to have a tighter integration than just redirecting to the login page. We should check with the CBRAIN team if point 2. in @shots47s' list above would be doable.

@natacha-beck
Copy link
Contributor

We should discuss about the new interface in the coming weeks, I will bring the point 2 to this discussion.

@cmadjar cmadjar reopened this May 14, 2020
@cmadjar cmadjar changed the title Pipeline execution on CONP datasets Pipeline CBRAIN execution on CONP datasets May 20, 2020
@cmadjar cmadjar added this to Other issues on the repo in Fall 2020 roadmap Sep 23, 2020
@cmadjar cmadjar moved this from Other issues on the repo to Deadline: end of December in Fall 2020 roadmap Sep 30, 2020
@cmadjar
Copy link
Collaborator

cmadjar commented Sep 30, 2020

Discussed briefly at the CONP dev call of September 30th, 2020.

Will focus and split that issue in smaller tasks at the next CONP dev call (October 7th).

@glatard should we invite people from the CBRAIN team to the next CONP dev call to discuss the plan? If so, who should be invited?

@glatard glatard changed the title Pipeline CBRAIN execution on CONP datasets Pipeline execution on CONP datasets Oct 7, 2020
@glatard
Copy link
Contributor Author

glatard commented Oct 7, 2020

Here are a few possible actions regarding this issue, organized in four Goals summarized below. All goals can be worked on in parallel, except Goal 3 as it depends on 1 and 2.

Screenshot from 2020-10-07 08-28-29

Goal 1: Run CONP pipelines in CBRAIN

Tasks

  1. Make sure that all CONP pipelines that are available in CBRAIN appear as such in the CONP portal.
  2. When user click on the CBRAIN button in a CONP pipeline, redirect to the pipeline launch page instead of the generic CBRAIN login.

How

Point 2 most likely requires storing a CBRAIN tool config id for each pipeline, preferably as a config file also available on GitHub for easier update. This design would also solve point 1, as a pipeline will be assumed to be installed in CBRAIN if and only if it has a valid tool config id. When registering config ids, one should make sure that they match the exact same pipeline (boutiques descriptor) than registered in CONP.

Who

CONP developers (@cmadjar, @mandana-mazaheri), liaise with @natacha-beck to get tool config ids.

Goal 2: Process CONP datasets in CBRAIN

Task

  • Create a CBRAIN data provider for the whole CONP dataset, access datasets on this data provider or create a CBRAIN data provider for each CONP dataset.
  • Store a CBRAIN data provider id for each dataset.
  • In each dataset page, add a link to redirect to the CBRAIN dataset page in the CBRAIN portal.

How

The ideal solution would be to use CBRAIN's DataLad data provider. Otherwise, install and download the datasets on a server (suggestion: Beluga, to facilitate processing), and register this location as a regular CBRAIN data provider. Make sure that simple pipelines (Diagnostics) can be run on the files. In any case, new datasets should be created automatically (either create a new data provider or register new files to an existing data provider).

The CBRAIN data provider id should be stored using a mechanism similar to the one used to store CBRAIN tool config ids (see previous point). Suggestion: JSON file available in the portal config on GitHub.

Who

This is on the CBRAIN roadmap. Need to make sure that the CBRAIN datalad provider works as expected.
Liaise with CONP developers for DataLad expertise.

Notes

Something specific has to be done for datasets that require authentication. The CBRAIN team will manually configure permissions.

Goal 3: Process CONP datasets in CBRAIN using CONP pipelines

Tasks

  • In the CONP portal, create an interface to select a pipeline from a dataset, and/or to select a dataset from a pipeline
  • From this interface, redirect to a pre-populated CBRAIN launch form

How

Needs discussion, it might be a bit tricky, as fine-grained file selection in the dataset might be necessary.

Who

CONP portal developers: @liamocn, @xlecours

Goal 4: Analytics on pipeline execution

Task

  • Create a dashboard of CONP pipeline executions on CONP datasets. This dashboard would track executions done in and outside of CBRAIN.

How

  • Regularly upload Boutiques provenance from CBRAIN and any other execution platform.
  • Pull Boutiques provenance records and present them in graphs

Who

@mandana-mazaheri for the provenance dashboard, liaise with @nbeck for provenance upload from CBRAIN.

@cmadjar
Copy link
Collaborator

cmadjar commented Oct 7, 2020

@cmadjar cmadjar added this to Deadline: end of December in Winter 2021 Dec 11, 2020
@cmadjar cmadjar moved this from Leftovers from the fall to Deadline: end of February in Winter 2021 Jan 13, 2021
@cmadjar cmadjar removed this from Deadline: end of February in Winter 2021 Jan 13, 2021
@cmadjar cmadjar added this to Deadline: end of May in Spring 2021 Apr 15, 2021
@cmadjar cmadjar moved this from Deadline: end of May to Leftover issues from winter roadmap in Spring 2021 Apr 15, 2021
@cmadjar cmadjar moved this from Leftover issues from winter roadmap to Other issues on repos in Spring 2021 Apr 15, 2021
@cmadjar cmadjar removed this from Other issues on repos in Spring 2021 Apr 15, 2021
@cmadjar cmadjar closed this as completed Apr 30, 2021
Data Portal Developments automation moved this from To do to Done Apr 30, 2021
@cmadjar
Copy link
Collaborator

cmadjar commented Apr 30, 2021

ooooops, closed the wrong issue.

@cmadjar cmadjar reopened this Apr 30, 2021
Data Portal Developments automation moved this from Done to In progress Apr 30, 2021
@github-actions
Copy link

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

@github-actions github-actions bot added the Stale label Sep 28, 2021
@github-actions
Copy link

This issue was closed because it has been stalled for 3 months with no activity.

Data Portal Developments automation moved this from In progress to Done Dec 27, 2021
@cmadjar cmadjar reopened this Jan 4, 2022
Data Portal Developments automation moved this from Done to In progress Jan 4, 2022
@cmadjar cmadjar removed the Stale label Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Pipelines (Tools)
  
Awaiting triage
Fall 2020 roadmap
Deadline: end of December
Development

No branches or pull requests

5 participants