This repository serves as a comprehensive resource for exploring and understanding our manuscript, Global Organelle Profiling Defines a Sub-Cellular Map of the Human Proteome. Inside, you'll find detailed Jupyter notebooks and scripts that were pivotal in our data analysis process and in generating the figures.
Our aim is to provide an in-depth, transparent view into our research methods and findings. Dive into our notebooks to see how we transformed raw data into meaningful insights, or explore our scripts to understand the technical underpinnings of our figure generation. We hope this repository will be a useful tool in your own research and learning journey.
git clone https://github.com/czbiohub-sf/Organelle_IP_analyses_and_figures.git
cd Organelle_IP_analyses_and_figures
To run the notebooks (except supplementary figure 3 panel E) use the provided conda environment:
conda env create -n OrgIP -f environment.yml
conda activate OrgIP
For a distinct run, the run timestamp has to be set first by running the 0.Set_timestamp.ipynb
(and, for figure 5, 0.Set_fig5_timestamp.ipynb
) notebook. The default setting is to look for the frozen imputed tables used to generate our figures.
If reproducing figures from frozen imputed tables, skip the enrichment
notebooks and start with Fig1
and proceed in order.
Supplementary figure 4 panel E uses a separate conda environment specification.
conda env create -n xgb2 -f notebooks/Supplementary_figures/Suppl_fig4/panel_B/environment.yml
conda activate xgb2
This directory contains various external and processed datasets required to make the figures. Note that some of these datasets are from external sources; these are found in the data/external/
subdirectory. The remaining datasets are all original datasets generated by or derived from this project. Note that some datasets, including the MaxQuant output is too large to host on GitHub. Please email the corresponding authors to obtain larget datasets. We are in the process of making large datasets available on FigShare.
This directory includes a series of Jupyter notebooks that detail the analytical processes and methodologies used in the creation of each figure. These notebooks serve as the principal guide for understanding the application of the scripts and Python modules located in the scripts/ directory, specifically focusing on their roles in analysis and figure generation.
We provide a structured tree diagram (below) representing the organization of the notebooks directory. This diagram delineates the specific notebooks responsible for generating each panel of the figures.
It is important to execute the notebooks sequentially, following the top-to-bottom order presented in the tree diagram. This ensures a smooth workflow, as many notebooks requires the output of their predecessors.
notebooks
|
├── 0.Set_timestamp.ipynb
├── enrichment
| ├── 1.QC_filter_and_impute.ipynb
| ├── 2.Batch_selection.ipynb
| ├── 3.correlation_filter.ipynb
| ├── 4.NOC_processing.ipynb
| └── output
| ├── correlation_tables
| | └── ..
| ├── enrichment_and_volcano_tables
| | └── ..
| └── preprocessing
| └── ..
├── Fig1
│ └── panel_L
│ ├── Fig1_L_heatmap.ipynb
│ └── output
| └── ..
├── Fig2
│ ├── panel_B
│ │ ├── Fig2_B_heatmap.ipynb
│ │ └── output
| | └── ..
│ ├── panel_C
│ │ ├── Fig2_C_consensus_annotation.ipynb
│ │ └── output
| | └── ..
│ └── panel_D
│ ├── Fig2_D_umap.ipynb
│ └── output
| └── ..
├── Fig3
│ ├── panels_A_B_F
│ │ ├── Fig3_A_B_F_local_k-NN_network.ipynb
│ │ └── output
| | └── ..
│ └── panels_C_D
│ ├── Fig3_C_D_cluster_connectivity_and_Jaccard_coeff.ipynb
│ └── output
| └── ..
├── Fig4
│ └── panel_D
│ └── Please_read.txt
├── Fig5
| ├── 0.Set_fig5_timestamp.ipynb
│ ├── panel_A
│ │ ├── 1.infected_enrichment
| | | ├── 1.QC_filter_and_impute.ipynb
| | | ├── 2.Batch_selection.ipynb
| | | ├── 3.correlation_filter.ipynb
| | | ├── 4.NOC_processing.ipynb
| | | └── output
| | | └── ..
│ │ ├── 2.control_enrichment
| | | ├── 1.QC_filter_and_impute.ipynb
| | | ├── 2.Batch_selection.ipynb
| | | ├── 3.correlation_filter.ipynb
| | | ├── 4.NOC_processing.ipynb
| | | └── output
| | | └── ..
│ │ └── 3.aligned_umap
│ │ ├── Fig5_A_aligned_umap.ipynb
│ │ └── output
| | └── ..
│ ├── panel_B
│ │ ├── Fig5_B_remodeling_score.ipynb
│ │ └── output
| | └── ..
│ ├── panel_C
│ │ ├── Fig5_C_umap_with_leiden_labels.ipynb
│ │ └── output
| | └── ..
│ ├── panel_D
│ │ ├── Fig5_D_trajectory.ipynb
│ │ └── output
| | └── ..
│ └── panel_E
│ ├── Fig5_E_Sankey_plot.ipynb
│ └── output
| └── ..
├── Fig6
│ └── ..
|
└── Supplementary_figures
├── Suppl_fig1
| └── ..
├── Suppl_fig2
| ├── Suppl_fig2.ipynb
│ └── output
| └── ..
├── Suppl_fig3
| ├── panel_A
| | ├── Suppl_fig3_A_IP_correlation_vs_interaction_stoi.ipynb
| | └── output
| | └── ..
| └── panel_B
| ├── Suppl_fig3_B_tenary_plots.ipynb
| └── output
| └── ..
├── Suppl_fig4
| └── panel_B
| ├── Suppl_fig4_B_XGBoost_classifier.ipynb
| └── output
| └── ..
These are Python modules that contain the bulk of the code used for data analysis and figure generation. They are used directly by the Jupyter notebooks discussed above. Please note that these scripts are explicitly written for, and specific to this project and/or the OpenCell project. They are not intended to form a stand-alone or general-purpose Python package.
This project is licensed under the BSD 3-Clause license - see the LICENSE file for details.