Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modification of n_read_per_site #131

Open
DayeaPark opened this issue Aug 16, 2023 · 3 comments
Open

Modification of n_read_per_site #131

DayeaPark opened this issue Aug 16, 2023 · 3 comments

Comments

@DayeaPark
Copy link

DayeaPark commented Aug 16, 2023

Hi.

I currently run m6Anet with pre-trained model (Hct116_RNA002). I wonder if there is any way that I can change the n_read_per_site from 20 to 10. I changed n_read_per_site in the model toml file (prod_pooling.toml) but has error. You any help to make modifition on read threshold will be helpful for me. Thank you.

ValueError: Length of values (86428) does not match length of index (43214)

my modified toml file looks like this.

model = "prod_sigmoid_pooling"

[[block]]
block_type = "DeaggregateNanopolish"
num_neighboring_features = 1

[[block]]
block_type = "KmerMultipleEmbedding"
input_channel = 66
output_channel = 2
num_neighboring_features = 1

[[block]]
block_type = "ConcatenateFeatures"

[[block]]
block_type = "Linear"
input_channel = 15
output_channel = 150
activation = "relu"
batch_norm = true

[[block]]
block_type = "Linear"
input_channel = 150
output_channel = 32
activation = "relu"
batch_norm = false

[[block]]
block_type = "SigmoidProdPooling"
input_channel = 32
n_reads_per_site = 10

@kristinrma
Copy link
Collaborator

Hi @DayeaPark,
Both the min_reads variable in the training config file and the n_reads_per_site variable in the model config should be changed. Sorry for the potential confusion in the documentation. Let us know if that solves your issue.

@DayeaPark
Copy link
Author

DayeaPark commented Sep 8, 2023

Thanks for the response, @kristinrma. I am using conda environment to install m6Anet. Could you let me know where I can find the both training config and model config? I created model config and provide the file when I run m6Anet inference, however I cannot find where I can change the training config.

I guess I can make training model again providing training config to run m6Anet-train. However I have problem in the step when I run this.
In training config, I need to edit root_dir and nor_path. Since I am using your preset data, I used your norm_path data (norm_factors_hct116.joblib) but I cannot find the labelled file for root_dir. In your description. I need to provide the root_dir which includes data.info.labelled file and data.json file. Could you let me know where I can find the those files?

I just want to use your pre-trained model with n_read threshold as 1. If you have any idea to solve this problem without re-training model, please let me know. Thank you.

My training config looks like this.
[loss_function]
loss_function_type = "binary_cross_entropy_loss"

[dataset]
root_dir = "/path/to/m6anet/m6anet/tests/data/"
min_reads = 10
norm_path = "/path/to/m6anet/m6anet/model/norm_factors/norm_factors_hct116.joblib"
num_neighboring_features = 1

[dataloader]
[dataloader.train]
batch_size = 256
sampler = "ImbalanceOverSampler"

[dataloader.val]
batch_size = 256
shuffle = false

[dataloader.test]
batch_size = 256
shuffle = false

Thanks for your help!

@kristinrma
Copy link
Collaborator

Hi @DayeaPark,

Glad you were able to find the sample training config and modify it. To replicate the pre-trained model with the minimum number of reads at 10, I would suggest running m6Anet dataprep using the SGNex Hct116 Rep2 Run1 dataset as this was the original dataset used to train m6Anet. A tutorial on how to retrieve files from the SGNex AWS S3 bucket can be found here https://github.com/GoekeLab/sg-nex-data/blob/master/docs/AWS_data_access_tutorial.md. After you generate the data prep files, you can follow the m6Anet training documentation to create data.info.labelled from your data.info set; then set root_dir to your dataprep folder. Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants