Modification of n_read_per_site #131

DayeaPark · 2023-08-16T16:07:11Z

Hi.

I currently run m6Anet with pre-trained model (Hct116_RNA002). I wonder if there is any way that I can change the n_read_per_site from 20 to 10. I changed n_read_per_site in the model toml file (prod_pooling.toml) but has error. You any help to make modifition on read threshold will be helpful for me. Thank you.

ValueError: Length of values (86428) does not match length of index (43214)

my modified toml file looks like this.

model = "prod_sigmoid_pooling"

[[block]]
block_type = "DeaggregateNanopolish"
num_neighboring_features = 1

[[block]]
block_type = "KmerMultipleEmbedding"
input_channel = 66
output_channel = 2
num_neighboring_features = 1

[[block]]
block_type = "ConcatenateFeatures"

[[block]]
block_type = "Linear"
input_channel = 15
output_channel = 150
activation = "relu"
batch_norm = true

[[block]]
block_type = "Linear"
input_channel = 150
output_channel = 32
activation = "relu"
batch_norm = false

[[block]]
block_type = "SigmoidProdPooling"
input_channel = 32
n_reads_per_site = 10

kristinrma · 2023-09-08T05:22:56Z

Hi @DayeaPark,
Both the min_reads variable in the training config file and the n_reads_per_site variable in the model config should be changed. Sorry for the potential confusion in the documentation. Let us know if that solves your issue.

DayeaPark · 2023-09-08T12:48:10Z

Thanks for the response, @kristinrma. I am using conda environment to install m6Anet. Could you let me know where I can find the both training config and model config? I created model config and provide the file when I run m6Anet inference, however I cannot find where I can change the training config.

I guess I can make training model again providing training config to run m6Anet-train. However I have problem in the step when I run this.
In training config, I need to edit root_dir and nor_path. Since I am using your preset data, I used your norm_path data (norm_factors_hct116.joblib) but I cannot find the labelled file for root_dir. In your description. I need to provide the root_dir which includes data.info.labelled file and data.json file. Could you let me know where I can find the those files?

I just want to use your pre-trained model with n_read threshold as 1. If you have any idea to solve this problem without re-training model, please let me know. Thank you.

My training config looks like this.
[loss_function]
loss_function_type = "binary_cross_entropy_loss"

[dataset]
root_dir = "/path/to/m6anet/m6anet/tests/data/"
min_reads = 10
norm_path = "/path/to/m6anet/m6anet/model/norm_factors/norm_factors_hct116.joblib"
num_neighboring_features = 1

[dataloader]
[dataloader.train]
batch_size = 256
sampler = "ImbalanceOverSampler"

[dataloader.val]
batch_size = 256
shuffle = false

[dataloader.test]
batch_size = 256
shuffle = false

Thanks for your help!

kristinrma · 2023-09-20T06:14:25Z

Hi @DayeaPark,

Glad you were able to find the sample training config and modify it. To replicate the pre-trained model with the minimum number of reads at 10, I would suggest running m6Anet dataprep using the SGNex Hct116 Rep2 Run1 dataset as this was the original dataset used to train m6Anet. A tutorial on how to retrieve files from the SGNex AWS S3 bucket can be found here https://github.com/GoekeLab/sg-nex-data/blob/master/docs/AWS_data_access_tutorial.md. After you generate the data prep files, you can follow the m6Anet training documentation to create data.info.labelled from your data.info set; then set root_dir to your dataprep folder. Hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modification of n_read_per_site #131

Modification of n_read_per_site #131

DayeaPark commented Aug 16, 2023 •

edited

Loading

kristinrma commented Sep 8, 2023

DayeaPark commented Sep 8, 2023 •

edited

Loading

kristinrma commented Sep 20, 2023

Modification of n_read_per_site #131

Modification of n_read_per_site #131

Comments

DayeaPark commented Aug 16, 2023 • edited Loading

kristinrma commented Sep 8, 2023

DayeaPark commented Sep 8, 2023 • edited Loading

kristinrma commented Sep 20, 2023

DayeaPark commented Aug 16, 2023 •

edited

Loading

DayeaPark commented Sep 8, 2023 •

edited

Loading