Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference JSONDecoderError #146

Open
sangyu opened this issue Jan 5, 2024 · 2 comments
Open

Inference JSONDecoderError #146

sangyu opened this issue Jan 5, 2024 · 2 comments

Comments

@sangyu
Copy link

sangyu commented Jan 5, 2024

Hello,

I'm trying to use m6anet to detect methylation in RNA002 datasets. I managed to dataprep and generated eventaling.index, data.log, data.json and data.info without error.

When I ran m6anet inference, however, I got a JSONDecoderError. Specifically:

(m6anet37) C:\Users\xusy>m6anet inference --input_dir F:/2023-10-11/ --out_dir F:/2023-10-11/ --n_processes 4 --num_iterations 1000
Traceback (most recent call last):
File "C:\Users\xusy.conda\envs\m6anet37\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "C:\Users\xusy.conda\envs\m6anet37\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\xusy.conda\envs\m6anet37\Scripts\m6anet.exe_main
.py", line 7, in
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet_init
.py", line 30, in main
args.func(args)
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\scripts\inference.py", line 100, in main
ds = NanopolishDS(input_dir[0], DEFAULT_MIN_READS, args.norm_path, mode='Inference')
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\utils\data_utils.py", line 100, in init
self.set_feature_indices()
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\utils\data_utils.py", line 109, in set_feature_indices
self.total_neighboring_features = self.get_total_neighboring_features()
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\utils\data_utils.py", line 149, in get_total_neighboring_features
kmer, _ = self._load_data(self.data_fpath, tx_id, tx_pos, start_pos, end_pos)
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\utils\data_utils.py", line 185, in load_data
pos_info = json.loads(json_str)[tx_id][str(tx_pos)]
File "C:\Users\xusy.conda\envs\m6anet37\lib\json_init
.py", line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\xusy.conda\envs\m6anet37\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 38152)

Please help. Thanks!

@yuukiiwa
Copy link
Collaborator

Hi @sangyu,

The json file has a corrupted line. Can you try the following script with a directory mkdir fixed in your data.json repo:

import ujson
out_json=open('fixed/data.json','w')
out_info=open('fixed/data.info','w')
out_info.write('transcript_id,transcript_position,start,end,n_reads'+'\n')
for ln in open('data.json','r'):
 if len(ln.split('ENST')) > 1:
     try:
         d=ujson.loads(ln)
         pos_start=out_json.tell()
         out_json.write(ln.strip()+'\n')
         pos_end=out_json.tell()
         id=list(d.keys())[0]
         pos=list(d[id].keys())[0]
         kmer=list(d[id][pos].keys())[0]
         n_reads=str(len(d[id][pos][kmer]))
         out_info.write(','.join([id,pos,str(pos_start),str(pos_end),n_reads])+'\n')
     except ujson.JSONDecodeError:
         lns=ln.split('ENST')
         for i in range(1,len(lns)):
             if i == 1:
                ln=''.join([lns[0],'ENST',lns[1].strip('{"')])
                d=ujson.loads(ln)
                pos_start=out_json.tell()
                out_json.write(ln+'\n')
                pos_end=out_json.tell()
                id=list(d.keys())[0]
                pos=list(d[id].keys())[0]
                kmer=list(d[id][pos].keys())[0]
                n_reads=str(len(d[id][pos][kmer]))
                out_info.write(','.join([id,pos,str(pos_start),str(pos_end),n_reads])+'\n')
             else:
                ln='{"ENST'+lns[i].strip('{"')
                d=ujson.loads(ln)
                pos_start=out_json.tell()
                out_json.write(ln+'\n')
                pos_end=out_json.tell()
                id=list(d.keys())[0]
                pos=list(d[id].keys())[0]
                kmer=list(d[id][pos].keys())[0]
                n_reads=str(len(d[id][pos][kmer]))
                out_info.write(','.join([id,pos,str(pos_start),str(pos_end),n_reads])+'\n')
out_json.close()
out_info.close()

Thanks!

Best wishes,
Yuk Kei

@sangyu
Copy link
Author

sangyu commented Jan 30, 2024

Dear Yuk Kei,

Thank you for your response, and for this very nice package! I will try the steps to see if it solves the problems with my files. These files were generated by a windows PC. Maybe that was the problem in the first place.

I also worked out that when I tried generating the same data.json on a Mac, it had a much small file size. These files then could participate in downstream processing normally. My MacBook, tho, is too small to process some of the large nanopolish eventalign files. Do you guys have plans to upgrade the package to support the M2 Mac (no Python 3.7 or PyTorch 1.6.0)?

Sangyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants