Inference JSONDecoderError #146

sangyu · 2024-01-05T14:17:15Z

Hello,

I'm trying to use m6anet to detect methylation in RNA002 datasets. I managed to dataprep and generated eventaling.index, data.log, data.json and data.info without error.

When I ran m6anet inference, however, I got a JSONDecoderError. Specifically:

(m6anet37) C:\Users\xusy>m6anet inference --input_dir F:/2023-10-11/ --out_dir F:/2023-10-11/ --n_processes 4 --num_iterations 1000
Traceback (most recent call last):
File "C:\Users\xusy.conda\envs\m6anet37\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "C:\Users\xusy.conda\envs\m6anet37\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\xusy.conda\envs\m6anet37\Scripts\m6anet.exe_main.py", line 7, in
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet_init.py", line 30, in main
args.func(args)
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\scripts\inference.py", line 100, in main
ds = NanopolishDS(input_dir[0], DEFAULT_MIN_READS, args.norm_path, mode='Inference')
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\utils\data_utils.py", line 100, in init
self.set_feature_indices()
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\utils\data_utils.py", line 109, in set_feature_indices
self.total_neighboring_features = self.get_total_neighboring_features()
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\utils\data_utils.py", line 149, in get_total_neighboring_features
kmer, _ = self._load_data(self.data_fpath, tx_id, tx_pos, start_pos, end_pos)
File "C:\Users\xusy.conda\envs\m6anet37\lib\site-packages\m6anet\utils\data_utils.py", line 185, in load_data
pos_info = json.loads(json_str)[tx_id][str(tx_pos)]
File "C:\Users\xusy.conda\envs\m6anet37\lib\json_init.py", line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\xusy.conda\envs\m6anet37\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 38152)

Please help. Thanks!

yuukiiwa · 2024-01-24T01:21:16Z

Hi @sangyu,

The json file has a corrupted line. Can you try the following script with a directory mkdir fixed in your data.json repo:

import ujson
out_json=open('fixed/data.json','w')
out_info=open('fixed/data.info','w')
out_info.write('transcript_id,transcript_position,start,end,n_reads'+'\n')
for ln in open('data.json','r'):
 if len(ln.split('ENST')) > 1:
     try:
         d=ujson.loads(ln)
         pos_start=out_json.tell()
         out_json.write(ln.strip()+'\n')
         pos_end=out_json.tell()
         id=list(d.keys())[0]
         pos=list(d[id].keys())[0]
         kmer=list(d[id][pos].keys())[0]
         n_reads=str(len(d[id][pos][kmer]))
         out_info.write(','.join([id,pos,str(pos_start),str(pos_end),n_reads])+'\n')
     except ujson.JSONDecodeError:
         lns=ln.split('ENST')
         for i in range(1,len(lns)):
             if i == 1:
                ln=''.join([lns[0],'ENST',lns[1].strip('{"')])
                d=ujson.loads(ln)
                pos_start=out_json.tell()
                out_json.write(ln+'\n')
                pos_end=out_json.tell()
                id=list(d.keys())[0]
                pos=list(d[id].keys())[0]
                kmer=list(d[id][pos].keys())[0]
                n_reads=str(len(d[id][pos][kmer]))
                out_info.write(','.join([id,pos,str(pos_start),str(pos_end),n_reads])+'\n')
             else:
                ln='{"ENST'+lns[i].strip('{"')
                d=ujson.loads(ln)
                pos_start=out_json.tell()
                out_json.write(ln+'\n')
                pos_end=out_json.tell()
                id=list(d.keys())[0]
                pos=list(d[id].keys())[0]
                kmer=list(d[id][pos].keys())[0]
                n_reads=str(len(d[id][pos][kmer]))
                out_info.write(','.join([id,pos,str(pos_start),str(pos_end),n_reads])+'\n')
out_json.close()
out_info.close()

Thanks!

Best wishes,
Yuk Kei

sangyu · 2024-01-30T06:06:39Z

Dear Yuk Kei,

Thank you for your response, and for this very nice package! I will try the steps to see if it solves the problems with my files. These files were generated by a windows PC. Maybe that was the problem in the first place.

I also worked out that when I tried generating the same data.json on a Mac, it had a much small file size. These files then could participate in downstream processing normally. My MacBook, tho, is too small to process some of the large nanopolish eventalign files. Do you guys have plans to upgrade the package to support the M2 Mac (no Python 3.7 or PyTorch 1.6.0)?

Sangyu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference JSONDecoderError #146

Inference JSONDecoderError #146

sangyu commented Jan 5, 2024

yuukiiwa commented Jan 24, 2024

sangyu commented Jan 30, 2024

Inference JSONDecoderError #146

Inference JSONDecoderError #146

Comments

sangyu commented Jan 5, 2024

yuukiiwa commented Jan 24, 2024

sangyu commented Jan 30, 2024