Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练时报错 #2

Open
zzhbb2002 opened this issue Oct 17, 2022 · 5 comments
Open

训练时报错 #2

zzhbb2002 opened this issue Oct 17, 2022 · 5 comments

Comments

@zzhbb2002
Copy link

在ubuntu系统环境下训练,在训练时时提示cuda不支持complexhalf计算,请问一下是cuda安装问题吗
错误日志:
python train.py -c configs/biaobei_base.json -m biaobei_base
[INFO] {'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 1234, 'epochs': 200, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 4, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/train_filelist.txt.cleaned', 'validation_files': 'filelists/val_filelist.txt.cleaned', 'text_cleaners': ['chinese_cleaners1'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0, 'cleaned_text': True}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs/biaobei_base'}
[WARNING] /home/zzh/下载/vits-mandarin-biaobei-main is not a git repository, therefore hash value comparison will be ignored.
/home/zzh/.local/lib/python3.8/site-packages/torch/functional.py:572: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:659.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
Traceback (most recent call last):
File "train.py", line 295, in
main()
File "train.py", line 55, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/zzh/下载/vits-mandarin-biaobei-main/train.py", line 122, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/home/zzh/下载/vits-mandarin-biaobei-main/train.py", line 195, in train_and_evaluate
scaler.scale(loss_gen_all).backward()
File "/home/zzh/.local/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/zzh/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: "fill_cuda" not implemented for 'ComplexHalf'

@AlexandaJerry
Copy link
Owner

您好,确实是由于CUDA没法运行

@zzhbb2002
Copy link
Author

谢谢您的回复,但我测试了./bandwidthTest ,显示pass
测试了mnistcudnn,显示testpass
为运行visual profiler,安装了java环境,实测cuda安装后的几个软件均可进入
torch.cuda.is_available()返回true
torch 1.10.2+cu111
torchaudio 0.10.2+cu111
torchvision 0.11.3+cu111
cuda版本11.1,请问一下是版本问题吗还是其他什么问题呢?

@AlexandaJerry
Copy link
Owner

根据下方的issue是版本高的问题
pytorch/pytorch#67324
jaywalnut310/vits#15

@zzhbb2002
Copy link
Author

请问一下大佬该如何预测这个模型呢

@ttkrpink
Copy link

jaywalnut310/vits#15 (comment)
works for me torch 1.9+cuda11.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants