Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xfyun配置文件中api_address表意不明以及异常处理问题 #167

Open
freeNestor opened this issue Mar 31, 2021 · 11 comments
Open

Comments

@freeNestor
Copy link

使用代理

proxych autosub -i /data/docs/video-work/2021-03-30_22-27-26.mp4 -S cmn-hans-cn
Translation destination language not provided. Only performing speech recognition.

Convert source file to "/tmp/tmpqme141s5.wav" to detect audio regions.
/usr/bin/ffmpeg -hide_banner -y -i "/data/docs/video-work/2021-03-30_22-27-26.mp4" -vn -ac 1 -ar 48000 -loglevel error "/tmp/tmpqme141s5.wav"

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpqme141s5.wav" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpqme141s5.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=0:02:15.658667
size=12.419996 Mibyte
bit_rate=768.004000 Kbit/s
probe_score=99
TAG:encoder=Lavf58.45.100
[/FORMAT]

Conversion completed.
Use Auditok to detect speech regions.
Auditok detection completed.
"/tmp/tmpqme141s5.wav" has been deleted.

Converting speech regions to short-term fragments.
Converting: 100% |#########################################################################| Time:  0:00:00

Sending short-term fragments to Google Speech V2 API and getting result.
Speech-to-Text: 100% |#####################################################################| Time:  0:07:10
Speech language subtitles file created at "/data/docs/video-work/2021-03-30_22-27-26.cmn-hans-cn.srt".

All works done.

期待的行为
使用录制的视频生成字幕文件。

截图
2021-03-31_09-09

操作环境(请提供以下完整数据):

  • 操作系统: manjaro
  • Python版本: 3.9
  • Autosub版本: 0.5.7-alpha
@BingLingGroup
Copy link
Owner

BingLingGroup commented Mar 31, 2021

视频的音量是否在-10dB以上?是否有较大的背景噪音?

@freeNestor
Copy link
Author

视频的音量是否在-10dB以上?是否有较大的背景噪音?

我确认下,应该是在-10db以上。我是用OBS录制的屏幕视频,音频增加了噪音滤镜。

@freeNestor
Copy link
Author

@BingLingGroup
看结果,是不是因为平均音量小于了-10db了?

ffmpeg -i ./2021-03-30_22-27-26.mp4 -filter_complex volumedetect -c:v copy -f null /dev/null 
ffmpeg version n4.3.2 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 10.2.0 (GCC)
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librav1e --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc --enable-shared --enable-version3
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './2021-03-30_22-27-26.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
  Duration: 00:02:15.68, start: 0.000000, bitrate: 2295 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709), 2560x1440, 2161 kb/s, 30.01 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 138 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
[Parsed_volumedetect_0 @ 0x563273d59d80] n_samples: 0
Stream mapping:
  Stream #0:1 (aac) -> volumedetect
  volumedetect -> Stream #0:0 (pcm_s16le)
  Stream #0:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
Output #0, null, to '/dev/null':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
    Stream #0:0: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s (default)
    Metadata:
      encoder         : Lavc58.91.100 pcm_s16le
    Stream #0:1(und): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709), 2560x1440, q=2-31, 2161 kb/s, 30.01 fps, 30 tbr, 15360 tbn, 15360 tbc (default)
    Metadata:
      handler_name    : VideoHandler
frame= 4046 fps=0.0 q=-1.0 Lsize=N/A time=00:02:15.65 bitrate=N/A speed=1.22e+03x
video:35574kB audio:12718kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_volumedetect_0 @ 0x563273d4a880] n_samples: 6511616
[Parsed_volumedetect_0 @ 0x563273d4a880] mean_volume: -16.7 dB
[Parsed_volumedetect_0 @ 0x563273d4a880] max_volume: -0.4 dB
[Parsed_volumedetect_0 @ 0x563273d4a880] histogram_0db: 308
[Parsed_volumedetect_0 @ 0x563273d4a880] histogram_1db: 343
[Parsed_volumedetect_0 @ 0x563273d4a880] histogram_2db: 438
[Parsed_volumedetect_0 @ 0x563273d4a880] histogram_3db: 440
[Parsed_volumedetect_0 @ 0x563273d4a880] histogram_4db: 398
[Parsed_volumedetect_0 @ 0x563273d4a880] histogram_5db: 586
[Parsed_volumedetect_0 @ 0x563273d4a880] histogram_6db: 14970

@freeNestor
Copy link
Author

加大了音量,结果是一样,没有生成字幕

> ffmpeg -i ./untitled.wav -filter_complex volumedetect -c:v copy -f null /dev/null
ffmpeg version n4.3.2 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 10.2.0 (GCC)
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librav1e --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc --enable-shared --enable-version3
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from './untitled.wav':
  Metadata:
    encoder         : Lavf58.45.100
  Duration: 00:02:15.70, bitrate: 1536 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
[Parsed_volumedetect_0 @ 0x558c63af9800] n_samples: 0
Stream mapping:
  Stream #0:0 (pcm_s16le) -> volumedetect
  volumedetect -> Stream #0:0 (pcm_s16le)
Press [q] to stop, [?] for help
Output #0, null, to '/dev/null':
  Metadata:
    encoder         : Lavf58.45.100
    Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
    Metadata:
      encoder         : Lavc58.91.100 pcm_s16le
size=N/A time=00:02:15.70 bitrate=N/A speed=2.67e+03x    
video:0kB audio:25444kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_volumedetect_0 @ 0x558c63b3f600] n_samples: 13027200
[Parsed_volumedetect_0 @ 0x558c63b3f600] mean_volume: -4.9 dB
[Parsed_volumedetect_0 @ 0x558c63b3f600] max_volume: -0.0 dB
[Parsed_volumedetect_0 @ 0x558c63b3f600] histogram_0db: 3408384

> proxych autosub -i untitled.wav -S cmn-hans-cn                                   
Translation destination language not provided. Only performing speech recognition.

Convert source file to "/tmp/tmpo1y3ifcc.wav" to detect audio regions.
/usr/bin/ffmpeg -hide_banner -y -i "untitled.wav" -vn -ac 1 -ar 48000 -loglevel error "/tmp/tmpo1y3ifcc.wav"

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpo1y3ifcc.wav" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpo1y3ifcc.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=0:02:15.700000
size=12.423780 Mibyte
bit_rate=768.004000 Kbit/s
probe_score=99
TAG:encoder=Lavf58.45.100
[/FORMAT]

Conversion completed.
Use Auditok to detect speech regions.
Auditok detection completed.
"/tmp/tmpo1y3ifcc.wav" has been deleted.

Converting speech regions to short-term fragments.
Converting: 100% |#########################################################################| Time:  0:00:00

Sending short-term fragments to Google Speech V2 API and getting result.
Speech-to-Text: 100% |#####################################################################| Time:  0:00:20
Speech language subtitles file created at "untitled.cmn-hans-cn.srt".

All works done.

@BingLingGroup
Copy link
Owner

我刚才测试了一下,一模一样的参数,也是 -S cmn-hans-cn,生成了字幕,没有问题。
你需要检查一下你的代理服务器,可能是谷歌不接受你的代理服务器发出的请求。这是很常见的,比如之前本项目的actions中曾有一段语音识别的代码以测试程序能否正常运行,就因为测试服务器似乎不能从api中得到请求所以总是识别失败,所以后来删掉了那段代码。
关于这个api,我不太清楚你有没有看原项目 agermanidis#111 ,因为是别人逆向出来的免费api,所以本项目并不负责这个api的有效性,如果哪一天谷歌突然关掉这个api,或者这个api封锁了某些ip,那么本项目并不会负责这些事情的处理。

@BingLingGroup
Copy link
Owner

当然你也可以使用autosub自带的设置proxy环境变量的参数来使用代理,不过这个方法就和你在终端里自己输入是一样的,相关代码 https://github.com/BingLingGroup/autosub/blob/dev/autosub/__init__.py#L47-L57

@freeNestor
Copy link
Author

@BingLingGroup
我后来换成xfyun接口去测试(我注册了一个新账号,xfyun config都没问题),发现是一样的。感觉autosub没去调api接口,但也没报错。不知道是哪里的问题,还有哪里可以看日志吗?

autosub -i /docs/video-work/2021-03-30_22-27-26.mp4 -sapi xfyun -sconf /docs/video-work/xfyun-config
Convert source file to "/tmp/tmpyb2u0kdm.wav" to detect audio regions.
/usr/bin/ffmpeg -hide_banner -y -i "/docs/video-work/2021-03-30_22-27-26.mp4" -vn -ac 1 -ar 48000 -loglevel error "/tmp/tmpyb2u0kdm.wav"

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpyb2u0kdm.wav" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpyb2u0kdm.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=0:02:15.658667
size=12.419996 Mibyte
bit_rate=768.004000 Kbit/s
probe_score=99
TAG:encoder=Lavf58.45.100
[/FORMAT]

Conversion completed.
Use Auditok to detect speech regions.
Auditok detection completed.
"/tmp/tmpyb2u0kdm.wav" has been deleted.

Converting speech regions to short-term fragments.
Converting: 100% |#########################################################################| Time:  0:00:00

Sending short-term fragments to Xun Fei Yun WebSocket API and getting result.
Speech-to-Text: 100% |#####################################################################| Time:  0:00:00
Speech language subtitles file created at "/docs/video-work/2021-03-30_22-27-26.zh_cn.srt".

All works done.

@BingLingGroup
Copy link
Owner

那就是你的网络问题了,怎么可能没调接口,我无法复现

@BingLingGroup
Copy link
Owner

#168 这位显然用讯飞没啥问题,说明不是程序的问题。我大概至少半年前也用过目前repo里关于讯飞识别的代码,也是没问题的,可以说代码就是没问题的。

你可以考虑自己装开发环境单步调试。

@freeNestor
Copy link
Author

freeNestor commented Apr 2, 2021

解决了,看了下api_xfyun.py源码,原来是api_address这块儿的问题。以为config里配置的api_address是完整的uri,没想到api_address只是域名,然后代码里拼接而已。-_-!!

request_location = "/v2/iat"
    result_url = "wss://" + api_address + request_location

不过,这都没报错,建议代码给点错误提示

@BingLingGroup
Copy link
Owner

好吧,你issue没按照格式填,我没法定位你的错误。

- 如果使用了config文件,请附上config文件,但注意去掉与账号有关的信息。

这就是个readme的问题,我又看了下代码,之所以只用了域名而没有后面的路径,主要是鉴权的时候用的是域名,而且实际上三种请求地址,不一样的地方也只有域名。另外,关于报错的事情,因为我是直接用的讯飞官方的代码,所以如果它的代码里面没有异常判断,我就没有。

改进的方法的话,首先是把api_address改为host会更好一些,我会在readme和代码里同步修改。以及我忘了在readme里面提,实际上你不填api_address,也会有默认值的,我记得最初的时候我就忘了提api_address这回事,后来加的时候又忘了说默认值,我记得多数情况下用默认值就可以了。 https://github.com/BingLingGroup/autosub/blob/dev/autosub/core.py#L499-L502

至于异常的情况,实际上也可以使用-of full-src来查看api返回的完整响应。当然以后会考虑直接在代码里判断。

@BingLingGroup BingLingGroup reopened this Apr 3, 2021
@BingLingGroup BingLingGroup changed the title 命令运行成功生成的字幕文件只有时间线,没有字幕 xfyun配置文件中api_address表意不明以及异常处理问题 Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants