Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output format #187

Open
famda opened this issue May 16, 2024 · 3 comments
Open

Output format #187

famda opened this issue May 16, 2024 · 3 comments

Comments

@famda
Copy link

famda commented May 16, 2024

Hey!
Awesome work on this!

Is it possible to transcript/diarize and get a json output as a result file?
That would be a nice feature to have.

@MahmoudAshraf97
Copy link
Owner

Thanks, it's possible yes, there's an example in one of the branches if you want to try it, but I haven't added it to the main branch because when it comes to JSON, everyone has their own scheme and a universal scheme won't cut it, but happy to hear your suggestions

@famda
Copy link
Author

famda commented May 16, 2024

I understand. I think is just a matter of having structure on the response. Something that can be deserialized.
I was also testing this which is kinda wrapper api around whisper.
That API gives you the possibility of getting the type of format you want to receive (text, json, ...).

with the possibility of passing an argument like --output_format [json, srt, text, or whatever]

My idea was to have something like this (just a suggestion if it makes sense):

{
    "text": "Hi, my name is Test.",
    "speaker": "Speaker 0",
    "segments": [
        {
            "id": 0,
            "seek": 0,
            "start": 0.0,
            "end": 5.4,
            "text": "Hi, my name is Test.",
            "tokens": [ 
                  double array
            ],
            "temperature": 0.0,
            "avg_logprob": -0.19734466075897217,
            "compression_ratio": 1.7903780068728523,
            "no_speech_prob": 0.1006949171423912,
            "words": [
                {
                    "word": " Hi,",
                    "start": 0.0,
                    "end": 0.64,
                    "probability": 0.7109836935997009
                },
                {
                    "word": " my",
                    "start": 0.88,
                    "end": 1.08,
                    "probability": 0.9681467413902283
                },
                {
                    "word": " name",
                    "start": 1.08,
                    "end": 1.22,
                    "probability": 0.9989060163497925
                },
                {
                    "word": " is",
                    "start": 1.22,
                    "end": 1.38,
                    "probability": 0.9960727691650391
                },
                {
                    "word": " Test.",
                    "start": 1.38,
                    "end": 1.62,
                    "probability": 0.8055099844932556
                }
            ]
        }
    ],
    "language": "en"
}

What do you think of this?

@MahmoudAshraf97
Copy link
Owner

Sounds reasonable, I'll work on it when I have the time, or maype open a PR if possible 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants