Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Add AssemblyAI connector #5392

Open
4 of 6 tasks
Swimburger opened this issue Mar 8, 2024 · 5 comments
Open
4 of 6 tasks

.Net: Add AssemblyAI connector #5392

Swimburger opened this issue Mar 8, 2024 · 5 comments
Assignees
Labels
.NET Issue or Pull requests regarding .NET code

Comments

@Swimburger
Copy link

Swimburger commented Mar 8, 2024

Motivation and Context

AssemblyAI is a speech AI company offering AI models through APIs.
Adding a connector will help users integrate AssemblyAI easily with Semantic Kernel.

Description

Progress of implementation of AssemblyAI connector.
Current implementation ASSEMBLYAI BRANCH

TODO

  1. AudioToTextService

Potential additions

  • Add real-time speech-to-text
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code triage labels Mar 8, 2024
@github-actions github-actions bot changed the title .NET: Add AssemblyAI connector .Net: Add AssemblyAI connector Mar 8, 2024
@Swimburger
Copy link
Author

I noticed that the IAudioToTextService.GetTextContentsAsync method returns multiple TextContent's.
We have APIs to return the transcript as sentences and another as paragraphs.
Would it make sense to add options to AssemblyAIAudioToTextExecutionSettings, which would control whether the transcript is returned as a single TextContent, or a TextContent for each sentence, or a TextContent for each paragraph?

@Krzysztof318
Copy link
Contributor

I would add to todo also full realtime transcribing, so you send AudioContent or AudioStreamContent and you get IAsyncEnumerable<StreamingTextContent>

@Swimburger
Copy link
Author

I would add to todo also full realtime transcribing, so you send AudioContent or AudioStreamContent and you get IAsyncEnumerable<StreamingTextContent>

I want to add realtime, but I want to finalize and release non-realtime transcription first.

Our realtime solution uses a WebSocket connection, expects raw audio bytes to be sent continuously, and responds with partial and final transcript objects. This is mostly consistent with other realtime transcription services.
I'd be happy to work with y'all in figuring out how to create a good abstraction that'll work for us and other realtime services.

@Swimburger
Copy link
Author

Instead of using the AudioStreamContent, I'm introducing an AssemblyAI file service for users to upload their files to AssemblyAI. #5964

In the future, we can use a streaming audio content class for Streaming STT.

@Swimburger
Copy link
Author

Now that we have the AssemblyAIAudioToTextService and AssemblyAIFileService in, I think we can release the initial version of this connector. What would the next steps be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.NET Issue or Pull requests regarding .NET code
Projects
Status: Sprint: Done
Development

No branches or pull requests

4 participants