Filestructure #41

oenu · 2022-10-02T10:48:05Z

oenu
Oct 2, 2022
Collaborator

Im working on handling files from the script, I think it would be best if we used internal file paths for our data and then offered an export. This (hopefully) keeps the chances of a user going in and messing with loosely typed data (strings in vtt for example) low.

https://github.com/electron/electron/blob/main/docs/api/app.md#appgetpathname

There is a set of standard locations that can be used, im thinking we use the appData/StageWhisper directory that returns as

%APPDATA%/StageWhisper on Windows
$XDG_CONFIG_HOME/StageWhisper or ~/.config/StageWhisper on Linux
~/Library/Application Support/StageWhisper on macOS

.
└── store/
    ├── app_preferences.json { darkmode, langauge, ... }
    ├── app_config.json { version, repo, ... }
    └── data/
        ├── {uuidv4}/
        │   ├── entry.json { inQueue, title, model, etc }
        │   ├── audio/
        │   │   ├── file.mp3 | file.opus | file.wav
        │   │   └── audio.json { addedOn, language, type }
        │   └── transcriptions/
        │       └── {uuid}/
        │           ├── transcription.json { language, transcribedOn, model, ... }
        │           ├── transcript.vtt (Use VTT as source of truth and compile to txt)
        │           └── transcript.txt
        └── 1bfb7987-da1d-4a02-87a9-e841c5dd4e29/
            ├── entry.json
            ├── audio/
            │   ├── fancy_twice_sample.opus
            │   └── audio.json {addedOn: 30-12-9999, language: Korean, tpye: Opus
            └── transcriptions/
                ├── 3a8b37f4-3a41-4522-b3cc-1c6e28a3ab75/
                │   ├── transcription.json { langauge: English, transcribedOn: 30-12-9999, model: baseEn, ... }
                │   ├── transcript.vtt
                │   └── transcript.txt
                └── 7fcf511c-f9d5-4bdc-a427-19a28f6e8ca1/
                    ├── transcription.json { langauge: Korean, transcribedOn: 30-12-9999, model: large, ... }
                    ├── transcript.vtt
                    └── transcript.txt

Let me know if you have any thoughts

It is my view that keeping files in webVTT is our best bet at offering stable audio-sync capabilities as allowing direct text editing would be a nightmare to copy across to a VTT file.

When a user wants their transcript they can export to an output directory. This also makes more sense for queuing multiple files as a user may direct output at an external/mounted drive, which given the long transcribe times, could be unplugged/unmounted before Whisper is done.

@Stage-Whisper/developers

oenu · 2022-10-02T11:09:14Z

oenu
Oct 2, 2022
Collaborator Author

Updated to reflect changes in file structure, included parameter files for audio and transcriptions to allow for state resumption

1 reply

oenu Oct 2, 2022
Collaborator Author

@sawhney17

oenu · 2022-10-04T18:46:51Z

oenu
Oct 4, 2022
Collaborator Author

Updated trascription/parameters.json -->> transcription/transcription.json

0 replies

petersterne · 2022-10-05T16:44:23Z

petersterne
Oct 5, 2022
Maintainer

Maintaining an internal database of files introduces complexity and requires significant storage, since all the audio files are effectively copied from their original locations. So what are the advantages to this approach? Is the idea that users will be able to easily see and manage transcription entries in the app, and even potentially play back the audio or edit the transcription text inside the app?

0 replies

harrislapiroff · 2022-10-05T16:51:02Z

harrislapiroff
Oct 5, 2022
Maintainer

This might deserve some user testing/research. This is something Tabula does (and I'm betting Tabula has a similar audience) but if I'm honest, I find it more annoying than useful. It's rare that I'm trying to extract data from the same file more than once, so really I just end up going through and clearing out this list from time to time.

On the other hand, transcripts are something someone might be more likely to revisit than a CSV. In combination with other interface niceities that couldn't be reproduced in a text editor (say, being able to click on a sentence in the transcription to hear it in the audio file)—I could imagine users revisiting or spending significant time with transcripts in our interface.

2 replies

oenu Oct 5, 2022
Collaborator Author

+1 on the "click the transcription hear the audio" I really like that idea

petersterne Oct 5, 2022
Maintainer

As far as user expectations, it's worth noting that many of the cloud-based transcription services that journalists routinely use (e.g. Otter and Trint) work on the database model, storing copies of your audio and transcriptions so you can listen to, review and edit previous transcriptions.

oenu · 2022-10-05T16:52:40Z

oenu
Oct 5, 2022
Collaborator Author

The main reason for us to do this is to build stability into the app. We could instead just keep the path that the audio file is on, it (would be trivial to change this) but we would have to then build in resiliency against a user moving the file or editing it. My thought was that by keeping the audio file inside the application we can ensure that we have it and know where it is.

I could add a setting "space saver" which changes the import behaviour.

The functionality benefit would be playing the exact audio file back and showing transcriptions underneath, with the option to edit the transcriptions inside the app.

I guess the question is, do we want a wrapper for Whisper or a full app that can manage transcriptions? (or one then the other?)

4 replies

oenu Oct 5, 2022
Collaborator Author

As a note to this I wasnt considering transcription audio to be that large, aren't songs are just a couple of megabytes each?

petersterne Oct 5, 2022
Maintainer

Sure, but I imagine journalists will be using the app to routinely transcribe hour-long interviews (each of which is probably 50-100MB). If someone starts using the app to transcribe all of their interviews, you'll quickly get up to multiple gigs.

oenu Oct 5, 2022
Collaborator Author

That makes sense, we can also just offer to delete the recordings when they export the transcription. My main concern is that if we rely on users keeping the audio file in the exact same place (and named the same thing) it could lead to a bad user experience. ie. Why cant I listen back to the file? I put it in your app and then got rid of it on my hard drive to save space

crazy4pi314 Oct 5, 2022
Maintainer

I think in the cases where the user moved the file/its not where the original path indicated, it is totally fine to just issue a warning that we can't find it bc it has been moved/deleted. Yeah if it was on a shared drive where I am not the only one editing it could be hard to sort out what happened, but most of the time when I see that for like a video file in an editor, I'm like "doh, I moved that file, let me change the path". I would personally much rather deal with that, than duplicate the gigs of recordings I have.

Another model I see for creative editing tools is to designate a directory on the computer and say "my media lives here. It may move around in there, but anything you are looking for lives here." That's a bit more flexible for user managing, and you can still display in the app things in the "library" without owning.

sawhney17 · 2022-10-05T17:02:26Z

sawhney17
Oct 5, 2022
Collaborator

Many PKM systems involve the direct importing of files but I feel that what’s more important is not the files itself but rather the transcripts. We can soft link to the files on the hard drive I guess. Space saver as a setting is a good compromise. About the songs, I believe that a more useful use case would be interviews and stuff like that which can span from about 5 minutes to over an hour. I spoke to my English teacher this morning and he mentioned the immense utility something like this could have for quickly reviewing past papers and assessments which are often at least 15 minutes. Plus if the files are uncompressed(should we compress them first?), it can take even larger mounts of space (well over a gigabyte per hour. )

…

On 5 Oct 2022, 8:55 PM +0400, Adam Newton-Blows ***@***.***>, wrote: As a note to this I wasnt considering transcription audio to be that large, aren't songs are just a couple of megabytes each? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

5 replies

oenu Oct 5, 2022
Collaborator Author

#41 (reply in thread)

Im concerned about issues with symlinks

petersterne Oct 5, 2022
Maintainer

If the issue is just making sure we know where the file is, then can we zip or compress the audio files to cut down on space and only uncompress them when we need to play them?

oenu Oct 5, 2022
Collaborator Author

Zipping wouldnt work for us as we would be introducing significant latency to all audio interactions, and im not sure we would even get any size benefit. It might be possible to transcribe the audio to another format such as 128kbs, but that would then be lossy (and might affect the transcription). I really dont think that many users will have a vast library of previously transcribed clips taking up gigs on their systems (I figure any computer powerful enough to even run the Whisper model will have storage to spare)

Ill do a couple audio compression tests using my m1 macbook air, if its only a few seconds to compress then ill see about implementing it

oenu Oct 5, 2022
Collaborator Author

After running some tests on my machine (I think the macs get a media encoder in the silicon so this might not be realistic for windows machines) encoding from 300MB of Flac to 5.1MB of mp3 took 7.6 seconds. So maybe the compression thing is realistic. @petersterne Which formats do journalists commonly use?

petersterne Oct 5, 2022
Maintainer

It varies, but I think it's fair to say that print and web journalists generally use lossy files like MP3 or AAC files — whatever a voice recorder or the Voice Memos app records natively in.

Podcasters and radio journalists are more likely to use uncompressed audio or lossless compressed audio (FLAC) for obvious reasons.

But MP3 files can still take up a lot of space. My "Recordings" folder filled with MP3 files of interviews is now over 6 gigs.

oenu · 2022-10-05T17:05:39Z

oenu
Oct 5, 2022
Collaborator Author

Another advantage to maintaining files in our directory is #25 (comment) being able to use our directory as an RSS output, meaning we could have jobs added automatically with the option to run them through transcription

2 replies

crazy4pi314 Oct 5, 2022
Maintainer

I think it's an interesting model/workflow (esp rss feed automation) for the content creator/consumers users to have some database features. If there is more interaction with the content in the app itself (review audio/video with subtitles added or something) then think it makes sense to manage internally files so the user doesn't accidentally change processed files. I just don't know how existing products handle this (otter?) and what the scope of our current roadmap for the near/long term stuff.

petersterne Oct 5, 2022
Maintainer

I think the cloud-based transcription services just let you upload files and then save copies in the cloud.

sawhney17 · 2022-10-05T17:44:43Z

sawhney17
Oct 5, 2022
Collaborator

Looking over this again I think we can actually work around it mean issue is the storage and I think that with the proper UI and as long as we plan for those interactions very well it's not that big of a deal we just need to give the user and ability to monitor the file size of the database, so to speak, and also to monitor which recordings are taking up the most space and stuff. I think that telegram is useful apps who's UI we can model towards. With a few affordances for things like the fact we should never automatically delete.

I think a small banner, or something on the bottom left or something to show the uses of the excuse of predefined size limit is a good idea

0 replies

petersterne · 2022-10-11T04:46:39Z

petersterne
Oct 11, 2022
Maintainer

@oenu Can you give us an update on how the filesystem currently works, what the advantages and disadvantages of the current approach are, and possible alternative approaches?

2 replies

oenu Oct 11, 2022
Collaborator Author

File system currently works:

Con fig files are stored in folders, folders are named uuid's, transcriptions are stored in subfolders etc. We use electron Main process functions to access files and complete CRUD operations. The results of this are passed to redux (the frontend state management system) and then shown in react.

CRUD:
Create - Done
Read - Partially done, currently can only read all files at once and fully overwrite redux
Update - Just implemented a bad version of this for a single line transcription. Would like to start again and take it from 200 lines to 20 (take out a bunch of code that assumes the user might be sending bad data and move it to the front end)
Delete - not working

Advantages:
Built, information is stable as truth is always on disk

Disadvantages:
Incredibly opaque, very hard to work with, no built in types/error checking (all has to be built for each thing), makes synchronous work very hard

Proposal
https://www.npmjs.com/package/lowdb
Move config files, entry information, transcription information, audio file paths to a low db database. Would allow us to access files with code like

db.data.entry.push('example')
db.data.entries[0]

instead of having to built it out.
On the disk this would look like a json file containing all of our information in one place, much much easier to work with

CRUD:
Create - Very easy to adapt existing code (would just change a fsWriteFile to a db.push)
Read - Extremely easy to adapt (again fsReadFile to db.read())
Update - easy again, would allow us to just pull changes
Delete - Would actually work vs current system

Advantages:
Easy to do, standard format, database is always better than a bunch of files,

Disadvantages:
6-12 hours of development work that would stall all other development on transcription editing

petersterne Oct 11, 2022
Maintainer

So is we use lowdb, then the edits aren't automatically written to disk. They're just stored in a json database and it's only when the user exports the transcript that you'll actually create a file (txt, vtt, srt, or even doc) that contains the edits. In other words, it's nondestructive editing.

The current transcription editing interface is nice (thanks @sawhney17!) but the feature itself isn't really functional. Being unable to delete entries is a huge problem that blocks testing and would probably take hours of work to solve anyway. So if you think that using lowdb will make it easier to provide the functionality we need, then go ahead and do it.

For reference, the functionality we want from a transcription database is:

User can import audio file(s), which will be transcribed through whisper and produce a transcription that is stored
User can (re)name transcription
User can find previously imported audio recordings by searching their names
User can read transcription and listen to audio synced with transcription
User can edit transcription text in-line
User can search the text of the transcription
User can export transcription text (time-stamped or not) to a file so that they can use it with another app or share it
User can delete recording/transcription that they no longer want to work with

Filestructure #41

oenu Oct 2, 2022 Collaborator

Replies: 9 comments · 16 replies

oenu Oct 2, 2022 Collaborator Author

oenu Oct 2, 2022 Collaborator Author

oenu Oct 4, 2022 Collaborator Author

petersterne Oct 5, 2022 Maintainer

harrislapiroff Oct 5, 2022 Maintainer

oenu Oct 5, 2022 Collaborator Author

petersterne Oct 5, 2022 Maintainer

oenu Oct 5, 2022 Collaborator Author

oenu Oct 5, 2022 Collaborator Author

petersterne Oct 5, 2022 Maintainer

oenu Oct 5, 2022 Collaborator Author

crazy4pi314 Oct 5, 2022 Maintainer

sawhney17 Oct 5, 2022 Collaborator

oenu Oct 5, 2022 Collaborator Author

petersterne Oct 5, 2022 Maintainer

oenu Oct 5, 2022 Collaborator Author

oenu Oct 5, 2022 Collaborator Author

petersterne Oct 5, 2022 Maintainer

oenu Oct 5, 2022 Collaborator Author

crazy4pi314 Oct 5, 2022 Maintainer

petersterne Oct 5, 2022 Maintainer

sawhney17 Oct 5, 2022 Collaborator

petersterne Oct 11, 2022 Maintainer

oenu Oct 11, 2022 Collaborator Author

petersterne Oct 11, 2022 Maintainer

oenu
Oct 2, 2022
Collaborator

Replies: 9 comments 16 replies

oenu
Oct 2, 2022
Collaborator Author

oenu Oct 2, 2022
Collaborator Author

oenu
Oct 4, 2022
Collaborator Author

petersterne
Oct 5, 2022
Maintainer

harrislapiroff
Oct 5, 2022
Maintainer

oenu Oct 5, 2022
Collaborator Author

petersterne Oct 5, 2022
Maintainer

oenu
Oct 5, 2022
Collaborator Author

oenu Oct 5, 2022
Collaborator Author

petersterne Oct 5, 2022
Maintainer

oenu Oct 5, 2022
Collaborator Author

crazy4pi314 Oct 5, 2022
Maintainer

sawhney17
Oct 5, 2022
Collaborator

oenu Oct 5, 2022
Collaborator Author

petersterne Oct 5, 2022
Maintainer

oenu Oct 5, 2022
Collaborator Author

oenu Oct 5, 2022
Collaborator Author

petersterne Oct 5, 2022
Maintainer

oenu
Oct 5, 2022
Collaborator Author

crazy4pi314 Oct 5, 2022
Maintainer

petersterne Oct 5, 2022
Maintainer

sawhney17
Oct 5, 2022
Collaborator

petersterne
Oct 11, 2022
Maintainer

oenu Oct 11, 2022
Collaborator Author

petersterne Oct 11, 2022
Maintainer