Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converting a longer pdf - of say 7 - 8 pages #297

Open
vivekna opened this issue May 25, 2021 · 3 comments
Open

converting a longer pdf - of say 7 - 8 pages #297

vivekna opened this issue May 25, 2021 · 3 comments

Comments

@vivekna
Copy link

vivekna commented May 25, 2021

It works great for smaller chunk of texts. But if I try to convert pdfs like a 7 page pdf it always fails after a very long wait. it would run for hours and fail, wanted to know if this is meant only for smaller texts? what's the alternative??

The error would usually be a connection error.,, but that's expected if it runs for few hours to convert a seven page pdf right??

gtts.tts.gTTSError: 500 (Internal Server Error) from TTS API. Probable cause: Uptream API error. Try again later.

My code is very simple and straight-forward :

import pdftotext
from gtts import gTTS
from os.path import splitext

filelocation = "C:\\Users\\vna\\Downloads\\catch22.pdf"
with open(filelocation, "rb") as f:  # open the file in reading (rb) mode and call it f
    pdf = pdftotext.PDF(f)  # store a text version of the pdf file f in pdf variable


string_of_text = ''
for text in pdf:
    string_of_text += text

final_file = gTTS(text=string_of_text, lang='en')  # store file in variable
outname = splitext(filelocation)[0] + '.mp3'
final_file.save(outname)  # save file to computer

@pndurette
Copy link
Owner

Hi there!

Hmm, for a 7 page PDF, I'd say the cause is indeed because you're requesting a lot and eventually the server shuts you down after too many quick requests.

Unfortunately there's not any way to tell gTTS to 'slow down' the requests. So I'll add this as an enhancement.

In the meantime, if that's what's actually happening, you'll have to make separate requests with a slight sleep between them.

@vivekna
Copy link
Author

vivekna commented May 25, 2021

Yes, thats what is happening here... Thanks for the enhancement!
Do you know how we can make separate requests? probably split by page ? and will it inturn create separate mp3 files? though i can do something to join all together later.. but is there a known approach? have u tried to convert such docs?

Also wish to add, the outcome of gTTS is way better than pystxx3 . . Thanks for your effort! :)

@ickam
Copy link

ickam commented Jul 21, 2021

A related question @pndurette : would using my own API key created in google cloud shell allow me to process longer files?
If so, how do I feed it to the app if I installed it through pip?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants