made training call more robust. #370

thisismygitrepo · 2024-04-17T04:40:56Z

If you have extremely large number of sql / docs/ plan / examples etc (typically above thousands). The probability of having this error becomes very large (inevitable):

HTTPSConnectionPool(host='ask.vanna.ai', port=443): Max retries exceeded with url: /rpc (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1006)')))

I solved this problem by making a robust call. There are libraries to do that, but instead of making project more complex with dependencies, I added my implementation.

Note: I made train(plan=plan) fix only. But the same needs to be done for all other sections of the train method (i.e. whereever there is a loop of api calls)

…alls of train method

thisismygitrepo · 2024-04-17T05:44:59Z

while at the for loops, one should consider adding a progress visualizer.
With thousands or more items, the user has no clue if the app is hanging or is it making progress in training.
I recommend tqdm unless there is something simpler.

zainhoda · 2024-04-21T03:12:00Z

Thanks for this -- I think you're the first user to experience this. I'd be curious how your experience was after you trained? In most other cases we usually recommend that people "start small" with a specific subset of data and then expand gradually as the accuracy improves.

thisismygitrepo · 2024-04-21T04:31:58Z

I made the same conclusion as yours, it means I'm the first one to try out on massive sql database.
To add context, I have a department of health (state-wide) database with 4k tables that is a spagetti monster and the provided train methods fail (all of them give the max_retry error due to large number of calls.).

To your question, it worked on simple queries, but for seriously complex stuff that involves signifcant amount of corporate knowledge (e.g. how many patients with dxg insulin results exceeded that level provided they went to service x over the past three months in facility y) this is when it starts to crack (using GPT4 turbo). I'm thinking more context window would improve it judging by the simple errors its making (like column doesn't exist).

I'm not sure if you are hinting at me more data may reduce accuracy.

made training call more robust. Same needs to be done for all other c…

3e5a6ab

…alls of train method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

made training call more robust. #370

made training call more robust. #370

thisismygitrepo commented Apr 17, 2024 •

edited

thisismygitrepo commented Apr 17, 2024

zainhoda commented Apr 21, 2024

thisismygitrepo commented Apr 21, 2024 •

edited

made training call more robust. #370

Are you sure you want to change the base?

made training call more robust. #370

Conversation

thisismygitrepo commented Apr 17, 2024 • edited

thisismygitrepo commented Apr 17, 2024

zainhoda commented Apr 21, 2024

thisismygitrepo commented Apr 21, 2024 • edited

thisismygitrepo commented Apr 17, 2024 •

edited

thisismygitrepo commented Apr 21, 2024 •

edited