Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental trade download and candle building #5578

Open
aplainzetakind opened this issue Sep 15, 2021 · 7 comments
Open

Incremental trade download and candle building #5578

aplainzetakind opened this issue Sep 15, 2021 · 7 comments
Labels
Data download Issues related to fetching historical data Enhancement Enhancements to the bot. Get lower priority than bugs by default.

Comments

@aplainzetakind
Copy link

Describe your environment

  • Operating system, Python, CCTX: Running in docker on linux
  • Freqtrade Version: 2021.8

Your question

I am downloading trades from Kraken. It seems that even if I only fetch a very short time span of trades, all the past trades are loaded into memory in a dataframe, and then new files are produced from there. I have been fetching trades a couple of months at a time starting at the beginning of 2017, and as I approached the present, I started to run out of RAM with 55 million BTC/EUR trades. The process requires roughly 30GB of memory, and because I ran out I had to create a swap file and use that, which of course made everything even slower. Together with producing the candles (I'm creating all up to 1h), it has been running for about 5 hours now. This is quite untenable and I don't see myself updating my historic data any further. My questions are:

  • Is this normal or is something wrong on my side?
  • If this is indeed normal, is there a design reason for not doing things with streaming file I/O in constant memory? Also, candle conversion seems to be making a pass for each target timeframe; I imagine there'd be room to improve it to produce all target candles in one pass.
  • If there are no reasons such approaches are unacceptable, would you be open to patches attempting to improve those points?
@aplainzetakind aplainzetakind added the Question Questions - will be closed after some period of inactivity. label Sep 15, 2021
@xmatthias xmatthias added Data download Issues related to fetching historical data Enhancement Enhancements to the bot. Get lower priority than bugs by default. and removed Question Questions - will be closed after some period of inactivity. labels Sep 16, 2021
@xmatthias
Copy link
Member

Downloading (and converting) trades data into candles (ohlcv) is not a core feature - but was developed as a neccessity to work around shortcommings in the kraken API (no historic candles).

There's for sure room for improvement, so if you're up to it, feel free to improve this part (it's a time question for me - and as it's only a "kraken-problem" - i see it as not quite high priority.

As you're already pointing out (file streaming, ...) - it's probably not something that's easily / quickly done, and to avoid overlaps / mistakes in the data, there'll be for sure needs to have good tests in place.

That said ... i could also imagine a seperate "convert" command, which splits the conversion logic out (so you don't have to download data "up to now" only because you want additional 1h data (on top of the existing 5m data)).
I think that would also simplify testing (no need to wait for downloading ...) - and should probably be the starting point (obviously NOT with 30GB of data ... you'll become crazy testing with this).


The only think i'm not sure is the conversion from trades into different timeframes at the same time.
While it might sound like a good idea, i'm not sure it is:
You'll need to batch the data (say, load weekly/monthly batches) to avoid loading GB's of trades into memory at the same time - and then convert it to candles.

it'll most likely not make sense to persist the candles per batch.
while that would reduce memory load, it would increase disk latency a lot, especially on higher timeframes and when using json to store the data (Which must write the whole file again and again) - so you'd end up reading/writing the complete candles file over and over and over again.

While you can work around that different formats (hdf5 allows appending) - please consider that json is still the default data format - and must therefore be considered / supported as well.

I think this can be either left behind for later improvements (if it's seen as still causing problems).

@aplainzetakind
Copy link
Author

I missed the fact that hdf5 is a data storage option. I'm not really familiar with it but it sounds like it should be suitable to read/write without hogging memory. In that case my plan would be to hackishly convert what I already downloaded with a local script and write a proper conditional case for the candle making and append-download functions when both the trades and candles are set to hdf5. Assuming I can find the time, that is.

@xmatthias
Copy link
Member

hackishly convert

you could also simply use freqtrade convert-trade-data (with a few more arguments ...) instead of "hacking something together" ... ;)

@aplainzetakind
Copy link
Author

Oh, thanks. Sorry to not RTFM properly.

@niondir
Copy link

niondir commented Aug 13, 2022

Since I'm also facing some issues with long download times and missing any progress while downloading I have some more thoughts.

I see that appending JSON is hard, but download could use multiple JSON files (e.g. in a sub dir) so we can have e.g. 1 file per day and if needed 1 subdir per year. e.g. /user_data/kraken/dl/2022/ETH_EUR-trades-2022-01-01.json.gz etc ...
This leads to ~365 files per dir and enables to dump data for single days when done and maybe even show some progress in the log.

To stay compatible with the rest of the application, when the download finishes, it can just combine all the downloaded files fairly easy in a streaming mode.

@robog-two
Copy link

Might this bug be related?
Seems like it may be a CCXT issue, not a freqtrade issue (at least for the "missing any progress" part, which I have encountered as well. It seems like it just "forgets" what you've downloaded, which is especially frustrating in the event of a network error)
ccxt/ccxt#15827

@xmatthias
Copy link
Member

@robog-two the two are not related, no.
I'm not even sure if that ccxt issue is still relevant - as kraken behavior seems to have changed, having both this "id" and the actual timestamp to be identical - just in a slightly different precision (which is however irrelevant for pagination).


If you encounter a new issue with freqtrade, then i'd encourage you to instead open a new issue, describing your problem.
with ccxt, it's actually similar / identical. I strongly doubt that you're running into that particular ccxt issue though - as i'm not sure that (rather old) issue is still relevant, either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data download Issues related to fetching historical data Enhancement Enhancements to the bot. Get lower priority than bugs by default.
Projects
None yet
Development

No branches or pull requests

4 participants