-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Help]: Memory issues when creating interferograms with large stacks #94
Comments
There is a special function |
Thanks for the additional heads up about the features of ASF.download, will look into that further. I've been running some tests since you directed me to this and with I would love to see one of your notebook examples on Patreon or GitHub relate to large stack processing, is that something you are planning for the future? |
Here, I've shared some tips on how to configure a Dask cluster for processing large datasets: https://www.patreon.com/posts/new-pygmtsar-91285612 |
Thank you for those tips - given that my processor has been designed with 20 threads, I've increased the number of threads per worker to 8, and I'm planning on having two workers. My understanding is if I do not specifically call a I realise that I am pushing the limits with 3 subswaths being processed at once, however I'm still surprised by the amount of memory being used for a single iteration of interferogram generation when using |
Probably, your Dask cluster is configured in a wrong way. You might start from the default configuration provided in the example notebooks which work well as using 2 CPU cores and 12GB RAM Google Colab instances as 8 CPU cores and 53 GB RAM Google Colab Pro instances. |
Running further tests using the default dask config (i.e. I've also removed mention of the chunksize being 1024 to determine whether defaults would be suitable. This doesn't seem to have had an impact of the memory usage. |
The system should utilize all the available memory and CPU cores to archive better performance. Why do you worry when the application uses available resources? The problem is when it works slow and/or uses a lot of swap file. |
Sorry I should be clearer. Working with a queue of 1 is currently the only way I can process the interferrograms. Ideally I'd like to be able to have the default of 16 jobs running at once for interferogram generation not to take a significantly longer amount of time than it used to. However, unless I specify a queue of 1, processing fails due to memory reaching 100% and killing the processing. |
New PyGMTSAR (Python InSAR) Docker Images Available Now: https://www.patreon.com/posts/new-pygmtsar-now-98688871 |
I've been experimenting with this over the past couple of weeks using your Yamchi Dam notebook as an example. Everything works perfectly with regards to interferogram generation with larger stacks now, i'm limiting myself to one subswath and reframing so that i'm only working with one burst. I still think the new methods for interferogram generation are more memory intensive though. However, I simply cannot generate a stack past a certain size given my memory constraints. This stack limit (approximately 40 dates) is less than half of what it used to be (and that was with 3 full subswaths of information). The bottleneck appears to be the Do you have any suggestions for how to deal with this? |
A stack is chunking 2048x2048 pixels for 2D processing and 512x512 pixels for 3D operations. Using 4GB RAM per worker we can process 2000 interferograms stack. |
I had been setting Either way, I can process the larger stacks now so thank you! |
sbas.sync_cube() effectively stores large chunked grids as NetCDF file computing and saving them chunk per chunk and open the file as a lazy Dask data cube. If your code is not too complicated, you can process huge data cubes on low memory hosts, just provide 4+ GB RAM per worker. Small values like chunksize=1024 (or even 512) theoretically allow to have only 2 GB (or 1 GB) RAM per worker but practically Dask cannot process 2x2=4 (or 4x4=16) more tasks and it doesn’t work on large grids. The default chunk size seems to be optimal for many configurations while large chunks like 4096 can be beneficial for 32+ GB RAM hosts. |
Describe the problem you met
Hi Alexey, I'm having trouble converting my existing code to allow me to use some of your new great features. I seem to be failing to even generate the stack of interferograms. I'm trying on 3 subswaths, with a 20-stack of SLC images to test, but I'm planning on testing with a larger time series in the future. I've tried both the methods listed in your notebooks (namely the dask persist approach and the standard way of simply calling the command on a decimated output). I've also subscribed to your patreon and your gpt service in the hopes that they could answer my questions without the need for a github ticket but so far I've not been able to resolve it. Previously I would be able to limit the number of jobs for interferogram generation/ unwrapping etc which allowed me to generate ~100 stack slc (3 subswath) relatively easily on my system, anything more and previously the SBAS command (sbas.sbas_parallel) would be my main bottleneck - again with memory issues.
I would have thought that iteration through these interferograms/ splitting the jobs would be the solution, however I'm concerned that as the output won't be a single data cube, that will cause issues with later commands/ would not be possible currently. Is there any possibility of re-including the n_jobs feature?
The main suggestion from the GPT is to create a spill folder for dask (which I have done and it still fails).
OS and software version
If applicable, what operating system and software version you are using?
OS: Linux Mint 21.1 Cinnamon
docker: mobigroup/pygmtsar:latest
System specs
processor: 12th Gen Intel© Core™ i7-12700H × 14
ram: 32GB
2TB storage
Log file
memory_issue.txt
Code example
pygmtsar_tests.txt
The text was updated successfully, but these errors were encountered: