Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help]: Memory issues when creating interferograms with large stacks #94

Open
benwright-davra opened this issue Feb 6, 2024 · 13 comments

Comments

@benwright-davra
Copy link

Describe the problem you met
Hi Alexey, I'm having trouble converting my existing code to allow me to use some of your new great features. I seem to be failing to even generate the stack of interferograms. I'm trying on 3 subswaths, with a 20-stack of SLC images to test, but I'm planning on testing with a larger time series in the future. I've tried both the methods listed in your notebooks (namely the dask persist approach and the standard way of simply calling the command on a decimated output). I've also subscribed to your patreon and your gpt service in the hopes that they could answer my questions without the need for a github ticket but so far I've not been able to resolve it. Previously I would be able to limit the number of jobs for interferogram generation/ unwrapping etc which allowed me to generate ~100 stack slc (3 subswath) relatively easily on my system, anything more and previously the SBAS command (sbas.sbas_parallel) would be my main bottleneck - again with memory issues.

I would have thought that iteration through these interferograms/ splitting the jobs would be the solution, however I'm concerned that as the output won't be a single data cube, that will cause issues with later commands/ would not be possible currently. Is there any possibility of re-including the n_jobs feature?

The main suggestion from the GPT is to create a spill folder for dask (which I have done and it still fails).

OS and software version
If applicable, what operating system and software version you are using?
OS: Linux Mint 21.1 Cinnamon
docker: mobigroup/pygmtsar:latest

System specs
processor: 12th Gen Intel© Core™ i7-12700H × 14
ram: 32GB
2TB storage

Log file
memory_issue.txt

Code example
pygmtsar_tests.txt

@AlexeyPechnikov
Copy link
Owner

There is a special function Stack.compute_interferogram_multilook() designed to effectively compute large stacks. It splits a stack into subsets and processes them separately, saving the results on disk. You can open the saved results with Stack.open_stack(name).
By the way, ASF.download() function automatically checks if all the scenes and orbits have been downloaded and fetches the missing ones. It even has an option skip_exist=False to force the re-downloading of orbits and checks file sizes for all the scenes to fix incomplete downloading issues. And, of course, it checks if the data directory exists and creates it if necessary. There's no need to create wrappers around this function to check for the existence of the data directory or for downloaded scenes and orbits.

@benwright-davra
Copy link
Author

benwright-davra commented Feb 7, 2024

Thanks for the additional heads up about the features of ASF.download, will look into that further.

I've been running some tests since you directed me to this and with queue=1 and everything else set to default, I'm using approximately 15.5GB of RAM. Does this sound right to you? I also seem to be getting an error message occuring with some of the jobs, seemingly at random - i've attached a log file with the console messages.
interferrogram_bug.txt

I would love to see one of your notebook examples on Patreon or GitHub relate to large stack processing, is that something you are planning for the future?

@AlexeyPechnikov
Copy link
Owner

Here, I've shared some tips on how to configure a Dask cluster for processing large datasets: https://www.patreon.com/posts/new-pygmtsar-91285612

@benwright-davra
Copy link
Author

Thank you for those tips - given that my processor has been designed with 20 threads, I've increased the number of threads per worker to 8, and I'm planning on having two workers. My understanding is if I do not specifically call a --memory-limit, the memory allocation is done automatically using TOTAL_MEMORY * min(1, nthreads / total_nthreads). Meaning that I should have approximately 12.8GB memory per worker available.

I realise that I am pushing the limits with 3 subswaths being processed at once, however I'm still surprised by the amount of memory being used for a single iteration of interferogram generation when using Stack.compute_interferogram_multilook(). It's still using approximately 13-15GB for a single interferrogram meaning that I have to iterate through each pair individually which as you can imagine is taking a large amount of time. When I specify a wavelength of 200 and a resolution of 60, I can only get about 20% of the way through the processing of 4 interferograms at once before the job is killed, RAM usage builds up over time. Is this normal? Like I say I could handle 8 interferogram pairs (3 subswaths) previously.

@AlexeyPechnikov
Copy link
Owner

Probably, your Dask cluster is configured in a wrong way. You might start from the default configuration provided in the example notebooks which work well as using 2 CPU cores and 12GB RAM Google Colab instances as 8 CPU cores and 53 GB RAM Google Colab Pro instances.

@benwright-davra
Copy link
Author

benwright-davra commented Feb 14, 2024

Running further tests using the default dask config (i.e. client = Client() with no further mentions of dask) memory utilisation on my system is now even higher with approximately 17GB used by python processes when calculating a queue of 1 interferogram (3 subswaths). A reminder that i'm not using Google Colab, or Jupyter Notebooks. I'm running this as a standard python script from a linux cmd terminal (within a docker container) - but I can't imagine that should have an impact?

I've also removed mention of the chunksize being 1024 to determine whether defaults would be suitable. This doesn't seem to have had an impact of the memory usage.

@AlexeyPechnikov
Copy link
Owner

The system should utilize all the available memory and CPU cores to archive better performance. Why do you worry when the application uses available resources? The problem is when it works slow and/or uses a lot of swap file.

@benwright-davra
Copy link
Author

Sorry I should be clearer. Working with a queue of 1 is currently the only way I can process the interferrograms. Ideally I'd like to be able to have the default of 16 jobs running at once for interferogram generation not to take a significantly longer amount of time than it used to. However, unless I specify a queue of 1, processing fails due to memory reaching 100% and killing the processing.

@AlexeyPechnikov
Copy link
Owner

New PyGMTSAR (Python InSAR) Docker Images Available Now: https://www.patreon.com/posts/new-pygmtsar-now-98688871

@benwright-davra
Copy link
Author

I've been experimenting with this over the past couple of weeks using your Yamchi Dam notebook as an example. Everything works perfectly with regards to interferogram generation with larger stacks now, i'm limiting myself to one subswath and reframing so that i'm only working with one burst. I still think the new methods for interferogram generation are more memory intensive though.

However, I simply cannot generate a stack past a certain size given my memory constraints. This stack limit (approximately 40 dates) is less than half of what it used to be (and that was with 3 full subswaths of information). The bottleneck appears to be the los_displacement_mm(sbas.lstsq) command where I use: disp_sbas = sbas.los_displacement_mm(sbas.lstsq(unwrap_sbas.phase - trend_sbas, corr_sbas)) (where trend, unwrap, and corr have all undergone an stdev filtering process to only carry forward the best pairs. The problem, at least how I can see it is that I won't be able the split this processing into batches to limit memory usage due to it requiring the full 3D-stack (so I don't think something like #87 will work).

Do you have any suggestions for how to deal with this?

@AlexeyPechnikov
Copy link
Owner

A stack is chunking 2048x2048 pixels for 2D processing and 512x512 pixels for 3D operations. Using 4GB RAM per worker we can process 2000 interferograms stack.

@benwright-davra
Copy link
Author

I had been setting datagrid.chunksize = 1024 still since your suggestion with my work from the original pygmtsar, so I have removed that and now I think i'm able to process much more easily. Will using sbas.sync_cube() also help keep from storing too many objects in memory or is this simply to allow you to have each step generated to a folder?

Either way, I can process the larger stacks now so thank you!

@AlexeyPechnikov
Copy link
Owner

sbas.sync_cube() effectively stores large chunked grids as NetCDF file computing and saving them chunk per chunk and open the file as a lazy Dask data cube. If your code is not too complicated, you can process huge data cubes on low memory hosts, just provide 4+ GB RAM per worker. Small values like chunksize=1024 (or even 512) theoretically allow to have only 2 GB (or 1 GB) RAM per worker but practically Dask cannot process 2x2=4 (or 4x4=16) more tasks and it doesn’t work on large grids. The default chunk size seems to be optimal for many configurations while large chunks like 4096 can be beneficial for 32+ GB RAM hosts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants