-
Notifications
You must be signed in to change notification settings - Fork 245
Description
@cwindolf along with everyone else
As discussed in the NPX discord I've settled on some dredge parameters that I'm hopeful will give me good results, doing extensive testing on data snippets ranging 3-10 minutes.
Now I am attempting a run on my whole experiment file which is 12868 seconds (3.5 hours).
I am getting a lot of OOM errors in dredge and I'm working through them and would appreciate some help. A small one was in the laplacian function, at the step lap = np.zeros((n, n)) and I OOM'd needing only a MB, so I added a garbage collection and that worked and has never erred again. To me that's a normal OOM since scipy / python is bad with memory leaks and when processing these large loops garbage collection is required, and you never know where it will come up until you run long files. This is not an error that concerns me for myself or for other users.
My main issue, and what I anticipate will cause issues for many other users, are the large arrays Cs Ds Us and Ss, which are not memory leak issues but rather huge arrays that cannot be reasonably loaded into memory.
Ds and Cs are defined with Ds = np.zeros((B, T0, T1), dtype=np.float32) where, for me, it looks like both T0 and T1 are 12868 seconds, and B is my total number of spatial windows, which is 148. For float 32 this works out to 98 GB (each array) RAM or on my hard drive.
Later Ss is a modified copy of Cs and Us is a modified copy of Ss. So for my file it is 98*4 nearly 400 GB of required RAM. But I believe it is possible to make no more than 3 active copies at a time, since I believe Cs is deleted after Ss is calculated, and Ss is deleted after Us is calculated. I don't think this is implemented yet and all four copies are in memory at some point.
In the final thomas_solve step both Us and Ds are used and are upgraded to float64. This is requiring something like 196 GB of memory (per array), which is not reasonable for a single machine, even professional grade. And I've definitely run longer experiments before, 5 to 6 hours, which would require something like 1.1 TB of memory (just for these two arrays).
My current solution that I am hacking into the code is to use memmaps and write these large arrays to the disk. I am still getting errors and I do not have a functional version yet, but this should work "in theory" and simply requires a large, fast hard drive of sufficient space.
However my intuition is that there is a more optimal solution for long experiments where these arrays do not fit into memory. I think the current code is likely speed optimal, but speed optimal is not useful if it completely fails on the dataset I need to use. And I think that in general most NPX users are using much longer files that will reasonably OOM with the current implementation. Assuming a user has only 64 GB of RAM, then in the final thomas_solve step (two arrays Us and Ds, 64 bits, of size 148xTxT) they can only run a file of length 85 minutes (by my math, assuming no other memory usage beyond these variables).
Are there any ways that the speed-optimal code can be modified to be better than a memmap and still run for long files? In particular I am interested in why the entire BxTxT array needs to be in memory at one time. Especially since I have modified my time horizon to only be about 4 seconds, I would imagine that the code could be restructured to calculate motion along a sliding time window, something like BxTHxTH instead? Or perhaps the steps to go from Cs to Ss to Us could be calculated "on the fly" for each B segment of data? Of course these implementations are much slower as they would involve repeat calculations that are then discarded, but for me (and I assume many others) the file length limit is a deal-breaker, and so speed on shorter files is irrelevant if longer files completely fail.