The current Cuckatoo C31 mean solvers uses 8GB of RAM (or more). The way these solvers work is that they split edges into buckets and then do
lean phase on each bucket separately. The solvers I’ve seen use 4 bytes per edge in bucket thus the total memory usage is 2^31 * 4 = 8GB.
However, there is a better way to store edges in the bucket. If edges in the bucket are sorted, then the difference between two consecutive edges is usually small and the distribution of expected values depends only on the number of buckets. If you write and read them sequentially you can store just the difference bits. You need less than 15bits to store each individual diff using some efficient bitpacking. See https://en.wikipedia.org/wiki/Arithmetic_coding for example of such coding scheme.
I’ve implemented a simple encoding as a proof of concept: https://github.com/dzetkulic/ct/blob/master/a.cc#L38
The code above only does Phase1 of edge trimming. It’s not a full solver but it can do Phase1 with 3.7GB of RAM. That’s 0.47x of what’s usually needed.
Interestingly this is orthogonal to Slean mining (https://www.grin-forum.org/t/slean-mining-mining-cuckatoo3x-using-low-memory-gpus/) and you can use two approaches together for even lower memory usage. I.e. If you do Slean with K=4 and bitpacking then you only need 2*256MB + (12GB/4)*0.47 = 1.91GB.
The bad news is that I don’t see a good way to implement this efficiently on GPUs and on CPU it’s quite slow. The good news is that I think that it would work just fine on ASIC where you can afford to do bit manipulations efficiently.
In the future we may see ASICs that use 2GB of DRAM with Slean+bitpacking instead of utilizing huge SRAM.