Should mention, the coolest part is that I never sent over "all" the memory used...

zozbot234 · on June 21, 2024

> Instead, I was clever with virtual memory, and when a page of memory was needed that wasn't loaded by the recipient Pi, it would request and lazy-load just that page from the provider Pi, and with some careful bookkeeping mark that the page was owned by the recipient Pi.

I wonder if that "trick" can be extended to a full implementation of distributed shared memory, i.e. multiple nodes running separate tasks in a single address space and implementing cache coherence over the network. Probably needs quite a bit of extra compiler/runtime support so it wouldn't really apply to standard binaries, but it might still be useful nonetheless.

nradclif · on June 21, 2024

Partitioned Global Address Space (PGAS) compilers/runtimes do something similar to that. Unified Parallel C (UPC,https://upc.lbl.gov/) and Coarray Fortran/Coarray C++ (https://docs.nersc.gov/development/programming-models/coarra...) are good examples commonly used in HPC. Fabric Attached Memory (OpenFAM, https://openfam.github.io/) is another example.

pklausler · on June 22, 2024

“commonly used in HPC” is a bit of a stretch if you’re talking about production applications.

arjvik · on June 22, 2024

That's actually basically what I was doing! Was able to run programs compiled for a "normal" OS on a single unified distributed virtual address space!

alksjdalkj · on June 22, 2024

Sounds like a cool class project! If I understand your approach correctly, this is how live virtual machine migration typically works (e.g., https://none.cs.umass.edu/~shenoy/courses/spring18/readings/...). It also sounds similar to this "remote-fork" concept: https://www.usenix.org/system/files/osdi23-wei-rdma.pdf.