DMTCP (Distributed MultiThreaded Checkpointing) transparently checkpoints a single-host or distributed computation in user-space -- with no modifications to user code or to the O/S. It works on most Linux applications, including Python, Matlab, R, GUI desktops, MPI, etc. It is robust and widely used (on Sourceforge since 2007). For the newest releases of DMTCP, please go to: https://github.com/dmtcp/dmtcp
Among the applications supported by DMTCP are MPI (various implementations), OpenMP, MATLAB, Python, Perl, R, and many programming languages and shell scripting languages. With the use of TightVNC, it can also checkpoint and restart X-Window applications. The OpenGL library for 3D-graphics is supported through a special plugin. It also has strong support for HPC (High Performance Computing) environments, including MPI and SLURM, through the github MANA project: https://github.com/mpickpt/mana
Features
- checkpoint-restart
- distributed computation
- user-space
- HPC (High Performance Computation)
- MPI
- SLURM