Program/Track C/C.1.1/Distributed system for scientific and engineering computations with problem containerization and prioritization

Distributed system for scientific and engineering computations with problem containerization and prioritization

Aleksandr Sokolov, Andrey Larionov, Amir Mukhtarov

08:40, 26 Sep. 202320m

A key challenge in computer modeling is obtaining numerical results on large input datasets. For instance, this problem arises when researchers need to produce a visualization of an economic or physical process, as well as during the computation of characteristics of complex mathematical models using the Monte Carlo method. Each problem requires repeating the same program on a large set of inputs, consuming much time, from several hours to days and weeks. Computational algorithms may be implemented in different programming languages and may require various specific tools like GNU Octave, NS-3, or OMNeT++. This article describes a distributed system architecture, which can speed up the process of obtaining results for such problems. The system comprises a backend server, a control service (supervisor), worker nodes, and a database. These algorithms are executed in Docker containers to abstract from particular languages and tools required for computational algorithms. The system supports several strategies for problem prioritization to operate efficiently under heavy loads introduced by multiple users. To use the system, the user only needs to build a Docker image with an encapsulated algorithm, describe the input dataset in a JSON file, and upload them via the web interface. The system can be deployed in any public cloud. In this article, we describe the system architecture and numerical results obtained from computations on various clouds and local platforms. We demonstrate that the computation time for CPU-bound problems in different public clouds varies greatly. Finally, we show the influence of different prioritization strategies on the duration of computations under a moderate workload.