14. Multicore parallelization

FORCESPRO supports two levels of multicore parallelism:

  • Internal parallelism (Internal parallelism): the work for a solver is distributed over multiple cores

  • External parallelism (External parallelism): a solver is run multiple times on multiple cores with different inputs

For combining both levels of parallelism, see section Combining external and internal parallelism.

14.1. Internal parallelism

FORCESPRO supports the computation on multiple cores, which is particularly useful for large problems and long horizons (the workload is split along the horizon to multiple cores). This is implemented by the use of OpenMP and can be switched on by using

codeoptions.parallel = 1;

By default multicore computation is switched off.

When the parallel option is enabled with 1 (codeoptions.parallel = 1), the maximum number of threads to be used is set as the maximum number of threads available to OpenMP (max_number_of_threads = omp_get_max_threads()). Additionally, a runtime parameter num_of_threads is created to control the number of threads in runtime. The allowed range of values for the runtime parameter is [1, max_number_of_threads]. Leaving the parameter unset or setting a value outside the allowed range will lead in execution with the maximum number of threads (max_number_of_threads).

The maximum number of threads can also be set manually during code generation by setting:

% <max_number_of_threads> larger than 1
codeoptions.parallel = <max_number_of_threads>;

14.2. External parallelism

External parallelism can be enabled only on the level of the C interface (see High-level interface and C interface: memory allocations). In order to execute multiple calls to the same generated solver in parallel, the solver is required to be thread-safe. Thread-safety can be ensured by setting the option

codeoptions.threadSafeStorage = 1;

The solver can be called in parallel by assigning an independent memory buffer to each thread as in the following code snippet:

/* each of the NUM_THREADS threads must be assigned its own memory buffer */
char * mem[NUM_THREADS];
FORCESNLPsolver_mem * mem_handle[NUM_THREADS];

/* input & output for each of the NUM_SOLVERS solvers */
FORCESNLPsolver_params params[NUM_SOLVERS];
FORCESNLPsolver_info info[NUM_SOLVERS];
FORCESNLPsolver_output output[NUM_SOLVERS];
int exit_code[NUM_SOLVERS];

/* create memory buffer for each thread */
for (i=0; i<NUM_THREADS; i++)
{
    mem[i] = malloc(mem_size);
    mem_handle[i] = FORCESNLPsolver_external_mem(mem[i], i, mem_size);
}

/* parallel call to the solver using OpenMP */
#pragma omp parallel for
for (i_solver=0; i_solver<NUM_SOLVERS; i_solver++)
{
    int i_thread = omp_get_thread_num();
    exit_code[i_solver] = FORCESNLPsolver_solve(&params[i_solver], &output[i_solver], &info[i_solver], mem_handle[i_thread], ...);
}

/* free user-allocated memory */
for (i = 0; i < NUM_THREADS; i++)
{
    free(mem[i]);
}

If you run multiple concurrent solvers and if you’re interested only in one solution, you can use the real-time parameter solver_exit_external to terminate all other solvers from the thread that converges fastest. See section Early-terminate solver for more information.

Important

Special care has to be taken when using solvemethod = 'SQP_NLP' with problem.reinitialize = 0 or solvemethod = 'QP_FAST' with problem.warmstart = 1 in parallel because the memory buffer stores the current state of the solver between consecutive calls. Therefore, the memory buffers have to be allocated per solver and not per thread, so that there are NUM_SOLVERS buffers.

Important

When using the code-generated integrators (see section Code-generated integrators) within a multithreaded environment, you will have to specify via the option nlp.max_num_threads the maximum number of threads on which you wish to run the solver in parallel. For instance, if running the solver on a maximum of 5 threads in parallel one would set

codeoptions.nlp.max_num_threads = 5;

Note

Solvers with binary (Binary constraints) or integer variables (Mixed-integer nonlinear solver) are not yet thread safe.

Alternatively, the internal memory interface (see Internal memory) also supports thread safety, but with less flexibility and with a hard limit on the number of memory buffers (see Table 13.1). This functionality is not covered here but you can get started by consulting the example BasicExample_internal_mem_multithreading.c in the examples\StandaloneExecution\C folder that comes with your client.

14.3. Combining external and internal parallelism

In order to combine external and internal parallelism on m internal threads and n external threads (so that m*n threads are employed in total), you need to set the following code options:

codeoptions.parallel = m;
codeoptions.max_num_mem = n; % only for internal memory interface
codeoptions.nlp.max_num_threads = m*n; % only for code-generated integrators

If these options are set inconsistently with the number of threads, the solver will exit with exitflags -101 (for insufficient max_num_mem) or -102 (for insufficient max_num_threads).

In OpenMP, nested parallelism needs to be enabled. Depending on the compiler and OpenMP version, one or both of the following library calls are required before calling the solver:

omp_set_nested(1);
omp_set_max_active_levels(2);

Additionally, dynamic adjustment of the number of threads needs to be disabled by

omp_set_dynamic(0);