14. Multicore parallelization¶
FORCESPRO supports two levels of multicore parallelism:
Internal parallelism (Internal parallelism): the work for a solver is distributed over multiple cores
External parallelism (External parallelism): a solver is run multiple times on multiple cores with different inputs
For combining both levels of parallelism, see section Combining external and internal parallelism.
14.1. Internal parallelism¶
FORCESPRO supports the computation on multiple cores, which is particularly useful for large problems and long horizons (the workload is split along the horizon to multiple cores). This is implemented by the use of OpenMP and can be switched on by using
codeoptions.parallel = 1;
codeoptions.parallel = 1
By default multicore computation is switched off.
When the parallel option is enabled with 1 (codeoptions.parallel = 1
), the maximum number
of threads to be used is set as the maximum number of threads available to OpenMP (max_number_of_threads = omp_get_max_threads()
).
Additionally, a runtime parameter num_of_threads
is created to control
the number of threads in runtime. The allowed range of values for the runtime
parameter is [1, max_number_of_threads]
. Leaving the parameter unset or
setting a value outside the allowed range will lead in execution with the
maximum number of threads (max_number_of_threads
).
The maximum number of threads can also be set manually during code generation by setting:
% <max_number_of_threads> larger than 1
codeoptions.parallel = <max_number_of_threads>;
# <max_number_of_threads> larger than 1
codeoptions.parallel = <max_number_of_threads>
14.2. External parallelism¶
External parallelism can be enabled only on the level of the C interface (see High-level interface and C interface: memory allocations). In order to execute multiple calls to the same generated solver in parallel, the solver is required to be thread-safe. Thread-safety can be ensured by setting the option
codeoptions.threadSafeStorage = 1;
codeoptions.threadSafeStorage = 1
The solver can be called in parallel by assigning an independent memory buffer to each thread as in the following code snippet:
/* each of the NUM_THREADS threads must be assigned its own memory buffer */
char * mem[NUM_THREADS];
FORCESNLPsolver_mem * mem_handle[NUM_THREADS];
/* input & output for each of the NUM_SOLVERS solvers */
FORCESNLPsolver_params params[NUM_SOLVERS];
FORCESNLPsolver_info info[NUM_SOLVERS];
FORCESNLPsolver_output output[NUM_SOLVERS];
int exit_code[NUM_SOLVERS];
/* create memory buffer for each thread */
for (i=0; i<NUM_THREADS; i++)
{
mem[i] = malloc(mem_size);
mem_handle[i] = FORCESNLPsolver_external_mem(mem[i], i, mem_size);
}
/* parallel call to the solver using OpenMP */
#pragma omp parallel for
for (i_solver=0; i_solver<NUM_SOLVERS; i_solver++)
{
int i_thread = omp_get_thread_num();
exit_code[i_solver] = FORCESNLPsolver_solve(¶ms[i_solver], &output[i_solver], &info[i_solver], mem_handle[i_thread], ...);
}
/* free user-allocated memory */
for (i = 0; i < NUM_THREADS; i++)
{
free(mem[i]);
}
If you run multiple concurrent solvers and if you’re interested only in one
solution, you can use the real-time parameter solver_exit_external
to
terminate all other solvers from the thread that converges fastest. See
section Early-terminate solver for more information.
Important
Special care has to be taken when using solvemethod = 'SQP_NLP'
with
problem.reinitialize = 0
or solvemethod = 'QP_FAST'
with
problem.warmstart = 1
in parallel because the memory buffer stores the
current state of the solver between consecutive calls.
Therefore, the memory buffers have to be allocated per solver and not per
thread, so that there are NUM_SOLVERS buffers.
Important
When using the code-generated integrators (see section Code-generated integrators) within a multithreaded environment, you will have to specify via
the option nlp.max_num_threads
the maximum number of threads on which you wish to run the solver in parallel. For instance, if running the solver on a maximum of 5 threads in parallel one would set
codeoptions.nlp.max_num_threads = 5;
codeoptions.nlp.max_num_threads = 5
Note
Solvers with binary (Binary constraints) or integer variables (Mixed-integer nonlinear solver) are not yet thread safe.
Alternatively, the internal memory interface (see Internal memory) also supports thread safety, but
with less flexibility and with a hard limit on the number of memory buffers
(see Table 13.1). This functionality is not covered here but you
can get started by consulting the example
BasicExample_internal_mem_multithreading.c
in the examples\StandaloneExecution\C
folder that comes with your client.
14.3. Combining external and internal parallelism¶
In order to combine external and internal parallelism on m
internal threads
and n
external threads (so that m*n
threads are employed in total),
you need to set the following code options:
codeoptions.parallel = m;
codeoptions.max_num_mem = n; % only for internal memory interface
codeoptions.nlp.max_num_threads = m*n; % only for code-generated integrators
codeoptions.parallel = m
codeoptions.max_num_mem = n # only for internal memory interface
codeoptions.nlp.max_num_threads = m*n # only for code-generated integrators
If these options are set inconsistently with the number of threads, the solver
will exit with exitflags -101
(for insufficient max_num_mem
) or -102
(for
insufficient max_num_threads
).
In OpenMP, nested parallelism needs to be enabled. Depending on the compiler and OpenMP version, one or both of the following library calls are required before calling the solver:
omp_set_nested(1);
omp_set_max_active_levels(2);
Additionally, dynamic adjustment of the number of threads needs to be disabled by
omp_set_dynamic(0);