parallel programming with threads and tasks

The simplest way to run computations in parallel is to use parallelApply. This works like apply(BasicList,Function), except that it uses all your cores, and always returns a List.

i1 : parallelApply(1..10, n -> n!)

o1 = {1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800}

o1 : List

There is some overhead to parallelism, so this will only speed things up for a big computation. If the list is long, it will be split into chunks for each core, reducing the overhead. But the speedup is still limited by the different threads competing for memory, including cpu caches; it is like running Macaulay2 on a computer that is running other big programs at the same time. We can see this using elapsedTime.

i2 : L = random toList (1..10000);

i3 : elapsedTime         apply(1..100, n -> sort L);
 -- 0.594508 seconds elapsed

i4 : elapsedTime parallelApply(1..100, n -> sort L);
 -- 0.200211 seconds elapsed

You will have to try it on your examples to see how much they speed up.

Warning: Threads computing in parallel can give wrong answers if their code is not "thread safe", meaning they make modifications to memory without ensuring the modifications get safely communicated to other threads. (Thread safety can slow computations some.) Currently, modifications to Macaulay2 variables and mutable hash tables are thread safe, but not changes inside mutable lists. Also, access to external libraries such as singular, etc., may not currently be thread safe.

The rest of this document describes how to control parallel tasks more directly.

The task system schedules functions and inputs to run on a preset number of threads. The number of threads to be used is given by the variable allowableThreads, and may be examined and changed as follows. (allowableThreads is temporarily increased if necessary inside parallelApply.)

i5 : allowableThreads

o5 = 5

i6 : allowableThreads = maxAllowableThreads

o6 = 13

To run a function in another thread use schedule, as in the following example.

i7 : R = ZZ/101[x,y,z];

i8 : I = (ideal vars R)^2

             2             2        2
o8 = ideal (x , x*y, x*z, y , y*z, z )

o8 : Ideal of R

i9 : dogb = I -> () -> res quotient module I

o9 = dogb

o9 : FunctionClosure

i10 : f = dogb I

o10 = f

o10 : FunctionClosure

i11 : t = schedule f

o11 = <<task, created>>

o11 : Task

Note that schedule returns a task, not the result of the computation, which will be accessible only after the task has completed the computation.

i12 : t

o12 = <<task, running>>

o12 : Task

Use isReady to check whether the result is available yet.

i13 : isReady t

o13 = false

To wait for the result and then retrieve it, use taskResult.

i14 : taskResult t

       1      6      8      3
o14 = R  <-- R  <-- R  <-- R  <-- 0
                                   
      0      1      2      3      4

o14 : ChainComplex

i15 : assert instance(oo,ChainComplex)

It is possible to make a task without starting it running, using createTask.

i16 : t' = createTask f

o16 = <<task, created>>

o16 : Task

i17 : t'

o17 = <<task, created>>

o17 : Task

Start it running with schedule.

i18 : schedule t';

i19 : t'

o19 = <<task, running>>

o19 : Task

i20 : taskResult t'

       1      6      8      3
o20 = R  <-- R  <-- R  <-- R  <-- 0
                                   
      0      1      2      3      4

o20 : ChainComplex

One may use addStartTask to specify that one task is to be started after another one finishes. In the following example, G will start after F finishes.

i21 : F = createTask(() -> "result of F")

o21 = <<task, created>>

o21 : Task

i22 : G = createTask(() -> "result of G")

o22 = <<task, created>>

o22 : Task

i23 : addStartTask(F,G)

i24 : schedule F

o24 = <<task, created>>

o24 : Task

i25 : taskResult F

o25 = result of F

i26 : taskResult G

o26 = result of G

Use addCancelTask to specify that the completion of one task triggers the cancellation of another, by means of an interrupt exception.

Use addDependencyTask to schedule a task, but to ensure that it will not run until one or more other tasks finish running.

Using the functions above, essentially any parallel functionality needed can be created.

Low level C API functionality using the same scheduler also exists in the Macaulay2/system directory. It works essentially the same way as the Macaulay2 interface.

parallel programming with threads and tasks

See also

Menu