Julia: Parallel multiple for loop

Created on 6 Feb 2015  路  17Comments  路  Source: JuliaLang/julia

I wanted to run a parallel nested for loop. In the docs it shows how you can do a nested for loop into a single outer loop:

for i = 1:2, j = 3:4
  println((i, j))
end

and an example for parallel for loop:

nheads = @parallel (+) for i=1:200000000
  int(rand(Bool))
end

How can I combine these concepts? I tried something along the lines of:

result = @parallel (hcat) for i=1:10,j=1:10
    [i^2,j^2]
end

but I get:

ERROR: syntax: invalid assignment location

Also tried with Iterators package:

julia> result = @parallel (hcat) for p in product(0:0.1:1,0:0.1:1)
         [p[1]^2, p[2]^2]
       end
exception on 1: ERROR: MethodError: `getindex` has no method matching getindex(::Iterators.Product, ::Int64)
Closest candidates are:
  getindex(::(Any...,), ::Int64)
  getindex(::(Any...,), ::Real)
  getindex(::FloatRange{T}, ::Integer)
  ...

 in anonymous at no file:1679
 in anonymous at multi.jl:1528
 in run_work_thunk at multi.jl:603
 in run_work_thunk at multi.jl:612
 in anonymous at task.jl:6
MethodError(getindex,(Iterators.Product(Any[0.0:0.1:1.0,0.0:0.1:1.0]),1))

I am on:

julia> versioninfo()
Julia Version 0.4.0-dev+2684
Commit 8938e3a (2015-01-13 22:01 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

parallel

Most helpful comment

A possible workaround:

@parallel for (i,j) in collect(Iterators.product(1:10, 1:20))
    ....
end

All 17 comments

This would certainly be nice to have, and shouldn't be too difficult. We could start by just parallelizing the outermost loop, but in cases where there isn't enough parallelism there, we should also be able to parallelise across loop nest levels.

Cc: @amitmurthy

See the recent work in #9871 for something that's pretty close to what you need.

Relevent (but not sure how useful), OpenMP has a collapse keyword for, well, collapsing nested parallel loops.

@timholy I missed #9871. Thanks!

I am a bit confused on what @simd actually does. Can it be used to do what I want in my original post?

@simd is for exploiting vector arithmetic units, not multiple threads. It's currently limited to a single loop. Though with PR #9876 you can use it on Cartesian ranges, albeit it just vectorizes the fastest index instead of collapsing/flattening the range.

On modern processors, best results are often obtained by multi-threading the outer loop(s) and vectorizing the innermost loop.

Any news on this? It appears that @parallel still doesn't accept one line nested loops, or is there a way to achieve this by now?

You can do this manually today. Just do it over the outer loop only, or use ind2sub to convert between an overall linear index and the multidimensional/cartesian index you want.

Sorry, but I'm not sure I understand your suggestion. Would your first suggestion be to convert

@parallel for i=1:10, j=1:10, k=1:10
    ...
end

into

@parallel for i=1:10
    for j=1:10, k=1:10
        ...
    end
end

And I'm afraid that after reading the relevant entry in the docs I have no idea what ind2sub is doing exactly.

@timholy Would eachindex also be a reasonable alternative?

@nilshg The ind2sub approach:

@parallel for i in 1:length(A)
    indices = ind2sub(size(A), i)
    @show A[indices...]
end

(Since I'm using @show this will be a mess, but I hope it gets the point across.)

Any movement on this? I just ran into this today and it would be very nice to have this feature.

I am looking into the same problem. I have 2 for nested loops and I want to parallelize first for loop over workers and second loop over threads? Is this possible.

A possible workaround:

@parallel for (i,j) in collect(Iterators.product(1:10, 1:20))
    ....
end

I think the example in the docs on using shared arrays in parallel computing gives some demonstrations on how to go about parallizing multiple iterators.

@TAJD where exactly?

I think I misunderstood the problem. But I found the notes on using SharedArrays helpful for parallizing my own simulations.

Maybe the workaround I suggested earlier:

@distributed for (i,j) in collect(Iterators.product(1:10, 1:20))
    ....
end

could be the default behavior of @distributed when it encounters a multi-dimensional loop? Unfortunately I don't know enough about macros as to prepare a PR myself with this approach.

Also it is not clear what the reducer would do in this case.

Was this page helpful?
0 / 5 - 0 ratings