Cudf: [BUG] Make detail APIs order MR parameter after stream

Created on 7 May 2020  路  12Comments  路  Source: rapidsai/cudf

Describe the bug
Internal (detail namespace) APIs that mirror public APIs often include both memory resource and stream parameters with defaulted values, and usually in this order:

cudf::detail::example(...other parameters...,
  rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource(),
  cudaStream_t stream = 0)

However, placing mr before stream is somewhat inconvenient because:

  1. The mr parameter will often use the default because the caller-specified memory resource should only be used for allocations that are returned to the caller, and not for intermediate temporary allocations.
  2. If there is no stream in the calling scope, the public API can usually be used instead.
  3. It's less onerous to explicitly specify 0 or nullptr than rmm::mr::get_default_resource().

Expected behavior
Re-order the stream and mr parameters of detail APIs to make it easier to specify a stream while keeping the implicitly defaulted memory resource. Also, the default value can be removed from the stream parameter to prevent forgetting to hook it up when using the default value for mr.

cudf::detail::example(...other parameters...,
  cudaStream_t stream, // no longer defaults to 0
  rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource())

Additionally, the TRANSITION.md guide could be updated to explain the reasoning for this ordering.

libcudf tech debt

All 12 comments

Point number 1 is quite subtle. The common case here depends on whether or not the called function is just the detail implementation of the calling function. For example

cudf::fill(..., mr) calls cudf::detail::fill(..., mr, stream=0).

In this case, you always want to pass the mr from the public API into the detail API.

But in other cases, a decision needs to be made. Example (using proposed ordering, not current ordering)

std::unique_ptr<column> cudf::api_A(..., mr) {
    return cudf::detail::api_A(..., 0, mr); // passes MR, stream set to 0 *****
}

std::unique_ptr<column> cudf::detail::api_A(..., mr, stream=0) {
    ...
    auto temp = cudf::detail::api_B(..., stream); // don't pass non-default MR because output is temp
    ...
    return cudf::detail::api_C(..., stream, mr); // pass non-def MR because output is returned
}

Note the reason we got these switched around is marked by * above. A common use of detail APIs is just to implement the public API, and in this case we always pass 0 for stream, so we often ended up putting the detail's stream param last to allow using a default of 0.

Inside detail APIs, we should always be passing stream. And since fixing the present issue would make sure stream is always before mr in the parameter list, we should probably not have a default value for stream parameters.

Good points. Also, in your * case I noticed that an explicit , 0 is frequently used anyway even where it could just be using the implicit default. For example:
https://github.com/rapidsai/cudf/blob/17ad431ac81c8f47d5b249cb8a07d643cf53eb4b/cpp/src/copying/slice.cpp#L65-L70
I suppose that's even more reason to remove the default value for stream. I'll add a note about this in the description.

Side note: this discussion made me realize that with talk of using PTDS, we should probably stop using a "raw" zero for the default stream and instead use a function that conditionally returns cudaStreamDefault or cudaStreamPerThread.

std::unique_ptr<column> cudf::detail::api_A(..., mr, stream= default_stream()) {

https://github.com/thrust/thrust/pull/717/files

@jrhemstad what do you think about replacing cudaStream_t stream with something like this? This would prevent using 0 or nullptr, but it would allow {} as a shorthand for default_stream(), as well as implictly casting to/from cudaStream_t.

class cudf_stream {
  cudaStream_t _ptr = default_stream();

 public:
  cudf_stream() = default; // {} gives default_stream()
  cudf_stream(int) = delete; // prevent cast from 0
  cudf_stream(std::nullptr_t) = delete; // prevent cast from nullptr
  cudf_stream(cudaStream_t ptr): _ptr{ptr} {} // implict cast from cudaStream_t

  operator cudaStream_t() { return _ptr; } // implicit cast to cudaStream_t
};

Edit: here's an example https://wandbox.org/permlink/BGZ5nJMkAhsrCyN2

It could even be worth considering something similar for rmm::mr::device_memory_resource* as that would make it so that rmm::mr::get_default_resource() only needs to be defined in one place and all the defaults could just be mr = {}.

I'm always a fan of strong typing. I think that makes sense in this case.

I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class?

I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class?

Great minds!

I just ran into a situation where RMM needs a strongly typed stream object:

  explicit device_uvector(std::size_t size,
                          cudaStream_t stream                 = 0,
                          rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource())

  device_uvector(std::size_t size,
                 value_type const& v,
                 cudaStream_t stream                 = 0,
                 rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource()

When you try and construct device_uvector(100,0) It's ambiguous which of the two ctors should be selected.

I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class?

@harrism so are you envisioning an _owning_ stream object? I.e., creates the stream when the object is created, destroyes when the object is destroyed?

So APIs would become:

foo(.., rmm::stream const& s);

The class would have to be moveable, but not copyable (can't have multiple objects trying to destroy the same stream handle), so it should probably just be a wrapper around a unique_ptr<cudaStream_t>.

Yeah, ultimately I was thinking about a stream pool owned by RMM. The problem I'm concerned about is an allocator that maintains a free list of blocks associated with a stream: currently that stream could get destroyed without the allocator knowing, and then that list is orphaned. Normally to "steal" those blocks from the stream you cudaStreamSynchronize(), which is UB if the stream has been destroyed.

Moreover, since cudaStream_t is just a pointer, that pointer could potentially get reused for another stream created without the RMM allocator knowing anything about it. Synchronizing this new stream to steal those blocks is actually unnecessary. So you have this free list of blocks incorrectly associated with this new stream.

If RMM owns stream creation and destruction, we can provide a way to ensure this is handled correctly and safely.

And yes, I think making an owning stream object makes sense. But it might just get/return streams from/to the pool rather than calling cudaStreamCreate(). (That can be future work of course)

Ok, so for now I can disallow construction of the rmm::stream from a cudaStream_t (while still allowing it to be cast the other way around) and just have a default constructor/factory that returns the default stream. Also, it''ll be passed around by reference as Jake suggested. I'll split this off into another PR.

Fixed by #6744

Was this page helpful?
0 / 5 - 0 ratings