Spot in the code https://github.com/gfx-rs/gfx/blob/416d9ff65cd4559dcda5e640d7baf79e606be4e8/src/backend/gl/src/device.rs#L1153
Reported in https://github.com/gfx-rs/wgpu-rs/pull/10#issuecomment-499659509
cc @kyren @grovesNL
I haven't looked deep into the details, but my first thought was - why not make it do the same thing as Metal?
Backend::Memory is allocated, create a GL buffer backing itBacked::Buffer objects placed at this memory just become logical sub-ranges of this bufferThis way, we can map the memory before (regardless of) anything is bound to it. See the corresponding Metal backend code for the reference.
As soon as I'm ready to work on graphics again (within the next two days), I'm going to take a look at fixing this, since this will break what I'm doing once I rebase on gfx / wgpu / wgpu-rs master.
If somebody else wants to do it before then that's also fine, but otherwise I'm gonna take a stab at fixing this in the near future.
actual Backed::Buffer objects placed at this memory just become logical sub-ranges of this buffer
(As discussed briefly on Gitter)...
This might work if the platform allows buffer role changes, but it definitely can't work if the platform doesn't allow role changes.
This wouldn't be so bad, except that currently webgl at least ought to have the buffer_role_change capability set to false, because webgl does not allow changing the target types of element array buffers (see issue #2811). Since you can't mix element array buffer types with other types in webgl at all, this proposed technique precludes wasm32 entirely.
I don't know what the best solution here is. The only real idea I have is to maybe have !buffer_role_change imply emulate_map, and make emulate_map work such that mapping and unmapping memory can happen independently of binding a buffer to that memory. If that sounds sensible I can see how feasible that is to do? Does anybody else have any simpler ideas, or have a sense of whether this is even likely to work?
Wait, why don't we just bake this limitation into the physical device geometry that we expose? Normally, Vulkan requires there to be a way to create a buffer that can be used for anything. We already have a few limitations in GL backend, and it's not even an official target of Vulkan Portability Initiative, so let's consider the following:
Have one memory type dedicated to storing "element array buffers". When the user creates a buffer and then asks for memory requirements:
INDEX plus any transfers - we return only this dedicated memory type to be supported in the maskINDEX - we return the regular memory type as supportedINDEX is mixed with other types that require role changes - we return zero supported memory typesThat last bit would be a restriction, a deviation from Vulkan, but I think it's in a very sweet spot here for being super easy to implement for us, and at the same time fitting a lot of existing applications.
if INDEX is mixed with other types that require role changes - we return zero supported memory types
Yeah okay, that makes a lot of sense!
I don't know what other platforms' "buffer role change" restrictions really look like, I only really know anything about webgl from having run into it just recently. Are other platforms that restrict buffer role changes just limited to index buffers or is it more general than that? Would not having buffer role change capability mean returning strictly a different memory type for every buffer type? If we're mostly concerned about wasm32, does the fact that wasm32 definitely does require mapping emulation change things a bit? (Edit: never mind, the way we would probably want to do it on wasm32 would still be to create a giant gl buffer on memory allocation, the mapping emulation would be unrelated)
I don't even have a good way to test a platform with opengl and no buffer role changes other than webgl, so I really don't know what they're like.
I might need a bit more mentoring on fixing this than I thought.
I now have a gl Buffer type similar to metal's, with a RawBuffer field as well as a Range<u64> sub-range of the RawBuffer.
What should I do for things like this in gl/src/command.rs? I can either change every n::RawBuffer type in Command to be a RawBuffer paired with a range (or just offset?), or I can add extra data to Cache to keep track of the buffer sub-ranges for every RawBuffer command (similar to other data like index_type). Also, it seems only natural that this change would remove the requirement that provided buffer offsets be 0 (such as here), since we will now have to use a sub-range of the actual opengl buffer no matter what.
Am I thinking about this correctly? Sorry for needing a bit of hand holding here.
This is by far non-trivial, now that I see the problems you are facing a bit clearer...
You'd need to change every appearance of a buffer in Command enum to be accompanied by a range (we need a full range to call gl.bind_buffer_range for uniform buffers).
Also, it seems only natural that this change would remove the requirement that provided buffer offsets be 0 (such as here), since we will now have to use a sub-range of the actual opengl buffer no matter what.
The requirement wasn't because we didn't know the offset. It was there because we didn't know what to do with it :) Unfortunately, bind_buffer_range isn't an option for vertex/index buffers. However, not all is lost: there is way out by taking the offset and adjusting other values to account for it.
For example, index buffers. When there is an offset X, we can just remember it in the state. When the user tries to render indices A..B of type u16, we can shift them by X/2, effectively applying the offset. We can easily guarantee that these offsets are divisible by 4 by returning the alignment from get_buffer_requirements. There is a bit of complication with indirect draws though... for now we should just assert_eq!(offset, 0) on indirect indexed draw calls.
For vertex buffers, they are bound per attribute (vertex_attrib_pointer()), and an offset is already required for each call. We just need to pass in "attribute_offset + buffer_offset" in there.
For example, index buffers. When there is an offset X, we can just remember it in the state. When the user tries to render indices A..B of type u16, we can shift them by X/2, effectively applying the offset. We can easily guarantee that these offsets are divisible by 4 by returning the alignment from get_buffer_requirements. There is a bit of complication with indirect draws though... for now we should just assert_eq!(offset, 0) on indirect indexed draw calls.
For vertex buffers, they are bound per attribute (vertex_attrib_pointer()), and an offset is already required for each call. We just need to pass in "attribute_offset + buffer_offset" in there.
Okay, that's actually great to hear because that's already where I was headed. I hadn't thought about the alignment requirements yet, so that's a good idea to return that as part of get_buffer_requirements.
Mostly my questions were how specifically to pass the data through. Just repeating what you said to make sure I understand it: it seems like every n::RawBuffer in Command should be paired with a range like you said, except BindIndexBuffer (the range will be set along with the index type in index_type field, maybe with a rename), and BindAttribute, where the offset will be combined with the offset inside n::AttributeDesc. Does that sound right? (Edit: I guess the range also has to be stored with vertex_buffers in Cache before the range gets combined into the attributes)
The index / vertex parts actually made the most sense to me, but for everything else... passing the ranges of everything else through new fields in Cache felt wrong, so I decided to make sure. I also hadn't yet figured out what to do about indirect drawing, so I'm happy with just asserting that the offset is 0 for now.
Thank you!
Edit 2: Okay, actually having a separate range for each command probably isn't required either.. maybe, eh I'll figure it out eventually.
it seems like every n::RawBuffer in Command should be paired with a range like you said, except BindIndexBuffer
BindIndexBuffer also needs to carry the range into the native GL buffer. When the queue executes it, it needs to remember the range in its state, to apply on the next DrawIndexed commands, in addition to the index_buffer_offset specified there.
With BindAttribute it's an implementor (yours!) choice whether to keep the AttributeDesc as is and provide a range together with the buffer, or to apply the range into AttributeDesc at the time the command is created.
Similar choice is yours on other commands where the buffer offset can be applied early, like CopyBufferToBuffer that has offsets in BufferCopy structure.
BindIndexBuffer also needs to carry the range into the native GL buffer. When the queue executes it, it needs to remember the range in its state, to apply on the next DrawIndexed commands, in addition to the index_buffer_offset specified there.
I wasn't being super clear there, I kept the range state next to index_type in Cache so I could apply it on the next DrawIndexed command, it's possible I did that wrong though and it should go in the queue state.
I have a draft PR up though so we can discuss it, but it doesn't fix the actual wgpu-rs bug yet.