It's doable, but I have no immediate plans of implementing it.
Note: You can achieve this result yourself by using a user shader (see: post-shaders).
great
VLC just started supporting 360 videos, any chance of taking their implementation and using it in mpv?
http://people.videolan.org/~jb/Builds/360/
I wrote a shader to do this, but I can't seem to find any way to change the viewing angle after loading the script - is there any way to change variables in a shader after its been loaded (through a bind or otherwise)?
@tesu There's no current way to do that, unfortunately.
However, since you seem to have written the logic for it, it's possible we could get this merged into vo_opengl itself. What I'm unhappy about is that this seems to be only for one type of 360° - whereas ffmpeg at least defines several types (including cubemapped, equirectangular projection, and equirectangular tiled projection).
That said, we could still merge this to solve the common case, and figure out something else for the cubemap case if such a case becomes common.
Fair enough, there was a specific equirectangular video I wanted to watch, so I didn't consider other formats for 360 video at all when writing my shader. That being said, it appears that equirectangular projections are the traditional and currently most popular form of 360 video, so I think it would be worthwhile to merge.
If the cubemap case becomes common, it would just require a rewrite of the conversion from theta/phi to x/y (but it would be a lot messier).
For cubemaps we'd actually want to upload the video texture as a GL_CUBE_MAP or something in order to sample “around” edges. Incidentally, we may want to do that for equirectangular projection as well, in order to have the horizontal axis wrap around. Which is... annoying, to say the least. So maybe not for now.
Anyway, I'd be happy to merge this logic (probably with some changes to reduce the calculation to a matrix multiplication or something), do you want to do the rewrite into opengl/video.c (for authorship info) or is it okay if I do that?
Another thing that's been going through my head is how to do “high quality” sampling. I mean the way you've written it works, but I think the best way to do this coordinate adjustment would be during the main scaling pass, since that way we can reconstruct the exact pixel position at the correct location, meaning we get no aliasing from the bilinear sampling. But because equirectangular projection doesn't preserve areas, we would end up sampling more or less of some regions. Maybe we should stretch the filter kernel as well to compensate?
Sure, go ahead and merge in the logic with whatever optimizations necessary. I'd love to do it myself, but I took a look at the code and couldn't help but feel overwhelmed, so I'd think it would be best if you were to do it.
Regarding the high quality sampling, I definitely agree that the way I've done it is not optimal. Your suggestion for getting exact pixel positions sounds good to me, but since there isn't a 1 to 1 map of pixels it might be worthwhile to interpolate pixels that don't closely align with any on the source frame? Not sure how big the performance hit for that would be though.
I haven't worked with filter kernels before, so I have no comment on the matter.
There's also the question of how we want the video to behave in relation to --video-unscaled=yes/no, aspect ratio handling, computation of src/dst rects, and so on.
I mean, in principle, spherical video doesn't really have a concept of “dst rect”, since it covers an arbitrarily large field of view? What about the scale calculations? I guess that for video-unscaled, we want the pixel distance to be equal for both the source and the display, but what pixel distance do we consider here? Center of the screen at the center of the texture? How does that scale translate? Also, this assumption seems not true when using EQUIRECTANGULAR_TILE, since that only covers a cut-out of the entire sphere; so does this mean we should take that into account and compute an appropriate bounding box for vidoe-unscaled=no?
Also, what does the concept of an “aspect ratio” even mean for spherical video? Doesn't equirectangular projection always get “squished” onto the entire sphere, regardless of the video dimensions? Should we ignore aspect ratios and just always map it 1:1, or should we assume aspect ratios will apply an additional stretch factor on the output of the image somehow?
What does video-zoom, video-pan etc. do? I guess we could map video-pan to the yaw/pitch in some sense, and video-zoom to the field of view for the projection? What should the initial field of view be?
The thing about 360 degree video is that there are specific aspect ratios for each kind; e.g. 2:1 for equirectangular (360 horizontal fov, 180 vertical fov), 3:2 for cubemaps (the 6 square faces need to be positioned somehow), etc. The problem then becomes that those aspect ratios aren't really good for viewing the video - just because the source video has a 2:1 aspect ratio doesn't mean that the same aspect ratio should be used for watching. Optimally, the user should be able to specify an arbitrary aspect ratio, which when combined with a choice of a fov would basically allow the user to choose any size "viewport."
Regarding --video-unscaled, I think the most reasonable way to approach it would be to consider the center pixel of the viewport. That pixel is the only one that isn't distorted by the filter, while the others have all been stretched out at varying lengths dependent on their distance from the center. However, the size of the rectangle in this case would be meaningless, or at least not draw many parallels to the traditional concept, because it would depend on the size of the chosen viewport rather than anything from the video itself. Which makes sense, I think, because with 360 video you can choose to watch it through any size viewport you want so no rectangle is more valid than another.
video-zoom changing the fov seems pretty intuitive to me, and since video-pan already works in 2 dimensions it would be a good fit for adjusting theta/phi. A sane default for fov of 90 degrees seems good to me, as it's the midpoint between the 2 boundaries and also aligns with some quick research I did.
Heh, we can actually map --video-rotate to roll.
Also, I gave it a bit of a try, but it still seems to “distort” near the edges, and look a bit weird in general. Still seems like I'm looking at the surface of a sphere.
I mean what we're doing is basically raytracing, so shouldn't we figure out how to calculate the right “ray angle” for each pixel, and then intersect that with the sphere / turn it into polar coordinates so we can sample from the texture?
I mean what we're doing is basically raytracing, so shouldn't we figure out how to calculate the right “ray angle” for each pixel, and then intersect that with the sphere / turn it into polar coordinates so we can sample from the texture?
Actually, I think that's pretty much what the shader does, and maybe what it instead needs to do is compensate for barrel distortion, similar to the way a rectilinear lens does for photography?
I mean basically the expected result here is that the result of a 360° video with a certain angle should look identical to the output of a traditional camera pointed in that same angle. So we need to recreate the effects of a traditional camera lens.
Also I was thinking that instead of this complicated raytracing business, we could instead generate a real polygonal model of a 3D sphere and map it to the right texture coordinates using equirectangular projection, then use a “traditional” camera projection matrix to map this into the plan. (Maybe we could use a geometry shader for this? :p)
At least that could be a starting point; optimizing away the actual geometry could be done as a separate step, but that would rely on figuring out how to “reverse” this calculation.
Hm pretty interesting. Wouldn't be too hard to hook up an option and feed it to a uniform to make it runtime changeable.
Using --video-rotate for yaw is fine, but there would have to be an additional flag for pitch, otherwise the movement would only be possible in one direction instead of two.
Regarding barrel distortion, I'm not sure if that's the proper fix for the issue. I did a little bit of research but it seems like those filters are used between taking the picture and converting it into the 360 degree image, not between conversion of equirectangular to rectilinear projections.
Thinking more about it, maybe it'd be wrong to reuse --video-rotate here. I'd probably rather have a separate option, which would work closer to panscan and the other scaler settings. (These are closest to "cheap" updates to how video is rendered on the screen.)
@tesu do you have any interest in attempting to implement more complete support for this, built into the opengl renderer? I could provide the option handling side of it.
@wm4 Sorry, I'm not comfortable enough with C or the existing mpv codebase to implement this—I don't know where to start.
Almost all related video rendering code is in these files:
https://github.com/mpv-player/mpv/blob/master/video/out/opengl/video.c
https://github.com/mpv-player/mpv/blob/master/video/out/opengl/video_shaders.c
But I admit this might be a bit too much.
If you want to try anyway, here are some explanations. A good example might be to trace how pass_sample_unsharp works. It's a rather simple pass that is done completely separate from other filtering. The most important thing is to know how our gl_shader_cache thing works. On every frame, _all_ shaders are dynamically built by string appending. The GLSL macro appends the text within the macro to the current shader. The GLSLF macro does the same thing, but takes a C string literal and works like printf. At the end of each render pass, the shader is compiled and a rectangle is rendered using the shader (to make this fast, the shader is actually cached, instead of being recompiled every time).
The unsharp code uses the hook infrastructure, which automatically takes care of allocating a temporary texture and rendering to it. For some reason, most of the hooks are setup only once (in gl_video_setup_hooks), but they are called every time a frame is rendered.
The hook implementations can access struct gl_video, which has most of the surrounding state and the video parameters. You probably wouldn't go as far as hooking up spherical video parameters or keyboard input. But maybe this provides enough information that you could understand how this could be wired up and to help us come up with a proper implementation.
@wm4 FWIW, I don't think hooks are very relevant. The “Right Way(tm)” to implement any sort of “coordinate transformation” like this would be as part of the main scaling pass, as a modification of the “vec2 pos”. So it would probably be part of the function pass_sample, as an extra calculation that updates the position after sampler_prelude. (*)
I'd also probably go ahead and make it an explicit part of the function that also calculates the src/dst rects (since for 360° video, you need a different dst rect and probably also a different src rect). Basically I'd extend the panscan etc. functions to also calculate the 3D rotation and output a series of vectors, which would get cached in opengl/video.c the same way the mp_rect src/dst are; and passed to pass_sample by the main scaling code pass_scale_main.
(*) This is actually nontrivial for separated scaling, because as far as I can tell, the 360° coordinate transformation is non-separable. So what do we do for 360° video when using separated samplers? (This is a rather important question because of the opengl-hq default)
I thought for a start we'd just do it the naive way.
I don't think separated scaling can deal with rotation/projection anyway.
So far, all I managed to do was re-implement my user shader in video_shaders.c in https://github.com/tesu/mpv/commit/ba2888ea317edc7376d7313fd1ec652313c9ea46. This naive approach is probably enough to get rudimentary 360° video support, after adding options to incorporate camera angle/fov.
I'm not sure how to extend upon this while still using hooks through, as the underlying problems discussed earlier (aspect ratios, etc) can't be solved through this approach.
Here's an example of the problems I mentioned: https://0x0.st/nQj.jpg the video seems to get “warped”, especially near the edges. Watching it in motion makes me feel like I'm drunk.
It certainly doesn't seem to match what I would expect.
Note that this is not an effect of the wide field of view: even this very extreme wide angle photo shot on a real camera exhibits straight lines near the extremes of the field of view:
https://0x0.st/nQf.jpg (pink lines added by me)
@lachs0r contributed this test sample which you can use to verify correctness:

This also matches a similar test file I found on the internet:

This is an equirectangular projection of a cube. All of the lines should appear straight. This is what I get using your shader and a 90° FoV:

And using a 120° FoV:

This makes it quite obvious to me that something is not quite right.
@haasn Curious where you found those? Looking for SMTPE and VR doesn't really turn up much without spending hundreds if not thousands to buy and while there is no shortage of "VR" content, there is an abundance of _bad_ VR content. Poor renders, reencodings, deceptive labeling (3D is not VR), etc.
+1
Oh! My! God! I asked it in 2015!
The answer is apparently no, unless someone suddenly has an interest in implementing it.
see v360 filter in ffmpeg, its CPU only but it works.
Most helpful comment
@lachs0r contributed this test sample which you can use to verify correctness:
This also matches a similar test file I found on the internet:

This is an equirectangular projection of a cube. All of the lines should appear straight. This is what I get using your shader and a 90° FoV:
And using a 120° FoV:
This makes it quite obvious to me that something is not quite right.