tensorflow 🚀 - Tensorflow lite gpu delegate inference using opengl and SSBO in android

@anilsathyan7

Thanks for trying out the GPU delegate.

Can you provide a little bit more context in terms of timing, i.e. how many milliseconds/seconds was it before and after?

What kind of network are you using? Specifically, are all ops supported?

Have you written a custom shader code to copy camera texture into SSBO, or are you just dumping CPU memory to SSBO by yourself? If it's the former, you're doing things right and it should get faster. If it's the latter, it's only going to get slower.

impjdi on 4 Mar 2019

Model: Similar to the Official TF-Lite Segmentation Model (model inference graph attached as image).The last three additional nodes are not supported by gpu delegate, it seems.The input image size is 129*129.

Phone: OnePlus 3, GPU: Adreno 530

Timings:-
CPU Inference: 60-70 ms
GPU Inference: 40-50 ms
GPU Inference (SSBO): 80-90 ms

i.e Time for executing 'interpreter.run()' method.

Here is the method that we used to copy camera texture into SSBO:-

//Initialise SSBO
public int[] initializeShaderBuffer(){
    android.opengl.EGLContext eglContext = eglGetCurrentContext();
    int[] id = new int[1];
    GLES31.glGenBuffers(id.length, id, 0);
    GLES31.glBindBuffer(GLES31.GL_SHADER_STORAGE_BUFFER, id[0]);
    GLES31.glBufferData(GLES31.GL_SHADER_STORAGE_BUFFER, mWidth * mHeight, null, GLES31.GL_STREAM_COPY);
    GLES31.glBindBuffer(GLES31.GL_SHADER_STORAGE_BUFFER, 0);// unbind
    return id;
}
int inputSsboId = initializeShaderBuffer()[0];

 //After that every time a frame is available OR in onDraFrame(), call 
fillSsboWithCameraImageTexture(inputSsboId,data);

//(Note: Data is Nothing but Camera Frame ByteBuffer)

// Fill Ssbo With CameraImageTexture 

private int fillSsboWithCameraImageTexture(int inputSsboId,ByteBuffer cameraFramme) {

    GLES31.glBufferData(GLES31.GL_SHADER_STORAGE_BUFFER, mWidth * mHeight, cameraFramme, GLES31.GL_STREAM_COPY);
    return inputSsboId;

}

129_80k_dm05

Can the same 'Interpreter.run()' method handle normal input from CPU and SSBO? Or is there any other options/functions for running the inference in this case?

anilsathyan7 on 6 Mar 2019

@anilsathyan7

Apologies for the delayed response. For some reason, I just got this in my inbox >_<

Quick question re: your code:

Doesn't it have to be

GLES31.glBufferData(GLES31.GL_SHADER_STORAGE_BUFFER, 3 * mWidth * mHeight, null, GLES31.GL_STREAM_COPY);

?

Also, do you have the luxury to make the input SSBO of shape 1x129x129x4 ? Then you could eliminate one hidden memcpy inside.

From the graph you shared (btw, nice visualization; appreciate that), it indeed looks like everything would be handled until the last ResizeBilinear. The shape of it is also not too bad (129x129x2), in terms of, it has too many channels etc. So I wouldn't expect any slow down.

Did you properly call BindGlBufferToTensor before ModifyGraphWithDelegate? Can you share the shader code that converts your texture to SSBO? I was doing something like:

   #version 310 es
   layout(local_size_x = 16, local_size_y = 16) in;
   layout(binding = 0) uniform sampler2D input_texture;
   layout(std430) buffer;
   layout(binding = 1) buffer Output { float elements[]; } output_data;
   void main() {
     ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
     if (gid.x >= 224 || gid.y >= 224) return;
     vec3 pixel = texelFetch(input_texture, gid, 0).xyz;
     int linear_index = 3 * (gid.y * 224 + gid.x);
     output_data.elements[linear_index + 0] = pixel.x;
     output_data.elements[linear_index + 1] = pixel.y;
     output_data.elements[linear_index + 2] = pixel.z;
   }

for MobileNet. Might not be directly applicable, but you roughly get the idea...

impjdi on 21 Mar 2019

Not officially announced yet, but FYI: GPU code is now visible at:

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/delegates/gpu

if you need the code for better insight what is happening.

impjdi on 28 Mar 2019

👍3 ❤1

Hi @impjdi ,
Can you just share the sample classification app using ssbo or atleast the opengl related code?
We used the following shader code based on your inputs.But we encountered some errors related to shader version, which we could not resolve being opengl beginners.

#version 310 es
layout(local_size_x = 16, local_size_y = 16) in;
layout(binding = 0) uniform sampler2D u_Texture0;
layout(std430) buffer;
layout(binding = 1) buffer Output { float elements[]; } output_data;
void main() {
    ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
    if (gid.x >= 257 || gid.y >= 257) return;
    vec3 pixel = texelFetch(u_Texture0, gid, 0).xyz;
    int linear_index = 3 * (gid.y * 257 + gid.x);
    output_data.elements[linear_index + 0] = pixel.x;
    output_data.elements[linear_index + 1] = pixel.y;
    output_data.elements[linear_index + 2] = pixel.z;
}

mTextureUniformHandle0 = GLES31.glGetUniformLocation(mProgramHandle,
                "u_Texture0");
// Set the active texture0 unit to texture unit 0.
        GLES31.glActiveTexture(GLES31.GL_TEXTURE0);

        // Bind the texture to this unit.
        GLES31.glBindTexture(GLES31.GL_TEXTURE_2D, mTextureDataHandle0);

        // Tell the texture uniform sampler to use this texture in the shader by
        // binding to texture unit 0.
        GLES31.glUniform1i(mTextureUniformHandle0, 0);

    public int[] initializeShaderBuffer(){
        android.opengl.EGLContext eglContext = eglGetCurrentContext();
        int[] id = new int[1];
        GLES31.glGenBuffers(id.length, id, 0);
        GLES31.glBindBuffer(GLES31.GL_SHADER_STORAGE_BUFFER, id[0]);
        GLES31.glBufferData(GLES31.GL_SHADER_STORAGE_BUFFER, 257*257*3*4, null, GLES31.GL_STREAM_COPY);
        GLES31.glBindBufferBase(GLES31.GL_SHADER_STORAGE_BUFFER,1,id[0]);
        GLES31.glBindBuffer(GLES31.GL_SHADER_STORAGE_BUFFER, 0);// unbind
        return id;
    }

anilsathyan7 on 1 Apr 2019

@anilsathyan7

I am out of office on vacation this week with limited network access and there's a good chance I'll forget about this. Could you please nudge me again next week?

impjdi on 1 Apr 2019

Sure porygon ...😉

anilsathyan7 on 1 Apr 2019

Hi @impjdi ,
Can you help us with the ssbo tflite inferecne issue?? We could not run the tflite inference using ssbo in android.Can you just share the sample classification app using ssbo or atleast the opengl related code?How much speed up can we expect in this scenario?

anilsathyan7 on 9 Apr 2019

Hi @impjdi ,
I'll second a request for a demo illustrating SSBO inference.

Maybe I should open a separate issue... We're attempting to use a GLSurfaceView in our app, alongside the tflite GPUDelegate. Our renderer works fine until interpreter.modifyGraphWithDelegate(delegate); is called, which results in a black screen. No glErrors are produced. Its difficult to understand how commenting/uncommenting the above line changes the behaviour, even after looking at the newly released GPU delegates source.

A working example might clear things up...

Thank you!

ktgordon on 9 Apr 2019

@anilsathyan7

Heh, I missed the porygon part earlier :)

The below is in C++, but should be similar in Java too.

    glActiveTexture(GL_TEXTURE0 + 0);
    glBindTexture(GL_TEXTURE_2D, /*your gl texture that has the image*/);
    glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 1, /*your ssbo*/, 0, /*size in bytes*/);
    glUseProgram(/*the program above*/);
    glDispatchCompute(width / 16, height / 16, 1);  // these are work group sizes
    glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);  // unbind
    glBindTexture(GL_TEXTURE_2D, 0);  // unbind

impjdi on 10 Apr 2019

@ktgordon

Hm, the only official example code is the TFLite demo app that is in the TF repository. As an Android app consists of a lot more than just a single Java file, that'd be difficult unless I start up a whole new git repo with the files. Unfortunately, on top of that, I'm not a real mobile app developer; I do most of my stuff in Android C++ without cameras. I'll see whether I can cook up a C++ binary that can do all this in a single C++ file =/ That discussion aside...

modifyGraphWithDelegate hanging sounds like you have an issue somewhere else. Make sure that your TfLiteGpuDelegateBindBufferToTensor is called before modifyGraphWithDelegate, and that your SSBO is already created. The flow of the program with modifyGraphWithDelegate is as follows:

Interpreter.modifyGraphWithDelegate (Java)
Interpreter::ModifyGraphWithDelegate (C++)
tflite::gpu::gl::(anonymous)::DelegatePrepare (C++)
tflite::gpu::gl::(anonymous)::Delegate::Prepare (C++)

You can probably trace back what is causing the hanging.

impjdi on 10 Apr 2019

@anilsathyan7

Did things work out? Can this issue be closed?

impjdi on 20 Apr 2019

The code is working fine; but we are not able to get correct output using the ssbo as input.The output seems to be black (i.e output is all zeroes).We are not able to ensure that data is correctly copied into ssbo or whether it is correctly accessed by tensorflow; even though it is running without errors.It seems there is no way to debug and see shader codes (GLSL) in android.

anilsathyan7 on 22 Apr 2019

Attached with this is the logfile containing the errors when we tried to use SSBO with the tflite model.
The code works properly in Mobiles with Adreno-GPU without any errors but no output is visualized. But in phones with Mali-GPU, there are some issues even before the model comes into picture.

The errors vary between Mali Devices, whereas the output is not getting visualized in Adreno Devices.
The devices used in the below testing are:

Mali (Error logs are attached with the issue: mali-gpu-ssbo-errorlog.txt)
_Samsung A8+_
_Honor Play_
_Moto C plus_

Adreno (Error Logs are attached: adreno-gpu-ssbo-errorlog.txt)
_Poco F1_

mali-gpu-ssbo-errorlog.txt

adreno-gpu-ssbo-errorlog.txt

@impjdi Could you have a look at it.. And it would be better if you could share with us the working app code for reference.

SanthoshRajendiran on 22 Apr 2019

@impjdi Any updates on SSBO??

SanthoshRajendiran on 4 May 2019

Hi @impjdi ,
I'll second a request for a demo illustrating SSBO inference.

Maybe I should open a separate issue... We're attempting to use a GLSurfaceView in our app, alongside the tflite GPUDelegate. Our renderer works fine until interpreter.modifyGraphWithDelegate(delegate); is called, which results in a black screen. No glErrors are produced. Its difficult to understand how commenting/uncommenting the above line changes the behaviour, even after looking at the newly released GPU delegates source.

A working example might clear things up...

Thank you!

@ktgordon Have you found a resolution/workaround for this issue? I am experiencing exactly the same problem. After calling modifyGraphWithDelegate(), all glDraw calls results in black. Does not even need to associate SSBO buffer to TFLite Tensors. This is strange. Taking a deeper look as well.

gnsmrky on 8 May 2019

We did find a workaround. I'm assuming you're using the Java API and bringing in gpu delegates via
implementation 'org.tensorflow:tensorflow-lite:0.0.1-gpu-experimental'

What I think is happening is that modifyGraphWithDelegate() modifies the current context so that our display surface is no longer current... not a problem if we had access to our original state variables. However, since we originally tried using GLSurfaceView we didn't have access to any of these variables. In effect modifyGraphWithDelegate made changes to the gl state we couldn't recover from.

Switching from GLSurfaceView to TextureView gave us more control at the cost of more complexity. We created a dummy context, initialized our interpreter and called modifyGraphWithDelegate(), then created a new shared context with the dummy context. This way we could make our display surface current and render to it.

Managing the egl context was handled by reusing code from Grafika.

This got us passed the black screen problem anyways...

ktgordon on 8 May 2019

I am doing exactly what you said here as I based on TFLite demo (which uses TextureView). Mainly the following:

Create gl context, set gl viewport, etc. Stores eglDisplay, eglSurface, eglContext.
Make call to modifyGraphWithDelegate().
Set the eglContext, eglSurface, eglDisplay as current using eglMakeCurrent

The draws using glDrawArrays, results in black. Interestingly, if step 1 & step 2 is swapped in sequence, everything works.

The Grafika code was also referenced as well.

Will try to setup a dummy context next...

gnsmrky on 9 May 2019

Hi @ktgordon , @gnsmrky ,
Are you suggesting that ssbo method would not work with normal GLSurfaceView? What about something like GLTextureView( link1, link2)?

Finally, are you able to achieve any speedup compared to normal GPU inference? If so, can you share a basic working demo app? Just to clear things up ...

anilsathyan7 on 9 May 2019

@ktgordon Just got it working! Indeed, the dummy shared context is the key to make it work. I guess the GLES context setting/switching can be a lot more complicated than one can imagine...

@anilsathyan7 I based on the TFLite demo, which is the main sample project that TFLite GPU delegate page provides. This sample project uses TextureView. Don't know if SSBO works with other surface types. I would imagine it should as eglCreateWindowSurface() takes SurfaceView, SurfaceTexture, SurfaceHolder or a Surface, according to Android eglSurface doc. GLTextureView from your link extends SurfaceTexture, should work as well.

The performance gain is significant. I was trying a 448x448 image. (Trying a larger image to amplify the copy time). The time it takes w/o SSBO/Image2D copy shader is around 900ms on a Snapdragon 808. Using copy shader the time comes down to < 20ms!

gnsmrky on 9 May 2019

@gnsmrky Could you share your repo, so that it could be a better thing for everyone to start exploring ssbo with that.

SanthoshRajendiran on 14 May 2019

@gnsmrky Could you share your repo, so that it could be a better thing for everyone to start exploring ssbo with that.

@SanthoshRajendiran Trying to find the time to do that. The code is very messy now and unreadable. Will get it cleaned up as soon as I get spare cycles.

gnsmrky on 15 May 2019

👍1

@gnsmrky @impjdi Any updates on the repo?? Can you provide some code fragments on where the changes need to be incorporated in the mobile app??

SanthoshRajendiran on 21 May 2019

@gnsmrky thank you so much for your efforts. let us know when you are adding sample code here.

soham24 on 22 May 2019

@SanthoshRajendiran @soham24 Plan to publish the repo over the weekend. Still doing some tweaks. :)

gnsmrky on 24 May 2019

❤1

@SanthoshRajendiran

I bubbled up this request in last couple of meetings. The example will be added to the TFLite demo app, but I have some deadline coming up, so it will be a couple months until I can get to it :(

impjdi on 24 May 2019

❤1

@SanthoshRajendiran @soham24 @impjdi
I just put up the repo at tensorflow-lite-ssbo Android classifier demo. Should just open the project in Android Studio to build and run. Once in the app on the phone, select GPU and mobilenet v1 float to see the copy time for the time it takes to copy a frame to SSBO.

The code is still very rough. But should serve the purpose to get started playing around SSBO in TFLite GPU delegate.

On my LG G4 (Android M, Snapdragon 808), the time it takes to copy a 224x224 pixel buffer is reduced significantly. From 180ms ~ 200ms (Java ByteBuffer putFloat() copy), down to 1ms (shader + SSBO). As LG G4 is a relatively an old phone (> 5 years now), the time it saves on a more recent phone may not be as significant. But really, if G4 can do a frame copy in < 1ms, surely any other Android phone can do better. :)

Basically what it does is the following:

Initialize GLES Context A (eglContext).
Create a surface texture for camera.
Create SSBO
Create compute shader needed to copy surface texture to SSBO.
Initialize GLES Context B (gpuContext) with Context A being shared.
Call modifyGraphWithDelegate()
Do proper context switching using eglMakeCurrent()
- Switch to Context A when camera -> surface texture -> SSBO.
- Switch to Context B when calling TFLite Interpreter.run().

Note: I didn't create a separate thread to simplify the process. Usually Context A & B should be in 2 separate threads, so eglMakeCurrent() is called only once in a thread.

Haven't got the time to put up a readme. Just take a look into the commit. Should be fairly straightforward to figure out what's in there. Hope this helps to clarify a few things about TFLite + GPU delegate + SSBO.

Let me know if it works out for you guys...

gnsmrky on 26 May 2019

❤3

@gnsmrky Congrats and thanks for the amazing work on SSBO. We tried out the application in some of our mobile phones. The working methodology on various phones is discussed below:

1) Oneplus 3 - Model running time is around 40ms, the same as without SSBO. Copy time is around 0 or 1 in all cases
2) Poco F1 - Model running time is around 25ms, But we are not able to get the actual output from the app.
3) Samsung A8+, Honor Play - The apps are crashing with linkage error, saying maximum number of work group invocations. We modified the sizes for work group to 8, and we obtained a model running time of 5ms, but we were not able to get proper output from the model.

SanthoshRajendiran on 27 May 2019

@gnsmrky thank you so much. I will let you know about working after implementation.

soham24 on 27 May 2019

@gnsmrky Great work. thank you so much! Did you try deeplab segmentation model?

junhwanjang on 27 May 2019

@gnsmrky Congrats and thanks for the amazing work on SSBO. We tried out the application in some of our mobile phones. The working methodology on various phones is discussed below:

Oneplus 3 - Model running time is around 40ms, the same as without SSBO. Copy time is around 0 or 1 in all cases

So it is only working and have proper output on OnePlus 3 among these phones? Let me see if I can get hold of a Snapdragon 845 phone.

gnsmrky on 27 May 2019

@gnsmrky Great work. thank you so much! Did you try deeplab segmentation model?

@junhwanjang I haven't tried deeplab yet. But I did try the output SSBO with other models, which works correctly as well. Does deeplab work with GPU Delegate fully yet, do you know?

gnsmrky on 27 May 2019

@SanthoshRajendiran I just updated the repo. Seems like compute shader needs a real on-display surface on some devices. I added a 1dp x 1dp view to the asset to associate it with gles surface. Can you give the updated repo another try on your phones again?

Here is the latest commit.

BTW, the Cam -> SSBO copy does not take transformation from updateTexImage() into account. You may need to position your phone counter clock-wise (i.e. bottom of the phone points to the right) to have correct inference result.

gnsmrky on 27 May 2019

👍1

@gnsmrky Thanks for the update. With POCO F1 (Adreno 630, Snapdragon 845), the output is coming now around a speed of 20-30ms and copy time is around 0-1ms.
Still the problem persists in Mali GPU Devices (tested on Honor Play)
Attached below is the error log with Honor Play:

mali-ssbo-android-errorlog.txt

SanthoshRajendiran on 27 May 2019

@SanthoshRajendiran Have you tried setting work group to 8, or even 4, for Mali devices? Here are the 2 lines you should change 16 to 8 or 4.
local_size in compute shader @ L1092
glDispatchCompute @ L1162

gnsmrky on 28 May 2019

@gnsmrky We tried setting work groups as both 4 and 8 in Samsung A8+, the model is not running properly even when we tried it in landscape mode. When we change the work groups, the app crashes within some time due to GL out of memory error.

E/AndroidRuntime: FATAL EXCEPTION: CameraBackground
Process: android.example.com.tflitecamerademo, PID: 23378
java.lang.IllegalArgumentException: Internal error: Failed to run on the given Interpreter: Next operations are not supported by GPU delegate:
SQUEEZE: Operation is not supported.
First 29 operations will run on the GPU, and the remaining 2 on the CPU.TfLiteGpuDelegate Invoke: [GL_OUT_OF_MEMORY]: There is not enough memory left to execute the command.Node number 31 (TfLiteGpuDelegate) failed to invoke.

    at org.tensorflow.lite.NativeInterpreterWrapper.run(Native Method)
    at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:149)
    at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:275)
    at org.tensorflow.lite.Interpreter.run(Interpreter.java:249)
    at com.example.android.tflitecamerademo.ImageClassifierFloatMobileNet.runInference(ImageClassifierFloatMobileNet.java:101)
    at com.example.android.tflitecamerademo.ImageClassifier.classifyFrameSSBO(ImageClassifier.java:167)
    at com.example.android.tflitecamerademo.Camera2BasicFragment.classifyFrameSSBO(Camera2BasicFragment.java:967)
    at com.example.android.tflitecamerademo.Camera2BasicFragment.access$1200(Camera2BasicFragment.java:91)
    at com.example.android.tflitecamerademo.Camera2BasicFragment$8.run(Camera2BasicFragment.java:785)
    at android.os.Handler.handleCallback(Handler.java:873)
    at android.os.Handler.dispatchMessage(Handler.java:99)
    at android.os.Looper.loop(Looper.java:214)
    at android.os.HandlerThread.run(HandlerThread.java:65)

I/Process: Sending signal. PID: 23378 SIG: 9

SanthoshRajendiran on 28 May 2019

@gnsmrky By default in models like deeplab (models not fully capable of running in GPU), there is a fallback happening in GPU Delegate from GPU to CPU. Does this behavior change in SSBO and how do we get the data if it is falling back to CPU?

SanthoshRajendiran on 29 May 2019

@gnsmrky thanks. this worked like charm on low-end devices.
One question, It's true that we have to rotate the phone counter-clockwise. Can I add the rotating logic in the shader.
ref link: https://stackoverflow.com/questions/28074977/rotating-a-texture-on-a-fragment-shader-in-glsl-es

soham24 on 30 May 2019

@gnsmrky Could you give insight on what changes have to be done in your application code if I want to get an image output from the tflite model, with respect to SSBO.

SanthoshRajendiran on 31 May 2019

@gnsmrky We tried setting work groups as both 4 and 8 in Samsung A8+, the model is not running properly even when we tried it in landscape mode. When we change the work groups, the app crashes within some time due to GL out of memory error.

@SanthoshRajendiran I just updated the repo with few tweaks. Should lower the memory requirement a big.

Use FP16 precision.
Use 8 as workgroup size.
Add a check for SSBO buffer size upon creation.

The SQUEEZE error you are seeing may due to the failure when creating SSBO buffer. Are you running the repo as-is?

Let me know if the updated repo works out for you.

gnsmrky on 2 Jun 2019

@gnsmrky By default in models like deeplab (models not fully capable of running in GPU), there is a fallback happening in GPU Delegate from GPU to CPU. Does this behavior change in SSBO and how do we get the data if it is falling back to CPU?

@SanthoshRajendiran The SSBO in the repo is only for input buffer. Nothing is changed for output buffer. So the code for getting the output data should be the same way as you do with CPU (i.e. ByteBuffer).

I haven't got my hands on deeplab yet. Do you know which op it is causing the CPU fallback?

gnsmrky on 2 Jun 2019

@gnsmrky thanks. this worked like charm on low-end devices.
One question, It's true that we have to rotate the phone counter-clockwise. Can I add the rotating logic in the shader.
ref link: https://stackoverflow.com/questions/28074977/rotating-a-texture-on-a-fragment-shader-in-glsl-es

@soham24 The transformation happens when you use the regular glViewPort, glDraw, etc. with corresponding vertexfragment shader. The SSBO code in the repo is a simple memory float copy and does not involve any vertex/fragment shader. If we do the transformation on per-float basis, it will most likely slow things down.

The best way to do it is to "draw" the camera texture to another texture, with the desired transformation, and then do texture -> SSBO copy. That would take some efforts. Will need to find more time to do that.

gnsmrky on 2 Jun 2019

❤1 🎉1

@gnsmrky Could you give insight on what changes have to be done in your application code if I want to get an image output from the tflite model, with respect to SSBO.

@SanthoshRajendiran What do you want to do with the image output? Creating an SSBO and bind it to TFLite GPU delegate is as easy as creating one and call bindGlBufferToTensor() to the output tensor getOutputTensor(), as says in GPU Delegate document.

gnsmrky on 2 Jun 2019

@gnsmrky thanks. this worked like charm on low-end devices.
One question, It's true that we have to rotate the phone counter-clockwise. Can I add the rotating logic in the shader.
ref link: https://stackoverflow.com/questions/28074977/rotating-a-texture-on-a-fragment-shader-in-glsl-es

@soham24 The transformation happens when you use the regular glViewPort, glDraw, etc. with corresponding vertexfragment shader. The SSBO code in the repo is a simple memory float copy and does not involve any vertex/fragment shader. If we do the transformation on per-float basis, it will most likely slow things down.

The best way to do it is to "draw" the camera texture to another texture, with the desired transformation, and then do texture -> SSBO copy. That would take some efforts. Will need to find more time to do that.

Thanks @gnsmrky . It will be great if you update the sample with desired transformation.

soham24 on 3 Jun 2019

@gnsmrky We figured out the issue with Squeeze operation not getting supported. It is because, by default the Squeeze operation is not working in GPU on Mali Devices (verified with benchmark tool). Hope, we will open a separate issue for that, or since @impjdi is linked with the thread, he will handle that.. Other than that, the repo works as it is... In our case, we are handling a full GPU supported model and getting an image output to be rendered on to the surface, and so, we are going on with the SSBO output too..

SanthoshRajendiran on 3 Jun 2019

@SanthoshRajendiran I have a doubt. Are you resizing input size texture before passing it to tflite?
op will be resized. How you will render it directly via texture?

soham24 on 3 Jun 2019

@soham24 Input to the model we are resizing in order to make sure the model is running.. The output of the model we will resize it to the desired size that we will need to render.

SanthoshRajendiran on 3 Jun 2019

@soham24 Input to the model we are resizing in order to make sure the model is running.. The output of the model we will resize it to the desired size that we will need to render.

@SanthoshRajendiran It sounds odd to me as well. What I was trying to say is that if the size is not correct for SSBO, GPU delegate will then say SQUEEZE has problem, even though it is not the case.

gnsmrky on 4 Jun 2019

Thanks @gnsmrky . It will be great if you update the sample with desired transformation.

Work in progress, albeit very slowly...

gnsmrky on 4 Jun 2019

In the current version of the app, it is developed with EGL Surface. We tried using GL Surface View, but it is not working. Is there any work around that can be done to facilitate the ssbo output to be rendered directly on a GL Surface View?

SanthoshRajendiran on 6 Jun 2019

@gnsmrky We tried figuring out Output SSBO, but we are unable to do it correctly.. Could you tell us the exact places we need to make the changes in order to make it working.

Basically, we made these modifications.
1) Initialized tflite instance by setting setAllowBufferHandleOutput(true) as per the tflite gpu documentation.
2) Binded buffer output to model SSBO using gpuDelegate.bindGlBufferToTensor(outputTensor, outputSsboId);

3) Rendered the output on the mobile screen.

Could you check if SSBO output is working in your case.. Or some changes that were done previously like rotating the screen or something is needed now too to visualize the output on screen..

Herewith, I have attached the tflite we used for testing output SSBO, wherein we are not doing anything, but just resizing an image from 197 to 257 using a ResizeBilinear operation.

just_resize_ssbo.tflite.zip

SanthoshRajendiran on 7 Jun 2019

Could you check if SSBO output is working in your case.. Or some changes that were done previously like rotating the screen or something is needed now too to visualize the output on screen..

@SanthoshRajendiran I did not do anything for output SSBO in the repo I posted here. But I output SSBO does work. So it may be something in your shader code that moves data from SSBO to texture buffer for drawing on screen.

What I would suggest is to try out an op that does not change any shapes. sqrt op as one example, which is an unary op that does not change tensor shape. Fill in values that's predictable, say 100, the result should be 10. That was how I worked on both input/output SSBO at the beginning.

Most problems I ran into was not on TFLite GPU delegate part of the code, but on OpenGL ES in Android. Just needed to dissect the code piece by piece to get it to work correctly from SSBO to screen.

Hope this helps...

BTW, try not to use Bilinear Resize with non-integral resizing first. Try something like 2 as scaling factor. So 157 will be resized to 314. It may help...

gnsmrky on 12 Jun 2019

Hello, @gnsmrky:
I test your code in two different devices and it seems to use GPU only randomly. Most of the time while in GPU mode it does nothing. I tried less working groups (8 or 4) but doesn't make any difference...

Do you have any idea about why this is happening?

Thanks in advance.

jsolves on 14 Jun 2019

I test your code in two different devices and it seems to use GPU only randomly. Most of the time while in GPU mode it does nothing. I tried less working groups (8 or 4) but doesn't make any difference...

@jsolves Can you elaborate more? Did you mean Camera --> SSBO does not work, or GPU delegate? How did you observe whether it works or not?

gnsmrky on 15 Jun 2019

I hope i could tell you more. But every mode in the app works correctly until it goes GPU. Most of the time, classifies everything as 0% or near 0% and device gpu utilization doesn't goes up. Only in a few ocassions gpu classification goes well (and gpu utilization goes up, accordingly).

I tried other "gpu apps" and they worked as intended. I don't know how to determine if the problem is Camera-->SSBO or GPU delegate or something related with shaders. Do you know anything I can try to see if the problem is one of those things?

Thanks for your answer.

jsolves on 17 Jun 2019

I tried other "gpu apps" and they worked as intended. I don't know how to determine if the problem is Camera-->SSBO or GPU delegate or something related with shaders. Do you know anything I can try to see if the problem is one of those things?

@jsolves What you can definitely try is the original Tensorflow Lite Android repo, which already has GPU supported. My SSBO repo only adds the camera --> SSBO based on this repo. You can see if GPU in the TFLite Android repo has faster inference time.

gnsmrky on 18 Jun 2019

Ah, sorry, I'm a little tired with this problem. Yes, GPU Delegate works correctly in original repo and in my own custom apps. It gives faster inference time than CPU inference.

So the problem is in the camera->SSBO part, then?

jsolves on 18 Jun 2019

So the problem is in the camera->SSBO part, then?

@jsolves The main purpose for SSBO is to reduce the pixel copy time from Camera to input SSBO for TFLite. GPU inference time should not be affected at all.

Do you see "copy time" when you run the app when running in GPU mode?

Also, how did you check GPU utilization? Are you getting expected inference output when GPU utilization is low?

gnsmrky on 19 Jun 2019

Yes, I know. In GPU (with SSBO) the copy time is very low (0 - 2) but there is no right classification (it goes random values or all 0s) most of the time.

In device configuration, there is an option like "show gpu utilization" and in the few times that the app works fine (with GPU), that gpu indicator goes up.

It's like if the image camera doesn't always go to SSBO or some initialization trouble. But my Android-Fu isn't strong enough to get it... :(

jsolves on 19 Jun 2019

👍1

@jsolves As I tested gpu inference with basic 3-channel input model, I couldn't get right results.
However, when I changed model with 3-channel into fake 4-channels input (using new Input and strided_slice ops), finally get right results :)

https://www.tensorflow.org/lite/performance/gpu_advanced#tips_and_tricks

junhwanjang on 19 Jul 2019

Intriguing. How do you make that "fake 4-channels", setting the "fake" alpha to 1 in every pixel?

Thanks in advance.

jsolves on 23 Jul 2019

input_shape = (224, 224, 4)
inputs = Input(input_shape, dtype=np.float32)
x = Lambda(lambda x: x[:, :, :, :3])(inputs)
model_pre = Model(inputs, x)
model_pre.summary()
sess_fake = K.get_session()
graph_def_fake = sess_fake.graph_def
nodes_fake = [n for n in graph_def_fake.node]

I converted the model as follows.

Create fake inputs (including strided slice operation)
Change previous 3-channel input into fake inputs as above in graph (should aware of previous input names if possible)
Convert TFLite model

junhwanjang on 24 Jul 2019

I think https://mediapipe.dev does this.

soham24 on 9 Oct 2019

@soham24 I went through mediapipe, but could not understand how this works. The tflite provided by mediapipe team, has ops that are not supported by TFLite-GPU, nor those were even tensorflow operations as it is. Can anyone provide suggestions on how to train the segmentation model based on the mediapipe architecture.

SanthoshRajendiran on 12 Oct 2019

@SanthoshRajendiran Even I am trying to figure out pipeline by looking at mediapipe code.
IT will be great if guys at tf help us

soham24 on 12 Oct 2019

@SanthoshRajendiran @soham24

Yes, MediaPipe probably uses all features of the GPU delegate and is a good place to start (I used to work on MediaPipe a couple years ago :D). I agree that the GPU path is not super easy to read, but is still a decent place to start. If you look at the TfLiteInferenceCalculator, first of all, you will see tons of RunInGlContext thing, that ensures you stay in the same GL context. Then, all it really does is, copy input SSBO, run inference, and copy output SSBO. I think there is still room for improvement, which is going to happen very soon(tm). Well, that's on my plate for next 3 months :P

For the segmentation model, you want to check in the MediaPipe github page and ping those guys.

impjdi on 14 Oct 2019

Can we have update on this?

soham24 on 30 Jan 2020

Ummm, can you elaborate what kind of update you expect? Do you want us to walk through another open source software?

impjdi on 3 Feb 2020

I’ve tried to associate my custom tflite model to SSBO in android as @gnsmrky did, but I couldn’t make it work so far.
(By the way, the latest tflite seems not to support bindGlBufferToTensor but the official tflite gpu delegate document still introduces bindGlBufferToTensor in using SSBO.)
Anyway I’ve built tensorflow from https://github.com/gnsmrky/tensorflow-lite-ssbo and managed to run the image classification demo with SSBO. Even if it shows different results compared to CPU version and official GPU version without SSBO, it’s at least working--it has prediction values and copy time has reduced.
But as I changed the provided mobilenet model to my custom model (I’ve tried even a very simple model with add operation only), it looks like working but the output gets all zeroes, or it sometimes produces error that Tensor is not bound to a buffer handle depending on the used model.
Since I’ve tried models with the same 224x224x3 input as the original demo and changed nothing except for the model path, I’d like to know if there is any other modification I should take care of when I change or make a model.
Below are some examples of the simple models I’ve tried. (Visualized by Netron)

It would be great if TensorFlow offers an official SSBO demo with the latest tflite.

teavelope on 6 Feb 2020

tensorflow-lite-ssbo/tensorflow/lite/java/demo/app/src/main/java/com/example/android/tflitecamerademo/ImageClassifier.java:212: error: cannot find symbol
gpuDelegate.bindGlBufferToTensor(inputTensor, inputSsboId);
^
symbol: method bindGlBufferToTensor(Tensor,int)
location: variable gpuDelegate of type GpuDelegate
@jmhodges @gnsmrky

weinixuehao on 16 Mar 2020

I don't work in Java lands, and thus I don't know which delegate Java APIs are using, but bindGlBufferToTensor got renamed in the deprecated GL delegate, and removed in the new GPU delegate. Check out //tf/lite/delegates/gpu/gl_delegate & //tf/lite/delegates/gpu/gpu_delegate.

impjdi on 18 Mar 2020

@impjdi @jmhodges @gnsmrky
My model input pixel value is float with range 0.0 - 1.0(1.0/255) can use ssbo?

weinixuehao on 29 Mar 2020

How to dump a ssbo buf to cpu for evaluating?

weinixuehao on 31 Mar 2020

You're looking for glMapBufferRange

impjdi on 31 Mar 2020

Why transformedData print out zero value after glDispatchCompute invoked
Where is wrong?
I need to inpect the ssbo value after copyCamtextToSsbo if has an better method?
@impjdi @svenstaro @bmabey

weinixuehao on 1 Apr 2020

Not super familiar with Java ByteBuffer and FloatBuffer, but aren't you missing a glFinish before you start reading from the memory location?

impjdi on 1 Apr 2020

@impjdi @svenstaro @ktgordon @SanthoshRajendiran @gnsmrky
You can think FloatBuffer as a float * pointer(buffer) in c++
I have tried with glFinish but the same result
but if I add follow code:
GLES31.glBufferData(GL_SHADER_STORAGE_BUFFER, ssboSize, ssboData, GL_STREAM_COPY);
I can get ssboData content with glMapBufferRange why?

I want to evaluate out_data.elements content is correct? after dispatchcomute is done.
I have googled for two days but not found solution for it

weinixuehao on 1 Apr 2020

GLES31.glBufferData(GL_SHADER_STORAGE_BUFFER, ssboSize, ssboData, GL_STREAM_COPY);
I can get ssboData content with glMapBufferRange why?

Not talking about the "why" part, but isn't your problem solved if you can access ssboData?

I also remember that I couldn't find enough examples on the web to make reasonable progress. What you're asking right now seems slightly out of scope for TFLite GPU support, as you're asking pure OpenGLES compute shader questions. I suggest asking Khronos forums and/or follow the code paths inside TFLIte GPU and MediaPipe; these two frameworks use SSBO and textures a lot. I'm sure you will find your use case there.

impjdi on 1 Apr 2020

@impjdi @svenstaro @ktgordon @SanthoshRajendiran @gnsmrky

I`m a newbie for opengl es.

I have viewed MediaPipe and Tflite GPU to try solved it but failed

I`m curious how to debug compute shader in android to you

it has few stuff on the network about ssbo

The code is provided by https://github.com/gnsmrky/tensorflow-lite-ssbo

Sorry for my english if you do not understand

weinixuehao on 2 Apr 2020

Stop highlighting me.

svenstaro on 2 Apr 2020

@svenstaro Very sorry for disturbing you

weinixuehao on 2 Apr 2020

😄1

@impjdi
I finally located why glmapbufferrange return all zero.
GL_OES_EGL_image_external_essl3 not working with some android device
https://community.arm.com/developer/tools-software/graphics/f/discussions/9432/is-extension-gl_oes_egl_image_external_essl3-not-working-properly-in-compute-shader-on-mali-g71-gpu

weinixuehao on 13 Apr 2020

👍1

Ah, thanks for the update and sharing!

impjdi on 14 Apr 2020

I followed the official documentation for android for the GPU delegate and got stuck at the bindBuffer step, too.

I don't work in Java lands, and thus I don't know which delegate Java APIs are using, but bindGlBufferToTensor got renamed in the deprecated GL delegate, and removed in the new GPU delegate. Check out //tf/lite/delegates/gpu/gl_delegate & //tf/lite/delegates/gpu/gpu_delegate.

I checked out the current master and there is no gpu_delegate(.cc?), only a gpu_delegate_jni(.cc). Did you mean that?

Anyways, I found that TfLiteGpuDelegateBindBufferToTensor seems to be an exported symbol of the library and we can get the native handle of the delegate so we might be able to call that method directly from java.

martin-schulze-vireso on 14 Apr 2020

Sorry, the last file should have been //tf/lite/delegates/gpu/delegate.cc. We were internally trying to use bindBuffer (without the delegate API, but with GPU-internal functions directly) and saw that the new API is a bit broken, so that it's not usable with the new API. Someone is working on fixing those. For now, if you want to use bindBuffer, I guess you are stuck with the old API, i.e. gl_delegate.

impjdi on 14 Apr 2020

@impjdi Thanks for the update. Does that mean the SSBO route is currently only available with the C bindings or not at all?

martin-schulze-vireso on 14 Apr 2020

I haven't checked Java, but if Java has migrated to the new API (delegate.cc), your assessment is correct.

For C++, it's only available in v1 (gl_delegate.cc), but not in v2 (delegate.cc).

impjdi on 14 Apr 2020

@impjdi is the SSBO bindBuffer issue in v2 delegate resolved?

brucechou1983 on 18 May 2020

The current plan is not to support bindBuffer in delegate v2.

impjdi on 18 May 2020

@impjdi we have our image frame in GPU memory. Should we move it to CPU just to start inference, which will move it to GPU again? The time spent doing this would waste the benefits of gpu inference in many cases.

natario1 on 18 May 2020

@impjdi Could you share anything information about why bindBuffer will not be supported in delegate v2? I believe that it improves gpu end to end inference time by eliminating memcpy actions. Does tflite team run into some unresolvable issues or the decision is made only by product requirements?

brucechou1983 on 19 May 2020

There are many advanced usages of the mobile GPU inference, and for each of those, GPU delegate needs helper functions like bindBuffer because it doesn't fit in the delegate framework. After adding a bunch of support for extended usages either through the helper functions or options, we decided it's no more maintainable with the combinatoric growth and gives an inconsistent look even within the GPU delegates (OpenCL, OpenGL, Metal, etc.). Note that, we have to wrap it up with a Java API. With the majority of the users wanting the GPU delegate as just a quick blackbox accelerator, we made the final decision that the delegate API will stay simple and clean. For advanced usages that supports a streamlined GPU execution pipeline, we will still have an example code through, e.g. MediaPipe's TfLiteInferenceCalculator. Note that it's not there yet, as it still uses the v1 delegate and thus has access to bindBuffer.

impjdi on 19 May 2020

👍2

@impjdi This information is helpful. Another question is when will MediaPipe delegate v2 integration be released? Thank you.

brucechou1983 on 19 May 2020

Someone's working on it :)

impjdi on 19 May 2020

Has anyone managed to bind the buffer with the v2 delegate?

It seems to me that mediapipe is already using it, see mediapipe/tflite_gpu_runner.h . This runner is used in the calculator mentioned by impjdi under the use_advanced_gpu_api_ flag. It replaces the interpreter/delegate flow and uses low level components instead.

This is very unfriendly for those who want to have the SSBO utility without maintaining their own interpreter, but going deeper, the bind logic is in mediapipe/tflite_gpu_runner.cc and simply calls InferenceRunner::SetInputObject.

The v2 delegate owns an InferenceRunner itself so maybe a small patch to the v2 delegate could add the required SetInputObject (or output) call. But I haven't tested, setting this up would be hard for me at the moment.

@impjdi , any word of guidance would be helpful here. Is this correct? Can we simply patch the v2 delegate with a InferenceRunner::SetInputObject call, and invoke it instead of the v1 bindBuffer? I don't think I'm on the right track, but I do think it would be very useful to the community if we could achieve a patch file and share it here.

natario1 on 9 Jun 2020

@natario1 I think @impjdi explained that bindBuffer APIs don't fit in the v2 delegation design. The key difference between v1 & v2 delegate is v2 supports both OpenCL and OpenGL backend while v1 only supports OpenGL. This will affect how Tflite handles data ownership exchange. Moreover, many devices on the market don't fully support OpenCL-OpenGL interoperability. I've also tried the use_advanced_gpu_api_ flag in MediaPipe, the app crashes when I turn it on. So I think it's not a trivial patch for v2 delegate to support bindBuffer features. If you need this feature, I think the most simple solution is stick to mediapipe with opengl backend.

brucechou1983 on 9 Jun 2020

Thanks for your comment @brucechou1983 . A simpler solution for me is to stick to v1 delegate, but to be honest it doesn't seem like mediapipe runner is doing anything complex/fancy, other than calling InferenceRunner::SetInputObject and InferenceRunner::SetInputObjectDef when preparing. I understand that it might not be ready yet though, as it is under a flag.

The v2 delegate also does the same object/objectdef calls, but the difference is that it uses ObjectType::CPU_MEMORY instead of ObjectType::OPENGL_SSBO like mediapipe does.

I don't know what's the support of OpenCL in Android, but OpenGL works just fine, so we could have a flag in v2 delegate options that tells the delegate to not try OpenCL and go with OpenGL. It's something that the TF team could add to ease the v1-v2 transition I think, since people who were using v1 likely have a SSBO set up.

natario1 on 9 Jun 2020

@natario1 If a flag for only using OpenGL is what you need, it's already there though it's still experimental. You can set the flag to TFLITE_GPU_EXPERIMENTAL_FLAGS_GL_ONLY.

However, when you need realtime (>>30fps) semantic segmentation and/or face mesh running on a $200 dollar phone, choosing a right GPU backend in tflite runtime for efficient execution is really not a trivial problem. I do see the value of using OpenCL for some MALI gpu devices. The invoke() execution is 2x-3x faster than OpenGL ES. Although I have to copy the data to/from the tensors, the overall performance is still better. I think tflite team is trying to design the v2 delegate as a blackbox accelerator for general purpose/arbitrary IoT device/easy to use, while creating interfaces for other frameworks like MediaPipe to optimize for specific usage like streamlined GPU execution on mobile/desktop.

brucechou1983 on 10 Jun 2020

@natario1 I see you did your homework there, good job 👍

You might have noticed, but TFLite is adding bunch of delegates for various accelerators or APIs. Each of them having custom helper functions didn't help usage, but makes it more confusing for 99% of the users who want to use TFLite GPU delegate just as a magic box doing GPU-accelerated inference. So the final decision we made was to keep the TFLite GPU delegate as simple as possible, but leave the room open for advanced users who want to do real performant things.

The team that delivers TFLite GPU and MediaPipe are sister teams sharing one manager. Having said that, TFLite GPU won't break MediaPIpe, and that's a guarantee. And in that sense, going deeper and using advanced internal APIs like InferenceRunner::SetInputObject the way MediaPipe uses it is safe. Of course, because it's not the public API, but an advanced internal one, there might be API changes that may break you every once in a while, but you will always have the MediaPipe's reference implementation.

impjdi on 10 Jun 2020

I understand the situation @impjdi . Would you consider something like V2Delegate::GetInferenceRunner()? So that we can call InferenceRunner::SetInputObject or whatever else from _outside_ the delegate. This makes all the difference, because we'd still have to do our homework for integration and maintenance, but at least we don't have to fork Tensorflow or use a bazel patch, which is honestly a big burden, although MediaPipe helps.

You say that SetInput/OutputObject and SetInput/OutputObjectDef APIs are "advanced" and they are to some extent, but at the same time, it makes all the sense that to bind a tensor to "something", one has to specify its data layout, size, object type and so on. They're actually very elegant and easy to understand with respect to BindGlBufferToTensor, which from my point of view, was just doing some obscure magic under the hood which I couldn't really grasp.

These APIs would also be hidden behind the GetInferenceRunner() API, which you could document as a "use at your own risk" function, and keep the black-box surface clean. I think this approach would really "leave the room open" as you say. (maybe it would be more work for you than just adding a getter for the inference runner, but you get the point - being able to control the delegate objects from outside)

Apart from this, I'll try to use these low-level APIs this weekend and see if I manage to get v2 working. Thanks for helping!

Edit: After spending the weekend on it I realized this suggestion was not possible, but I hope you can consider something like what I ended up doing which is clean and keeps the delegate header untouched.

natario1 on 10 Jun 2020

@impjdi any suggestions on how to fix this error? It seems to be an issue with the BHWC > BHWC4 conversion, but I have no clue at how to address it. It happens in ToTensorConverter.

E/tflite:
    TfLiteGpuDelegate Invoke: Missing output in converter
    Node number 1 (TfLiteGpuDelegateV2) failed to invoke.

I create the object def and tensor object as follows:

// object def
tflite::gpu::ObjectDef object_def;
object_def.data_type = tflite::gpu::DataType::FLOAT32;
object_def.data_layout = tflite::gpu::DataLayout::BHWC;
object_def.object_type = tflite::gpu::ObjectType::OPENGL_SSBO;
object_def.user_provided = true;

// tensor object
tflite::gpu::OpenGlBuffer tensor_object;
tensor_object.id = ssbo;

Then pass both to the delegate before ModifyGraphWithDelegate. They are correctly passed to the inference runner and the runner builder, however I get that converter error.

TF version is 2.2.0 and the model I am using is extremely simple, takes a 400x400x1 image and calculates the average intensity, returning a single float. I am trying to use a SSBO object for the input only.

Also I'm running the OpenGL backend, OpenCL not available on my phone.

natario1 on 14 Jun 2020

After many hours, I think I hit a bug that is still present in 2.2.0, but was fixed in master by these commits: https://github.com/tensorflow/tensorflow/commit/4000a5c75cdbe49d77bcac93a7f21070a31c4cce https://github.com/tensorflow/tensorflow/commit/dffe6a0e810f4c3d9968ddb56fd58c8f405eb846

For those who are interested, in short, the fact that I'm using BHWC with 1 color channel (instead of 4), requires the gl engine to do a conversion and this conversion (before https://github.com/tensorflow/tensorflow/commit/4000a5c75cdbe49d77bcac93a7f21070a31c4cce and https://github.com/tensorflow/tensorflow/commit/dffe6a0e810f4c3d9968ddb56fd58c8f405eb846) is completely broken, because user_provided is hardcoded to true (https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/lite/delegates/gpu/gl/api2.cc#L595) but when user_provided is true, the engine will not bother to create the output GL buffer (https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/lite/delegates/gpu/gl/api2.cc#L199-L202), so the C->C4 conversion can't happen.

By cherry-picking https://github.com/tensorflow/tensorflow/commit/4000a5c75cdbe49d77bcac93a7f21070a31c4cce and https://github.com/tensorflow/tensorflow/commit/dffe6a0e810f4c3d9968ddb56fd58c8f405eb846 into v2.2.0 and exposing the necessary APIs, I'm able to do SSBO I/O with the v2 delegate. These commits are pretty old so I hope they can make it into next release.

These are the changes I had to make to expose the necessary APIs: https://github.com/natario1/tensorflow/commit/7401fbb4fa0c94004865c089d8c89bdd566ad747 . I don't know C++ so there might be errors, but the point is to create an interface that the V2 delegate extends. This interface can be retrieved from the delegate using a separate C++ header (delegate_core.h) so the high-level delegate is still a black box.

natario1 on 14 Jun 2020

Tensorflow: Tensorflow lite gpu delegate inference using opengl and SSBO in android

Most helpful comment

All 102 comments

Related issues