Arduino-esp32: esp32 camera web server streaming mjpeg feature is not working properly

Created on 24 Dec 2019  路  5Comments  路  Source: espressif/arduino-esp32

Hardware:

Board: AI Thinker ESP32-CAM
Core Installation version: 1.0.4
IDE name: Arduino IDE
Flash Frequency:
PSRAM enabled:
Upload Speed: 115200
Computer OS: Ubuntu

Description:

Given sample code CameraWebServer.ino for arduino is working weird.
I'm testing on AI Thinker ESP32-CAM board and the camera preview stream works perfectly by viewing with chrome/firefox web browser.
But the problem happens with the MJPEG stream itself.
the mjpeg stream (ex: http://192.168.0.67:81/stream) could not be opened with vlc player or opencv code.
I tried to debug with chrome debugger and found some weird point in comparison to ordinary mjpeg streams from linux motion program.
The mjpeg stream with no problem always giving response code 200 when I try to view the stream via web browser.
but esp camera running cameraWebServer.ino code is not giving any status code when I try to view the mjpeg stream.
But it's strange that the web browser has no problem showing the preview image coming from the esp32 camera without given no status code.
Anyway as far as I found, the difference between ordinary mjpeg and esp32 cam's mjpeg is the absence of http status code returned.

Does anyone have a solution for this?

Most helpful comment

Hey @kdsoo can you format this code please? I'm getting EOL errors on bytes = b" & some confusion on the hierarchy of the if statement?

Also an explanation of your code would be immensely helpful please! Thank you!

@andre-fu


import cv2
from urllib.request import Request, urlopen
import numpy as np

stream = urlopen('http://192.168.0.67:81/stream')
bytes = b''
while True:
    bytes += stream.read(1024)
    a = bytes.find(b'\xff\xd8')
    b = bytes.find(b'\xff\xd9')
    if a != -1 and b != -1:
        jpg = bytes[a:b+2]
        bytes = bytes[b+2:]
        i = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
        cv2.imshow('i', i)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

In ordinary case, VideoCapture() is used to be a hassle free solution to read out any kind of motion picture streams from various source like v4l2 devices, http live streams etc..

But strangely it doesn't work properly with esp32 camera.And the problem seems to be the videoCapture() API has problem with reading out http chunked transfer.

So the work around code I wrote is just simply reading the stream manually using urlopen and read operation and parse the start and end of JPEG

Please refer to the JPEG specification. FFD8 for SOI(Start of image) and FFD9 for EOI(End of image)

All 5 comments

I digged around the app_httpd.cpp code included in the example directory and compared the http header returned from a properly working mjpeg stream (Linux motion).

mjpeg header from linux motion daemon:

HTTP/1.0 200 OK
Server: Motion/4.1.1
Connection: close
Max-Age: 0
Expires: 0
Cache-Control: no-cache, private
Pragma: no-cache
Content-Type: multipart/x-mixed-replace; boundary=BoundaryString

mjpeg header from esp32-cam:

HTTP/1.1 200 OK
Content-Type: multipart/x-mixed-replace;boundary=123456789000000000000987654321
Transfer-Encoding: chunked
Access-Control-Allow-Origin: *

Content-Type: image/jpeg
Content-Length: 18130

The accusation of not returning http status code turned out wrong.
But the new suspicious point seems to be the "Transfer-Encoding: chunked" part.
I'm trying debugging by changing the header and content type but still have no clue.

It turns out to be the opencv "VideoCapture()" API and stock video players like VLC doens't support mjpeg parsing over http chunked transfer.

When I tried to parse mjpeg manually not using VideoCaptur() API and finally succeed to get image frame from mjpeg stream over http.
the brief code is like follows

import cv2
from urllib.request import Request, urlopen
import numpy as np

stream = urlopen('http://192.168.0.67:81/stream')
bytes = b''
while True:
bytes += stream.read(1024)
a = bytes.find(b'\xff\xd8')
b = bytes.find(b'\xff\xd9')
if a != -1 and b != -1:
jpg = bytes[a:b+2]
bytes = bytes[b+2:]
i = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
cv2.imshow('i', i)
if cv2.waitKey(1) & 0xFF == ord('q'):
break

When I used cv2.VideoCapture(...) instead of parsing stream from urlopen, I couldn't get a proper mjpeg stream.
Anyway, still have problem reading mjpeg stream from esp cam using stock media players like VLC but at least I can fetch frame by programmed code so I close this issue.
I'm still not sure why media players are not supporting chunked transfer btw.

Hey @kdsoo can you format this code please? I'm getting EOL errors on bytes = b" & some confusion on the hierarchy of the if statement?

Also an explanation of your code would be immensely helpful please! Thank you!

Hey @kdsoo can you format this code please? I'm getting EOL errors on bytes = b" & some confusion on the hierarchy of the if statement?

Also an explanation of your code would be immensely helpful please! Thank you!

@andre-fu


import cv2
from urllib.request import Request, urlopen
import numpy as np

stream = urlopen('http://192.168.0.67:81/stream')
bytes = b''
while True:
    bytes += stream.read(1024)
    a = bytes.find(b'\xff\xd8')
    b = bytes.find(b'\xff\xd9')
    if a != -1 and b != -1:
        jpg = bytes[a:b+2]
        bytes = bytes[b+2:]
        i = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
        cv2.imshow('i', i)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

In ordinary case, VideoCapture() is used to be a hassle free solution to read out any kind of motion picture streams from various source like v4l2 devices, http live streams etc..

But strangely it doesn't work properly with esp32 camera.And the problem seems to be the videoCapture() API has problem with reading out http chunked transfer.

So the work around code I wrote is just simply reading the stream manually using urlopen and read operation and parse the start and end of JPEG

Please refer to the JPEG specification. FFD8 for SOI(Start of image) and FFD9 for EOI(End of image)

Guys, I have been working on this issue and was able to successfully stream to VLC and browsers.
Also implemented multi-client streaming.
Please check these repos:
https://github.com/arkhipenko/esp32-cam-mjpeg (single client)
https://github.com/arkhipenko/esp32-cam-mjpeg-multiclient (up to 10 clients)
https://github.com/arkhipenko/esp32-mjpeg-multiclient-espcam-drivers (up to 10 clients using latest ESP32-cam drivers from espressif).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Curclamas picture Curclamas  路  4Comments

mistergreen picture mistergreen  路  4Comments

mpatafio picture mpatafio  路  4Comments

0x1abin picture 0x1abin  路  3Comments

paramono picture paramono  路  4Comments