How to achieve dual streaming using OpenCV in computer vision applications?

Computer vision applications like smart surveillance, video conferencing, social distancing, and smart traffic systems are becoming more popular these days. But they don’t come easy since these applications require real-time streaming and AI processing – both of which have their challenges.

For instance, during real-time AI processing, the application faces a common problem of FPS drop that, in turn, can affect live streaming capabilities. It’s because computer vision involves processing every frame from real-time streaming. This can be a time-consuming process – causing many to skip the frames and lose the data.

Hence, to overcome this problem, e-con Systems™ has designed e-CAM83_USB – a 4K HDR USB camera with dual node streaming. e-CAM83_USB is equipped with two video nodes so that one node can be used for live streaming without interruption while the other node handles the detection process. As a result, real-time streaming computer vision applications can leverage their multiprocessing or multithreading technique to continuously stream without FPS drop. We will learn more about the features of the product in a later section.

In this blog, let’s see how you can access two video nodes in a video conferencing application (using e-CAM83_USB or other camera modules). We will also learn how to achieve live streaming as well as detecting a speaker’s face – using OpenCV with a sample source code.

Enabling dual streaming for smart video conferencing

For the purpose of learning how dual streaming can be achieved, we will take smart video conferencing as a use case. With the help of OpenCV and Dlib face landmark, you can detect the speaker in a live video conferencing setup while displaying the detected person in a separate window. Using dual-stream video nodes with multiprocessing, you can easily identify the person talking without any FPS drop.

OpenCV and installation

OpenCV (Open Source Computer Vision Library) is a library of programming functions aimed at developing real-time computer vision applications. It is a great tool for image processing and can perform tasks like face detection, objection tracking, landmark detection, and more. OpenCV supports multiple languages including Python, Java, C++, etc.

In this sample application, you will be seeing how it works with Python. The Python library has hundreds of useful functions and algorithms – all freely available. Some of these functions are used in almost every computer vision task!

For this application, let’s use two-node H.264 and MJPEG formats. For streaming H.264, OpenCV should be installed from the source with Gstreamer enabled. 

Here are the instructions to install Gstreamer with opencv in ubuntu – 18.04 // python 3.6:

sudo apt-get install gstreamer1.0*

sudo apt install ubuntu-restricted-extras

sudo apt install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev

sudo apt-get install build-essential

sudo apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev

sudo apt-get install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libdc1394-22-dev

sudo apt-get install python3-pip python3-numpy

git clone https://github.com/opencv/opencv.git

cd opencv/

git clone https://github.com/opencv/opencv_contrib.git

mkdir build

cd build

cmake -D CMAKE_BUILD_TYPE=RELEASE \

-D CMAKE_INSTALL_PREFIX=/usr/local \

-D OPENCV_GENERATE_PKGCONFIG=ON \

-D OPENCV_EXTRA_MODULES_PATH=~/opencv/opencv_contrib/modules \

-D INSTALL_PYTHON_EXAMPLES=ON \

-D INSTALL_C_EXAMPLES=OFF \

-D PYTHON_EXECUTABLE=$(which python3) \

-D BUILD_opencv_python2=OFF \

-D PYTHON3_EXECUTABLE=$(which python3) \

-D PYTHON3_INCLUDE_DIR=$(python3 -c “from distutils.sysconfig import get_python_inc; print(get_python_inc())”) \

-D PYTHON3_PACKAGES_PATH=$(python3 -c “from distutils.sysconfig import get_python_lib; print(get_python_lib())”) \

-D WITH_GSTREAMER=ON \

-D WITH_GSTREAMER_0_10=ON \

-D BUILD_EXAMPLES=ON ..

sudo make -j4

sudo make install

sudo ldconfig

Dlib for face detection and recognition

The dlib library is one of the most popular packages for face recognition. A Python package named “face recognition” encapsulates dlib’s face recognition functions into a convenient API. This facial detector comes with pre-trained models to estimate the location of 68 coordinates (x, y) that map the facial points on a human’s face. The following figure depicts these coordinates:

Figure 1 – facial points in Dlib’s facial recognition package

How the application works

Building a dual streaming solution for a video conferencing system to identify the speaker and display him/her on a separate window involves the following steps:

  1. Streaming two nodes in OpenCV
  2. Detecting the face
  3. Detecting the speaker
  4. Displaying the detected person

Let us look at each of them in detail.

Streaming two nodes in OpenCV

There are two nodes from the camera – while the H.264 node is used for continuously delivering the video stream, the MJPEG stream is used for processing. Here you can use multiprocessing to reduce the FPS drop.

Detecting the face

You can use Haar Cascades to identify the face from the MJPEG stream and pass it to Dlib for getting the 68 facial landmarks.

Detecting the person who is talking

After getting the 68 facial landmarks, you can easily detect whether a person is talking or not by calculating the Mouth Aspect Ratio (MAR).

To calculate MAR, first compute the euclidean distances between the two sets of vertical mouth landmarks (x, y)-coordinates by using the following methodology

A = dist.euclidean(mouth[2], mouth[10]) # 51, 59

B = dist.euclidean(mouth[4], mouth[8]) # 53, 57

 compute the euclidean distance between the horizontal mouth landmark (x, y)-coordinates

C = dist.euclidean(mouth[0], mouth[6]) # 49, 55

 compute the mouth aspect ratio

Once you derive the values for A, B, and C, MAR can be calculated as follows:

MAR = (A + B) / (2.0 * C)

Displaying the detected person

Once MAR is calculated, the threshold value is fixed to detect the person who is talking, and the coordinates of the person talking is identified. Then, the image can be cropped and resized for displaying by using multiprocessing.

Source code for dual streaming configuration

To download the source code for the sample application discussed above, please visit this link.

e-CAM83_USB – Enabling new-age computer vision experiences with dual streaming

e-CAM83_USB from e-con Systems™ is a SONY-based 4K HDR USB high-speed camera with dual-stream support to simultaneously receive two streams with different resolutions. In addition, this camera’s HDR capability helps capture superior images without data loss – even in the most challenging lighting conditions.

Based on the Sony IMX317, a 1/2.5″ sensor, e-CAM83_USB has a powerful high-performance ISP – providing high-quality images with Auto White Balance (AWB), Auto Gain Control, and Auto Exposure functions.

Please watch the below video to learn more about the features and applications of e-CAM83_USB.

Need more information on this best-fit video conferencing camera? Check out our FAQs.

If you need help selecting an ideal video conferencing camera solution – no matter your application, please write to us at camerasolutions@e-consystems.com. You can also visit our Camera Selector to get a full view of e-con Systems’ camera portfolio.

Related posts

The Ultimate Guide to Depth Perception and 3D Imaging Technologies

Make your existing NVIDIA® Jetson Orin™ devices faster with Super Mode

e-con Systems Launches See3CAM_CU83 – A 4K RGB-IR USB Camera Delivering Stunning Visuals Day and Night