Recently, we built NeetoRecord, a loom alternative. The desktop application was built using Electron. In a series of blogs, we capture how we built the desktop application and the challenges we ran into. This blog is part 3 of the blog series. You can also read about part 1, part 2, part 4, part 5, part 6, part 7 part 8 and part 9.
Modern tools like Zoom and Google Meet allow us to blur or completely replace our background in real-time video, creating a polished and distraction-free environment regardless of where we are.
This is possible because of advancements in machine learning. In this blog, we'll explore how to achieve real-time background blurring and replacement using TensorFlow's body segmentation capabilities.
TensorFlow body segmentation is a computer vision technique that involves dividing an image into distinct regions corresponding to different parts of a human body. It typically employs deep learning models, such as convolutional neural networks (CNNs), to analyze an image and predict pixel-level labels. These labels indicate whether each pixel belongs to a specific body part, like the head, torso, arms, or legs.
The segmentation process often starts with a pre-trained model, which has been trained on large datasets. The model processes the input image through multiple layers of convolutions and pooling, gradually refining the segmentation map. The final output is a precise mask that outlines each body part, allowing for applications in areas like augmented reality, fitness tracking, and virtual try-ons.
To learn more about Tensorflow and body segmentation, check out the below resources.
We'll create a simple React app that streams video from the webcam.
import React, { useRef, useEffect } from "react";
const App = () => {
const videoRef = useRef(null);
useEffect(() => {
const getVideo = async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({
video: true,
});
if (videoRef.current) {
videoRef.current.srcObject = stream;
}
} catch (err) {
console.error("Error accessing webcam: ", err);
}
}
getVideo();
return () => {
if (videoRef.current && videoRef.current.srcObject) {
videoRef.current.srcObject.getTracks().forEach(track => track.stop());
}
};
}, []);
return (
<div>
<video ref={videoRef} autoPlay width="640" height="480" style={transform: 'scaleX(-1)'}/>
</div>
);
}
export default App;
In the code above, we render a <video>
element, and once the app is mounted,
we obtain the video stream from the user's webcam using
navigator.mediaDevices.getUserMedia
. This call will prompt the user to grant
permission to access their camera. Once the user grants permission, the video
stream is captured and rendered in the <video>
element.
Next, let's add the necessary TensorFlow packages.
yarn add @tensorflow/tfjs-core @tensorflow/tfjs-converter @tensorflow-models/body-segmentation @mediapipe/selfie_segmentation
@tensorflow/tfjs-core
is the core JavaScript package for TensorFlow,
@tensorflow-models/body-segmentation
contains all the functions we need for
body segmentation, and @mediapipe/selfie_segmentation
is our pre-trained
model.
The TensorFlow body segmentation package provides a pre-trained
MediaPipeSelfieSegmentation
model for segmenting the human body in images and
videos. This model is specifically designed for the upper body. If our
requirement involves the entire body, we may want to consider other models like
BodyPix.
We need to load this model to create a segmenter;
import * as bodySegmentation from "@tensorflow-models/body-segmentation";
const createSegmenter = async () => {
const model = bodySegmentation.SupportedModels.MediaPipeSelfieSegmentation;
const segmenterConfig = {
runtime: "mediapipe",
solutionPath: "https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation",
modelType: "general",
};
return bodySegmentation.createSegmenter(model, segmenterConfig);
};
We load the model from a CDN, configure the runtime as mediapipe
, and set the
modelType to general
. Then, we create the segmenter
using the
bodySegmentation.createSegmenter
method.
// ./videoBackground.js
import * as bodySegmentation from "@tensorflow-models/body-segmentation";
const createSegmenter = async () => {
const model = bodySegmentation.SupportedModels.MediaPipeSelfieSegmentation;
const segmenterConfig = {
runtime: "mediapipe",
solutionPath: "https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation",
modelType: "general",
};
return bodySegmentation.createSegmenter(model, segmenterConfig);
};
class VideoBackground {
#segmenter;
getSegmenter = async () => {
if (!this.#segmenter) {
this.#segmenter = await createSegmenter();
}
return this.#segmenter;
};
}
const videoBackground = new VideoBackground();
export default videoBackground;
Here, we define a VideoBackground
class and create an instance of it. Inside
the class, the getSegmenter
function ensures that the segmenter
is created
only once, so we don't have to recreate it each time.
Before we continue further, let's update our demo app. Since we are going to
modify the video, we need a <canvas/>
to display the modified video. Add that
to our demo app.
// rest of the code...
const App = () => {
const canvasRef = useRef();
// rest of the code...
return (
<div>
<video
ref={videoRef}
autoPlay
width="640"
height="480"
style={{ display: "none" }}
/>
<canvas ref={canvasRef} width="640" height="480" style={transform: 'scaleX(-1)'}/>
</div>
);
}
Also, hide the <video>
element by setting display: "none"
since we don't
want to display the raw video.
Next, create a function within the VideoBackground
class to blur the video.
// rest of the code...
class VideoBackground {
// rest of the code...
#animationId;
stop = () => {
cancelAnimationFrame(this.#animationId);
};
blur = async (canvas, video) => {
const foregroundThreshold = 0.5;
const edgeBlurAmount = 15;
const flipHorizontal = false;
const blurAmount = 5;
const segmenter = await this.getSegmenter();
const processFrame = async () => {
const segmentation = await segmenter.segmentPeople(video);
await bodySegmentation.drawBokehEffect(
canvas,
video,
segmentation,
foregroundThreshold,
blurAmount,
edgeBlurAmount,
flipHorizontal
);
this.#animationId = requestAnimationFrame(processFrame);
};
this.#animationId = requestAnimationFrame(processFrame);
};
}
The blur
function takes video
and canvas
references. It uses
requestAnimationFrame
to continuously draw the resulting image onto the
canvas
. First, it creates a body segmentation using the
segmenter.segmentPeople
function by passing the video reference. This allows
us to identify which pixels belong to the background and foreground.
To achieve the blurred effect, we use the bodySegmentation.drawBokehEffect
function, which applies a blur to the background pixels. This function accepts
additional configurations like foregroundThreshold
, blurAmount
, and
edgeBlurAmount
, which we can adjust to customize the effect.
We've also added a stop
function to halt video processing by canceling the
recursive requestAnimationFrame
calls.
import React, { useRef, useEffect, useState } from "react";
function App() {
const [cameraReady, setCameraReady] = useState(false);
// rest of the code...
<video
// rest of the code...
onLoadedMetadata={() => setCameraReady(true)}
/>;
// rest of the code...
}
Before calling the blur
function, ensure the video is loaded by waiting for
the onLoadedMetadata
event to be triggered.
All set; let's blur the video background.
import React, { useRef, useEffect, useState } from "react";
import videoBackground from "./videoBackground";
function App() {
const [cameraReady, setCameraReady] = useState(false);
const videoRef = useRef(null);
const canvasRef = useRef();
useEffect(() => {
async function getVideo() {
try {
const stream = await navigator.mediaDevices.getUserMedia({
video: true,
});
if (videoRef.current) {
videoRef.current.srcObject = stream;
}
} catch (err) {
console.error("Error accessing webcam: ", err);
}
}
getVideo();
return () => {
if (videoRef.current && videoRef.current.srcObject) {
videoRef.current.srcObject.getTracks().forEach(track => track.stop());
}
};
}, []);
useEffect(() => {
if (!cameraReady) return;
videoBackground.blur(canvasRef.current, videoRef.current);
return () => {
videoBackground.stop();
};
}, [cameraReady]);
return (
<div className="App">
<video
ref={videoRef}
autoPlay
width="640"
height="480"
style={{ display: "none" }}
onLoadedMetadata={() => setCameraReady(true)}
/>
<canvas ref={canvasRef} width="640" height="480" />
</div>
);
}
export default App;
Here, we added another useEffect
that triggers when cameraReady
is true
.
Inside this useEffect
, we call the videoBackground.blur
function, passing
the canvas
and video
refs. When the component unmounts, we stop the video
processing by calling the videoBackground.stop()
function.
If we feel that just blurring is not enough and want to completely replace the
background, we need to remove the background from the video and place an
<img/>
behind the <canvas/>
. To remove the background, we can utilize the
bodySegmentation.toBinaryMask
function. This function will return an
ImageData with its
alpha channel being 255
for the background and 0
for the foreground. We can
use this info in the original data and set the background pixels' alpha to be
transparent
.
// rest of the code...
class VideoBackground {
// rest of the code...
remove = async (canvas, video) => {
const context = canvas.getContext("2d");
const segmenter = await this.getSegmenter();
const processFrame = async () => {
context.drawImage(video, 0, 0);
const segmentation = await segmenter.segmentPeople(video);
const coloredPartImage = await bodySegmentation.toBinaryMask(
segmentation
);
const imageData = context.getImageData(
0,
0,
video.videoWidth,
video.videoHeight
);
// imageData format; [R,G,B,A,R,G,B,A...]
// below for loop iterate through alpha channel
for (let i = 3; i < imageData.data.length; i += 4) {
// Background pixel's alpha will be 255.
if (coloredPartImage.data[i] === 255) {
imageData.data[i] = 0; // this is a background pixel's alpha. Make it fully transparent
}
}
await bodySegmentation.drawMask(canvas, imageData);
this.#animationId = requestAnimationFrame(processFrame);
};
this.#animationId = requestAnimationFrame(processFrame);
};
}
Similar to the blurring process, inside processFrame
, we first create the
segmentation using segmenter.segmentPeople
and convert it to a binary mask
using bodySegmentation.toBinaryMask
. We then obtain the original image data
with context.getImageData
. Next, we loop through the image data to make the
background pixels transparent. Finally, we draw the result on the canvas using
bodySegmentation.drawMask
.
Before calling this function, let's modify our demo app by adding an option to
switch between none
, blur
, and image
effects, rather than removing the
blur function. Additionally, include a background image.
const BACKGROUND_OPTIONS = ["none", "blur", "image"];
function App() {
const [backgroundType, setBackgroundType] = useState(BACKGROUND_OPTIONS[0]);
// rest of the code...
return (
<div>
// rest of the code...
{backgroundType === "image" && (
<img
style={{
position: "absolute",
top: 0,
bottom: 0,
width: "640px",
height: "480px",
}}
src="/bgImage.png"
/>
)}
// rest of the code...
<div>
<select
value={backgroundType}
onChange={e => setBackgroundType(e.target.value)}
>
{BACKGROUND_OPTIONS.map(option => (
<option value={option} key={option}>
{option}
</option>
))}
</select>
</div>
</div>
);
}
Here, we added a <select>
element to choose between none
, blur
, and
image
, and an <img>
element to display the background image, which will
serve as our virtual background.
All set. Now, let's update the useEffect
.
useEffect(() => {
if (!cameraReady || backgroundType === "none") return;
const bgFn =
backgroundType === "blur" ? videoBackground.blur : videoBackground.remove;
bgFn(canvasRef.current, videoRef.current);
return () => {
videoBackground.stop();
};
}, [cameraReady, backgroundType]);
Based on the selection, we will call either videoBackground.blur
or
videoBackground.remove
.
Full working example can be found in this Github repo.
If this blog was helpful, check out our full blog archive.