quaintitative

I write about my quantitative explorations in visualisation, data science, machine and deep learning here, as well as other random musings.

For more about me and my other interests, visit playgrd or socials below


Categories
Subscribe

Introduction to PoseNet with three.js

Google has released the tensorflow.js version of PoseNet for a while now.

PoseNet is essentially a pre-trained convolutional neural network (CNN) that is able to detect a human pose in any image. This paper provides the technical details on PoseNet.

This tensorflow.js version of PoseNet is actually really simple and straight-forward to use if you have any experience with Javascript.

All you need to do is to load PoseNet from a CDN, use the browser to access the webcam, feed the webcam’s images into the trained PoseNet model, and it will spit out all the data on the locations of your body parts - eyes, ears, etc.

You can then use the x and y coordinates of each part of your body for anything.

I’m fairly amazed at the ease of use, and the potential applications of PoseNet.

Before this, one had to buy a XBOX Kinect, jump through some loops to get the drivers up, and then figure out the data coming out from the Kinect. Now that I have played around with PoseNet, I can see why Microsoft discontinued the Kinect. And I can certainly think about selling off my Kinect.

Actually, all one needs to do to use PoseNet is to go to the tensorflow.js Github site and copy and paste the code of the entire demo there.

But I thought it would be good to provide an even more gentle introduction to PoseNet, so I’ve simplified the main code from tensorflow.js Github site’s official demo considerably. And instead of just rendering the position of each body part in HTML canvas (which is what the tensorflow.js Github site does), I decided to use three.js spheres to track the body parts. I figured that it would be more fun to feel like you are moving a 3 dimensional object through virtual space.

Before, we start, take a look at the demo here

You will have to grant access to your webcam. Don’t worry about that. As all the computing is being carried out on your browser, I’m not saving your images to any server anywhere. In this case, what happens on your browser, stays on your browser (which is kind of another side advantage of doing such stuff directly in your browser). I’ve also blurred and de-saturated the feed from the webcam.

So first off, in our HTML file, we insert the CDN links to the necessary libraries.

<script src="https://cdnjs.cloudflare.com/ajax/libs/three.js/97/three.js"></script>
<script src="https://unpkg.com/@tensorflow/tfjs"></script>
<script src="https://unpkg.com/@tensorflow-models/posenet"></script>

And then we set up the divs for the webcam video, to draw the video to screen, and for the three.js canvas.

<div class='flex'>
    <div id='main' style='display:none'>
        <video id="video" playsinline style=" -moz-transform: scaleX(-1);
        -o-transform: scaleX(-1);
        -webkit-transform: scaleX(-1);
        transform: scaleX(-1);
        display: none;
        ">
        </video>
        <canvas id="output" />
    </div>  
</div>

<div id='threeContainer' class='flex'>
</div>

Now we move on to the main part - sketch.js.

We first set up the three.js scene.

// three.js setup

const width = 250;
const height = 250;

// Setup scene
const scene = new THREE.Scene();

//  We use an orthographic camera here instead of persepctive one for easy mapping
//  Bounded from 0 to width and 0 to height
// Near clipping plane of 0.1; far clipping plane of 1000
const camera = new THREE.OrthographicCamera(0,width,0,height, 0.1, 1000);
camera.position.z = 500;

// Setting up the renderer
const renderer = new THREE.WebGLRenderer( { antialias: true } );
renderer.setPixelRatio( window.devicePixelRatio );
renderer.setSize( width, height );
renderer.setClearColor( 0xDE3C4B, 1 );

// Attach the threejs animation to the div with id of threeContainer
const container = document.getElementById( 'threeContainer' );
container.appendChild( renderer.domElement );

// Scene lighting
const hemiLight     = new THREE.HemisphereLight('#EFF6EE', '#EFF6EE', 0 );
hemiLight.position.set( 0, 0, 0 );
scene.add( hemiLight );

const group = new THREE.Group();

I create a Tracker class to help me in creating and moving spheres on the screen.

// Creating Tracker class
function Tracker(){
    this.position = new THREE.Vector3();

    const geometry = new THREE.SphereGeometry(10,7,7);
    const material = new THREE.MeshToonMaterial({ color: 0xEFF6EE, 
                                                opacity:0.5, 
                                                transparent:true, 
                                                wireframe:true, 
                                                emissive: 0xEFF6EE,
                                                emissiveIntensity:1})

    const sphere = new THREE.Mesh(geometry, material);
    group.add(sphere);

    this.initialise = function() {
    this.position.x = -10;
    this.position.y = -10;
    this.position.z = 0;
    }

    this.update = function(x,y,z){
    this.position.x = x;
    this.position.y = y;
    this.position.z = z;
    }

    this.display = function() {
    sphere.position.x = this.position.x;
    sphere.position.y = this.position.y;
    sphere.position.z = this.position.z;

    // console.log(sphere.position);
    }
}

And add them to the scene.

scene.add( group );

Now we move on to the parts that load the PoseNet model, feed in the webcam images, and get at the positions of the body parts. I will just go into the essential parts here.

First, the functions to setup the camera and load the video.

// Load camera
async function setupCamera() {
    if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) {
    throw new Error(
        'Browser API navigator.mediaDevices.getUserMedia not available');
    }

    const video = document.getElementById('video');
    video.width = width;
    video.height = height;

    const mobile = isMobile();
    const stream = await navigator.mediaDevices.getUserMedia({
    'audio': false,
    'video': {
        facingMode: 'user',
        width: mobile ? undefined : width,
        height: mobile ? undefined : height,
    },
    });
    video.srcObject = stream;

    return new Promise((resolve) => {
    video.onloadedmetadata = () => {
        resolve(video);
    };
    });
}

async function loadVideo() {
    const video = await setupCamera();
    video.play();

    return video;
}

I then initialise my 17 trackers (corresponding to the 17 body parts tracked by PoseNet).

// Initialise trackers to attach to body parts recognised by posenet model

let trackers = [];

for (let i=0; i<17; i++){
    let tracker = new Tracker();
    tracker.initialise();
    tracker.display();

    trackers.push(tracker);
}

Then we setup the main render and detect functions.

let net;

// Main animation loop
function render(video, net) {
    const canvas = document.getElementById('output');
    const ctx = canvas.getContext('2d');

    // Flip the webcam image to get it right
    const flipHorizontal = true;

    canvas.width = width;
    canvas.height = height;

    async function detect() {

    // Load posenet
    net = await posenet.load(0.5);

    // Scale the image. The smaller the faster
    const imageScaleFactor = 0.75;

    // Stride, the larger, the smaller the output, the faster
    const outputStride = 32;

    // Store all the poses
    let poses = [];
    let minPoseConfidence;
    let minPartConfidence;

    const pose = await net.estimateSinglePose(video, 
                                                imageScaleFactor, 
                                                flipHorizontal, 
                                                outputStride);
    poses.push(pose);

    // Show a pose (i.e. a person) only if probability more than 0.1
    minPoseConfidence = 0.1;
    // Show a body part only if probability more than 0.3
    minPartConfidence = 0.3;

    ctx.clearRect(0, 0, width, height);

    const showVideo = true;

    if (showVideo) {
        ctx.save();
        ctx.scale(-1, 1);
        ctx.translate(-width, 0);
        // ctx.filter = 'blur(5px)';
        ctx.filter = 'opacity(50%) blur(3px) grayscale(100%)';
        ctx.drawImage(video, 0, 0, width, height);
        ctx.restore();
    }

    poses.forEach(({score, keypoints}) => {
        if (score >= minPoseConfidence) {
        keypoints.forEach((d,i)=>{
            if(d.score>minPartConfidence){
            // console.log(d.part);
            // Positions need some scaling
            trackers[i].update(d.position.x*0.5, d.position.y*0.5-height/4,0);
            trackers[i].display();
            }
            // Move out of screen if body part not detected
            else if(d.score<minPartConfidence){
            trackers[i].update(-10,-10,0);
            trackers[i].display();
            }
        })
        }
    });

    renderer.render( scene, camera );
    requestAnimationFrame(detect);
    }

    detect();

}

I’ve provided comments on the key parts of the code above, so you should not have much difficulty understanding these lines. This part of the code in the official demo is fairly more complicated and offers more options, but I’ve rewritten this, and taken out a lot of the extra lines to allow one to more easily understand the how to load and use PoseNet.

Now we just run all the functions in the main function.

async function main() {
    // Load posenet
    const net = await posenet.load(0.75);

    document.getElementById('main').style.display = 'block';
    let video;

    try {
    video = await loadVideo();
    } catch (e) {
    let info = document.getElementById('info');
    info.textContent = 'this browser does not support video capture,' +
        'or this device does not have a camera';
    info.style.display = 'block';
    throw e;
    }

    render(video, net);
}

navigator.getUserMedia = navigator.getUserMedia ||
    navigator.webkitGetUserMedia || navigator.mozGetUserMedia;


main();

And that’s it. The full code is available here.


Articles

Comparing Prompts for Different Large Language Models (Other than ChatGPT)
AI and UIs
Listing NFTs
Extracting and Processing Wikidata datasets
Extracting and Processing Google Trends data
Extracting and Processing Reddit datasets from PushShift
Extracting and Processing GDELT GKG datasets from BigQuery
Some notes relating to Machine Learning
Some notes relating to Python
Using CCapture.js library with p5.js and three.js
Introduction to PoseNet with three.js
Topic Modelling
Three.js Series - Manipulating vertices in three.js
Three.js Series - Music and three.js
Three.js Series - Simple primer on three.js
HTML Scraping 101
(Almost) The Simplest Server Ever
Tweening in p5.js
Logistic Regression Classification in plain ole Javascript
Introduction to Machine Learning Right Inside the Browser
Nature and Math - Particle Swarm Optimisation
Growing a network garden in D3
Data Analytics with Blender
The Nature of Code Ported to Three.js
Primer on Generative Art in Blender
How normal are you? Checking distributional assumptions.
Monte Carlo Simulation of Value at Risk in Python
Measuring Expected Shortfall in Python
Style Transfer X Generative Art
Measuring Market Risk in Python
Simple charts | crossfilter.js and dc.js
d3.js vs. p5.js for visualisation
Portfolio Optimisation with Tensorflow and D3 Dashboard
Setting Up a Data Lab Environment - Part 6
Setting Up a Data Lab Environment - Part 5
Setting Up a Data Lab Environment - Part 4
Setting Up a Data Lab Environment - Part 3
Setting Up a Data Lab Environment - Part 2
Setting Up a Data Lab Environment - Part 1
Generating a Strange Attractor in three.js
(Almost) All the Most Common Machine Learning Algorithms in Javascript
3 Days of Hand Coding Visualisations - Day 3
3 Days of Hand Coding Visualisations - Day 2
3 Days of Hand Coding Visualisations - Day 1
3 Days of Hand Coding Visualisations - Introduction