Demo
Note: I am using a free tier option on Heroku/ to host my API as a service, this means it will shut itself down and will be slow for the first attempt as it starts up the container.
Note also: If you’re face is not detected it may be because part of your face is covered by your hair. Also the models used in this API use DLib’s frontal face detector. Which means if you are not facing the camera it may not detect your face. I also restricted the number of faces to only allow one face to save computation.
More important note: This will not work on mobiles as the javascript needs updating for full cross browser support.
Load demo
Inspiration
The inspiration for this article came from MachineBox where they offer their products free for open source projects. MachineBox are really making waves with their on-premises containerisation of machine learning capabilities. The particular inspiration for this blog came from their FaceBox product which is incredibly powerful and simple to run. However, I wanted to add in some extra features into my API such as Landmark Detection and Pose Estimation.
You can see the whole project over on my GitHub repo
High Level Overview
Building an Environment
As usual we will be using Docker to create a reproducable environment. By doing this from the outset of development we can ensure that the API will be horizontally scalable, that way, if we were to somehow get over 500 requests per minute, we could just spin up another container. We also want to make this container as light as possible so we should start from an alpine Docker image and build it up from there.
Note: If this was to be productionalised it could be made even lighter by reducing the number of layers and removing any unnessesary libraries.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
FROM python:3-alpine3.6 ENV CC="/usr/bin/clang" CXX="/usr/bin/clang++" OPENCV_VERSION="3.3.0" RUN echo -e '@edgunity http://nl.alpinelinux.org/alpine/edge/community\n\ @edge http://nl.alpinelinux.org/alpine/edge/main\n\ @testing http://nl.alpinelinux.org/alpine/edge/testing\n\ @community http://dl-cdn.alpinelinux.org/alpine/edge/community'\ >> /etc/apk/repositories RUN apk add --update --no-cache \ # --virtual .build-deps \ build-base \ openblas-dev \ unzip \ wget \ cmake \ #Intel® TBB, a widely used C++ template library for task parallelism' libtbb@testing \ libtbb-dev@testing \ # Wrapper for libjpeg-turbo libjpeg \ # accelerated baseline JPEG compression and decompression library libjpeg-turbo-dev \ # Portable Network Graphics library libpng-dev \ # A software-based implementation of the codec specified in the emerging JPEG-2000 Part-1 standard (development files) jasper-dev \ # Provides support for the Tag Image File Format or TIFF (development files) #tiff-dev \ # Libraries for working with WebP images (development files) #libwebp-dev \ # A C language family front-end for LLVM (development files) clang-dev \ linux-headers \ # Additional python packages && pip install numpy imutils requests flask RUN mkdir /opt && cd /opt && \ wget https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip && \ unzip ${OPENCV_VERSION}.zip && \ rm -rf ${OPENCV_VERSION}.zip RUN mkdir -p /opt/opencv-${OPENCV_VERSION}/build && \ cd /opt/opencv-${OPENCV_VERSION}/build && \ cmake \ -D CMAKE_BUILD_TYPE=RELEASE \ -D CMAKE_INSTALL_PREFIX=/usr/local \ -D WITH_FFMPEG=NO \ -D WITH_IPP=ON \ -D WITH_OPENEXR=NO \ -D WITH_TBB=YES \ -D BUILD_EXAMPLES=NO \ -D BUILD_ANDROID_EXAMPLES=NO \ -D INSTALL_PYTHON_EXAMPLES=NO \ -D BUILD_DOCS=NO \ -D BUILD_opencv_python2=NO \ -D BUILD_opencv_python3=ON \ -D PYTHON3_EXECUTABLE=/usr/local/bin/python \ -D PYTHON3_INCLUDE_DIR=/usr/local/include/python3.6m/ \ -D PYTHON3_LIBRARY=/usr/local/lib/libpython3.so \ -D PYTHON_LIBRARY=/usr/local/lib/libpython3.so \ -D PYTHON3_PACKAGES_PATH=/usr/local/lib/python3.6/site-packages/ \ -D PYTHON3_NUMPY_INCLUDE_DIRS=/usr/local/lib/python3.6/site-packages/numpy/core/include/ \ .. && \ make VERBOSE=1 && \ make && \ make install && \ rm -rf /opt/opencv-${OPENCV_VERSION} # Making an app directory RUN mkdir -p /app/data/models && \ mkdir /app/src && \ mkdir /app/data/input && \ mkdir /app/data/output # Facial landmark detection model RUN cd /app/data/models && \ wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2 && \ bzip2 -d shape_predictor_68_face_landmarks.dat.bz2 # Installing dlib RUN apk add --no-cache git && \ git clone https://github.com/davisking/dlib.git && \ cd dlib/examples && mkdir build && cd build && cmake .. -DUSE_AVX_INSTRUCTIONS=ON && cmake --build . --config Release && \ cd ../.. && python setup.py install # Baking code into container ADD src/*.py /app/src/ # Adding alias for the client RUN echo 'alias client="clear && python /app/src/client.py"' >> ~/.profile && \ source ~/.profile RUN pip install --no-cache-dir flask-cors Flask-Uploads pytest pytest-xdist pytest-sugar # Running our API as the entrypoint WORKDIR /app/src ENTRYPOINT ["python"] CMD ["server.py"] |
For conveniance we can also create a docker-compose file to save the user having to have knowledge of Docker to be able to start this container with the configurations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
version: "2" services: opencv: build: . image: challisa/opencv container_name: opencv-wordpress ports: - "8686:8686" volumes: - ./src:/app/src stdin_open: true tty: true environment: - PORT=8686 |
Creating a Flask API
This should be relatively simple implementation of Flask, as we always expect to recieve an image and we will send back a payload describing the image. We will seperate this out into a file called server.py which will import the Facial Analysis object that we create in the next section.
Note we also import two other files: settings.py and helpers.py. The helper function is a wrapper that will send an abort message (400 error) if at any point the Facial Analysis fails. We also pull through a variable from settings.py whether we will be upsampling the image when running the bounding box detection.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
from flask import Flask, request, Response from flask_cors import CORS import os import helpers import settings from oo_face import FaceAPI as FaceAPIv1 # Initialize the Flask application app = Flask(__name__); CORS(app, resources=r'/api/*') @helpers.abort_on_fail @app.route('/api/v1/face', methods=['POST']) def check_image(): face = FaceAPIv1(blob=request.files['image'], upsample_bb=settings.upscale_bb) payload = face.get_payload(verbose=False) return Response(response=payload, status=200, mimetype="application/json") # start flask app (tried processes=8 threaded was much better) app.run(host="0.0.0.0", port=int(os.getenv('PORT')), debug=True, threaded=True) |
Developing a Facial Analysis Object
Since we may be getting multiple requests at a time and we want to implement multi-threading to improve the performance of our API it makes sense to create an instance with arributes and methods for each of the algorithms we will be using. Therefore we will be creating a Facial Analysis object. We will be incorporating three main methods; bounding box estimation, facial landmark detection and pose estimation.
Note that we will have some dependancies to manage and hence will have to split the multithreading into different sections. Pose estimation is dependant on the facial landmarks, which are also dependant on the bounding box estimation.
Before we can get to all of these fancy algorithms we have to be able to load the image from our API. We will do this along with some other convenience operations when we initialise the object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
from time import time import cv2 import helpers class FaceAPI(object): '''This class will analyze a given photo (tested formats: jpg) and return a payload of information it found regarding the image supplied. Example: ```python3 face = FaceAPI(blob=request.files['image']) pl = face.get_payload(verbose=True) ``` Profiling: ~10ms per image ''' def __init__(self, blob, upsample_bb=0): ''' Set upsample_bb=1 to upsample the image during face detection on bounding box method, note that it comes with a time cost ''' self.start_time = time() self.upsample_bb = upsample_bb # Loads FileStorage object from flask or path self.blob = helpers.load_blob(blob) self.file_name = self.blob.filename # Load the image from byte string, get attributes and greyscale self.original_image = helpers.load_image(self.blob) self.grey_image = cv2.cvtColor(self.original_image, cv2.COLOR_BGR2GRAY) self.height, self.width, _ = self.original_image.shape # Initial settings for vars self.BoundingBoxContained = False self.Reason = '' self.main() |
Bounding Box Estimation
Theory – see pyimagesearch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
import dlib import helpers detector = dlib.get_frontal_face_detector() class FaceAPI(object): #... see above section for init def get_bounding_box(self): ''' Detects all faces within the supplied image if there is only 1 face detected then it will add a bounding box to the data using DLIB's face detector Example: http://dlib.net/face_detector.py.html ''' rects = detector(self.grey_image, self.upsample_bb) self.FacesCount = len(rects) if self.FacesCount < 1: self.Reason += 'No faces detected' elif self.FacesCount > 1: self.Reason += 'Detected {} faces'.format(self.FacesCount) else: self._bounding_box = rects[0] self.BoundingBox = helpers.prettify_bb(rects[0]) self.BoundingBoxContained = self.BoundingBox['Left'] > 0 and \ self.BoundingBox['Left'] + self.BoundingBox['Width'] < self.width and \ self.BoundingBox['Top'] > 0 and \ self.BoundingBox['Top'] + self.BoundingBox['Height'] < self.height self.Reason += "Bounding box wasn't contained" if not self.BoundingBoxContained else '' self.Success = bool(self.FacesCount) and self.BoundingBoxContained |
Facial Landmark Detection
Theory – see learnopencv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
import dlib import helpers import settings predictor = dlib.shape_predictor(settings.facial_landmarks_model) class FaceAPI(object): #... see above section for init + bounding box def get_facial_landmarks(self): ''' Adds facial landmarks to the data using DLIB's facial landmark predictor Example: http://dlib.net/face_landmark_detection.py.html ''' self.facial_landmarks = helpers.shape_to_np(predictor(self.grey_image, self._bounding_box)) self.PointChin = self.facial_landmarks[8] self.PointNose = self.facial_landmarks[30] self.PointLeftEyeLeft = self.facial_landmarks[36] self.PointRightEyeRight = self.facial_landmarks[45] self.PointMouthLeft = self.facial_landmarks[48] self.PointMouthRight = self.facial_landmarks[54] self.PointCheekLeft = self.facial_landmarks[0] self.PointCheekRight = self.facial_landmarks[16] |
Pose Estimation
Theory – see learnopencv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
import math import numpy as np import cv2 class FaceAPI(object): #... see above section for init + facial landmarks def get_pose(self): ''' Uses the Facial landmarks to make an approximation to the persons pose obviously we need to make some assumptions on the camera angle, position focal length etc We also use an appoximated 3d facial model found from: (https://www.learnopencv.com/head-pose-estimation-using-opencv-and-dlib/) The projections found come from: (https://github.com/jerryhouuu/Face-Yaw-Roll-Pitch-from-Pose-Estimation-using-OpenCV) ''' #2D image points. image_points = np.array([ self.PointNose, self.PointChin, self.PointLeftEyeLeft, self.PointRightEyeRight, self.PointMouthLeft, self.PointMouthRight ], dtype='double') # 3D model points. model_points = np.array([ (0.0, 0.0, 0.0), # Nose tip (0.0, -330.0, -65.0), # Chin (-225.0, 170.0, -135.0), # Left eye left corner (225.0, 170.0, -135.0), # Right eye right corne (-150.0, -150.0, -125.0), # Left Mouth corner (150.0, -150.0, -125.0) # Right mouth corner ]) # Camera internals center = (self.width/2, self.height/2) focal_length = center[0] / np.tan(60/2 * np.pi / 180) camera_matrix = np.array( [[focal_length, 0, center[0]], [0, focal_length, center[1]], [0, 0, 1]], dtype = 'double' ) dist_coeffs = np.zeros((4,1)) # Assuming no lens distortion (success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE) axis = np.float32([[500,0,0], [0,500,0], [0,0,500]]) imgpts, jac = cv2.projectPoints(axis, rotation_vector, translation_vector, camera_matrix, dist_coeffs) modelpts, jac2 = cv2.projectPoints(model_points, rotation_vector, translation_vector, camera_matrix, dist_coeffs) rvec_matrix = cv2.Rodrigues(rotation_vector)[0] proj_matrix = np.hstack((rvec_matrix, translation_vector)) eulerAngles = cv2.decomposeProjectionMatrix(proj_matrix)[6] pitch, yaw, roll = [math.radians(theta) for theta in eulerAngles] self.Roll = np.round(-math.degrees(roll), settings.payload_scores_dp); self.RollPFN = [int(x) for x in np.round(imgpts[0].ravel())] self.Pitch = np.round(math.degrees(pitch), settings.payload_scores_dp); self.PitchPFN = [int(x) for x in np.round(imgpts[1].ravel())] self.Yaw = np.round(math.degrees(yaw), settings.payload_scores_dp); self.YawPFN = [int(x) for x in np.round(imgpts[2].ravel())] |
Designing a Front End
I’m not going to claim to have spent the time in designing the front end that I used! I started off by using Dan Markov’s “Take a selfie with javascript” jsfiddle. Once we have this set up we simplay have to add an API call when the photo is taken. We can do this using an AJAX call, once the photo has been taken it will be stored in a canvas object which we can conver to a blob and send that to our API.
Note: drawResponse is a function that draws the bounding box, pose and facial landmarks ontop of our canvas.
Note: showResponse is a function that displays the response from the API using renderjson.js below the image.
Note: We also define a modal using the tingle.js library
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
function sendToAPI() { var canvas = document.querySelector('canvas'); var FACEAPI ='http://0.0.0.0:8686/api/v1/face'; canvas.toBlob(function (blob) { var formData = new FormData(); formData.append('image', blob, 'webcam_' + (new Date).getTime().toString() + '.jpg'); $.ajax(FACEAPI, { method: 'POST', data: formData, processData: false, contentType: false, success: function(response) { showResponse(response); if (response.Success) { drawResponse(response, canvas); } else { modal.setContent(response.Reason); modal.open(); } }, error: function (msg) { console.log(msg); } }); }); } |