Do you know Cozmo? The friendly robot from Anki? Well...here he is...
Cozmo is a programmable robot that has many features...and one of those includes a camera...so you can Cozmo take a picture of something...and then do something with that picture...
To code for Cozmo you need to use Python...actually...Python 3 -;)
For this blog, we're going to need a couple of things...so let's install them...
pip3 install ‘cozmo[camera]’ |
This will install the Cozmo SDK...and you will need to install the Cozmo app in your phone as well...
If you have the SDK installed already, you may want to upgrade it because if you don't have the latest version it might not work...
pip3 install --upgrade cozmo |
Now, we need a couple of extra things...
sudo apt-get install python-pygame pip3 install pillow pip3 install numpy |
pygame is a games framework
pillow is a wrapper around the PIL library and it's used to manage images.
numpy allows us to manage complex numbers in Python.
That was the easy part...as now we need to install OpenCV...which allows to manipulate images and video...
This one is a little bit tricky, so if you get stuck...search on Google or just drop me a message...
First, make sure that OpenCV is not installed by removing it...unless you are sure it's working properly for you...
pillow is a wrapper around the PIL library and it's used to manage images.
numpy allows us to manage complex numbers in Python.
That was the easy part...as now we need to install OpenCV...which allows to manipulate images and video...
This one is a little bit tricky, so if you get stuck...search on Google or just drop me a message...
First, make sure that OpenCV is not installed by removing it...unless you are sure it's working properly for you...
sudo apt-get uninstall opencv |
Then, install the following prerequisites...
sudo apt-get install build-essential cmake pkg-config yasm python-numpy sudo apt-get install libjpeg-dev libjpeg8-dev libtiff5-dev libjasper-dev libpng12-dev sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev libdc1394-22-dev sudo apt-get install libxvidcore-dev libx264-dev libxine-dev libfaac-dev sudo apt-get install libgtk-3-dev libtbb-dev libqt4-dev libmp3lame-dev sudo apt-get install libatlas-base-dev gfortran sudo apt-get install libopencore-amrnb-dev libopencore-amrwb-dev libtheora-dev libxvidcore-dev x264 v4l-utils |
If by any chance, something is not available on your system, simply remove it from the list and try again...unless you're like me and want to spend hours trying to get everything...
Now, we need to download the OpenCV source code so we can build it...from the source...
Now, we need to download the OpenCV source code so we can build it...from the source...
wget https://github.com/opencv/opencv/archive/3.4.0.zip unzip opencv-3.4.0.zip //This should produce the folder opencv-3.4.0 |
Then, we need to download the contributions because there are some things not bundled in OpenCV by default...and you might need them for any other project...
wget https://github.com/opencv/opencv_contrib/archive/3.4.0.zip unzip opencv-contrib-3.4.0.zip //This should produce the folder opencv_contrib-3.4.0 |
As we have both folders, we can start compiling...
cd opencv-3.4.0 mkdir build cd build cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D INSTALL_PYTHON_EXAMPLES=OFF -D CMAKE_CXX_COMPILER=/usr/bin/g++ -D INSTALL_C_EXAMPLES=OFF -D OPENCV_EXTRA_MODULES_PATH=/YourPath/opencv_contrib-3.4.0/modules -D PYTHON_EXECUTABLE=/usr/bin/python3.6 -D WITH_FFMPEG=OFF -D BUILD_OPENCV_APPS=OFF -D BUILD_OPENCD_TS=OFF -D WITH_LIBV4L=OFF -D WITH_CUDA=OFF -D WITH_V4L=ON -D WITH_QT=ON -D WITH_LAPACK=OFF -D WITH_OPENCV_BIOINSPIRED=OFF -D WITH_XFEATURES2D=ON -D WITH_OPENCL=OFF -D WITH_FACE=ON -D ENABLE_PRECOMPILED_HEADERS=ON -D WITH_OPENCL=OFF -D WITH_OPENCL_SVM=OFF -D WITH_OPENCLAMDFFT=OFF -D WITH_OPENCLAMDBLAS=OFF -D WITH_OPENCV_DNN=OFF -D BUILD_OPENCV_APPS=ON -D BUILD_EXAMPLES=OFF .. |
Keep extra attention that you need to pass the correct path to your opencv_contrib folder...so it's better to pass the full path to avoid making errors...
And yes...that's a pretty long command for a build...and it took me a long time to make it work...as you need to figure out all the parameters...
Once we're done, we need to make it...as cmake will prepare the recipe...
And yes...that's a pretty long command for a build...and it took me a long time to make it work...as you need to figure out all the parameters...
Once we're done, we need to make it...as cmake will prepare the recipe...
make -j2 |
If there's any mistake, simply do this...
make clean make |
Then, we can finally install OpenCV by doing this...
sudo make install sudo ldconfig |
To test that it's working properly...simply do this...
python3 >>>import cv2 |
If you don't have any errors...then we're good to go -;)
That was quite a lot of work...anyway...we need an extra tool to make sure our image get nicely processed...
Download textcleaner and put in the same folder as your Python script...
And...just in case you're wondering...yes...we're going to have Cozmo take a picture...we're going to process it...use SAP Leonardo's OCR API and then have Cozmo read it back to us...cool, huh?
SAP Leonardo's OCR API is still on version 2Alpha1...but regardless of that...it works amazing well -;)
Although keep in mind that if the result is not always pretty accurate that because of the lighting, the position of the image, your handwritting and the fact that the OCR API is still in Alpha...
Ok...so first things first...we need a white board...
And yes...my hand writing is far from being good... -:(
Now, let's jump into the source code...
CozmoOCR.py |
---|
import cozmo from cozmo.util import degrees import PIL import cv2 import numpy as np import os import requests import json import re import time import pygame import _thread def input_thread(L): input() L.append(None) def process_image(image_name): image = cv2.imread(image_name) img = cv2.resize(image, (600, 600)) img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) blur = cv2.GaussianBlur(img, (5, 5), 0) denoise = cv2.fastNlMeansDenoising(blur) thresh = cv2.adaptiveThreshold(denoise, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2) blur1 = cv2.GaussianBlur(thresh, (5, 5), 0) dst = cv2.GaussianBlur(blur1, (5, 5), 0) cv2.imwrite('imggray.png', dst) cmd = './textcleaner -g -e normalize -o 12 -t 5 -u imggray.png out.png' os.system(cmd) def ocr(): url = "https://sandbox.api.sap.com/ml/ocr/ocr" img_path = "out.png" files = {'files': open (img_path, 'rb')} headers = { 'APIKey': "APIKey", 'Accept': "application/json", } response = requests.post(url, files=files, headers=headers) json_response = json.loads(response.text) json_text = json_response['predictions'][0] json_text = re.sub('\n',' ',json_text) json_text = re.sub('3','z',json_text) json_text = re.sub('0|O','o',json_text) return json_text def cozmo_program(robot: cozmo.robot.Robot): robot.camera.color_image_enabled = False L = [] _thread.start_new_thread(input_thread, (L,)) robot.set_head_angle(degrees(20.0)).wait_for_completed() while True: if L: filename = "Message" + ".png" pic_filename = filename latest_image = robot.world.latest_image.raw_image latest_image.convert('L').save(pic_filename) robot.say_text("Picture taken!").wait_for_completed() process_image(filename) message = ocr() print(message) robot.say_text(message, use_cozmo_voice=True, duration_scalar=0.5).wait_for_completed() break pygame.init() cozmo.run_program(cozmo_program, use_viewer=True, force_viewer_on_top=True) |
Let's analyze the code a little bit...
We're going to use threads, as we need to have a window where we can see what Cozmo is looking at and another with Pygame where we can press "Enter" as command to have Cozmo taking a picture.
Basically, when we run the application, Cozmo will move his head and get into picture mode...then, if we press "Enter" (On the terminal screen) it will take a picture and then send it to our OpenCV processing function.
This function will simply grab the image, scale it, make it grayscale, do a GaussianBlur to blur the image and remove the noise and reduce detail. Then we're going to apply a denoising to get rid of dust and fireflies...apply a threshold to separate the white and black pixels, and apply a couple more blurs...
Finally we're to call textcleaner to further remove noise and make the image cleaner...
So, here is the original picture taken by Cozmo...
This is the picture after our OpenCV post-processing...
And finally, this is our image after using textcleaner...
Finally, once we have the image the way we wanted, we can call the OCR API which is pretty straightforward...
To get the API Key, simply go to https://api.sap.com/api/ocr_api/overview and log in...
Once we have the response back from the API, we can do some Regular Expressions cleanup just to make sure some characters doesn't get wrongly recognized...
Finally, we can have Cozmo to read the message out loud -;) And just for demonstration purposes...
Here, I was lucky enough that the lighting and everything was perfectly setup...so it was a pretty clean response...further tests were pretty bad -:( But again...it's important to have good lighting...
Of course...you wan to see a video of the process in action, right? Well...funny enough...my first try was perfect! Even better than this one...but I didn't shoot the video -:( Further tries were pretty crappy until I could get something acceptable...and this is what you're going to watch now...the sun coming through the window didn't helped me...but it's pretty good anyway...
Hope you liked this blog -:)
Greetings,
Blag.
SAP Labs Network.
We're going to use threads, as we need to have a window where we can see what Cozmo is looking at and another with Pygame where we can press "Enter" as command to have Cozmo taking a picture.
Basically, when we run the application, Cozmo will move his head and get into picture mode...then, if we press "Enter" (On the terminal screen) it will take a picture and then send it to our OpenCV processing function.
This function will simply grab the image, scale it, make it grayscale, do a GaussianBlur to blur the image and remove the noise and reduce detail. Then we're going to apply a denoising to get rid of dust and fireflies...apply a threshold to separate the white and black pixels, and apply a couple more blurs...
Finally we're to call textcleaner to further remove noise and make the image cleaner...
So, here is the original picture taken by Cozmo...
This is the picture after our OpenCV post-processing...
And finally, this is our image after using textcleaner...
Finally, once we have the image the way we wanted, we can call the OCR API which is pretty straightforward...
To get the API Key, simply go to https://api.sap.com/api/ocr_api/overview and log in...
Once we have the response back from the API, we can do some Regular Expressions cleanup just to make sure some characters doesn't get wrongly recognized...
Finally, we can have Cozmo to read the message out loud -;) And just for demonstration purposes...
Of course...you wan to see a video of the process in action, right? Well...funny enough...my first try was perfect! Even better than this one...but I didn't shoot the video -:( Further tries were pretty crappy until I could get something acceptable...and this is what you're going to watch now...the sun coming through the window didn't helped me...but it's pretty good anyway...
Hope you liked this blog -:)
Greetings,
Blag.
SAP Labs Network.