lunes, 30 de julio de 2018

Cozmo, read to me


Do you know Cozmo? The friendly robot from Anki? Well...here he is...

Cozmo is a programmable robot that has many features...and one of those includes a camera...so you can Cozmo take a picture of something...and then do something with that picture...

To code for Cozmo you need to use Python...actually...Python 3 -;)

For this blog, we're going to need a couple of things...so let's install them...

pip3 install ‘cozmo[camera]’

This will install the Cozmo SDK...and you will need to install the Cozmo app in your phone as well...

If you have the SDK installed already, you may want to upgrade it because if you don't have the latest version it might not work...

pip3 install --upgrade cozmo

Now, we need a couple of extra things...

sudo apt-get install python-pygame
pip3 install pillow
pip3 install numpy

pygame is a games framework
pillow is a wrapper around the PIL library and it's used to manage images.
numpy allows us to manage complex numbers in Python.

That was the easy part...as now we need to install OpenCV...which allows to manipulate images and video...

This one is a little bit tricky, so if you get stuck...search on Google or just drop me a message...

First, make sure that OpenCV is not installed by removing it...unless you are sure it's working properly for you...

sudo apt-get uninstall opencv

Then, install the following prerequisites...

sudo apt-get install build-essential cmake pkg-config yasm python-numpy

sudo apt-get install libjpeg-dev libjpeg8-dev libtiff5-dev libjasper-dev 
libpng12-dev

sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev 
libv4l-dev libdc1394-22-dev

sudo apt-get install libxvidcore-dev libx264-dev libxine-dev libfaac-dev

sudo apt-get install libgtk-3-dev libtbb-dev libqt4-dev libmp3lame-dev

sudo apt-get install libatlas-base-dev gfortran

sudo apt-get install libopencore-amrnb-dev libopencore-amrwb-dev 
libtheora-dev libxvidcore-dev x264 v4l-utils

If by any chance, something is not available on your system, simply remove it from the list and try again...unless you're like me and want to spend hours trying to get everything...

Now, we need to download the OpenCV source code so we can build it...from the source...

wget https://github.com/opencv/opencv/archive/3.4.0.zip
unzip opencv-3.4.0.zip //This should produce the folder opencv-3.4.0

Then, we need to download the contributions because there are some things not bundled in OpenCV by default...and you might need them for any other project...

wget https://github.com/opencv/opencv_contrib/archive/3.4.0.zip
unzip opencv-contrib-3.4.0.zip 
//This should produce the folder opencv_contrib-3.4.0

As we have both folders, we can start compiling...

cd opencv-3.4.0
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE 
-D CMAKE_INSTALL_PREFIX=/usr/local 
-D INSTALL_PYTHON_EXAMPLES=OFF 
-D CMAKE_CXX_COMPILER=/usr/bin/g++ 
-D INSTALL_C_EXAMPLES=OFF 
-D OPENCV_EXTRA_MODULES_PATH=/YourPath/opencv_contrib-3.4.0/modules 
-D PYTHON_EXECUTABLE=/usr/bin/python3.6 
-D WITH_FFMPEG=OFF 
-D BUILD_OPENCV_APPS=OFF 
-D BUILD_OPENCD_TS=OFF 
-D WITH_LIBV4L=OFF 
-D WITH_CUDA=OFF 
-D WITH_V4L=ON 
-D WITH_QT=ON 
-D WITH_LAPACK=OFF 
-D WITH_OPENCV_BIOINSPIRED=OFF 
-D WITH_XFEATURES2D=ON 
-D WITH_OPENCL=OFF 
-D WITH_FACE=ON 
-D ENABLE_PRECOMPILED_HEADERS=ON 
-D WITH_OPENCL=OFF 
-D WITH_OPENCL_SVM=OFF 
-D WITH_OPENCLAMDFFT=OFF 
-D WITH_OPENCLAMDBLAS=OFF 
-D WITH_OPENCV_DNN=OFF 
-D BUILD_OPENCV_APPS=ON 
-D BUILD_EXAMPLES=OFF ..

Keep extra attention that you need to pass the correct path to your opencv_contrib folder...so it's better to pass the full path to avoid making errors...

And yes...that's a pretty long command for a build...and it took me a long time to make it work...as you need to figure out all the parameters...

Once we're done, we need to make it...as cmake will prepare the recipe...

make -j2

If there's any mistake, simply do this...

make clean
make

Then, we can finally install OpenCV by doing this...

sudo make install
sudo ldconfig

To test that it's working properly...simply do this...

python3
>>>import cv2


If you don't have any errors...then we're good to go -;)

That was quite a lot of work...anyway...we need an extra tool to make sure our image get nicely processed...

Download textcleaner and put in the same folder as your Python script...

And...just in case you're wondering...yes...we're going to have Cozmo take a picture...we're going to process it...use SAP Leonardo's OCR API and then have Cozmo read it back to us...cool, huh?
SAP Leonardo's OCR API is still on version 2Alpha1...but regardless of that...it works amazing well -;)

Although keep in mind that if the result is not always pretty accurate that because of the lighting, the position of the image, your handwritting and the fact that the OCR API is still in Alpha...

Ok...so first things first...we need a white board...


And yes...my hand writing is far from being good... -:(

Now, let's jump into the source code...


CozmoOCR.py
import cozmo
from cozmo.util import degrees
import PIL
import cv2
import numpy as np
import os
import requests
import json
import re
import time
import pygame
import _thread

def input_thread(L):
    input()
    L.append(None)

def process_image(image_name):
 image = cv2.imread(image_name)
 
 img = cv2.resize(image, (600, 600))
 img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
 
 blur = cv2.GaussianBlur(img, (5, 5), 0)
 denoise = cv2.fastNlMeansDenoising(blur)
 thresh = cv2.adaptiveThreshold(denoise, 255, 
                 cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
 blur1 = cv2.GaussianBlur(thresh, (5, 5), 0)
 dst = cv2.GaussianBlur(blur1, (5, 5), 0)
 
 cv2.imwrite('imggray.png', dst)
 
 cmd = './textcleaner -g -e normalize -o 12 -t 5 -u imggray.png out.png'
 
 os.system(cmd) 

def ocr():
 url = "https://sandbox.api.sap.com/ml/ocr/ocr"
 
 img_path = "out.png"
 
 files = {'files': open (img_path, 'rb')}
 
 headers = {
     'APIKey': "APIKey",
     'Accept': "application/json",
 }
 
 response = requests.post(url, files=files, headers=headers)
 
 json_response = json.loads(response.text)
 json_text = json_response['predictions'][0]
 json_text = re.sub('\n',' ',json_text)
 json_text = re.sub('3','z',json_text)
 json_text = re.sub('0|O','o',json_text) 
 return json_text

def cozmo_program(robot: cozmo.robot.Robot):
 robot.camera.color_image_enabled = False
 L = []
 _thread.start_new_thread(input_thread, (L,))
 robot.set_head_angle(degrees(20.0)).wait_for_completed()
 while True:
  if L:
   filename = "Message" + ".png"
   pic_filename = filename
   latest_image = robot.world.latest_image.raw_image
   latest_image.convert('L').save(pic_filename)
   robot.say_text("Picture taken!").wait_for_completed()
   process_image(filename)
   message = ocr()
   print(message)
   robot.say_text(message, use_cozmo_voice=True, 
                                       duration_scalar=0.5).wait_for_completed()
   break

pygame.init()
cozmo.run_program(cozmo_program, use_viewer=True, force_viewer_on_top=True)


Let's analyze the code a little bit...

We're going to use threads, as we need to have a window where we can see what Cozmo is looking at and another with Pygame where we can press "Enter" as command to have Cozmo taking a picture.

Basically, when we run the application, Cozmo will move his head and get into picture mode...then, if we press "Enter" (On the terminal screen) it will take a picture and then send it to our OpenCV processing function.

This function will simply grab the image, scale it, make it grayscale, do a GaussianBlur to blur the image and remove the noise and reduce detail. Then we're going to apply a denoising to get rid of dust and fireflies...apply a threshold to separate the white and black pixels, and apply a couple more blurs...

Finally we're to call textcleaner to further remove noise and make the image cleaner...

So, here is the original picture taken by Cozmo...


This is the picture after our OpenCV post-processing...


And finally, this is our image after using textcleaner...

Finally, once we have the image the way we wanted, we can call the OCR API which is pretty straightforward...

To get the API Key, simply go to https://api.sap.com/api/ocr_api/overview and log in...

Once we have the response back from the API, we can do some Regular Expressions cleanup just to make sure some characters doesn't get wrongly recognized...

Finally, we can have Cozmo to read the message out loud -;) And just for demonstration purposes...


Here, I was lucky enough that the lighting and everything was perfectly setup...so it was a pretty clean response...further tests were pretty bad -:( But again...it's important to have good lighting...

Of course...you wan to see a video of the process in action, right? Well...funny enough...my first try was perfect! Even better than this one...but I didn't shoot the video -:( Further tries were pretty crappy until I could get something acceptable...and this is what you're going to watch now...the sun coming through the window didn't helped me...but it's pretty good anyway...


Hope you liked this blog -:)

Greetings,

Blag.
SAP Labs Network.