sábado, 15 de diciembre de 2018

Hey Vector, who do I look like?


I have played with Cozmo in the past, so when Vector came out...I knew I needed to do something with it ;)

So...what’s Vector?


Pretty much a black Cozmo? Well...yes and no :) Vector has a better processor, 4 cores, a microphone, almost double amount of parts, a better camera and colorful display.

As you know...I’m a really big fan of SAP Leonardo Machine Learning APIs...as they allow you to easily consume Machine Learning services.

For this blog I wanted to do something that I have always liked...take a picture of someone and then compare it with photos of famous actors and actresses and see who this person resembles the most ;)

So, let’s start :D

Installing the Vector SDK

Make sure that Vector is connected to the Internet by using Vectors app on IPhone or Android. Here’s a nice video on how to do that.

Once your Vector is connected to the Internet...make sure to simply kill the Vector's app on your phone.

The Vector SDK was only available to the people who back Anki on their Kickstarter campaign.
...but since November 11th, the SDK is on Public Alpha! :D Which means...you can finally get your hands on it ;)

If by any chance you got the SDK installed before...remove it before moving forward…

python3 -m pip uninstall anki_vector

Then simply install it by doing this…

python3 -m pip install --user anki_vector

Then, you need to authenticate your Vector…

python3 -m anki_vector.configure

You will be asked for Vector’s name, ip address and serial number. Also you will be requested for your Anki Cloud Credentials.

To get this information simply put Vector on his charger...and press his top twice. This will give you his name, then lift up and down his handle in order to the IP. The serial number is on Vector’s bottom.

The Learning Phase


First things first...we need a bunch of pictures from famous people...for that I relied on The Movie DB website...


I went and download almost randomly 100 images of both men and woman. I didn’t went into each person but rather saved the “thumbnails”.

Now, there’s an SAP Leonardo API called “Inference Service for Face Feature Extraction” which basically grabs and image, determine if there’s a face or not and then extracts its features...like the color of the eyes, form on the mouth, hair and so on...and that information is returned in a nice although pretty much impossible to decipher Vector of Features. I mean...they look just like numbers...and they can mean anything :P

Anyway...I created a folder called “People” and drop all the 100 images. So, next step is of course get all the features for all the images...and manually its obviously not only hard but pointless...it’s way better to optimize the process ;)

One programming language that I grown to love is Crystal...Fast as C, slick as Ruby, yep...pretty much a better way of doing Ruby :)

Installation is pretty easy and you can find instructions here but I’m using Ubuntu on VMWare, so here are the instruction for it…

On a terminal window copy and paste this…

curl -sSL https://dist.crystal-lang.org/apt/setup.sh | sudo bash

Then simply do this…

sudo apt-get update

sudo apt install crystal

Installation of the following modules is optional but recommended…

sudo apt install libssl-dev      # for using OpenSSL
sudo apt install libxml2-dev     # for using XML
sudo apt install libyaml-dev     # for using YAML
sudo apt install libgmp-dev      # for using Big numbers
sudo apt install libreadline-dev # for using Readline

Once we’re done...it’s time to write the application…first create a folder called “Features”.

Call your script PeopleGenerator.cr and copy and paste the following code…


PeopleGenerator.cr
require "http"
require "json"

class FaceFeature
  JSON.mapping({
    face_feature: Array(Float64)
  })
end

class Predictions
  JSON.mapping({
 faces: Array(FaceFeature)
  })
end

class Person
  JSON.mapping({
 id: String,
 predictions: Array(Predictions)
  })
end

folder = Dir.new("#{__DIR__}/People")
while photo = folder.read
  if photo != "." && photo != ".." && photo != "Features"
 io = IO::Memory.new
 builder = HTTP::FormData::Builder.new(io)

 File.open("#{__DIR__}/People/" + photo) do |file|
  metadata = HTTP::FormData::FileMetadata.new(filename: photo)
  headers = HTTP::Headers{"Content-Type" => "image/jpg"}
  builder.file("files", file, metadata, headers)
 end
 builder.finish

 headers = HTTP::Headers{"Content-Type" => builder.content_type, 
                                "APIKey" => "YourAPIKey",
                                "Accept" => "application/json"}
 response = HTTP::Client.post("https://sandbox.api.sap.com/ml/
                                      facefeatureextraction/
                                      face-feature-extraction", body: io.to_s , 
                                      headers: headers)
 
 feature_name = "#{__DIR__}/Features/" + File.basename(photo, ".jpg") + ".txt"
 
 puts photo 
 
 File.write(feature_name, Person.from_json(response.body).predictions[0].
                   faces[0].face_feature)
 sleep  2.second
  end
end

command = "zip -r -j features.zip #{__DIR__}/Features"
Process.run("sh", {"-c", command})

puts "Done."

Let’s explain the code before we check the results…

require "http"
require "json"

We need these two libraries to be able to call the SAP Leonardo API and also to be able to read and extract the results…

class FaceFeature
  JSON.mapping({
    face_feature: Array(Float64)
  })
end

class Predictions
  JSON.mapping({
faces: Array(FaceFeature)
  })
end

class Person
  JSON.mapping({
id: String,
predictions: Array(Predictions)
  })
end

This is the JSON mapping that we need to use to extract the information coming back from the API.

folder = Dir.new("#{__DIR__}/People")
while photo = folder.read
  if photo != "." && photo != ".." && photo != "Features"
io = IO::Memory.new
builder = HTTP::FormData::Builder.new(io)

File.open("#{__DIR__}/People/" + photo) do |file|
metadata = HTTP::FormData::FileMetadata.new(filename: photo)
headers = HTTP::Headers{"Content-Type" => "image/jpg"}
builder.file("files", file, metadata, headers)
end
builder.finish

headers = HTTP::Headers{"Content-Type" => builder.content_type, "APIKey" => "YourAPIKey,"Accept" => "application/json"}
response = HTTP::Client.post("https://sandbox.api.sap.com/ml/facefeatureextraction/face-feature-extraction", body: io.to_s , headers: headers)

feature_name = "#{__DIR__}/Features/" + File.basename(photo, ".jpg") + ".txt"

puts photo

File.write(feature_name, Person.from_json(response.body).predictions[0].faces[0].face_feature)
sleep  2.second
  end
end

command = "zip -r -j features.zip #{__DIR__}"
Process.run("sh", {"-c", command})

puts "Done."

This section is larger, first we specify the folder from the images will be read. Then for each image we will if it’s a picture or a folder structure...of course we want images only…

Then, we create a FormData builder in order to avoid having to base64 encode the images...put them in JSON payload and so on...this way is easier and native…

We open each image and feed the FormData metadata and headers.

Also, we need to pass the extra “headers” required by SAP Leonardo.

Once that is done, we can simply call the REST API, and then we create a “Feature Name” which is going to be the name of the generated file...basically the image name with an “.txt” extension.

For each file we’re going to extract the feature vector from the JSON response, write the file and give 2 seconds delay just to not overflow the API call…

Once that’s done, we simply call a “zip” command from the terminal and zip it…


Now, the zip file will contain a 100 files...each with the features of all the images that we have on our “People” folder.

Simply as that...we have trained our application ;)

The Testing and Execution Phase


I know that usually you test your model first...but for this once...we can do both at the same time ;)

We’re going to create a Python script that will deal with taking our picture...call the Features API on that image and then call another API to determine who we do look like…

Let’s create a script called GuessWho.py


GuessWho.py
import anki_vector
import threading
import requests
import os
import json
import time
import subprocess
import re
import math
from PIL import Image
from anki_vector.events import Events
from anki_vector.util import degrees

event_done = False
said_text = False
new_width  = 184
new_height = 96

def main():
    args = anki_vector.util.parse_command_args()
    with anki_vector.Robot(args.serial, enable_face_detection=True, 
                           enable_camera_feed=True) as robot:
        evt = threading.Event()

        def on_robot_observed_face(event_type, event):

            global said_text
            if not said_text
                said_text = True
                robot.say_text("Taking Picture!")
                image = robot.camera.latest_image
                image.save("Temp.png")
                robot.say_text("Picture Taken!")
                evt.set()

        robot.behavior.set_head_angle(degrees(45.0))
        robot.behavior.set_lift_height(0.0)

        robot.events.subscribe(on_robot_observed_face, Events.robot_observed_face)

        try:
            if not evt.wait(timeout=10):
                print("---------------------------------")
        except KeyboardInterrupt:
            pass

def guess_who():
    args = anki_vector.util.parse_command_args()
    with anki_vector.Robot(args.serial) as robot: 
        url = "https://sandbox.api.sap.com/ml/facefeatureextraction/
               face-feature-extraction"
                
        img_path = "Temp.png"
        files = {'files': open (img_path, 'rb')}

        headers = {
            'APIKey': "YourAPIKey",
            'Accept': "application/json",
        }
    
        response = requests.post(url, files=files, headers=headers)
  
        robot.say_text("I'm processing your picture!")
    
        json_response = json.loads(response.text)
        json_text = json_response['predictions'][0]['faces'][0]['face_feature']
    
        f = open("myfile.txt", "w")
        f.write(str(json_text))
        f.close()
    
        time.sleep(1)
    
        p = subprocess.Popen('zip -u features.zip myfile.txt', shell=True)
    
        time.sleep(1)
    
        url = "https://sandbox.api.sap.com/ml/similarityscoring/similarity-scoring"
    
        files = {'files': ("features.zip", open ("features.zip", 'rb'), 
                 'application/zip')}
        params = {'options': '{"numSimilarVectors":100}'}
    
        response = requests.post(url, data=params, files=files, headers=headers)
        json_response = json.loads(response.text)

        robot.say_text("I'm comparing your picture with one hundred other pictures!")

        for x in range(len(json_response['predictions'])):
            if json_response['predictions'][x]['id'] == "myfile.txt":
                name, _ = os.path.splitext(json_response['predictions'][x]
                          ['similarVectors'][0]['id']) 
                name = re.findall('[A-Z][^A-Z]*', name)
                full_name = " ".join(name)
                pic_name = "People/" + "".join(name) + ".jpg"
                avg = json_response['predictions'][x]['similarVectors'][0]['score']
                robot.say_text("You look like " + full_name + 
                               " with a confidence of " + 
                                str(math.floor(avg * 100)) + " percent")
                image_file = Image.open(pic_name)
                image_file = image_file.resize((new_width, new_height), 
                                                Image.ANTIALIAS)  
                screen_data = anki_vector.screen.convert_image_to_screen_data(
                                                                   image_file)
                robot.behavior.set_head_angle(degrees(45.0))
                robot.conn.release_control()
                time.sleep(1)
                robot.conn.request_control()                
                robot.screen.set_screen_with_image_data(screen_data, 0.0)
                robot.screen.set_screen_with_image_data(screen_data, 25.0)
                
                print(full_name)
                print(str(math.floor(avg * 100)) + " percent")

                time.sleep(5)

if __name__ == '__main__':
    main()
    guess_who()

This script is bigger...so let’s make sure we understand everything that is going on…

import anki_vector
import threading
import requests
import os
import json
import time
import subprocess
import re
import math
from PIL import Image
from anki_vector.events import Events
from anki_vector.util import degrees

That’s a lot of libraries :) The first one is pretty obvious...is how we can connect to Vector ;)

The second one is to handle “threads” as we need to do a couple of things asynchronously.

The third one is to handle the call to the APIs.

The fourth one is to handle folder access.

The fifth one is to handle the JSON response coming back from the API.

The sixth one is so that we can have a delay in the execution of the application.

The seventh is to be able to call terminal commands.

The eight one is to use Regular Expressions.

The ninth one is to handle math operations.

The tenth one is to handle image operations.

The eleventh is to handle events as we want Vector to try to detect our face.

The last one is to be able to move Vectors face.

def main():
    args = anki_vector.util.parse_command_args()
    with anki_vector.Robot(args.serial, enable_face_detection=True, enable_camera_feed=True) as robot:
        evt = threading.Event()

        def on_robot_observed_face(event_type, event):

            global said_text
            if not said_text:
                said_text = True
                robot.say_text("Taking Picture!")
                image = robot.camera.latest_image
                image.save("Temp.png")
                robot.say_text("Picture Taken!")
                evt.set()

        robot.behavior.set_head_angle(degrees(45.0))
        robot.behavior.set_lift_height(0.0)

        robot.events.subscribe(on_robot_observed_face, Events.robot_observed_face)

        try:
            if not evt.wait(timeout=5):
                print("---------------------------------")
        except KeyboardInterrupt:
            pass

This one is for sure...our main event :) Here we’re going to open a connection with Vector, and as we can have multiple Vectors...we need to grab the serial number to specify which one we want to use...also we need to activate both face detection and camera feed.

We’re going to start a thread as we need to call an event where Vector tries to detect our face. If he can see us, then he will say “Taking Picture!”...grab the image, save it and then say “Picture Taken!”. After that the event is done...but...while this is happening we can move his head and drop down his handle so that he can see us better.

As you can see we’re subscribed to two events, one to observe our face and the other when our face is there and visible…

def guess_who():
    args = anki_vector.util.parse_command_args()
    with anki_vector.Robot(args.serial) as robot:
        url = "https://sandbox.api.sap.com/ml/facefeatureextraction/face-feature-extraction"
                
        img_path = "Temp.png"
        files = {'files': open (img_path, 'rb')}

        headers = {
            'APIKey': "YourAPIKey",
            'Accept': "application/json",
        }

        response = requests.post(url, files=files, headers=headers)

        robot.say_text("I'm processing your picture!")

        json_response = json.loads(response.text)
        json_text = json_response['predictions'][0]['faces'][0]['face_feature']

        f = open("myfile.txt", "w")
        f.write(str(json_text))
        f.close()

        time.sleep(1)

        p = subprocess.Popen('zip -u features.zip myfile.txt', shell=True)

        time.sleep(1)

        url = "https://sandbox.api.sap.com/ml/similarityscoring/similarity-scoring"

        files = {'files': ("features.zip", open ("features.zip", 'rb'), 'application/zip')}
        params = {'options': '{"numSimilarVectors":100}'}

        response = requests.post(url, data=params, files=files, headers=headers)
        json_response = json.loads(response.text)

        robot.say_text("I'm comparing your picture with one hundred other pictures!")

        for x in range(len(json_response['predictions'])):
            if json_response['predictions'][x]['id'] == "myfile.txt":
                name, _ = os.path.splitext(json_response['predictions'][x]['similarVectors'][0]['id']) 
                name = re.findall('[A-Z][^A-Z]*', name)
                full_name = " ".join(name)
                pic_name = "People/" + "".join(name) + ".jpg"
                avg = json_response['predictions'][x]['similarVectors'][0]['score']
                robot.say_text("You look like " + full_name + " with a confidence of " + str(math.floor(avg * 100)) + " percent")
                image_file = Image.open(pic_name)
                image_file = image_file.resize((new_width, new_height), Image.ANTIALIAS)  
                screen_data = anki_vector.screen.convert_image_to_screen_data(image_file)
                robot.behavior.set_head_angle(degrees(45.0))
                robot.conn.release_control()
                time.sleep(1)
                robot.conn.request_control()                
                robot.screen.set_screen_with_image_data(screen_data, 0.0)
                robot.screen.set_screen_with_image_data(screen_data, 25.0)
                
                print(full_name)
                print(str(math.floor(avg * 100)) + " percent")

                time.sleep(5)

This method will handle to rough parts of our application…

We connect to Vector once again...although this time we don’t need to activate anything as the picture has been already taken.

We pass the URL for the features API.

Then we open our “Temp.png” file which is the image that Vector took from us.

We need to pass the extra header for the SAP Leonardo API.

We call the API and get the JSON response.

Again, we need to extract the feature information from the JSON response. This time however we’re going to create a single file called “myfile.txt”. We going to make the application sleep for a second and then call a terminal process to add “myfile.txt” to our Features.zip file…

Then we sleep again for another second...and this is just not to overflow the API calls…

Here, we’re going to call a different API which is called Inference Service for Similarity Scoring

This API will read all the 101 features files and determine the cosine distance (-1 to 1) from each file compared to one another. This way it can determine which files are closer to each other and hence to whom do we resemble the most...providing us with a percentage of confidence.

This call is a little bit more complicated that the previous one as we need to upload the zip file…

        files = {'files': ("features.zip", open ("features.zip", 'rb'), 'application/zip')}
        params = {'options': '{"numSimilarVectors":100}'}

        response = requests.post(url, data=params, files=files, headers=headers)
        json_response = json.loads(response.text)

Take into account that while we have 101 files...we need to compare 1 file against 100 others...so we pass 100 as the “numSimilarVectors”.

Once we done that, we need to read each section from the JSON response until we find the id that have the value of “myfile.txt”. Once we have that, we use a Regular Expression to extract only the name without the extension. Also, we need to have the name of the image...so in the end we need to have something like this…

full_name = “Nicolas Cage”
pic_name = “People/NicolasCage.jpg”

We need to extract the percentage of confidence as well…

avg = json_response['predictions'][x]['similarVectors'][0]['score'] 

So, we can have Vector saying “You look like Nicolas Cage with a confidence of 75 percent”.

Now...here’s come the fun part ;) We already know how do we look like...but let’s say...we don’t really remember how Nicolas Cage looks like...so let’s take advantage of Vector’s fancy screen and display it there ;) By the way...we need to release control, gain it back and display the image for zero seconds and then re-display it...this is mainly because Vector’s eyes keep blocking the image on the screen...and this a way to prevent that behavior ;)

           image_file = Image.open(pic_name)
                image_file = image_file.resize((new_width, new_height), Image.ANTIALIAS)  
                screen_data = anki_vector.screen.convert_image_to_screen_data(image_file)
                robot.behavior.set_head_angle(degrees(45.0))
                robot.conn.release_control()
                time.sleep(1)
                robot.conn.request_control()                
                robot.screen.set_screen_with_image_data(screen_data, 0.0)
                robot.screen.set_screen_with_image_data(screen_data, 25.0)

First we open the image, then we resize it so it fits on the screen, then we convert it to Vector’s format and finally we display it on the screen, specifying for how long we want it there…

                print(full_name)
                print(str(math.floor(avg * 100)) + " percent")

                time.sleep(5)

We print some information on the screen and then sent it to sleep for 5 seconds so the image doesn’t disappear too quickly ;)

Finally! The most important part of the whole script...calling the functions :P

if __name__ == '__main__':
    main()
    guess_who()

And that’s pretty much it :) We open a terminal window and type…

python3 GuessWho.py

Vector is going to try to look at us and detect our face...he will take a picture...SAP Leonardo APIs are going to be called...we will listened and see who do we look like ;)

Hope you enjoyed this blog...I obviously did :D

And just to wrap up things...here’s a small video…


BTW...this is the picture that Vector took of me...


Greetings,

Blag.
SAP Labs Network.

miércoles, 19 de septiembre de 2018

SAP Leonardo Machine Learning API’s on the Go


Working for the d-shop, first in the Silicon Valley and now in Toronto, allows me to use my creativity and grab any new gadget that hits the market.

This time, it was Oculus Go’s turn 😉 and what’s the Oculus Go? Well, it is an Standalone VR headset, which basically means…no tangled cables 😉

For this project I had the chance to work with either Unity or Unreal Engine…I had used Unity many times to develop Oculus Rift and Microsoft HoloLens applications…so I thought Unreal Engine would be a better choice this time…although I have never used it in a big project before…specially because nothing beats Unreal when it comes to graphics…

With Unreal chosen…I needed to make another decision…C++ or Blueprints…well…while I have used C++ in the past for a couple of Cinder applications…Blueprints looked better as I wanted to develop faster and without too many complications…and well…that’s half of the truth…sometimes Blueprints can become really messy 😊

Just so you know, I used Unreal Engine 4.20.2 and created a Blueprints application.



Since the beginning I knew that I wanted to use SAP Leonardo Machine Learning API’s…as I used them before for my blog “Cozmo, read to me”  where I used a Cozmo Robot, OpenCV and SAP Leonardo’s OCR API to read a whiteboard with a handwritten message and have Cozmo read it out loud.

The idea

This time, I wanted to showcase more than just one API…so I needed to choose which ones…gladly that wasn’t really hard…most API are more “Enterprise” oriented…so that left me with “Image Classification, OCR and Language Translation” …

With all decided…I still needed to figure out how to use those API’s…I mean…Oculus Go is Virtual Reality…so no chance of looking at something, taking a picture and send it to the API…

So, I thought…why don’t I use Blender (which is an Open-Source 3D computer graphics software toolset) and make some models…then I can render those models…take a picture and send it to the API…and having models means…I could turn them into “.fbx” files and load them into Unreal for a nicer experience…

With the OCR and Language Translation API’s…it was different…as I needed images with text…so I decided to use InkScape (which is an Open-Source Vector Graphics Editor).

The implementation

When I first started working on the project…I knew I needed to start step by step…so I first did a Windows version of the App…then ported it to Android (Which was pretty easy BTW) and finally ported it to Oculus Go (Which was kind of painful…)

So, sadly I’m not going to be able to put any source code here…simply because I used Blueprints…and I’m not sure if you would like to reproduce them by hand ☹ You will see what I mean later on this blog…

Anyway…let’s keep going 😊

When I thought about this project, the first thing that came into my mind was…I want to have a d-shop room…with some desks…a sign for each API…some lights would be nice as well…



So, doesn’t look that bad, huh?

Next, I wanted to work on the “Image Classification” API…so I wanted to be fairly similar…but with only one desk in the middle…which later turned into a pedestal…with the 3D objects rotating on top of it…the it should be a space ready to show the results back from the API…also…arrows to let the user change the 3D model…and a house icon to allow the user to go back to the “Showfloor”…




You will notice two things right away…first…what does that ball supposed to be? Well…that’s just a placeholder that will be replaced by the 3D Models 😊 Also…you can see a black poster that says “SAP Leonardo Output”…that’s hidden and only become available when we launch the application…

For the “Optical Character Recognition” and “Language Translation” scenes…it’s pretty much the same although the last one doesn't have arrows 😊





The problems

So that’s pretty much how the scenes are related…but of course…I hit the first issue fast…how to call the API’s using Blueprints? I looked online and most of the plugins are paid ones…but gladly I found a free one that really surprised me…UnrealJSONQuery works like a charm is not that hard to use…but of course…I needed to change a couple of things in the source code (like adding the header for the key and changing the parameter to upload files). Then I simply recompiled it and voila! I got JSON on my application 😉

But you want to know what I changed, right? Sure thing 😊 I simply unzip the file and went to JSONQuery --> Source --> JSONQuery --> Private and opened JsonFieldData.cpp

Here I added a new header with (“APIKey”, “MySAPLeonardoAPIKey”) and then I looked for PostRequestWithFile and change the “file” parameter to “files”…

To compile the source code, I simply created a create a new C++ project, then a “plugins” folder in the root folder of my project and put everything from the downloaded folder…open the project…let it compiled and then I re-created everything from my previous project…once that was done…everything started to work perfectly…

So, let’s see part of the Blueprint used to call the API…




Basically, we need to create the JSON, call the API and then read the result and extract the information.

Everything was going fine and dandy…until I realized that I needed to package the 3D images generated by Blender…I had no idea how to do it…so gladly…the Victory Plugin came to the rescue 😉 Victory has some nodes that allows you to read many directories from inside the generated application…so I was all set 😊

This is how the Victory plugin looks like when using it in a Blueprint…




The Models

For the 3D Models as I said…I used Blender…I modeled them using “Cycles Render”, baked the materials and then render the image using “Blender Render” to be able to generate the .fbx files…





If the apples look kind of metallic or wax like…blame my poor lighting skills ☹

When loaded into Unreal…the models look really nice…


Now…I know you want to see how a full Blueprint screen looks like…this one is for the 3D Models on the Image Classification scene…


Complicated? Well...kind of…usually Blueprints are like that…but they are pretty powerful…

Here’s another one…this time for the “Right Arrow” which allows us to change models…


Looks weird…but works just fine 😉



You may realize that both “Image Classification” and “OCR” both have Right and Left arrows…so I needed to do some reuse of variables and they needed to be shared between Blueprints…so…for that I created a “Game Instance” where I simply create a bunch of public variables that could be then shared and updated.

If you wonder what I used Inkscape for? Well…I wanted to have a kind of Neon Sign image and a handwritten image…



From Android to Oculus Go

You may wonder…why does it changed from Android to the Oculus Go? Aren’t they both Android based? Well…yes…but still…thanks to personal experience…I know that things change a lot…

First…on Android…I created the scenes…and everything was fine…on the Oculus Go…no new scenes were loaded…when I clicked on a sign…the first level loaded itself… ☹ Why? Because I needed to include them in the arrays of scenes to be packaged…

And the funny thing is that the default projects folder for Unreal is “Documents”…so when I tried to add the scene it complained because the path was too long…so I need to clone the project and move it a folder on C:\

Also…when switching from Windows to Android…it was a simple as changing the “Click” to “Touch”…but for Oculus Go…well…I needed to create a “Pawn”…where I put a camera, a motion controller, and a pointer (acting like a laser pointer)…here I switch the “Touch” for a “Motion Controller Thumbstick”…and then from here I needed to control all the navigation details…very tricky…

Another thing that changed completely was the “SAP Leonardo Output”…let’s see how that looked on Android…



Here you can see that I used a “HUD”…so wherever you look…the HUD will go with you…

On the Oculus Go…this didn’t happen at all…first I needed to put a black image as a background…

Then I needed to create an actor and then put the HUD inside…turning it on a 3D HUD…




The final product

When everything was done…I simply packaged my app and load it into the Oculus Go…and by using Vysor I was able to record a simple session so you can see how this looks in real life 😉 Of course…the downside (Because first…I’m lazy to keep figuring out things and second because it’s too much hassle) is that you need to run this from the “Unknown Sources” section on the Oculus Go…but…it’s there and working and that’s all that matters 😉

Here’s the video so you can fully grasp what this application is all about 😊





I hope you like it 😉

Greetings,

Blag.
SAP Labs Network.




lunes, 30 de julio de 2018

Cozmo, read to me


Do you know Cozmo? The friendly robot from Anki? Well...here he is...

Cozmo is a programmable robot that has many features...and one of those includes a camera...so you can Cozmo take a picture of something...and then do something with that picture...

To code for Cozmo you need to use Python...actually...Python 3 -;)

For this blog, we're going to need a couple of things...so let's install them...

pip3 install ‘cozmo[camera]’

This will install the Cozmo SDK...and you will need to install the Cozmo app in your phone as well...

If you have the SDK installed already, you may want to upgrade it because if you don't have the latest version it might not work...

pip3 install --upgrade cozmo

Now, we need a couple of extra things...

sudo apt-get install python-pygame
pip3 install pillow
pip3 install numpy

pygame is a games framework
pillow is a wrapper around the PIL library and it's used to manage images.
numpy allows us to manage complex numbers in Python.

That was the easy part...as now we need to install OpenCV...which allows to manipulate images and video...

This one is a little bit tricky, so if you get stuck...search on Google or just drop me a message...

First, make sure that OpenCV is not installed by removing it...unless you are sure it's working properly for you...

sudo apt-get uninstall opencv

Then, install the following prerequisites...

sudo apt-get install build-essential cmake pkg-config yasm python-numpy

sudo apt-get install libjpeg-dev libjpeg8-dev libtiff5-dev libjasper-dev 
libpng12-dev

sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev 
libv4l-dev libdc1394-22-dev

sudo apt-get install libxvidcore-dev libx264-dev libxine-dev libfaac-dev

sudo apt-get install libgtk-3-dev libtbb-dev libqt4-dev libmp3lame-dev

sudo apt-get install libatlas-base-dev gfortran

sudo apt-get install libopencore-amrnb-dev libopencore-amrwb-dev 
libtheora-dev libxvidcore-dev x264 v4l-utils

If by any chance, something is not available on your system, simply remove it from the list and try again...unless you're like me and want to spend hours trying to get everything...

Now, we need to download the OpenCV source code so we can build it...from the source...

wget https://github.com/opencv/opencv/archive/3.4.0.zip
unzip opencv-3.4.0.zip //This should produce the folder opencv-3.4.0

Then, we need to download the contributions because there are some things not bundled in OpenCV by default...and you might need them for any other project...

wget https://github.com/opencv/opencv_contrib/archive/3.4.0.zip
unzip opencv-contrib-3.4.0.zip 
//This should produce the folder opencv_contrib-3.4.0

As we have both folders, we can start compiling...

cd opencv-3.4.0
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE 
-D CMAKE_INSTALL_PREFIX=/usr/local 
-D INSTALL_PYTHON_EXAMPLES=OFF 
-D CMAKE_CXX_COMPILER=/usr/bin/g++ 
-D INSTALL_C_EXAMPLES=OFF 
-D OPENCV_EXTRA_MODULES_PATH=/YourPath/opencv_contrib-3.4.0/modules 
-D PYTHON_EXECUTABLE=/usr/bin/python3.6 
-D WITH_FFMPEG=OFF 
-D BUILD_OPENCV_APPS=OFF 
-D BUILD_OPENCD_TS=OFF 
-D WITH_LIBV4L=OFF 
-D WITH_CUDA=OFF 
-D WITH_V4L=ON 
-D WITH_QT=ON 
-D WITH_LAPACK=OFF 
-D WITH_OPENCV_BIOINSPIRED=OFF 
-D WITH_XFEATURES2D=ON 
-D WITH_OPENCL=OFF 
-D WITH_FACE=ON 
-D ENABLE_PRECOMPILED_HEADERS=ON 
-D WITH_OPENCL=OFF 
-D WITH_OPENCL_SVM=OFF 
-D WITH_OPENCLAMDFFT=OFF 
-D WITH_OPENCLAMDBLAS=OFF 
-D WITH_OPENCV_DNN=OFF 
-D BUILD_OPENCV_APPS=ON 
-D BUILD_EXAMPLES=OFF ..

Keep extra attention that you need to pass the correct path to your opencv_contrib folder...so it's better to pass the full path to avoid making errors...

And yes...that's a pretty long command for a build...and it took me a long time to make it work...as you need to figure out all the parameters...

Once we're done, we need to make it...as cmake will prepare the recipe...

make -j2

If there's any mistake, simply do this...

make clean
make

Then, we can finally install OpenCV by doing this...

sudo make install
sudo ldconfig

To test that it's working properly...simply do this...

python3
>>>import cv2


If you don't have any errors...then we're good to go -;)

That was quite a lot of work...anyway...we need an extra tool to make sure our image get nicely processed...

Download textcleaner and put in the same folder as your Python script...

And...just in case you're wondering...yes...we're going to have Cozmo take a picture...we're going to process it...use SAP Leonardo's OCR API and then have Cozmo read it back to us...cool, huh?
SAP Leonardo's OCR API is still on version 2Alpha1...but regardless of that...it works amazing well -;)

Although keep in mind that if the result is not always pretty accurate that because of the lighting, the position of the image, your handwritting and the fact that the OCR API is still in Alpha...

Ok...so first things first...we need a white board...


And yes...my hand writing is far from being good... -:(

Now, let's jump into the source code...


CozmoOCR.py
import cozmo
from cozmo.util import degrees
import PIL
import cv2
import numpy as np
import os
import requests
import json
import re
import time
import pygame
import _thread

def input_thread(L):
    input()
    L.append(None)

def process_image(image_name):
 image = cv2.imread(image_name)
 
 img = cv2.resize(image, (600, 600))
 img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
 
 blur = cv2.GaussianBlur(img, (5, 5), 0)
 denoise = cv2.fastNlMeansDenoising(blur)
 thresh = cv2.adaptiveThreshold(denoise, 255, 
                 cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
 blur1 = cv2.GaussianBlur(thresh, (5, 5), 0)
 dst = cv2.GaussianBlur(blur1, (5, 5), 0)
 
 cv2.imwrite('imggray.png', dst)
 
 cmd = './textcleaner -g -e normalize -o 12 -t 5 -u imggray.png out.png'
 
 os.system(cmd) 

def ocr():
 url = "https://sandbox.api.sap.com/ml/ocr/ocr"
 
 img_path = "out.png"
 
 files = {'files': open (img_path, 'rb')}
 
 headers = {
     'APIKey': "APIKey",
     'Accept': "application/json",
 }
 
 response = requests.post(url, files=files, headers=headers)
 
 json_response = json.loads(response.text)
 json_text = json_response['predictions'][0]
 json_text = re.sub('\n',' ',json_text)
 json_text = re.sub('3','z',json_text)
 json_text = re.sub('0|O','o',json_text) 
 return json_text

def cozmo_program(robot: cozmo.robot.Robot):
 robot.camera.color_image_enabled = False
 L = []
 _thread.start_new_thread(input_thread, (L,))
 robot.set_head_angle(degrees(20.0)).wait_for_completed()
 while True:
  if L:
   filename = "Message" + ".png"
   pic_filename = filename
   latest_image = robot.world.latest_image.raw_image
   latest_image.convert('L').save(pic_filename)
   robot.say_text("Picture taken!").wait_for_completed()
   process_image(filename)
   message = ocr()
   print(message)
   robot.say_text(message, use_cozmo_voice=True, 
                                       duration_scalar=0.5).wait_for_completed()
   break

pygame.init()
cozmo.run_program(cozmo_program, use_viewer=True, force_viewer_on_top=True)


Let's analyze the code a little bit...

We're going to use threads, as we need to have a window where we can see what Cozmo is looking at and another with Pygame where we can press "Enter" as command to have Cozmo taking a picture.

Basically, when we run the application, Cozmo will move his head and get into picture mode...then, if we press "Enter" (On the terminal screen) it will take a picture and then send it to our OpenCV processing function.

This function will simply grab the image, scale it, make it grayscale, do a GaussianBlur to blur the image and remove the noise and reduce detail. Then we're going to apply a denoising to get rid of dust and fireflies...apply a threshold to separate the white and black pixels, and apply a couple more blurs...

Finally we're to call textcleaner to further remove noise and make the image cleaner...

So, here is the original picture taken by Cozmo...


This is the picture after our OpenCV post-processing...


And finally, this is our image after using textcleaner...

Finally, once we have the image the way we wanted, we can call the OCR API which is pretty straightforward...

To get the API Key, simply go to https://api.sap.com/api/ocr_api/overview and log in...

Once we have the response back from the API, we can do some Regular Expressions cleanup just to make sure some characters doesn't get wrongly recognized...

Finally, we can have Cozmo to read the message out loud -;) And just for demonstration purposes...


Here, I was lucky enough that the lighting and everything was perfectly setup...so it was a pretty clean response...further tests were pretty bad -:( But again...it's important to have good lighting...

Of course...you wan to see a video of the process in action, right? Well...funny enough...my first try was perfect! Even better than this one...but I didn't shoot the video -:( Further tries were pretty crappy until I could get something acceptable...and this is what you're going to watch now...the sun coming through the window didn't helped me...but it's pretty good anyway...


Hope you liked this blog -:)

Greetings,

Blag.
SAP Labs Network.

lunes, 21 de mayo de 2018

The Blagchain



Lately, I have been learning about Blockchain and Ethereum. Two really nice and interesting topics...but as they say...the best way to learn is by doing...so I put myself on working on the Blagchain.

So, what's the Blagchain? Basically, it's a small Blockchain application that picks some things from Blockchain and some things from Ethereum and it was build as an educational thing...in the Blagchain you can get a user, post a product or buy it and everything will be stored in a chain like structure...

Before we jump into the screenshots...let me tell you about the technology I chose for this little project...

There are many technologies out there...so choosing the right one is always a hard thing...half the way you can realize that nope...that was not the smartest decision...some other language can do a better job in less time...or maybe that particular feature is not available and you didn't knew it because you never need it before...

When I started learning about Blockchain and Ethereum...I knew I wanted to build the Blagchain using a web interface...so the first languages that came into my mind were out of the question...basically because they don't provide web interfaces or simply because it would be too painful to build the app using them...also I wanted a language with few dependencies and with easy installation and extension...I wanted an easy but fast language...and then...almost instantly I knew which one I had to use...

Crystal is similar to Ruby but faster...and nicer -;) Also...it has Kemal a Sinatra-like web framework...

When I discovered Crystal I was really impressed by how well it is designed...specially because...it's still on Alpha! How can such a young language can be so good? Beats me...but Crystal is really impressive...

Anyway...let's see how the Blagchain works...

For sure...it's not a dapp...but that's fine because you only use it locally...it uses two web applications that run on different ports...one working as the server and the other working as the client...


You can add a new product...


You can see here that we have our Genesis Block, a new block for the posting of a product (And they are connected via the Previous Hash) and also you can see that any transaction will cost us 0.1 Blagcoin...


Now, we can use another browser to create a new user...


As this user didn't create the product...he/her can buy it...and add a new transaction to the chain...


Money (Blagcoin) goes from one account to the other. The chain grows and everything is recorded...


What if you don't have enough Blagcoin to buy something?


Now...if you like this kind of things...this is how many lines of codes it took me...

Blagchain.cr (Server part) --> 129 lines
BlagchainClient.cr (Client part) --> 125 lines
index.ecr (HTML, Bootstrap and JQuery) --> 219 lines

So not even 500 lines of codes for the whole application...that's pretty cool, huh? -;)

And yes...I know you want to see a little bit of the source code, right? Well...why not -:)

BlagchainClient.cr
post "/sellArticle" do |env|
  user = env.params.body["user"]
  article = env.params.body["article"]
  description = env.params.body["description"]
  price = env.params.body["price"]
  amount = (env.session.float("amount") - 0.1).round(2)
  env.session.float("amount", amount)
  HTTP::Client.post("http://localhost:3000/addTransaction", form: "user=" + user + 
                    "&article=" + article + "&description=" + description + "&price=" + price)
  env.session.bool("flag", true)
  env.redirect "/Blagchain"
end

Greetings,

Blag.
SAP Labs Network.

miércoles, 17 de enero de 2018

Wooden Puzzle - My first Amazon Sumerian Game

If you read my previous blog Amazon Sumerian - First impressions you will know that I wouldn't stop there -;)

I have been able to play a lot with Sumerian and most important...to learn a lot...the tutorials are pretty good so you should read them even if you don't have access to Sumerian yet...

Once thing that I always wanted to do...was to animate my Cozmo model...that I did on Blender...


I tried doing it on Blender (rigging it and doing the animation but it was getting weird as it worked fine on Blender but not on Sumerian...now I know why...but at the time I got frustrated) but failed...so instead I thought on doing it on Sumerian using its tools...

I gotta admit...at first it didn't worked...but then I kept exploring and realized that the Timeline was my friend...and after many testings...I got it working -;)

Here is how it looks like...


So just go to Cozmo and click on the robot to start the animation and then click on him again to restart the animation...

Simple but really cool -:)

After that...I start thinking about doing something else...something more interesting and this time involving some programming...which is actually JavaScript and not NodeJS like I though initially -:(

Anyway...I tried to do that once in Unity and also in Flare3D, but didn't had enough luck...although fair enough...by that time I didn't knew Blender...so I put myself into working on it...


I designed a Wooden Puzzle board using Blender and then imported into Sumerian and applied a Mesh Collision to it...that way...the ball can run around the board and fall down if it gets over a hole...

Here is how it looks like...




To play...simple use the cursor keys to move the board and guide the ball from "Start" to "Finish". Pressing "r" restarts the game.

Here's the link to play it "Wooden Puzzle"...

Was it hard to build? Not really -:) Sumerian is very awesome and pretty powerful...on top of that...the Sumerian team is really nice and they are always more than willing to help...

So far...my Sumerian experience had been nothing but joy...so I can see myself doing more and more projects...

Of course...I'm already working on a couple more -;) Specially one involving using Oculus Rift...but that will take more time for sure....as I need to do a lot of Blender work..

Have you tried Sumerian? Not yet? Why don't you go ahead and request access?

Greetings,

Blag.
Development Culture.

viernes, 22 de diciembre de 2017

Amazon Sumerian - First impressions

For those who know me and for those who doesn't...as I work as a Developer Evangelist...my main job is to learn, explore and evangelize new technologies and programming languages...so of course...AR/VR had been on my plate for quite some time...

I have played with Unity3D and Unreal Engine...and of course I have developed for the Google Glass, Microsoft HoloLens and Oculus Rift...

When the good folks at Amazon announced Amazon Sumerian you can figure out that I completely thrilled -:D

So yesterday, I finally got accepted into the Beta program, so of course I started to follow a couple of tutorials and get to know the tool -;)

Please be advised that I'm starting...so I haven't tried or used everything...I want to go step by step following the tutorials and trying to understand everything in the most positive way...

Have I mentioned that Sumerian runs on your browser? How crazy is that? No installation...just launch up your browser and start building AR/VR experiences...

When you first launch it, you will be presented with the following screen...



Where you can create a new screen or simply use a template.

Sumerian provides many tutorials, and so far I have only made my way through the first 3...


So here's how my TV room looks like...


As you can see...Sumerian is a full blown editor that provides all the tools that you can find on any other editor...plus many things that I believe are brand new and exciting...

Of course, you can preview your work...


As for the TV Room tutorial...the idea is that below the TV Screen, there's an Amazon Echo, so you can press it to change the videos presented on the screen. For this you need to use a State Machine and also create a script that will manage the different videos. For the scripting you need to use NodeJS...which is really nice as is the language that I mainly use when developing application for Alexa...



This is how my TV Room looks like when playing a video on render mode -:)


Before moving on to learn more about Sumerian...I need to say that the navigation system doesn't seem to be too good by now...you can use the mouse buttons, Tab and Shift...but control keys or AWSD doesn't seem to work like you would expect on Unity3D or Unreal Engine...I have forwarded my question to the Sumerian Team on Slack...so I will update this post as soon as I get an answer :)

*UPDATE* By following the "Lights and Camera" tutorial I found out that while the default camera doesn't allow fine grain navigation...the FlyCam does it! -:D All good in the hood -;)

Till next time,

Blag.
Development Culture.