lunes, 21 de mayo de 2012

When SAP HANA met R - First kiss

If you follow my blogs (I hope you do) then you know I really love the R programming language but I also love SAP HANA and in the past I have dealt with integration between those two:

HANA meets R
R meets HANA
Sanitizing data in SAP HANA with R

But...those integrations were not done using the SAP way...which means, they are not supported or endorsed by SAP...

Gladly...as of today, there's an official SAP way to do it!

First, we need to read and follow all the instructions detailed in Get your own SAP HANA DB server on Amazon Web Services by the most awesome Juergen Schmerder. (Believe! It took me less than 10 minutes to get my SAP HANA running on my laptop...really...a piece of cake).

With our SAP HANA instance up and running, we can dedicate ourselves to the funny part...the #R part

Go to your AWS Management Console and under Amazon EC2, launch a new instance...



You have to choose SUSE Linux Enterprise with 32 bit. I tried with 64 bit and it wasn't funny...didn't work and I lost a lot of time...32 bit for the win!

For the installation, you can follow this link SAP HANA Database Development Guide – Integration with R programming language, but at least in my case, I need to deal with a lot of difficulties, that gladly I'm going to write down in this blog, so you don't have to deal with them

First, we need a compiler as we're going to compile #R from it's source.

sudo zypper install gcc gcc-c++ gcc-fortran

Then we need to get and extract the #R source code.

wget http://cran.r-project.org/src/base/R-2/R-2.13.0.tar.gz
tar zxf R-2.13.0.tar.gz && cd R-2.13.0
./configure --enable-R-shlib --with-realine=no --with-x=no
make clean
make
make install

This step really takes a long time...so you better go doing something more productive in the meantime... 
When #R is finally installed, we need to download and install the Rserve package.

wget http://www.rforge.net/Rserve/snapshot/Rserve_0.6-5.tar.gz

Now, we have to log into R and do the installation...

R
install.packages("/PATH_TO_FILE/Rserve.tar.gz", repos = NULL)
library("Rserve") #To test the installation. If there's no output, then it's working fine
q()

If you get an error regarding a personal library...just say "y". Once Rserve is install, we need to create a config file.

vi /etc/Rserv.conf
maxinbuf 10000000
Maxsendbuf 0
remote enable
#Press ESC key
:w
#Press ESC key
:q!

Now, we have to create a user that will run the Rserve so we can connect to it from SAP HANA.

useradd -m login_name
passwd login_name

For some reason Amazon doesn't provide the password for the root user...but we might need it eventually...so just do this...after all, if your user and you're paying for it...

sudo passwd root
#Assign a password

Great, we're are now ready to start our server! (You need to be logged as the new user that we create in a previous step).

R CMD Rserve --RS-port 6311 --no-save --RS-encoding "utf8"

Now...we're ready to move to move to our SAP HANA server and keep configuring

Right click on your system node at the navigator tab
Select Administration
Select on the right hand side the Configuration tab
Select the indexserver.ini
Select the calcengine
 
#Add the following parameters...
 
cer_timeout - 300
cer_rserve_addresses - Our R Amazon server:6311
cer_rserve_maxsendsize - 0

One more thing, and we're ready to roll...go to your AWS Management Console, EC2 and then choose Security Groups. Our R server is going to be assign to "Quicklaunch-1". Just select it and go to Inbound. And add the port "6311".


That's all folks...we're officially ready to go! On SAP HANA, create a table and call it TICKETS_BY_YEAR with the following structure:


Open a SQL Editor and copy the following code...

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110101',4195);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110201',4245);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110301',4971);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110401',4469);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110501',4257);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110601',4973);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110701',4470);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110801',4981);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110901',4530);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20111001',4167);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20111101',4059);
insert into "SYSTEM"."TICKETS_BY_YEAR" values('20111201',1483);

This table is supposed to hold the tickets sales for a given company, during each month of the year 2011. What we want to do here is to determine or predict how are going to our sales on 2012. We're going to use #R for that matter. Create the following script and call it "Predict_Tickets". This script will have two Stored Procedures, called Prediction_Model and Get_Tickets.

CREATE TYPE T_PREDICTION_TICKETS AS TABLE (
PERIOD VARCHAR(8),
TICKETS INTEGER
);
 
DROP PROCEDURE Prediction_Model;
DROP PROCEDURE Get_Tickets;
 
CREATE PROCEDURE Prediction_Model(IN tickets_year TICKETS_BY_YEAR,OUT result T_PREDICTION_TICKETS)
LANGUAGE RLANG AS
BEGIN
period=as.integer(tickets_year$PERIOD)
tickets=as.integer(tickets_year$TICKETS)
var_year=as.integer(substr(period[1],1,4))
var_year=var_year+1
new_period=gsub("^\\d{4}",var_year,period)
next_year=data.frame(year=new_period)
prt.lm<-lm(tickets ~ period)
pred=round(predict(prt.lm,next_year,interval="none"))
result<-data.frame(PERIOD=new_period,TICKETS=pred)
END;
 
CREATE PROCEDURE Get_Tickets()
LANGUAGE SQLSCRIPT AS
BEGIN
Tickets = SELECT * FROM TICKETS_BY_YEAR;
CALL Prediction_Model(:Tickets,T_PREDICTION_TICKETS);
INSERT INTO "TICKETS_BY_YEAR" SELECT * FROM :T_PREDICTION_TICKETS;
END;
 
CALL Get_Tickets();
SELECT * FROM "TICKETS_BY_YEAR";

As you can see, our first Stored Procedure called Prediction_Model, we're using RLANG as the script language...meaning that we're going to embed R code that will go from our SAP HANA to our R Serve and back with the modified information. Prediction_Model is calling the Stored Procedure Get_Tickets, which is doing a select from the table TICKETS_BY_YEAR and then calling Prediction_Model to finally insert the data back into SAP HANA. At the end of our script, we call Get_Tickets and do a select to TICKETS_BY_YEAR to verify that our script worked.


Success! Our SAP HANA and R integration work like a charm! We never left SAP HANA Studio, but our code went to the R Server and back to bring us the modified information...all in just 829 milliseconds...really fast considering that both servers are in the cloud... That's all for now...I will come back with more information on SAP HANA and R as soon as I can...there's still a lot to discover and test.

Greetings,

Blag.