jueves, 2 de febrero de 2012

R meets HANA


If you read my last blog called HANA meets R you will remember that we read data from HANA into R directly, without having to download an .csv file, but using ODBC. This time, we're going to read data from HANA as well, but after do some nice tricks on R, we're going to post back the information into HANA.

Keep in mind, that is not an standard SAP solution. This only relies on a custom R package that can work with ODBC enabled tables, and like any custom packages, there are many limitations...anyway...this should be fixed when SAP released the official R into HANA integration.

In my previous blog Prediction model with HANA and R we create a stored procedure in HANA to populate a table called TICKETS_BY_YEAR, then on R we calculate the prediction for the next year and generate a nice graphic showing both the real data and the prediction. So...of course I'm not going to repeat all that.

This is the R code that we need to use...


library("RODBC")
ch<-odbcConnect("HANA",uid="P075400",pwd="HrCOpPk4")
Flight_Tickets<-sqlFetch(ch,"P075400.TICKETS_BY_YEAR")
period=Flight_Tickets$PERIOD
tickets=Flight_Tickets$TICKETS
var_year=substr(period[1],1,4)
var_year=as.integer(var_year)
var_year=var_year+1
var_year=as.character(var_year)
new_period=gsub("^\\d{4}",var_year,period)
next_year=data.frame(year=new_period,stringsAsFactors=FALSE)
prt.lm=lm(tickets ~ period)
pred=predict(prt.lm,next_year,interval="none")
period=next_year
tickets=pred
PREDICTION_TICKETS<-data.frame(period,tickets)
sqlDrop(ch,"PREDICTION_TICKETS",errors=FALSE)
sqlSave(ch,PREDICTION_TICKETS,rownames="id")
odbcClose(ch)

After we execute this code, we can check on HANA that our new table called PREDICTION_TICKETS was created...


And the data was populated as expected...


You may wonder...which are the limitations? Everything seems to work like a charm? Easy...not a lot, but important limitations...

* We don't have a way to validate if the table exists or not.
* We must delete the table before doing the insert, otherwise is not going to work.
* Even when the date field was called PERIOD, R named it "year" and pass it into HANA.
* We can't specify the type of the fields, nor the lenght
* We are forced to have an additional column with a numeric index, that we can nicely call "Id"...

As I said early...this is just a custom package that allows us to play...this shouldn't be used as a final solution, but as a playground. Enjoy!

Greetings,

Blag.