sábado, 9 de marzo de 2013

Getting flexible with SAP HANA


Most of you might not be aware of a feature introduced on SAP HANA SPS5. This new feature is called "Flexible Tables", which means that you can define a table that will grow depending on your needs. Let's see an example...

You define a table with ID, NAME and LAST_NAME. The table works fine, but you realize that you need to also add the PhoneNumber and Address...in a normal situation, you will need to open the definition and add those fields...but using Flexibles Tables, you will need to only add those fields as part of the INSERT query and let SAP HANA do it's magic...

Of course, that scenario is very unlikely to happen, because for a couple of fields, it doesn't make sense...so...where do we use this Flexible Tables? Enterprise Search, were we need to have all products on one table, and this table can have a really big amount of columns...as different products will have different characteristics...

So...for this blog...I really break my head trying to find a simple scenario that could cover Flexible Tables...what I came up with? SAP HANA, Flexible Tables, R and Twitter...

The code is very complex, so I'm not going to explain line by line how it works...but of course, I'm going to give a nice overview...

The Twitter API (I'm using version 1 even when it's deprecated, simply because version 1.1 deals with Authentication and didn't want to spend too much time on that...) allows us to fetch information from Twitter...so in this case I was interested in the Hashtags...the ones that starts with an "#" and are used to identify and organize certain tweets related to an event, technology or famous person. (I'm using only Tweets, not taking Retweets into account)...


Using R, I read the User Timeline to get the most recent 200 tweets from a particular account. This information will be send back to SAP HANA to be stored.



With the 200 tweets, I extract all the Hashtags, summarize them and then save the information both in the final table and in a intermediate table (to be used for the next user). In this intermediate table, I will store a long string with all the Hashtags separated by a comma.

When the next user arrives, all 200 tweets are read, Hashtags extracted, combine with the one saved in the intermediate table, summarized (This is very important because we want to keep track of the previous Hashtags, both the ones that are common to both users and the ones that only exist on the first or second user) and the information will be saved in the final table and the information from the next user will be save (replacing the previous one) as a long string in the intermediate table.

Why I need to this? Simple...let's say that the first user has 3 Hashtags...#SAPHANA, #R and #Python with values 3, 2 and 1. The next user will have 5 Hashtags...#SAPHANA, #SAP, #Ruby, #IPhone and #Android with the values 2, 5, 1, 4 and 3.

When we store the first user we will have:

UserNameSAPHANARPython
First_User321

When we store the second user, we will have...

UserNameSAPHANARPython SAPRubyAndroidIPhone
First_User321 ????
Second_User321 5143

Now...you may wonder...why R has "0" for the next user and SAP has "?" for the first user? Easy...as you can see...as we added more fields (at runtime) the table grow...the R field was already there for the first user so it got a "0" for the next user, however SAP wasn't there before, so we don't actually know what should the value for the first user...so a "?" will be in place...

I'm sure you will have a better picture when you see the images of the table after I show the source code...

First, we need to create a table called "TWITTER_USERS", that will hold the users that we want to work with...


Then, we need another table were we are going to store the Hashtags and its values as a long string. This table will be called "FIRST_HASH".


Now, things get interesting, as we're going to create our Flexible Table using a very simple command...this table will be called "TWITTER_HASHTAGS".

Twitter_Hashtags.sql
CREATE COLUMN TABLE TWITTER_HASHTAGS(
USERNAME NVARCHAR(10)
) WITH SCHEMA FLEXIBILITY;

This table will look pretty regular when watching its definition...but it's a Flexible Table...as you can see...we only defined one field...so this table can grow and grow and grow -;)


Next, we need to create a couple of type tables to allow to interact between SAP HANA and R...


Table_Types.sql
CREATE TYPE T_COL_NAMES AS TABLE(
COL_NAMES NVARCHAR(1000)
);
 
CREATE TYPE T_COL_VALUES AS TABLE(
COL_VALUES NVARCHAR(1000)
);

And now...we're ready to start with the code...one R procedure and two SQLScript procedures...

Get_Hashtags.sql
CREATE PROCEDURE GET_HASHTAGS(IN twittername TWITTER_USERS,IN first_hash FIRST_HASH,
                              OUT out_col_names T_COL_NAMES, OUT out_col_values T_COL_VALUES)
LANGUAGE RLANG AS
BEGIN
UserName = twittername$USERNAME
hashline = first_hash$HASH_LINE
hashvalues = first_hash$HASH_VALUES
 
Get_Twitter<-function(p_source,p_pattern){
  datalines = grep(p_pattern,web_page,value=TRUE)
  getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1)
  g_list = gregexpr(p_pattern,datalines)
  matches = mapply(getexpr,datalines,g_list)
  result = gsub(p_pattern,'\\1',matches)
  names(result) = NULL
  return(result)
}
 
Get_Hashtags<-function(p_source){
  check<-!length(grep('\\"([^,\\"]+)\\"', as.character(p_source)))
  if(!check){
    mypattern = '\\"([^,\\"]+)\\"'
    datalines = grep(mypattern,p_source,value=TRUE)
    getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1)
    g_list = gregexpr(mypattern,datalines)
    matches = mapply(getexpr,datalines,g_list)
    result = gsub(mypattern,'\\1',matches)
    names(result) = NULL
    return(result)
  }else{
    result<-p_source
    return(result)
  }
}
 
url<-paste("http://api.twitter.com/1/statuses/user_timeline.xml?count=200&screen_name=",UserName,sep="")
mypattern = '<text>([^<]*)</text>'
web_page<-readLines(url)
tweets<-Get_Twitter(web_page,mypattern)
mypattern = '[^\\&]#(\\.?\\w+)'
hash_list<-Get_Twitter(tweets,mypattern)
hashtags<-sapply(hash_list,Get_Hashtags)
hashtags<-as.vector(unlist(hashtags))
hashtags<-toupper(hashtags)
 
dt.hashtags<-data.frame(UserName,hashtags)
tab.hashtags<-table(dt.hashtags)
dt.hashtags<-as.data.frame.matrix(tab.hashtags)
hashtags_names<-names(dt.hashtags)
hashtags_names<-gsub("^\\.",'',hashtags_names)
 
if(length(hashline>=1)){
          hash_line<-gsub("^(\\w)+\\,",'',hashline)
          hash_line<-unlist(strsplit(hash_line, split=","))
          hash_values<-gsub("^(\\'+\\w+\\')+\\,",'',hashvalues)
          hash_values<-as.numeric(unlist(strsplit(hash_values, split=",")))
          hash_frame<-data.frame(names=hash_line,values=hash_values)
          hash_frame["values"]<-0
 
          Col_Names<-""
          Col_Values<-""
 
          for(i in 1:length(hashtags_names)){
                      Col_Names<-paste(Col_Names,hashtags_names[i],sep=",")
                      Col_Values<-paste(Col_Values,dt.hashtags[,i],sep=",")
          }
 
           Col_Names<-gsub("^\\,|\\.",'',Col_Names)
          Col_Values<-gsub("^\\,|\\.",'',Col_Values)
          Col_Names<-unlist(strsplit(Col_Names, split=","))
          Col_Values<-as.numeric(unlist(strsplit(Col_Values, split=",")))
          new_hash_frame<-data.frame(names=Col_Names,values=Col_Values)
          new_hash_frame<-rbind(hash_frame,new_hash_frame)
          new_hash_frame<-aggregate(values ~ names, FUN = "sum", data = new_hash_frame)
          new_hash_names<-new_hash_frame$names
          new_hash_values<-new_hash_frame$values
 
          Col_Names<-"USERNAME"
          Col_Values<-paste("'",UserName,"'",sep="")
 
           for(i in 1:length(new_hash_names)){
                      Col_Names<-paste(Col_Names,new_hash_names[i],sep=",")
                      Col_Values<-paste(Col_Values,new_hash_values[i],sep=",")
          }
}else{
          Col_Names<-"USERNAME"
          Col_Values<-paste("'",UserName,"'",sep="")
 
          for(i in 1:length(hashtags_names)){
                      Col_Names<-paste(Col_Names,hashtags_names[i],sep=",")
                      Col_Values<-paste(Col_Values,dt.hashtags[,i],sep=",")
          }
}
 
col_names<-gsub("^\\,\\.?",'',Col_Names)
col_values<-gsub("^\\,",'',Col_Values)
 
out_col_names<-data.frame(COL_NAMES=col_names)
out_col_values<-data.frame(COL_VALUES=col_values)
END;

Save_Hashtags.sql
CREATE PROCEDURE SAVE_HASHTAGS(IN in_col_names T_COL_NAMES, IN in_col_values T_COL_VALUES)
LANGUAGE SQLSCRIPT AS
v_select VARCHAR(2000);
v_col_names_char NVARCHAR(1000);
v_col_values_char NVARCHAR(1000);
CURSOR c_cursor1 FOR
SELECT COL_NAMES FROM :in_col_names;
CURSOR c_cursor2 FOR
SELECT COL_VALUES FROM :in_col_values;
BEGIN
                    OPEN c_cursor1;
                    FETCH c_cursor1 into v_col_names_char;
                    CLOSE c_cursor1;
                    OPEN c_cursor2;
                    FETCH c_cursor2 into v_col_values_char;
                    CLOSE c_cursor2;
                    DELETE FROM FIRST_HASH;
                    INSERT INTO FIRST_HASH VALUES(:v_col_names_char,:v_col_values_char);
                    v_select := 'INSERT INTO TWITTER_HASHTAGS (' || v_col_names_char || ') 
                                VALUES (' || v_col_values_char || ')';
                    EXEC v_select;
END;

Get_Twitter_Users.sql
CREATE PROCEDURE GET_TWITTER_USERS(UserName NVARCHAR(10))
LANGUAGE SQLSCRIPT AS
BEGIN
          Twitter_Users = SELECT USERNAME FROM TWITTER_USERS WHERE USERNAME = :UserName;
          First_Hash = SELECT HASH_LINE, HASH_VALUES FROM FIRST_HASH;
          CALL GET_HASHTAGS(:Twitter_Users,:First_Hash,T_COL_NAMES,T_COL_VALUES);
          CALL SAVE_HASHTAGS(:T_COL_NAMES,:T_COL_VALUES);
END;

In order for this to work, we need to insert some values on our "TWITTER_USERS" table...


And then, simply call the "GET_TWITTER_USERS" procedure...

Call_Get_Twitter_Users.sql
CALL GET_TWITTER_USERS('Blag');
CALL GET_TWITTER_USERS('Schmerdy');
CALL GET_TWITTER_USERS('ggread');

When we execute the first call...that's it with user @Blag we will have the following on the "FIRST_HASH" table...


And this on our "TWITTER_HASHTAGS" Flexible Table...


As you can see...our table started only with USERNAME...but as we pass in the Hashtags and its values...the table grew to able to hold them...

When we call the next user...that's @Schmerdy we will have this on the Flexible Table...


As you can see...in all the Hashtags that belong to @Blag but doesn't belong to @Schmerdy we have a "0" value...so what will happen to the ones that belong to @Schmerdy but not to @Blag?


Those field will have a "?" value, as they didn't exits before we add them...and again...the table grew to hold all the new fields....

Now...something interesting is that @Schmerdy had more Hashtags than @Blag...so what will happen when we call the last user which is @ggread that by the way...has less Hashtags than @Schmerdy and @Blag...


@ggread will have a value "0" in all the Hashtags that doesn't belong to the user...but will have a value in the one that are similar to @Schmerdy...


So...what will happen with the Hashtags that belongs to @ggread but doesn't exist on @Schmerdy or @Blag? Easy...they will be added and some "?" value will be placed where those Hashtags didn't exist before...


I wish I could put the whole table...but it contains more than 50 columns...so better...I can export them to a .CSV file...and do some analysis using Visual Intelligence...


Here, we can see how often these three user have used the Hashtags #SAPHANA, #SAP and #SAPTECHED in their last 200 tweets...

So...that's it...a nice and simple way to demonstrate how the Flexible Tables work in SAP HANA by using my always beloved R -:)

Greetings,

Blag.


SimpleDiagrams - What else can you ask for?

Today, I needed a simple tool to make some diagrams...I download a couple that weren't really good but did the job...until...I discovered SimpleDiagrams...


This amazing tool build with Adobe Air, just made me fell in love after a couple of minutes using it.

It contains many graphics in hand draw style that would make you diagrams better than ever -;)





When you install it, you can use it for free for 7 days or pay the registration which is $25...I of course...bought the license without thinking it twice...for a tool like this...it's a money investment -;)


Greetings,

Blag.

sábado, 2 de marzo de 2013

PowerBuilder and R get together


The other day I was thinking about writing a blog using PowerBuilder, but couldn't decide which one other technology I should integrate it...of course...R came to my mind...

My journey started around 4 days ago...when I start looking for ways to call R from an external language...last time I used Rook and Heroku to call R from SAP Mobile Platform as explained in my blog Consuming R from SAP Mobile Platform, but this time I knew that I needed to do something different.

My first thought was to use Rserve which is an R Server used by the SAP HANA Studio to connect to R as explained in my blog When SAP HANA met R - First kiss, so I downloaded the REngine files which are two .jar Java files.

In order to connect to Rserve we need to set it up...so basically with the Rserve package installed on my RStudio I simply need to create a small file...

Call_Rserve.R
library("Rserve")
Rserve()

When you run this...the R Server will start as a process that can be seen in the Task Manager.

With the files ready, I went to PowerBuilder and simply add them to the Java classpath like I did for the SAP HANA jdbc connector as explained in my blog PowerBuilder - The new kid on Developer Center's block, it didn't work...the files we in the classpath but I couldn't call any method...so I decided to keep looking for alternatives and then I find Rcaller, a single .jar Java file that instead of using Rserve, call the R executable directly...it didn't work either...thing is...in PowerBuilder (with the exception of the SAP HANA jdbc connector) you need to use either an EAServer or EJB capable server like JBoss...I haven't use any of them before...and also you need to build the .jar Java file yourself as you need some internal information.

I had the impression that I was going to a dead end...but then I think..."Hey...I'm using PowerBuilder.NET!" which means obviously that it relies on the .NET Framework...meaning that I could use .NET dll files...but I only had .jar Java files...

Doing some more research I came to find IKVM which an application that allows you to convert any .jar Java file into a nice .NET dll file...the usage is very simple...

Using IKVM
Open CMD
C:\>ikmvc -out:Rcaller.dll -target:module Rcaller.jar

This will produce a .dll file called Rcaller.dll...I thought I was right on track so I include this file in my project references...it didn't work as sadly, IKVM is still on development and not everything from Java has been translated to .NET, like for example the Java.IO interface...I then convert both REngine .jar Java files into an .dll but had the same luck...some conversions were missing...

Back to Google and keep looking...this time...straight for .NET implementations and I found R.NET which also uses the R executable to make the integration...it look very promising...however, R.NET is still in development and for some reason...I couldn't make it work as somehow the .dll file couldn't be found by the .NET engine...I thought...Ok...maybe it's PowerBuilder's fault...let's try straight on a real .NET environment...so I installed Visual Studio Express 2012 for Desktop...same luck...the .dll couldn't be found...

I was tired and angry...nothing seem to work...but then I get back to the Rserve page and realized that it was a .NET project called RserveCLI that used Rserve to do the integration with R....when downloading the project...there's no available .dll but that wasn't a problem...as I simply compile it on VS Express and got a shiny .dll waiting to be tested...so I create a new C# Console project and test it...it worked perfectly...

When I tried to use it on PowerBuilder.NET for the first time...it didn't work...so...as you may imagine...I was even more tired and even more angry...and for some unexplainable reason...I end deleting a single tiny file from the .NET framework that simply screw everything...I couldn't run PowerBuilder or VS Express any more...thought luck...another journey to fix my mess...I uninstalled all the .NET framework references...VC++ compilers...Runtimes...etc...it took me almost a full day to finally have everything back in place...however...up to this day...VS Express died completely...I can't even install it any more...as the installer show me the VS splash screen as then disappears with no visible error message...

Anyway...at least PowerBuilder was working again...and I finally overcome my issues with the RserveCLI dll...

The program that I'm going to show you, is for sure very simple...as due to the nature of the integration...and the fact that RserveCLI is still on development...there's really not much that we can do...but still...I believe it will help to illustrate two interesting points...


  • PowerBuilder.NET can interact with .NET dll files
  • R can be used by a long range of programming languages


This program will ask you to fill two arrays, Array A and Array B...each with four elements (somehow...and for some reason that I'm still trying to understand...when calling R from PowerBuilder, you need to pass even bi-dimensional arrays...being the smallest of [2,2]...)
Let's take a look at the layout...


So...enough talk...let's go to the Source Code...
First, we're going to define a Global Variable as we will use it in a lot of places...

Global Variables
RserveCli.RConnection conn

Then, we going to call the main windows (w_window) open event.

w_window.Open
#if DEFINED PBDOTNET then
@System.Net.IPAddress ipadd
byte ip[4] = {127,0,0,1}
ipadd = create @System.Net.IPAddress(ip)
@System.Int32 port
port = create System.Int32
port = 6311
conn = create RserveCli.RConnection(ipadd,port,"","")
#end if

Here, we are saying that we're going to use .NET styled code, so it should be ignored by the PowerBuilder compiler...as we're using Rserve on our local machine, we use the localhost IP address and assign the Rserve default port which is 6311. We define a connection to the server using the RConnection method. If you wonder why I'm using the @ it's because we want to make sure that we're calling the System from .NET and from PowerBuilder...I know that I said the code should ignored by the PowerBuilder compiler...but better safe than sorry....

We have a button called cb_load which is going to load the arrays defined on the screen.

cb_load.Clicked
double varA[2,2], varB[2,2]
varA[1,1] = double(Array_A_1.Text)
varA[1,2] = double(Array_A_3.Text)
varA[2,1] = double(Array_A_2.Text)
varA[2,2] = double(Array_A_4.Text)
varB[1,1] = double(Array_B_1.Text)
varB[1,2] = double(Array_B_3.Text)
varB[2,1] = double(Array_B_2.Text)
varB[2,2] = double(Array_B_4.Text)
 
#if DEFINED PBDOTNET then
::conn["A"] = RserveCli.Sexp.Make(varA)
::conn["B"] = RserveCli.Sexp.Make(varB)
#end if

Here, we simply create two arrays and fill them. We call ::conn with the extra "::" to let the compiler know that it's a global variable. When we use the Sexp.Make we're telling that we want to create an R vector using the provided array. So one vector called A and other called B.

Now that we have the arrays loaded in memory, we can proceed with the other buttons.

cb_sum.Clicked
#if DEFINED PBDOTNET then
txtResult.Text = ::conn["A+B"].ToString()
#end if

This is very simple, we're just saying...send this "A+B" command to R and give me the result as a String. In R, A+B will produce a factor sum...meaning that instead of creating a new factor containing the values of A and B, it's going to take the first element from A and sum it to the first element of B, it will do the same with the rest of the elements. You will see this more clear later.

cb_multiply.Clicked
#if DEFINED PBDOTNET then
txtResult.Text = ::conn["A*B"].ToString()
#end if

This is basically the same, with the exception that it will multiply instead of sum.

cb_min.Clicked
#if DEFINED PBDOTNET then
RserveCli.Sexp varC = ::conn["min(c(A,B))"]
txtResult.Text = varC.ToString()
#end if

This is a little bit more interesting...we're going to create an Sexp variable and then perform a small operation in R...by doing "c(A,B)" we going to take all the element from B and them to A, so A it's going to contain both values...using "min()" we're to get the smallest value.

cb_max.Clicked
#if DEFINED PBDOTNET then
RserveCli.Sexp varC = ::conn["max(c(A,B))"]
txtResult.Text = varC.ToString()
#end if

Here it's the same story but we're going to get the biggest value instead of the smallest.

cb_mean.Clicked
#if DEFINED PBDOTNET then
RserveCli.Sexp varC = ::conn["mean(c(A,B))"]
txtResult.Text = varC.ToString()
#end if

Same story again...R is really easy to work with...we calculate the mean which will be simply the sum of all elements divided by the number of elements.

cb_summary.Clicked
#if DEFINED PBDOTNET then
RserveCli.Sexp varC = ::conn["summary(c(A,B))"]
string varS = "Min: " + varC[0].ToString() + " / 1st Qu: " + varC[2].ToString() + " / Median: " &
                                + varC[2].ToString() + " / Mean: " + varC[3].ToString() + " / 3rd Qu: " &
                                + varC[4].ToString() + " / Max: " + varC[5].ToString()
txtResult.Text = varS
#end if

Here, we're going to obtain the summary of mixing A and B. The summary is a complex variable that will return us the Min and Max values, the 1st and 3rd Quadrants, the Median and the Mean values. By simply calling it as an array, we can extract all the values...

Finally...and to allow our program to run...we need to call the last piece of code...

wfpapp.Open
open(w_window)

wpfapp is the name of our application (by PowerBuilder default). Here we say, open our main windows called w_window (another PowerBuilder default)...

Let's run the application and see how it works...


We are simply filling the text boxes with some dummy data...Array A will contain 1,2,3,4 and Array B will contain 5,6,7,8.


When we press the Sum button...1 and 5 will be summed, 2 and 6 will be summed, 3 and 7 will be summed and finally, 4 and 8 will be summed.


When we press the Multiply button, the same thing will happen, but the numbers will be multiplied instead of being summed.



You already know but will happen with Minimum, Maximum and Mean...so we move ahead straight to Summary. It will return the Min and Max, the 1st and 3rd Quadrants and the Median and Mean values.
As you can see...this is a very simple example...but believe me...after full 4 days of work...I'm glad that I could finally make it work...and you thought that was job was easy? Think again -;) 4 stressful days and nights...

Greetings,

Blag.