I joined Nylas around four months ago as their first Senior Developer Advocate. It might seem like a short amount of time, but in the startup world, time moves differently.
What is Nylas?
In a nutshell, Nylas is a set of APIs that allows you to easily connect with any email, calendar and contacts provider without having to go through a lengthy and complex configuration process.
In other words, we do the heavy work so you can focus on your business.
When it comes to Email, we offer “One Email API for Every Provider”. This means our Email API is universal and you can connect to various providers without having to write specific code for each. Also, we provide real-time, bi-directional sync, and full CRUD capabilities. If you want to learn more, just go to our Universal Email API page.
When it comes to Calendar, we offer the same capabilities as our Email API, but also conferencing sync, events metadata, and programmatic webhooks. If you want to learn more, just go to our Universal Calendar API page.
When it comes to Contacts, you can create, update or delete contacts in any supported provider. If you want to learn more, just go to our Universal Contacts API page.
With Email, Calendar, and Contacts APIs, we’re just getting started. Check out what we offer on top of these APIs to give developers even more power.
With Neural API, exploring the world of AI and ML is easy, with ready-made models that can be used to clean conversations, extract signatures, perform OCR, and run sentiment analysis. If you want to learn more, just go to our Neural API page.
Nylas Streams is our ETL solution that requires little to no code to transform and consume communications data for E-Commerce, Sales, Fintech and Customer Success. If you want to learn more, just go to our Nylas Streams page.
Scheduler, a full featured scheduler with customizable UI. If you want to learn, just go to Scheduler.
Components, ready made and fully flexible UI/UX widgets ready for immediate use. If you want to learn more, just go to Components.
Why did I join?
The first thing that caught my attention with Nylas was the fact that they provide APIs to make communications easier. Handling Email, Calendars and Contacts gives a lot of space to create amazing applications.
The second thing that attracted me to Nylas was the company culture. Everybody looked committed to each other, as I saw a lot of internal support when the Nylas account posted something on social media, and also my interviews were more like chatting with friends than actual interviews.
The third and probably most important thing for me, was going back to Developer Advocacy, which is something that I’m really passionate about. Being able to share with the community is something that makes me feel good and that makes me a better person and a better developer.
Obviously, those 3 things might not be enough to convince you, so let’s create a small example.
For this I’m going to choose one of my favorite Programming Languages– R. And while I’m not an R professional or expert, I’m very passionate about it, so bear with me, there might be better ways to do this.
What we are going to do is simply read the first three messages in my inbox and print the subjects.
As you can see, using the SDK is easier, because we don’t need to traverse the JSON response or figure out where the element that we want to print is. Also, if something changes internally we can rest assured that the SDK will be updated to reflect any changes while simply calling the API will require some manual work. And the same goes for Calendar, Contacts, and the rest of our offerings. Why should you care?
Well, to begin with, I’m your friendly Developer Advocate so you know I’m going to provide you with constant and interesting content about the Nylas APIs. Also, signing up for a Nylas account is easy, no Credit Card is required and you can get 14 days to try it out. Ready to go? Just go to https://dashboard.nylas.com/register and follow the instructions.
Also, I’m already working on a series of blog posts that will help you to get started, make your first API calls and in overall, get the full Nylas experience.
Don’t think Nylas is for you? Well, recommend us to a friend then. You might know someone who could benefit from having fast, easy, and convenient access to universal communication APIs.
This is my first blog of the year…so I want it to be something really nice and huge -:) You know how much I love the R Programming Language…but I also love other technologies as well…so taking a bunch of them and hooking them up together is what really brings me joy.
Now…you may be wondering about the blog title…”There’s a party at Alexa’s place”…well…I wanted it to describe the blog in a funny way…so let’s see what we’re going to build -;)
Got any idea? Basically…we’re going to use Amazon Alexa as our UI…when we ask a command…we’re going to call a NodeJS Server on Heroku (Which BTW has a PhantomJS client installed)…this NodeJS will call an R Server on Heroku (Using the Rook Server)…and this R Server is going to call HANA Cloud Platform to get some Flights information and generate nice graphics that are going to be returned to the NodeJS Server which is going to call our web browser to display the graphic generated by the R Server…of course…by using PhantomJS were going to read the generated web page on the browser and this will be sent back to Amazon Alexa so she can read out the response…interesting enough for you? I hope -:) I took me more than two weeks to get all this up and running…so you better like it -:P
So…let’s go in some simple steps…
GET A HANA CLOUD PLATFORM ACCOUNT
You should have one already…if not…just go here to create one…
Then…we need to download the HANA Cloud Platform SDK extract it and modify the file tools/neo.sh on line 57…
Instead of this…
javaExe="$JAVA_HOME/bin/$javaCommand"
Use this…
javaExe="$JAVA_HOME"
Why? Well…it will make sense later on…or maybe it will make sense now If you have the SAP HANA Client installed…otherwise download it from here take a note that will need to copy the ngdbc.jar file…
GETTING THE DATA THAT WE'RE GOING TO USE
As always…in almost all my blogs…we’re going to use tables from the Flight model…which of course…doesn’t exist on HANA Cloud Platform…
The easiest way (at least for me) was to access an R/3 server…and simply download the tables as XLS files…convert them into CSV files and upload them into HCP…
And BTW…for some weird reason my R/3 didn’t have American Airlines listen on the SCARR table…so I just added it -;)
Now…if you don’t have access to an R/3 system…then you can download the tables in CSV format from here
CREATE THE R SERVER ON HEROKU
If you don’t have the Heroku Tool Belt installed…then go and grab it…
Steps to install R on Heroku with Graphic Capabilities
mkdkir myproject && cd myproject
mkdir bin
echo “puts ‘OK’ > config.ru
echo “source ‘http://rubygems.org’\n gem ‘rack’” > Gemfile
#Open your project folder and modify the Gemfile to replace the “\n” with an actual break line…
bundle install
git init . && git add . && git commit –m “Init”
heroku apps:create myproject –stack=cedar
git push heroku master
#Copy and paste the content of my installR.sh into the /bin folder of your project
git add . && git commit –am “message” && git push heroku master
heroku ps:scale web=0
installR.sh
#!/bin/bash
function download() {
if [ ! -f "$2" ]; then
echo Downloading $2...
curl $1 -o $2
else
echo Got $2...
fi
}
set -e
r_version="${1:-3.2.3}"
r_version_major=${r_version:0:1}
if [ -z "$r_version" ]; then
echo "USAGE: $0 VERSION"
exit 1
fi
basedir="$( cd -P "$( dirname "$0" )" && pwd )"
# create output directory
vendordir=/app/vendor
mkdir -p $vendordir
# R
download http://cran.r-project.org/src/base/R-$r_version_major/R-$r_version.tar.gz R-$r_version.tar.gz
tar xzf R-$r_version.tar.gz
# build R
echo ============================================================
echo Building R
echo ============================================================
cd $basedir/R-$r_version/
./configure --prefix=$vendordir/R --with-blas --with-lapack --enable-R-shlib --with-readline=no --with-x=yes
make
cd /app/bin
ln -s R-$r_version/bin/R
rm R-3.2.3.tar.gz
rm -rf erb gem irb rake rdoc ri ruby testrb
rm ruby.exe
cd /app/bin/R-$r_version
rm -rf src
rm Make*
rm -rf doc
rm -rf tests
rm README ChangeLog COPYING INSTALL SVN-REVISION VERSION
Now…we need to do a very important step -:) We need to install the totally awesome heroku-buildpack-multi from ddollar.
With this done…we will have all the missing libraries needed to compile R on the new Cedar Stack on Heroku and also…we will have a nicely installed R instance with Graphic capabilities…but of course…we’re not done yet…
Installing the R Libraries
#This will open R on Heroku…
R
#This will install the libraries with their corresponding dependencies
install.packages("Rook",dependencies=TRUE)
install.packages("Cairo",dependencies=TRUE)
install.packages("maps",dependencies=TRUE)
install.packages("forecast",dependencies=TRUE)
install.packages("plotrix",dependencies=TRUE)
install.packages("ggplot2",dependencies=TRUE)
install.packages("ggmap",dependencies=TRUE)
install.packages("rJava",dependencies=TRUE)
install.packages("RJDBC",dependencies=TRUE)
q()
All right…we’re almost there -;) The problem with Heroku is that is not writable…meaning that once you get disconnected…you will lost all your work -:(
So…we need to back it up and sent it somewhere else…I used my R Server on Amazon WebServices for this…
First…we need to compress the bin folder like this…
tar -cvzf bin.tar.gz bin
and then we need to save this file in our external server…
and of course after that we need it on our project folder…so we need to send it from our external server to our project folder, where will simply would need to uncompressed it…
So…let’s take some time to understand what’s going on with this code…we’re going to create a Rook server…which will allow us to host webpages from R…then, we’re going to use our hcp.sh script to get the password for our HANA Cloud Platform bridge…so we can get an JDBC connection to the database…from there we want to get a list of all the airports and also read the airports from a file detailed later (this airports file contains the geolocation of the airports). With this…we want to filter out the airports from HANA with the airports from the flight…so we don’t have any extra data…now…we have three choices…airports, US airports or carriers…the first one will generate a map of the world with all the airports as little red dots…the second one will generate a map of the US with the airports as little red dots but also showing the name of the cities…the last one will generate a geometric histogram with the details of the flights distance according to their carriers…later on…we’re going to read the information of the generated graphic to create a hexadecimal string of the graphic along with some information that Alexa should spell out…easy as cake, huh?
Procfile
web: bundle exec rackup config.ru
We want this R Server to be able to access HANA Cloud Platform…so let’s do that before we keep going…
With the location of Java…apply this command…
heroku config:set JAVA_HOME='/usr/bin/java'
Now…Copy the following files into your project folder…
“tools” folder from the HANA Cloud Platform SDK ngdbc.jar from SAP HANA Client
Also…create this little script which is going to allow us to connect to HCP…
var WebSocketServer = require("ws").Server
, http = require("http")
, express = require("express")
, request = require('request')
, fs = require('fs')
, app = express()
, arr = []
, msg = ""
, port = process.env.PORT || 5000
, childProcess = require('child_process')
, phantomjs = require('phantomjs-prebuilt')
, path = require('path')
, binPath = phantomjs.path;
app.use(express.static(__dirname + "/"))
var server = http.createServer(app)
server.listen(port)
var wss = new WebSocketServer({server: server})
var childArgs = [path.join(__dirname, 'phantom.js')]
var childStats = [path.join(__dirname, 'readphantom.js')]
app.get('/path', function (req, res) {
if(req.query.command == 'map'){
URL = "http://blagrookheroku.herokuapp.com/custom/summarize?airports=xyz&us_airports=&carriers=";
request(URL, function (error, response, body) {
if (!error) {
arr = body.split("/");
msg = "There are " + arr[1] + " airports around the world";
var bitmap = new Buffer(arr[0], 'hex');
var jpeg = new Buffer(bitmap,'base64');
fs.writeFileSync('Graph.jpg', jpeg);
res.redirect('/');
};
});
}else if(req.query.command == 'usmap'){
URL = "http://blagrookheroku.herokuapp.com/custom/summarize?airports=&us_airports=xyz&carriers=";
request(URL, function (error, response, body) {
if (!error) {
arr = body.split("/");
msg = "There are " + arr[1] + " airports in the US";
var bitmap = new Buffer(arr[0], 'hex');
var jpeg = new Buffer(bitmap,'base64');
fs.writeFileSync('Graph.jpg', jpeg);
res.redirect('/');
};
});
}else if(req.query.command == 'carriers'){
URL = "http://blagrookheroku.herokuapp.com/custom/summarize?airports=&us_airports=&carriers=xyz";
request(URL, function (error, response, body) {
if (!error) {
arr = body.split("/");
msg = "" + arr[1];
var bitmap = new Buffer(arr[0], 'hex');
var jpeg = new Buffer(bitmap,'base64');
fs.writeFileSync('Graph.jpg', jpeg);
res.redirect('/');
};
});
}else if(req.query.command == 'stat') {
childProcess.execFile(binPath, childArgs, function(err, stdout, stderr){
if(!err){
res.redirect('/');
};
});
}else if(req.query.command == 'readstat') {
childProcess.execFile(binPath, childStats, function(err, stdout, stderr){
if(!err){
res.write(stdout);
res.end();
};
});
}else if(req.query.command == 'bye'){
if(fs.existsSync('Graph.jpg')){
fs.unlink('Graph.jpg');
}
res.redirect('/');
}
});
wss.on("connection", function(ws) {
var id = setInterval(function() {
fs.readFile('Graph.jpg', function(err, data) {
if(!err){
ws.send(JSON.stringify("Graph.jpg/" + msg), function() { })
}else{
ws.send(JSON.stringify("Gandalf.jpg/No problem...I'm crunching your data..."), function() { })
}
});
}, 3000)
ws.on("close", function() {
clearInterval(id)
})
})
Let’s explain the code for a little bit and believe me…I’m far from being a NodeJS expert…this is really the first time I develop something this complex…and it took me a really long time and tons of research…so please try not to criticize me too much -:(
We’re going to create a express application that uses Web Sockets in order to refresh the browser in order to show the graphics generated by our R Server…it will also call PhantomJS to both create and read the generated web page so we can send it back to Alexa…
Here…we have six choices…map, usmap and carriers…the first three are going to call our R Server passing all parameters but leaving the ones that we don’t need empty…and just passing “xyz” as parameter…
When we got the response from R it’s going to be a long string separated by an “/”…which is going to be the hexadecimal string for the graphic along with the text intended for Alexa…Node will read the graphic…generated it and then refresh the browser in order to show it on the screen…
The stats option will call our PhantomJS script to simply read the page and create a new file with the Javascript part already executed…the readstat will read this information and extract the text that we need for Alexa…finally…bye will delete the graphic and the web socket will call the main graphic to be displayed on the screen.
Finally…the web socket is going to constantly check…every 3 seconds to see if there’s a graphic or not…and the display the related image…
index.html
<html>
<head>
<title>I'm a NodeJS Page!</title>
<div id="container" align="center"/>
<script>
var host = location.origin.replace(/^http/, 'ws')
var ws = new WebSocket(host);
ws.onmessage = function (event) {
var container = document.getElementById('container');
var data = JSON.parse(event.data);
data = data.split("/");
var url = data[0];
var msg = data[1];
container.innerHTML = '<img src="' + url + '"></br><p><b>' + msg + '</b></p>';
};
</script>
</head>
<body>
</body>
</html>
This one is going to be called by our express application and it will simply call the web socket to determine what it needs to display…it has some Javascript…that’s why we need PhantomJS to interact with it…
phantom.js
var page = require('webpage').create();
var fs = require('fs');
page.open('http://blagnodeheroku.herokuapp.com/', function () {
window.setTimeout(function () {
page.evaluate(function(){
});
fs.write('stats.html', page.content, 'w');
phantom.exit();
},4000);
});
Not the best and most describing name…but who cares -:P Anyway…this script will load the page…that is the express application…wait for 4 seconds for the Javascript to get generated and then create a web page called stats.html
readphantom.js
var page = require('webpage').create(),
address = "stats.html";
page.open(address, function (status) {
if (status == 'success') {
var results = page.evaluate(function() {
return document.querySelector('p').innerText.trim();
});
console.log(results);
phantom.exit();
}
});
This script will simply read the stats.html page and return the text that it’s located inside the “p” tag…dead simple…
SETTING UP ALEXA Creating the Lambda function
Now…we need to setup Alexa…so we can control everything via voice commands -:)
First…we need to go to Amazon Lambda and log in if you have an account…otherwise…please create one…and make sure you’re on the West Virginia region…
In the list of functions…look for color…
Choose the NodeJS one…Python has been included as well…but wasn’t when I started to work on this blog -:)
Here, just click next....
I already create the function…but you shouldn’t have a problem...
Basic execution role is more than enough…
This will provide a pop up window…simply press the “Allow” button and then “Create function”…we will include the source code later on…but notice the ARN generated number…because we’re going to need it on the next step…
Creating the Skill
Go to http://developer.amazon.com and log in…then choose Apps & Services --> Alexa --> Alexa Skills Set
Choose Alexa Skills Kit and fill the blanks...
As soon as we hit next an application number will be generated on a new field called Application ID. Grab this number as we’re going to need it for our application code.
The Interaction Model section is very important as here we’re going to define the “Intent Schema” and “Sample Utterances”…the first will define the parameters that we’re going to send to Alexa and the second is how we are going to call our application.
Our variable is going to be called “command” and it’s going to be a LITERAL…other types are NUMBER, DATE, TIME and DURATION. The intent is the method that we’re going to call in our code…
Sample Utterances
GetFlightsIntent airports from {around the world|command}
GetFlightsIntent airports from {united states|command}
GetFlightsIntent flight distance from {carriers|command}
GetFlightsIntent {thank you|command}
The test section help us to say commands and see how Alexa responds…but we’re not going to do that here…we’re going to test it using a real Alexa device -;)
Forget about the Publishing Information section unless you really want to publish your application…
Create a folder call Flights…Alexa_Party…or whatever you fancy…then create a folder call src and copy this file in there…calling it AlexaSkills.js
We’re going to need to install only one library….”request”…
sudo npm install --prefix=~/Flights/src request
This will create a folder called “node_modules” with the package in our project folder…then create a file called “index.js” and copy and paste the following code…
index.js
var request = require("request")
, AlexaSkill = require('./AlexaSkill')
, APP_ID = 'amzn1.echo-sdk-ams.app.8c0bd993-723f-4ab2-80b5-84402a7a59ce';
var error = function (err, response, body) {
console.log('ERROR [%s]', err);
};
var getJsonFromFlights = function(command, callback){
var msg = "";
if(command == "thank you"){
request("http://blagnodeheroku.herokuapp.com/path/?command=bye", function (error, response, body) {
if (!error) {
console.log("Done");
};
});
setTimeout(function() {
callback("thank you");
},2000);
}else if (command == "around the world"){
request("http://blagnodeheroku.herokuapp.com/path/?command=bye", function (error, response, body) {
if (!error) {
request("http://blagnodeheroku.herokuapp.com/path/?command=map", function (error, response, body) {
if (!error) {
request("http://blagnodeheroku.herokuapp.com/path/?command=stat", function (error, response, body) {
if (!error) {
request("http://blagnodeheroku.herokuapp.com/path/?command=readstat", function (error, response, body) {
if (!error) {
msg = body;
};
});
};
});
};
});
}
});
setTimeout(function() {
callback(msg.trim());
},15000);
}else if (command == "united states"){
request("http://blagnodeheroku.herokuapp.com/path/?command=bye", function (error, response, body) {
if (!error) {
request("http://blagnodeheroku.herokuapp.com/path/?command=usmap", function (error, response, body) {
if (!error) {
request("http://blagnodeheroku.herokuapp.com/path/?command=stat", function (error, response, body) {
if (!error) {
request("http://blagnodeheroku.herokuapp.com/path/?command=readstat", function (error, response, body) {
if (!error) {
msg = body;
};
});
};
});
};
});
}
});
setTimeout(function() {
callback(msg.trim());
},15000);
}else if (command == "carriers"){
request("http://blagnodeheroku.herokuapp.com/path/?command=bye", function (error, response, body) {
if (!error) {
request("http://blagnodeheroku.herokuapp.com/path/?command=carriers", function (error, response, body) {
if (!error) {
request("http://blagnodeheroku.herokuapp.com/path/?command=stat", function (error, response, body) {
if (!error) {
request("http://blagnodeheroku.herokuapp.com/path/?command=readstat", function (error, response, body) {
if (!error) {
msg = body;
};
});
};
});
};
});
}
});
setTimeout(function() {
callback(msg.trim());
},15000);
}
};
var handleFlightsRequest = function(intent, session, response){
getJsonFromFlights(intent.slots.command.value, function(data){
if(data != "thank you"){
var text = data;
var reprompt = 'Please say a command?';
response.ask(text, reprompt);
}else{
response.tell("You're welcome");
}
});
};
var Flights = function(){
AlexaSkill.call(this, APP_ID);
};
Flights.prototype = Object.create(AlexaSkill.prototype);
Flights.prototype.constructor = Flights;
Flights.prototype.eventHandlers.onSessionStarted = function(sessionStartedRequest, session){
console.log("onSessionStarted requestId: " + sessionStartedRequest.requestId
+ ", sessionId: " + session.sessionId);
};
Flights.prototype.eventHandlers.onLaunch = function(launchRequest, session, response){
// This is when they launch the skill but don't specify what they want.
var output = 'Welcome to Flights. ' +
'Please, say a command.';
var reprompt = 'Please, say a command?';
response.ask(output, reprompt);
console.log("onLaunch requestId: " + launchRequest.requestId
+ ", sessionId: " + session.sessionId);
};
Flights.prototype.intentHandlers = {
GetFlightsIntent: function(intent, session, response){
handleFlightsRequest(intent, session, response);
},
HelpIntent: function(intent, session, response){
var speechOutput = 'Get the information for airports and flights. ' +
'Please say a command?';
response.ask(speechOutput);
}
};
exports.handler = function(event, context) {
var skill = new Flights();
skill.execute(event, context);
};
Time to explain what I was trying to do here -:P
The handleFlightsRequest method will manage the response that Alexa will spell out for us…and inside this method we can find getJsonFromFlights which will take the command defined in the our Intent Schema. This function will call our NodeJS server for the following commands…”thank you” will simply call the bye command….”around the world” will call the bye, map, stat and readstat commands…”united states” will call the bye, usmap, stat and readstat commands…finally carriers will call the bye, carriers, stat and readstat commands…
After 15 seconds (Yep…I know it’s too much but there are a lot of processes going on) Alexa will get the response message and simply speak it to us -;)
That’s pretty much it…now…I can show some images before we jump into the video…
The first time I heard about R, was about 4 years ago...a couple of week after I joined SAP. At that time I read in one of our internal documents that SAP HANA was going to be able to interact with the R programming Language.
At first, I was totally clueless about R...I had never heard from it before...so I of course start looking for some more information, download R and RStudio and start learning how to use it...
After some time...I posted my first blog talking about R...that was on November 28, 2011...the blog was Dealing with R and HANA...
After that I kept learning and using it whenever it was suitable...and I end up writing my most successful blog on the SAP Community Network...that was on May 21, 2012.
That up to now, has 21,879 views and 62 comments, 16 likes and 32 bookmarks.
R is huge...it really is...there's thousands of packages that solve thousands of issues...so that's when I start reading R related books from my good friends at Packt Publishing...
As I was improving my R skills...I knew that something was missing...I really didn't knew much about Statistics and my Machine Learning skills were pretty dull as well...that's when I read another awesome book...and I think my favorite book on R so far...
If you're wondering how Social Media can be used with R, then please take a look at my blog Getting Flexible with SAP HANA where I use SAP HANA, R, Twitter and Schema Flexibility to analyze hashtags that can be further explored using for example SAP Lumira.
If you are into the Bioinformatics world...which I'm not -:( You should appreciate this book...
If you are an SAP Employee, please follow us on Jam.
At the d-shop we're always looking forward to play with the latest technologies...a while the Nest Thermostat is not really "new", we're in the process of getting one.
For now, we are happy to be able to play a little bit with Nest Home Simulator available on the Google Chrome Store.
In this blog, we're going to use the Nest Home Simulator, the Statistics Programming Language R and the Shiny package, which allows us to create awesome web interfaces to manage data. Also, we're going to need the RJson package.
First step will be to create an account on the Nest Developer Program and the install the Nest Home Simulator.
Wen we log into the Simulator, we are going to be able to create a Thermostat to start playing around with the current and target temperatures...
The second important thing, is that when we are log into the Nest Developer Program we're going to be able to create a client which will access our thermostat and also, we will going to able to generate a unique number to communicate with it.
We need to just copy and paste the URL from the Authorization URL and after accepting the conditions a number will be presented...we need this number for our R code...but keep in mind that once the connection is closed, we will need to generate a new number...
Now, we're ready to add some code to the mix...when working with R, is always better to use RStudio...so copy and paste this code...and remember to replace the code variable with your own number...also you will need to replace the Client ID and Client Secret with the information from your Nest Developer Program Client code...
This code will get 10 readings until it refreshes itself to get 10 new ones. So we can update the Current and Target Temperature values in order to see the Dashboard changing...
Nice...0.01522398 seconds...wait...what? Isn't Spark supposed to be pretty fast? Well...I remembered that I read somewhere that Spark shines with big files...
Well...I prepared a file with 5 columns and 1 million records...let's see how that goes...
48.31641 seconds? Look like Spark was almost twice as fast this time...and this is a pretty simple example...I'm sure that when complexity arises...the gap is even bigger...
And sure...I know that a lot of people can take my plain R code and make it even faster than Spark...but...this is my blog...not theirs -;)
I will come back as soon as I learn more about SparkR -:D
UPDATE
So...I got a couple of comments claiming that read.csv() is too slow...and I should measuring the process not the loading of an csv file...while I don't agree...because everything is included in the process...I did something as simple as moving the start.time after the csv file is done...let's see how much of a change this brings...
SparkR
Around 1 second faster...which means that reading the csv was really efficient...
Plain R
Around 6 seconds faster...read.csv is not that good...but...SparkR is almost 50% faster...
HOLLY CRAP UPDATE! Markus from Spain gave me this code on the comments...I just added a couple of things to make complaint...but...damn...I wish I could code like that in R! -:D Thanks Markus!!!
Yep...as I have said many times before, the LED Number application has become my new Hello World...whenever I learn a new programming language I try to build this, because it compromises several language constructs and makes you learn more by trying to figure out how to build it again...
Yesterday I blog about how to build it using Haskell...and...I'm still learning Haskell...so I thought..."It's cool that I'm using this for my new programming languages...but what about the old ones?"
I suddenly realized that I had never wrote this using R...and that's really sad...I love R -:)
I was excited to read this book, because it's been a while since I read any R book...but...I gotta admit...this is not my kind of book...as I discovered that obviously I had minus one experience in Bioinformatics...
The book is not short but not long either...340 pages...and it's full of recipes...
It starts with a basic introduction to R, which should be appreciated by newbies...but for more season developers that just can be skipped out...
The are chapters dedicated to Sequence Analysis, Protein Structure Analysis and even Machine Learning in Bioinformatics...
Of course there's a lot of new packages that are used for the recipes and well as many interesting graphics...
If you have some knowledge of Bioinformatics...then you should for sure get this book...if you're not...well...you can buy it anyway...even if you don't understand anything...it is still a book about R and it's full of interesting codes...so you might end learning a bunch of new things -;)
The recipes are well explained and the result is always shown...which is good so we can know exactly what to expect...
So...as time goes by, I'm getting more proficient with Julia...which is something fairly easy as the learning curve is pretty fast...
I decided to load a file with 590,209 records that I got from Freebase...the file in question contains Actors and Actresses from movies...you can have a quick look here...
For this test, I'm using my Linux box on VMWare running on 2 GB of RAM...running Ubuntu 12.04.4 (Precise)
For R, I'm not using any special package...just plain R...version 2.14.1 and for Julia version 0.2.1, I'm using the DataFrames package...
Let's take a look at the R source code first along with its runtime processing...
This source will first ask if the file was loaded already, if not...it will load it...then, it will eliminate the repeated records, delete all the null or NA's and the create a new Data Frame, sort it by "Gender" and then write a new CSV file...time will be taken to measure its speed...we will run it twice...first time the file is not loaded...second time it will...and that should improve greatly the execution time...
As we can see...the times are really good...and the different between the first and second run are pretty obvious...for the record...the generated file contains 105874 records...
Here...we're doing the same...we load the DataFrames package (But exclude that from the execution time), check if the file is loaded so we don't load it again on the second run...eliminate duplicates, delete all null or NA, create a new DataFrame, sort it by "Gender" and finally write a new CVS file...
Well...the difference between the second and first run is very significative...but of course...way slower than R...
But...let me tell you one simple thing...Julia is still a brand new language...the DataFrames package is not part of the core Julia language, which means...that its even newer...and optimizations are being performed as we speak...I would say that for a young language...18 seconds to process 590,209 records is pretty awesome...and of course...my R experience surpasses greatly my Julia experience...
So...I don't really want to leave you with the impression that Julia is not good or not fast enough...because believe me...it is...and you going to love my next experiment -;)
So this code is fairly simple...we have a couple of vectors with names and last names...then we loop 100000 times and then generate a couple of random numbers simply to read the vectors, create a full name and populate a new vector... with some random funny name combinations...
Well....the different between both runs is not really good...second time was a little bit higher...and 1 minute is kind of a lot...let's see how Julia behaves...
So this code as well, creates two arrays with names and last names, do a loop 100000 times, generate a couple of random numbers, mix a name with a last name and then populate a new array with some mixed full names...
Just like in the R code...the second time took Julia a little bit more...but...less than a second?! That's something like...amazingly fast and really took R by storm...
Now...I believe you will start to take Julia more seriously -:D