miércoles, 16 de julio de 2014

Web scrapping with Julia and PhatomJS

As I have been reading some PhantomJS books and I'm always looking to develop something nice using Julia...I thought that integrate them would be an awesome idea -;)

I thought about Twitter and the hashtags...wouldn't it be nice to write a PhantomJS script to webscrape Twitter and get all the hashtags that I have used?

For this particular script...I'm taking the hashtags from the first 5 Twitter pages linked to my profile...

var system = require('system');

var webpage = require('webpage').create();
webpage.viewportSize = { width: 1280, height: 800 };
webpage.scrollPosition = { top: 0, left: 0 };

var userid = system.args[1];
var profileUrl = "http://www.twitter.com/" + userid;

webpage.open(profileUrl, function(status) {
 if (status === 'fail') {
  console.error('webpage did not open successfully');
 var i = 0,
 queryFn = function() {
  return document.body.scrollHeight;
 setInterval(function() {
  top = webpage.evaluate(queryFn);
  webpage.scrollPosition = { top: top + 1, left: 0 };

  if (i >= 5) {
   var twitter = webpage.evaluate(function () {
    var twitter = [];
    forEach = Array.prototype.forEach;
    var tweets = document.querySelectorAll('[data-query-source="hashtag_click"]');
    forEach.call(tweets, function(el) {
    return twitter;

   twitter.forEach(function(t) {

}, 3000);

If we run this...we're going to have this output...

Now...what I want to do with this information...is to send it to Julia...and get the most used hashtags...so I will summarize them and then get rid of the ones that only appear once...

Let's see the Julia code...

tweets = readall(`phantomjs Hashtags.js Blag`)
tweets = split(tweets,"\n")
hashtags = Dict()
for hash in tweets
  hashtags[hash] += 1
 catch e
  hashtags[hash] = 1


for (k,v) in hashtags
 println("$k has been mentioned $v times")

When we run this code...we're going to have this output...

I still don't know how to sort Dicts in Julia...so bear with me -:)

Anyway...by looking at the output...we can have my top 3 hashtags -;)

#LeapMotion ==> 14 times
#Flare3D ==> 11 times
#DevHangout ==> 8 times

Hope you like this and see you next time -:)


Development Culture.

PhantomJS Cookbook - Book Review

After reading and blogging about Getting Started with PhantomJS I decided to bought another PhantomJS book...of course -;)

This time...I got PhantomJS Cookbook...

This book is fairly big with 304 pages...and it's really really nice...

Of course...being a "Cookbook"...there's always good and bad things...the bad things are that some of the recipes are boring or too obvious (specially if you already know something about PhantomJS)...and dont' get me wrong on this...I just believe that if you're going to read a "Cookbook" it's because you already have some knowledge and don't really need to be instructed on how to do the most simple tasks...but...as always...that's just me...for newbies...those recipes must be really good...

The good things about this book...is that the recipes that are good...are really good...and the recipes that are awesome...are totally awesome! I was really excited and blown away by some of them...

The book covers some CasperJS, Jasmine, Jenkins and more...much more...

Thanks to this book I discovered what HAR is...and how can PhantomJS can help you with it...

Wanna see some Jenkins? Here it is -:)

I gotta say...if you already bought "Getting Started with PhantomJS"...then you need to buy this "Cookbook"...really...it's really cool -;)

Taking screenshots of a webpage using different sizes is also a very nice script...but of course I'm not going to fill this post with images...because after all...PhantomJS is a headless browser, right? -;)


Development Culture.

lunes, 14 de julio de 2014

Getting Started with PhantomJS - Book Review

A couple of days ago I start reading Getting Started with PhantomJS from Packt Publishing. I had heard about PhantomJS in the past but never really use it...so I was really excited about reading the book...

Like all the Getting Started books, this one is kind of short...with 140 pages...but let me tell you...that's more than enough to keep your attention and make a PhatomJS advocate -;)

The book starts with a little introduction and then jumps straight into the code examples...which is something that I always appreciate -:P

The first important example is based on Pinterest but I don't use/like Pinterest...so I change the example a little bit to use Twitter instead -;)

Here's the source code in case you're interested...

var system = require('system');
var userid = system.args[1];
var page = require('webpage').create();

var profileUrl = "http://www.twitter.com/" + userid;
page.open(profileUrl, function(status) {
 if ( status === "success" ) {
  var twitter = page.evaluate(function (uid) {
  var username = document.querySelector('[href="/' + uid + '"]').innerText.trim();
  var numTweets = document.querySelector('[data-nav="tweets"]');
  numTweets = numTweets.attributes[1].value;
  var numFollowing = document.querySelector('[href="/' + uid + '/following"]');
  numFollowing = numFollowing.querySelector('[class="ProfileNav-value"]').innerText;  
  var numFollowers = document.querySelector('[href="/' + uid + '/followers"]');
  numFollowers = numFollowers.querySelector('[class="ProfileNav-value"]').innerText;    
  return {
   name: username,
   tweets: numTweets,
   following: numFollowing,
   followers: numFollowers
}, userid);

console.log(twitter.name + ' (' + userid + ')' + ' has wrote ' + twitter.tweets + ' and has ' +
      twitter.followers + ' followers and is following ' +
      twitter.following + ' accounts ');


And here's the output...

Cool, huh? For a very first example...I think it's impressive -;)

The book also comes with example on taking webpages screenshots, loading performance, modification of the DOM, working with files and more...

It includes even a small introduction to CasperJS, the perfect companion to PhantomJS.

I would say...just go ahead and buy this book...I totally love it and will read it again just to discover and learn more....PhantomJS is just amazing!


Development Culture.

jueves, 10 de julio de 2014

LED - My first Julia package

So yesterday I was thinking about Julia and how easy people claim package development is...of course...I need to give it a try...

I wanted to start small and simple...so I build something useless mostly for fun and learning...

The LED Package simply writes an LED representation of any given number...

julia> Using LED

julia > ShowLED(12345)

   _  _       _  
|  _| _| |_| |_  
| |_  _|   |  _| 

As simple as that...and it took me no more than 5 minutes to get it done...

So, I can confirm now that package development in Julia...is a piece of cake -:)

If everything was done nicely...you should be able to do...

julia > Pkg.add("LED")

otherwise...please do...

julia > Pkg.clone("git@github.com:atejada/LED.jl.git")

Of course...this was just an experiment...so of course I'm planning to put my mind into the work and come up with some nice and useful packages -;)


Development Culture.

jueves, 3 de julio de 2014

10 years of Pack Publishing!

My good friends at Packt Publishing are turning 10 years old and they are celebrating it with an awesome promotion!

All ebooks and videos for just $10 dollars each...hurry up! This wonderful offer end on July 5th...so go ahead and grab as many as you can -:D


Development Culture.

viernes, 16 de mayo de 2014

Julia versus R - Playing around

So...as time goes by, I'm getting more proficient with Julia...which is something fairly easy as the learning curve is pretty fast...

I decided to load a file with 590,209 records that I got from Freebase...the file in question contains Actors and Actresses from movies...you can have a quick look here...

For this test, I'm using my Linux box on VMWare running on 2 GB of RAM...running Ubuntu 12.04.4 (Precise)

For R, I'm not using any special package...just plain R...version 2.14.1 and for Julia version 0.2.1, I'm using the DataFrames package...

Let's take a look at the R source code first along with its runtime processing...

start.time <- Sys.time()
Actors<-read.csv("Actors_Table.csv", header=TRUE, 
                     stringsAsFactors=FALSE, colClasses="character", na.strings = "")
end.time <- Sys.time()
time.taken <- end.time - start.time

This source will first ask if the file was loaded already, if not...it will load it...then, it will eliminate the repeated records, delete all the null or NA's and the create a new Data Frame, sort it by "Gender" and then write a new CSV file...time will be taken to measure its speed...we will run it twice...first time the file is not loaded...second time it will...and that should improve greatly the execution time...

As we can see...the times are really good...and the different between the first and second run are pretty obvious...for the record...the generated file contains 105874 records...

Now...let's see the Julia version of the code...

using DataFrames
start = time()
isdefined(:Actors) || (Actors = readtable("Actors_Table.csv", header=true, nastrings=["","NA"]))
Actor_Info = DataFrame(Actor_Id=Actors["Actor_Id"],Name=Actors["Name"],Gender=Actors["Gender"])
sortby!(Actor_Info, [:Gender])
writetable("Actor_Info_Julia.csv", Actor_Info)
finish = time()
println("Time: ", finish-start)

Here...we're doing the same...we load the DataFrames package (But exclude that from the execution time), check if the file is loaded so we don't load it again on the second run...eliminate duplicates, delete all null or NA, create a new DataFrame, sort it by "Gender" and finally write a new CVS file...

Well...the difference between the second and first run is very significative...but of course...way slower than R...

But...let me tell you one simple thing...Julia is still a brand new language...the DataFrames package is not part of the core Julia language, which means...that its even newer...and optimizations are being performed as we speak...I would say that for a young language...18 seconds to process 590,209 records is pretty awesome...and of course...my R experience surpasses greatly my Julia experience...

So...I don't really want to leave you with the impression that Julia is not good or not fast enough...because believe me...it is...and you going to love my next experiment -;)

Let's take a look at the R source code first...

start.time <- Sys.time()
         "Danielle","Rocky","Julien","Uwe","Myles","Mike", "Steven")

for(i in 1:100000){
  name<-sample(1:15, 1)
  last_name<-sample(1:15, 1)
  full_name<-paste(names[name],last_names[last_name],sep=" ")
end.time <- Sys.time()
time.taken <- end.time - start.time

So this code is fairly simple...we have a couple of vectors with names and last names...then we loop 100000 times and then generate a couple of random numbers simply to read the vectors, create a full name and populate a new vector... with some random funny name combinations...

Well....the different between both runs is not really good...second time was a little bit higher...and 1 minute is kind of a lot...let's see how Julia behaves...

Here's the Julia source code...

start = time()
       "Danielle","Rocky","Julien","Uwe","Myles","Mike", "Steven"]
full_name = ""
for i = 1:100000
        full_name = names[name] * " " * last_names[last_name]
finish = time()
println("Time: ", finish-start)

So this code as well, creates two arrays with names and last names, do a loop 100000 times, generate a couple of random numbers, mix a name with a last name and then populate a new array with some mixed full names...

Just like in the R code...the second time took Julia a little bit more...but...less than a second?! That's something like...amazingly fast and really took R by storm...

Now...I believe you will start to take Julia more seriously -:D

Hope you liked this blog...


Development Culture.

jueves, 15 de mayo de 2014

Social Media Mining with R - Book review

I was really excited when my friend from Packt Publishing send me this book...as I haven't read any R book in a while...but don't get me wrong...the book is not bad...it's just that I expected a little bit more...let me explain a little bit...

This book is not too big, which is something I appreciate...it's 122 pages...it comes with a short introduction to R which is good for newbies and then it goes straight to Social Media Mining using Twitter.

The problem I had with this book...and that's maybe not really a bad thing...it has more Social Media Mining explanation than actual code...so sure, it does a great job explaining how Social Media Mining works but for a die hard developer like me...the source code is more important...

To be honest with you...I would bought this book if I haven't got any Social Media Mining experience....but I have worked and made several applications using R and Twitter in the past...so...this book wasn't really for me...


Development Culture.

martes, 13 de mayo de 2014

My first post on Julia

So...what Julia? Just another nice programming language -;)

According to it's creators...

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.

I just started learning it a couple of days ago...and I must say that I really like it...it has a Python like syntax so I felt comfortable from the very start...

Of course...it's kind of a brand new language, so things are being added and fixed while we speak...but the community is growing and I'm glad to be amongst it's "early" supporters -:)

What I did right after I read the documentation and watch a couple of videos was to simply port one my old Python applications to Julia...the app was "LCD Numbers" which ask for a number and return it printed like in LCD format...

This is the Python code...

global line1, line2, line3

line1 = ""
line2 = ""
line3 = ""

zero = {1: ' _  ', 2: '| | ', 3: '|_| '}
one = {1: '  ', 2: '| ', 3: '| '}
two = {1: ' _  ', 2: ' _| ', 3: '|_  '}
three = {1: '_  ', 2: '_| ', 3: '_| '}
four = {1: '    ', 2: '|_| ', 3: '  | '}
five = {1: ' _  ', 2: '|_  ', 3: ' _| '}
six = {1: ' _  ', 2: '|_  ', 3: '|_| '}
seven = {1: '_   ', 2: ' |  ', 3: ' |  '}
eight = {1: ' _  ', 2: '|_| ', 3: '|_| '}
nine = {1: ' _  ', 2: '|_| ', 3: ' _| '}

num_lines = {0: zero, 1: one, 2: two, 3: three, 4: four,
             5: five, 6: six, 7: seven, 8: eight, 9: nine}

def Lines(number):
    global line1, line2, line3
    line1 += number.get(1, 0)
    line2 += number.get(2, 0)
    line3 += number.get(3, 0)

number = str(input("\nEnter a number: "))
length = len(number)
for i in range(0, length):
    Lines(num_lines.get(int(number[i:i+1]), 0))

print ("\n")
print line1
print line2
print line3
print ("\n") 
And this is in turn...the Julia version of it...

zero = [1=> " _  ", 2=> "| | ", 3=> "|_| "]
one = [1=> "  ", 2=> "| ", 3=> "| "]
two = [1=> " _  ", 2=> " _| ", 3=> "|_  "]
three = [1=> "_  ", 2=> "_| ", 3=> "_| "]
four = [1=> "    ", 2=> "|_| ", 3=> "  | "]
five = [1=> " _  ", 2=> "|_  ", 3=> " _| "]
six = [1=> " _  ", 2=> "|_  ", 3=> "|_| "]
seven = [1=> "_   ", 2=> " |  ", 3=> " |  "]
eight = [1=> " _  ", 2=> "|_| ", 3=> "|_| "]
nine = [1=> " _  ", 2=> "|_| ", 3=> " _| "]

num_lines = [0=> zero, 1=> one, 2=> two, 3=> three, 4=> four,
             5=> five, 6=> six, 7=> seven, 8=> eight, 9=> nine]

line = ""; line1 = ""; line2 = ""; line3 = ""

function Lines(number, line1, line2, line3)
    line1 *= number[1]
    line2 *= number[2]
    line3 *= number[3]
    line1, line2, line3

println("Enter a number: "); number = chomp(readline(STDIN))
len = length(number)
for i in [1:len]
    line = Lines(num_lines[parseint(string(number[i]))],line1,line2,line3)
    line1 = line[1]; line2 = line[2]; line3 = line[3]

println(line3 * "\n")

As you can see...the code looks somehow similar...but of course...I got rid of those ugly global variables...and used some of the neat Julia features, like multiple value return and variable definition on one line... If you want to see the output...here it is...

Of course...this is just a test...things are going to become interesting when I port some R code into Julia and run some speed comparisons -;)


Development Culture.

miércoles, 7 de mayo de 2014

Game Development with Three.js - Book Review

Last week I wrote a review about Learning Three.js: The JavaScript 3D Library for WebGL - Book Review and I said that I was going to read another Three.js book and write a review about it...well...here it is -;)

Well...the book is not so big...just 118 pages...but that's fine...the other book was too big...

It starts with a nice introduction to Three.js, so you can get comfortable with it...comes with some nice examples and even a First Person Shooting game...which sadly...it's not deeply explained so you need to download the source code and try to make sense of it...

To be honest...and maybe I'm becoming grumpy...I wouldn't buy this book if I haven't bought it already...the examples are good enough to give you a sense of what you can do with Three.js but not really good to actually teach you how to do real games...what I mean is...I would have preferred to have several small games showcasing the capabilities and techniques that can be used...instead of an already made FPS game....

I'm not sure if there's more Three.js books out there...but...I don't think I'm going to go any further learning it...no matter how good and awesome it is...without a book like I want...I don't see much of a point on spending a lot of time learning it...and sure...maybe it's because I'm not really a JavaScript guy...but that's just me...


Development Culture.

miércoles, 30 de abril de 2014

Learning Three.js: The JavaScript 3D Library for WebGL - Book Review

This week I have been reading Learning Three.js: The JavaScript 3D Library for WebGL a huge and amazing book...402 pages...and that's where I have a problem with it...

Let's be clear...the book for me...it's a Three.js bible...I believe everything single command is included...and that can be overwhelming if you're trying to learn Three.js...I would honestly would prefer a book with some real life examples instead of one that gives everything that can be done...as that for sure...comes next...

Anyway...if didn't have it...I would buy it again...it's one of the best references for a language that I have ever read...and it has tons of examples...even a Github account -:) Learning-Threejs

Three.js is very and extremely powerful...and runs on a browser! So for me it's a must learn...but of course...I'm going to look for other books on the subject before I read this one again...

Sorry if this is a short review...but...I'm still overwhelmed by it -:(


Development Culture.