Blag's bag of rants: junio 2015

lunes, 29 de junio de 2015

Exploring SparkR

A colleague from work, asked me to investigate about Spark and R. So the most obvious thing to was to investigate about SparkR -;)

I installed Scala, Hadoop, Spark and SparkR...not sure Hadoop is needed for this...but I wanted to have the full picture -:)

Anyway...I came across a piece of code that reads lines from a file and count how many lines have a "a" and how many lines have a "b"...

For this code I used the lyrics of Girls Not Grey by AFI...

SparkR.R
library(SparkR) start.time <- Sys.time() sc <- sparkR.init(master="local") logFile <- "/home/blag/R_Codes/Girls_Not_Grey" logData <- SparkR:::textFile(sc, logFile) numAs <- count(SparkR:::filterRDD(logData, function(s) { grepl("a", s) })) numBs <- count(SparkR:::filterRDD(logData, function(s) { grepl("b", s) })) paste("Lines with a: ", numAs, ", Lines with b: ", numBs, sep="") end.time <- Sys.time() time.taken <- end.time - start.time time.taken

SparkR.R

library(SparkR)

start.time <- Sys.time()
sc <- sparkR.init(master="local")
logFile <- "/home/blag/R_Codes/Girls_Not_Grey"
logData <- SparkR:::textFile(sc, logFile)
numAs <- count(SparkR:::filterRDD(logData, function(s) { grepl("a", s) }))
numBs <- count(SparkR:::filterRDD(logData, function(s) { grepl("b", s) }))
paste("Lines with a: ", numAs, ", Lines with b: ", numBs, sep="")
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

0.3167355 seconds...pretty fast...I wonder how regular R will behave?

PlainR.R
library("stringr") start.time <- Sys.time() logFile <- "/home/blag/R_Codes/Girls_Not_Grey" logfile<-read.table(logFile,header = F, fill = T) logfile<-apply(logfile[,], 1, function(x) paste(x, collapse=" ")) df<-data.frame(lines=logfile) a<-sum(apply(df,1,function(x) grepl("a",x))) b<-sum(apply(df,1,function(x) grepl("b",x))) paste("Lines with a: ", a, ", Lines with b: ", b, sep="") end.time <- Sys.time() time.taken <- end.time - start.time time.taken

PlainR.R

library("stringr")

start.time <- Sys.time()
logFile <- "/home/blag/R_Codes/Girls_Not_Grey"
logfile<-read.table(logFile,header = F, fill = T)
logfile<-apply(logfile[,], 1, function(x) paste(x, collapse=" "))
df<-data.frame(lines=logfile)
a<-sum(apply(df,1,function(x) grepl("a",x)))
b<-sum(apply(df,1,function(x) grepl("b",x)))
paste("Lines with a: ", a, ", Lines with b: ", b, sep="")
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Nice...0.01522398 seconds...wait...what? Isn't Spark supposed to be pretty fast? Well...I remembered that I read somewhere that Spark shines with big files...

Well...I prepared a file with 5 columns and 1 million records...let's see how that goes...

SparkR.R
library(SparkR) start.time <- Sys.time() sc <- sparkR.init(master="local") logFile <- "/home/blag/R_Codes/Doc_Header.csv" logData <- SparkR:::textFile(sc, logFile) numAs <- count(SparkR:::filterRDD(logData, function(s) { grepl("a", s) })) numBs <- count(SparkR:::filterRDD(logData, function(s) { grepl("b", s) })) paste("Lines with a: ", numAs, ", Lines with b: ", numBs, sep="") end.time <- Sys.time() time.taken <- end.time - start.time time.taken

SparkR.R

library(SparkR)

start.time <- Sys.time()
sc <- sparkR.init(master="local")
logFile <- "/home/blag/R_Codes/Doc_Header.csv"
logData <- SparkR:::textFile(sc, logFile)
numAs <- count(SparkR:::filterRDD(logData, function(s) { grepl("a", s) }))
numBs <- count(SparkR:::filterRDD(logData, function(s) { grepl("b", s) }))
paste("Lines with a: ", numAs, ", Lines with b: ", numBs, sep="")
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

26.45734 seconds for a million records? Nice job -:) Let's see if plain R wins again...

PlainR.R
library("stringr") start.time <- Sys.time() logFile <- "/home/blag/R_Codes/Doc_Header.csv" logfile<-read.csv(logFile,header = F) logfile<-apply(logfile[,], 1, function(x) paste(x, collapse=" ")) df<-data.frame(lines=logfile) a<-sum(apply(df,1,function(x) grepl("a",x))) b<-sum(apply(df,1,function(x) grepl("b",x))) paste("Lines with a: ", a, ", Lines with b: ", b, sep="") end.time <- Sys.time() time.taken <- end.time - start.time time.taken

PlainR.R

library("stringr")

start.time <- Sys.time()
logFile <- "/home/blag/R_Codes/Doc_Header.csv"
logfile<-read.csv(logFile,header = F)
logfile<-apply(logfile[,], 1, function(x) paste(x, collapse=" "))
df<-data.frame(lines=logfile)
a<-sum(apply(df,1,function(x) grepl("a",x)))
b<-sum(apply(df,1,function(x) grepl("b",x)))
paste("Lines with a: ", a, ", Lines with b: ", b, sep="")
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

48.31641 seconds? Look like Spark was almost twice as fast this time...and this is a pretty simple example...I'm sure that when complexity arises...the gap is even bigger...

And sure...I know that a lot of people can take my plain R code and make it even faster than Spark...but...this is my blog...not theirs -;)

I will come back as soon as I learn more about SparkR -:D

UPDATE

So...I got a couple of comments claiming that read.csv() is too slow...and I should measuring the process not the loading of an csv file...while I don't agree...because everything is included in the process...I did something as simple as moving the start.time after the csv file is done...let's see how much of a change this brings...

SparkR

Around 1 second faster...which means that reading the csv was really efficient...

Plain R

Around 6 seconds faster...read.csv is not that good...but...SparkR is almost 50% faster...

HOLLY CRAP UPDATE!

Markus from Spain gave me this code on the comments...I just added a couple of things to make complaint...but...damn...I wish I could code like that in R! -:D Thanks Markus!!!

Markus's code
logFile <- "/home/blag/R_Codes/Doc_Header.csv" lines <- readLines(logFile) start.time <- Sys.time() a<-sum(grepl("a", lines, fixed=TRUE)) b<-sum(grepl("b", lines, fixed=TRUE)) paste("Lines with a: ", a, ", Lines with b: ", b, sep="") end.time <- Sys.time() time.taken <- end.time - start.time time.taken

Markus's code

logFile <- "/home/blag/R_Codes/Doc_Header.csv"
lines <- readLines(logFile)
start.time <- Sys.time()
a<-sum(grepl("a", lines, fixed=TRUE))
b<-sum(grepl("b", lines, fixed=TRUE))
paste("Lines with a: ", a, ", Lines with b: ", b, sep="")
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Simply...superb! -:)

Greetings,

Blag.

Development Culture.

jueves, 18 de junio de 2015

LED is my new Hello World - Lua Time

Getting on with my tradition of building an LED application for each and every new programming language that I learn...it's time for Lua -;)

LedNumbers.lua
local function split(s,delim) local result = {} for match in (s..delim):gmatch("(.-)"..delim) do table.insert(result,match) end return result end leds = {[0] = " _ ,\| \| ,\|_\| ", [1] = " ,\| ,\| ", [2] = " _ , _\| ,\|_ ", [3] = "_ ,_\| ,_\| ", [4] = " ,\|_\| , \| ", [5] = " _ ,\|_ , _\| ", [6] = " _ ,\|_ ,\|_\| ", [7] = "_ , \| , \| ", [8] = " _ ,\|_\| ,\|_\| ", [9] = " _ ,\|_\| , _\| "} io.write("Enter a number: ") num = io.read() for i = 1,3 do for j = 1, #num do line=split(leds[tonumber(string.sub(num,j,j))],",") io.write(line[i]) end print("") end

LedNumbers.lua

local function split(s,delim)
 local result = {}
 for match in (s..delim):gmatch("(.-)"..delim) do
  table.insert(result,match)
 end
 return result
end

leds = {[0] = " _  ,| | ,|_| ",
        [1] = "  ,| ,| ",
        [2] = " _  , _| ,|_  ",
        [3] = "_  ,_| ,_| ",
        [4] = "    ,|_| ,  | ",
        [5] = " _  ,|_  , _| ",
        [6] = " _  ,|_  ,|_| ",
        [7] = "_   , |  , |  ",
        [8] = " _  ,|_| ,|_| ",
        [9] = " _  ,|_| , _| "}

io.write("Enter a number: ")
num = io.read()
for i = 1,3 do
 for j = 1, #num do
  line=split(leds[tonumber(string.sub(num,j,j))],",")
  io.write(line[i])
 end
 print("")
end

What you can see right away...and that's something that really surprised me...is that Lua doesn't provide a "split" or "explode" command out of the box, so you need to make it yourself...actually...the same holds true for Haskell...but for me...functional languages are on another league...and anyway...Haskell's implementation of the split function is way shorter...

Here's the result...

Greetings,

Blag.
Development Culture.

viernes, 12 de junio de 2015

My first post on Lua

Lua is a programming language that always intrigued me but that I never spend time on trying to learn...

This has changed of course -;) As I have start reading Beginning Lua Programming...

So...what is Lua exactly? Well...it's is a powerful, fast, lightweight, embeddable scripting language.

A simple description for a simple language...and nope...I'm not implying simple as "useless" but rather as simple on its learning curve and its design...Lua is indeed pretty powerful and you will be more than surprised if you know where its being used...just search and get amazed -;)

The syntax pretty much reminds me of Python or Julia...so it's fairly easy to get used to it...

As an example...here is my Fibonacci numbers example...

Fibonacci.lua
function fib(num,a,b) local result = "" if a > 0 and num > 1 then result = result .. (a+b) .. " " .. fib(num-1,a+b,a) elseif a == 0 then result = a .. " " .. b .. " " .. (a+b) .. " " .. fib(num-1,a+b,b) end return result end io.write("Enter a number: ") num = tonumber(io.read()) print(fib(num,0,1))

Fibonacci.lua

function fib(num,a,b)
 local result = ""
 if a > 0 and num > 1 then
  result = result .. (a+b) .. " " .. fib(num-1,a+b,a)
 elseif a == 0 then
  result = a .. " " .. b .. " " .. (a+b) .. " " .. fib(num-1,a+b,b)
 end
 return result
end

io.write("Enter a number: ")
num = tonumber(io.read())
print(fib(num,0,1))

Here's a couple of tests -:)

Greetings,

Blag.
Development Culture.

LED is my new Hello World - OpenEuphoria Time

I started using OpenEuphoria a long time ago...when it was called RapidEuphoria...but...I stopped using as I started learning other languages...

Yesterday...all of a sudden...I realized that it is indeed a really cool language, so I wanted to refresh myself and what better than building my LED example one more time -;)

So...here it is -:)

LedNumbers.ex
include get.e include std/map.e include std/convert.e include std/sequence.e sequence num sequence snum object onum atom anum map leds = new() put(leds, 0, {" _ ","\| \| ","\|_\| "}) put(leds, 1, {" ","\| ","\| "}) put(leds, 2, {" _ "," _\| ","\|_ "}) put(leds, 3, {"_ ","_\| ","_\| "}) put(leds, 4, {" ","\|_\| "," \| "}) put(leds, 5, {" _ ","\|_ "," _\| "}) put(leds, 6, {" _ ","\|_ ","\|_\| "}) put(leds, 7, {"_ "," \| "," \| "}) put(leds, 8, {" _ ","\|_\| ","\|_\| "}) put(leds, 9, {" _ ","\|_\| "," _\| "}) num = prompt_string("Enter a number: ") snum = breakup(num,1) for i = 1 to 3 do for j = 1 to length(num) do anum = to_number(snum[j]) onum = map:get(leds,anum) puts(1,onum[i]) end for puts(1,"\n") end for puts(1,"\n")

LedNumbers.ex

include get.e
include std/map.e
include std/convert.e
include std/sequence.e

sequence num
sequence snum
object onum
atom anum
map leds = new()
   put(leds, 0, {" _  ","| | ","|_| "})
   put(leds, 1, {"  ","| ","| "})
   put(leds, 2, {" _  "," _| ","|_  "})
   put(leds, 3, {"_  ","_| ","_| "})
   put(leds, 4, {"    ","|_| ","  | "})
   put(leds, 5, {" _  ","|_  "," _| "})
   put(leds, 6, {" _  ","|_  ","|_| "})
   put(leds, 7, {"_   "," |  "," |  "})
   put(leds, 8, {" _  ","|_| ","|_| "})
   put(leds, 9, {" _  ","|_| "," _| "})
   
num = prompt_string("Enter a number: ")
snum = breakup(num,1)
for i = 1 to 3 do
        for j = 1 to length(num) do
            anum = to_number(snum[j])
            onum = map:get(leds,anum)
            puts(1,onum[i])
        end for
        puts(1,"\n")
end for
puts(1,"\n")

Of course...you want to see it in action -:D

If you haven't already...give OpenEuphoria a try...it's really fun -;)

Greetings,

Blag.
Development Culture.

jueves, 11 de junio de 2015

LED is my new Hello World - Pony Time

As you know...I'm always in a constant search for new, weird and cool programming languages...well...this time...the language came to me -;)

I was contacted by @scblessing letting me know about a programming language his company is working on...Pony.

So...what's exactly is Pony? Pony is an object-oriented, actor-model, capabilities-secure, high performance programming language.

Think about it as a nice mix of C++ and Erlang -;)

The documentation is not yet complete...but it's a good starting point...and also...it provides a Sandbox where you can read the source of Pony libraries and execute some fine examples...

So of course...I couldn't be happy with reading and playing around with it...I needed to code my LED as I do with all other programming languages...so here it is -;)

main.pony
actor Main var _env: Env new create(env: Env) => _env = env let leds: Array[Array[String]] = [[" _ ","\| \| ","\|_\| "], [" ","\| ","\| "], [" _ "," _\| ","\|_ "], ["_ ","_\| ","_\| "], [" ","\|_\| "," \| "], [" _ ","\|_ "," _\| "], [" _ ","\|_ ","\|_\| "], ["_ "," \| "," \| "], [" _ ","\|_\| ","\|_\| "], [" _ ","\|_\| "," _\| "]] var num: String = try env.args(1) else "" end var i: I64 = 0 var j: I64 = 0 var line: String = "" while i < 3 do while j < num.size().string().i64() do try line = line.insert(line.size().string().i64(), leds(num.substring(j,j).u64())(i.string().u64())) else "" end j = j + 1 end i = i + 1 j = 0 _env.out.print(line) line = "" end _env.out.print("")

main.pony

actor Main
 var _env: Env
 
 new create(env: Env) =>
  _env = env
  let leds: Array[Array[String]] = [[" _  ","| | ","|_| "],
                                    ["  ","| ","| "],
                                    [" _  "," _| ","|_  "],
                                    ["_  ","_| ","_| "],
                                    ["    ","|_| ","  | "],
                                    [" _  ","|_  "," _| "],
                                    [" _  ","|_  ","|_| "],
                                    ["_   "," |  "," |  "],
                                    [" _  ","|_| ","|_| "],
                                    [" _  ","|_| "," _| "]]

  var num: String = try env.args(1) else "" end
  var i: I64 = 0
  var j: I64 = 0
  var line: String = ""
  while i < 3 do
   while j < num.size().string().i64() do
    try line = line.insert(line.size().string().i64(),
             leds(num.substring(j,j).u64())(i.string().u64())) else "" end
    j = j + 1
   end
   i = i + 1
   j = 0
   _env.out.print(line)
   line = ""
  end
  _env.out.print("")

And here...you can see it in action -:D

My thoughts are that Pony while still a young language, has a lot of potential and my quick experience with it was nothing but fun and exciting...I will keep a close look at it -;)

Greetings,

Blag.

Development Culture.