one man data band

one man data band

Pictured above: Bus Arrival prototype with data fed by R and OneBusAway API

Tacoma has not historically been not known for being a city rich in technical talent.  We have Infoblox, TrueBlue, Accumula and a handful of other growing tech companies, but until recently it has been hard to find anyone who knows their way around a spreadsheet, much less a database.

Learning technical skills in the South Sound is born out of necessity of getting the job done as quickly as possible and as cheaply as possible.  In consulting most professionals generally fake it until they make it.  They learn on the job, try out new things and make a few mistakes before shipping their product out the door.  In the public sector, when I worked as a lowly report tech, I was often tasked with not just giving the needed statistics for a report, but also thinking ahead about what the next questions would be.

In my nearly 15 years of working with data I have had one tool that has been there to support me like no other - the R programming language along with the RStudio IDE.

I have worked with innumberable data systems: SAP, Oracle, MySQL, Access, MongoDB, SQLite, stone tablets, web pages, flat files, APIs returning JSON, XML, and protocolbuffer, data systems like Kafka, RabbitMQ, the list goes on.  I have encountered logfiles stretching into the gigabytes needing to be parsed on one hand and terabytes of unprocessed information-rich transactional data on the other, and R has never failed to meet a demand I threw at it.

R reads all types of data and returns information in all of the data formats it can read.

R can generate interactive maps from shapefiles like ArcGIS * whispers * but for free.  It can easily serve API's simply by decorating functions you have already written. It can employ Selenium to interact with and collect information off of web pages. It can connect to a Spark cluster to harness the power of multiple computers working together to process huge amounts of data.  It can consume message queue information to listen to Arduino and RabbitMQ sensors or send information to LoRa modules.  And on top of that it can take the data you load into it and produce models to test hypotheses and even predict the future.  I have done all of these things, mostly on my own, with the power of free, opensource packages developed by regular people wanting to solve problems.

It is crazy how much this one language can do when you learn enough of the packages.  "ggplot2" develops beautiful, intuitive charts. XML reads and writes XML.  jsonlite reads json. httr can send GET, PUT, and POST. And it does all of this without having to compile, because the heavy lifting has generally already been done with the core functions and features having been written in C++.  What the coder is left with are high level scripting API's that are data focused, are generally well-documented, similar in design, and a joy to work with.

I tend to set out to do something complicated in R by breaking it down into smaller tasks I know how to accomplish and by chaining those smaller tasks into a large script and then I think of all of the ways I will need this script to change over time to create smaller, modular functions and then somehow, something that seemed impossible the day before I am capable of doing hundreds of times per day the next.  It really is quite remarkable.  I encourage more people to learn the R language.  I hope that one day, I won't be just a one man band programming in it here, but one performer among hundreds.

Look forward to seeing some code examples in R for various things I do.