A Shiny new approach to Data Mining

Publicado el domingo, 2 de diciembre de 2012

Shiny Analogue ExampleEarly 2011 RStudio released the beta of their Integrated Development Environment (IDE) R. For most of us working with R having and IDE was something that was sorely missing. Not only did RStudio Inc. make the IDE, they have also put and incredible amount of effort to polish it to it’s current state. And all that for Free software.

The business model of RStudio has just been extended with Shiny. Shiny is a combination of R packages that enable “reactive programming” in R and present the resulting interface through a web-browser. It is “reactive” because it allows for an interface that responds immediately to any changes made by the user. It offers a basic set of widgets (control elements, like radio buttons, or sliders) and a clear distinction between what the user sees (the interface) and how the data is handled (the controller). If you do not program, then that may not seem very relevant, but when you start playing around with Shiny it will gently force you to work along a so called Model-View-Controller (MVC) pattern. You will be happy for it once your Shiny Apps get larger and more complex.

RStudio already offers Shiny to all R users for use on their local machines. Starting you shiny app is as simple as:

$ cd ~/ShinyAppFolder
$ R -e “shiny::runApp(port=8100) # 8100 is the default port number

I mention the port number above, as you choose any available port to have multiple Shiny Apps running on your machine. When you have set up a number of interfaces to explore your data set, that makes for a very cool feature.

Shiny Analogue Example

Example of a Shiny interface for stratigraphic plots

For teaching, and for your own efforts to understand libraries, it offers a really neat way to set up an interface to the various options of a package. As an example I set up a Shiny user interface to Stratiplot (part of Gavin Simpson’s analogue package) for paleoecology stratigraphic charts. Stratiplot has an extensive series of plotting options, in part inherited from the XY package, that give a lot of flexibility, but make it a bit cumbersome to try them all out on the command line.

If you look at the example HERE, you can see that there are now sliders available for the chooseTaxa method, and pull downs for the various combinations of plotting options. For a specific project these can easily be adapted to include other abundance scales (depending on the min/max in your data set), or even a selection of taxa to choose from for display in the graph. The main goal of this example is just to be able to play around interactively with the different graphing options that Stratiplot provides, including the stratigraphic zonation.

I believe RStudio released a second hit, and that Shiny is the next best thing after RStudio and R itself. The applications are endless, in business and in science, as the package is well thought out and a great way to extend the R skills you already have to an interface that allows you to share your data mining approach (or data exploration, data analysis, depending on what you are doing and the terminology you prefer) with others in your team or company.