Very cool: Biocep-R distributed R OSS project
March 30, 2009
I’ve been playing around with Biocep-R, an OSS project that aims to move the R platform into a whole new realm. I humbly suggest that folks in finance keep an eye on this. It has big potential in many other areas as well (e.g. biostatistics).
It still needs work before it’s ready for prime time use, but the vision and ability to execute seems to be there.
A few pieces from the project:
- Central control over distributed R engines in a multi-server environment.
- Amazon cloud support for the above: “Amazon EC2 virtual machines running R servers can be fired up or shut down to scale up or scale down according to the load…”
- Central R object repository to allow many engines to cooperate on large computation.
- The R engines can be persistent or run on demand for certain jobs. Maintain your own R session on a server and connect to it remotely; this is great for individuals with long running R jobs.
- SOAP/other remote access to R-based services. Submit jobs to the R-cloud from the technology of your choice. You will not want to submit a huge data set over SOAP, so the mechanism for making this work in practice needs some thought.
- A Java based GUI to work with all this. Not my favorite code editor, but still good.
One big stumbling block is still, IMO, the performance problems in the R runtime. Also the memory model in R is not the best for large computation; the S-PLUS Big Data feature tries to help with this (not available for R as of yet), but in the long run a more invasive solution may be needed.
So this project has a way to go before becoming the next big thing. But I think it gives a good picture of some next steps in statistical computing.