Saturday, June 19, 2010

Prediction API & Big Data API

Google has branched out in many directions since their initial search-engine & adwords success. The company has had such healthy profits (mainly from their internet based advertising) that they were able to dab in pretty much every current interesting application in Computer Science - see here!

For a while they are providing cloud computing services, such as Google Storage for Developers, check out the pricing of that service here. Google maintains all the data within their own infrastructure. I think this article tries to explains how the distributed storage is implemented (of course just a very generic overview). You will notice that the service is naturally scalable and pretty smart in a number of ways.

The most recent activity of google resulted in the announcement of two new APIs (Prediction API and BigData API). The diagram below shows how these fit together.



BigData is used to query a large cloud store (using an SQL dialect over a webservice) and the Prediction API can be used on the data to train google implemented AI models for prediction. This simply seems to be a machine learning library that can be accessed over a webservice. Obviously this runs on google cloud infrastructure and that has it's advantages.

A number of Machine Learning libraries exist, such as WEKA, RapidMiner and many other. I used to write some of my own code for these algorithms, however over the last few years I noticed an amazing increase in the count of ML libraries. In most of my work these days I use open source libraries.

I am not quite sure how the pricing of these APIs works (maybe somebody can enlighten us on this issue), my impression is it is connected to the Google cloud store service, for which these APIs will present another reason to use this store.

You can check out some code samples for the API here.

1 comment: