Saturday, June 19, 2010

Prediction API & Big Data API

Google has branched out in many directions since their initial search-engine & adwords success. The company has had such healthy profits (mainly from their internet based advertising) that they were able to dab in pretty much every current interesting application in Computer Science - see here!

For a while they are providing cloud computing services, such as Google Storage for Developers, check out the pricing of that service here. Google maintains all the data within their own infrastructure. I think this article tries to explains how the distributed storage is implemented (of course just a very generic overview). You will notice that the service is naturally scalable and pretty smart in a number of ways.

The most recent activity of google resulted in the announcement of two new APIs (Prediction API and BigData API). The diagram below shows how these fit together.



BigData is used to query a large cloud store (using an SQL dialect over a webservice) and the Prediction API can be used on the data to train google implemented AI models for prediction. This simply seems to be a machine learning library that can be accessed over a webservice. Obviously this runs on google cloud infrastructure and that has it's advantages.

A number of Machine Learning libraries exist, such as WEKA, RapidMiner and many other. I used to write some of my own code for these algorithms, however over the last few years I noticed an amazing increase in the count of ML libraries. In most of my work these days I use open source libraries.

I am not quite sure how the pricing of these APIs works (maybe somebody can enlighten us on this issue), my impression is it is connected to the Google cloud store service, for which these APIs will present another reason to use this store.

You can check out some code samples for the API here.

Thursday, June 17, 2010

Open Source Licenses

The best place to visit for more information on licenses would be the Open Source Initiative (OSI) at www.opensource.org/docs/definition.php The last time I checked, OSI has certified about 50 different licenses as being conformant with its concept of open source. So there's plenty of choice, but I will mention 3 most popular ones to you here. In any case it probably makes sense to research these a little more on your own since you have the best idea of how strict a licence you are after.

1- GNU General Public License (GPL)
The most known os license, (find the authoritative version at www.gnu.org/copyleft/gpl.html)
It is probably one of the strongest (and notorious) os licenses, it has a section 2b in the license that makes GPL a viral license. That means that if anyone likes your code and uses that code, they have to make their entire software GPL licensed too. The GPL restricts the people that receive your code but not you. In fact you can change the licence for specific people or versions to a more commercial license at any time. However people that already downloaded/agreed to your GPL licensed code version can distribute it under the GPL as long as they like.

2- BSD License, a Berkeley license (www.freebsd.org) used in first edition of Unix, you probably know that already :-). Not as strict to code users as GPL is, for example microsoft is known to have used parts of UNixes networking code in its commercial version of windows.

3-Mozzila Public License (MPL) (www.mozilla.org/MPL) more complex and more loaded with legalese than the GPL, yet it is largerly compatible with GPL. There's one major difference thought. The GPL forbids combining GPL code with proprietary code in a larger piece of work, whereas the MPL expressly allows this. My understanding is that MPL somewhat occupies the middle ground between GPL and BSD licenses.

Funny thing is actually, Microsoft doesn't like open source software :-), they critisize various aspects about os, and therefore they came-up with their own initiative which they called Shared-Source Licenses (see http://www.microsoft.com/resources/sharedsource/default.mspx for more info)

Finally, I'd love to hear various experiences with open source licensing you might have had. You can either comment or email me directly (note: your comment will first go for approval - apologies, but this is necessary due to spam).