Sunday, March 3, 2013

Lies, damned lies, and statistics

From my experience most PhDs in engineering and computer science must use quantitative data analyses and statistical techniques to evaluate and validate data within experiments at some point in their research. Increasingly social science researchers also use these techniques, mainly through well established software packages, such as SPSS, R-Statistics or other.

I have to say that most papers I read, even if they are relatively theoretical, do have a strong empirical data analysis component. One can sometime loose sight of the problems associated with statistical evaluations of empirical work, and hence I thought it be a good reminder for my readers and myself to refresh some common pitfalls with statistical techniques.


  • Discarding unfavorable data

  • Loaded questions

  • Overgeneralization

  • Biased samples

  • Misreporting or misunderstanding of estimated error

  • False causality

  • Proof of the null hypothesis

  • Data dredging

  • Data manipulation

Wikipedia is a good source. I also think a good statistics text will help in combination with some lighter reading.

There are also calls for researchers to make their code and datasets publicly available so that experiments can be repeated independently. This is now increasingly becoming a common practice, especially with high profile journals and conferences, but there are still numerous issues associated with making datasets and code-bases publicly available.

Saturday, March 2, 2013

The server side... and ASP.net side...

Browsing through my draft posts I found this draft, which I wrote while I was still a PhD student... just some ramblings about web programming and my past experiences with it.

Stay in Touch

To my surprise I made a rather interesting observation as a PhD student. A number of my PhD colleagues actually do not have any web-based programming knowledge or experience. This might come as a surprise but it may be their university simply stressed more fundamental CS issues and there was little time left for web programming or maybe they simply never had a chance to gain significant on-course / industry practice in web programming, and once their PhD started, their PhD topic was concerned with an entirely different topic. Whatever the reason, there's not any justifiable excuse in these days to ignore server side languages, especially as a comp. science student. During my PhD I therefore challenged myself to stay on top of new technology in server side programming. And yes maybe I lost some time that I could have dedicated elsewhere, at least I am ready to build a web based system at any time in pretty much any server language you'd throw at me!

Arguments for & against...

Within web-programming, we have a choice of languages to work with: Perl or C++ within a CGI setting, or PHP, ASP.Net (C#), JSP (Java), to name a few. i dabbed around in all of them at some point, but the most significant 'competition definitely takes place between PHP & ASP.net

The choice of language more often than anything depends on the background of the programmer. Then come into play execution speed, client preferences (sometime these are more important than any other factors, but more about that maybe in another post), and quite important is the question of available support and the availability of code base from past projects or from 3rd party sources, whether these be open-source or commercial. But the main idea is that we don't begin development from ground level.

PHP has it all, clients like it because it sprang from the open source movement and much more codebase out there is open source than probably for any other server side language out there. So this covers the client preference & availability of code-base factors. The speed is generally acceptable and programmers generally love PHP and since it is an interpreted language it's very easy to maintain and the symbiosis between PHP & MySQL works extremely well.

When I first dabbed in asp.net, this was in 2003/04, on 1.1 and 2.0 of the .net framework. Compared to PHP I hated many things about it, but that's a longer story! Since those days Microsoft engineers were bussy developing the technology, and currently we are at framework 4.0. I heard a lot of hype, as tends to be with Microsoft releases, so I decided to tame my curiosity.
  • ASP.net web-forms
  • ASP.net MVC
  • Stripped down (web-form free) approach
The idea for the stripped down approach was always within my head but in order to take this approach you really had to feel comfortable with some complexities of the chunky asp.net web-forms approach, until I found Chris Taylor's article.
ASP.net is much more complex as PHP, and maybe this is the problem with asp.net too. Due to it's complexity and the learning curve porgammers, quite rightly, keep away from it. Just to name a few problems. The asp.net menu control would render very ugly (non standard comliant) XHTML mark-up, instead of a CSS styled list which would be the way to go here, or asp.net would generate client IDs that depended on where in the page the server control occured rather than keeping the server ID assigned to the control in the first place. Fortunately at least the above named issues are resolved with framewrok 4.0, which gives us 'hope' for Microsoft.

Comparison Table (useful for 1st time asp.net people)

Speed Comparison of server-side languages - check out the source site




Monday, October 15, 2012

Behaving optimaly in life

Social sciences and Psychologhy have brought us a number of interesting insights into human behaviour. In a recent stumbleupon session I discovered a collection of recent scientific journal research articles relating to various aspects of life. You can read the original article on psychologytoday, what follows is a subset of the "solutions" suggested by the research papers. For a more complete description I recommend the reader to check out the full articles and of course I wouldn't take this advice literaly but only something to ponder on :-).

1-How to break bad habits: J. Quinn, A. Pascoe, W. Wood, & D. Neal (2010) Can't control yourself? Monitor those bad habits. Personality and Social Psychology Bulletin, 36, 499-511

Focus on stopping the behavior before it starts (or, as psychologists tend to put it, you need to "inhibit" your bad behavior). According to research by Jeffrey Quinn and his colleagues, the most effective strategy for breaking a bad habit is vigilant monitoring - focusing your attention on the unwanted behavior to make sure you don't engage in it. In other words, thinking to yourself "Don't do it!" and watching out for slipups - the very opposite of distraction. If you stick with it, the use of this strategy can inhibit the behavior completely over time, and you can be free of your bad habit for good.

2-How to make everything seem easier: J. Ackerman, C. Nocera, and J. Bargh (2010) Incidental haptic sensations influence social judgments and decisions. Science, 328, 1712- 1715.

For instance, we associate smoothness and roughness with ease and difficulty, respectively, as in expressions like "smooth sailing," and "rough road ahead." In one study, people who completed a puzzle with pieces that had been covered in sandpaper later described an interaction between two other individuals as more difficult and awkward than those whose puzzles had been smooth. (Tip: Never try to buy a car or negotiate a raise while wearing a wool sweater. Consider satin underpants instead. Everything seems easy in satin underpants.)

3-How to manage your time better: M. Weick & A. Guinote (2010) How long will it take? Power biases time predictions. Journal of Experimental Social Psychology.

You can learn to more accurately predict how long something will take and become a better planner, if you stop and consider potential obstacles, along with two other factors: your own past experiences (i.e., how long did it take last time?), and all the steps or subcomponents that make up the task (i.e., factoring in the time you'll need for each part.)

4-How to be happier: J. Quoidbach, E. Dunn, K. Petrides, & M. Mikolajczak (2010) Money giveth, money taketh away: The dual effect of wealth on happiness. Psychological Science, 21, 759-763.

The basic idea is that when you have the money to eat at fancy restaurants every night and buy designer clothes from chic boutiques, those experiences diminish the enjoyment you get out of the simpler, more everyday pleasures, like the smell of a steak sizzling on your backyard grill, or the bargain you got on the sweet little sundress from Target. Create plans for how to inject more savoring into each day, and you will increase your happiness and well-being much more than (or even despite) your growing riches. And if you're riches aren't actually growing, then savoring is still a great way to truly appreciate what you do have.

5-How to have more willpower: M. Muraven (2010) Building self-control strength: Practicing self-control leads to improved self-control performance. Journal of Experimental Social Psychology, 46, 465-468.

New research by Mark Muraven shows that our capacity for self-control is surprisingly like a muscle that can be strengthened by regular exercise. Do you have a sweet tooth? Try giving up candy, even if weight-loss and cavity-prevention are not your goals. Hate exerting yourself physically? Go out and buy one of those handgrips you see the muscle men with at the gym - even if your goal is to pay your bills on time. In one study, after two weeks of sweets-abstinence and handgripping, Muraven found that participants had significantly improved on a difficult concentration task that required lots of self-control. Just by working your willpower muscle regularly, engaging in simple actions that require small amounts of self-control - like sitting up straight or making your bed each day - you can develop the self-control strength you'll need to tackle all of your goals.

6-How to feel more powerfull: D. Carney, A. Cuddy, and A. Yap (2010) Power posing: Brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological Science, 21, 1363-1368.

In the animal kingdom, alphas signal their dominance through body movement and posture. Human beings are no different. The most powerful guy in the room is usually the one whose physical movements are most expansive - legs apart, leaning forward, arms spread wide while he gestures.  The nervous, powerless person holds himself very differently - he makes himself physically as small as possible: shoulders hunched, feet together, hands in his lap or arms wrapped protectively across his chest. We adopt these poses unconsciously, and they are perceived (also unconsciously) by others as indictors of our status. posing in "high power" positions not only created psychological and behavioral changes typically associated with powerful people, it created physiological changes characteristic of the powerful as well. High power posers felt more powerful, were more willing to take risks, and experienced significant increases in testosterone along with decreases in cortisol (the body's chemical response to stress.)

Search all text files by file's content

Finding a text-file, when one doesn't remember the file-name, or where one has stored it on the hard-drive can be a nightmare, especially since the Windows file-search fails. Powershell (udner Windows) comes to the rescue... all you have to do, is open up a Powershell console (in the newer versions it comes with the Windows OS, in older ones, you might need to download it).
  1. Make shure that in the command prompt of powershell you are within the drive you want to search (i.e. use cd and ..cd commands to get there, or type cd c: if you need to search the c drive.
  2. Get-ChildItem -Recurse -Include *.txt | Select-String "search string"
where search string is simply a piece of text that you know is in the contents of the file. For example in my case I type ANOVA, since I was looking for my notes on ANOVA tests. I think you can also use regex expressions, since powershell's Select-String cmd-let supports it, if I'm not mistaken.

Saturday, May 19, 2012

Sports Informatics and Social Collaboration

After a long time, I felt it was about time for a quick update on my Blog. Things have been rather hectic in the last few months. After my torn-ligaments injury, I'm still recuperating, although I've been working for five months as a Research Associate in the field of Sports Informatics, for the computer science department at Loughborough University. This work involves grass-root research of various areas in computer science applications within sport. I am tasked with organising seminars, and a symposium, in addition to initial academic research. There are numerous potential applications, such as the use of image analysis in sports science, virtual reality uses in sports science, coaching applications within sports, or AI / intelligent systems in sports-data management / systems solutions in sports science.

The table below (Lames 2012 - Departmental Presentation) illustrates the two-way relationships between computer science and sports science subjects. Essentially any work at these subject intersections is known as the field of sports-informatics (see the IACSS association website, which is an umbrella association for this type of work).
My task in this research position is to establish research links with international and nation wide research centres, for the department. Loughborough has a strong tradition in sports science with international research excellence (see SSEHS or STI, for instance), and there is a lot of potential applications, for example in team player image analysis based tracking algorithms, or the use of Machine Learning in detection of team-play patterns within the computer science department. To me, an area of most interest are the application of communications / social-media technologies in sports. Recently some work looked at several sporting events, and analysed the social-media UGC (User Generated Content). What type of things people are talking about in regard to sporting events, how the fan-athlete relationship is changing from traditional media, and whether any revealing information is shared (Pegoraro 2010, Kassing and Sanderson 2010). I am especially curious whether Twitter and other social-media contributions may be revealing in relation to for example sport draft picks, line-ups and team-play / strategy changes (in other words, these are problems of talent detection and coaching). There is also more work to be done in investigating fan-athlete communications, such as predicting how likely an athlete / celebrity is to respond with a direct message to fans, identifying fans, and classifying them based on the dynamics of interactions, or correlating match tracking data with social media contributions, since this type of work has not seen much research.

I am also beginning a new RA position in the All-in-One project (funded by the EPSRC) at Leicester University. This is a project looking at single infrastructure provision, and its technological and scientific feasibility within a 100 years from now. This progress is motivated by climate change, cost reductions, and efficient use of utilities (see this working paper, for a basic introduction). My main task in this project is to work on a collaborative web-based (web 2.0 / social-media type) system that facilitates collaboration and sharing within an academic and also a wider citizen-science community. This is an interesting area of work, with various problems, such as: how to design a system that facilitates efficient social, web based collaboration of many individuals?; or how to attract and maintain an active user-base of contributors and collaborators on the web based system? There is some very interesting research work in this area that was done within the Climate CoLab project of MIT's Collective Intelligence Centre, and for example the CSCW conference, contains highly relevant research contributions that help answer the questions, above. My work, within the All-in-One project involves the deployment of a collaborative system / processes based on the evaluation of prior academic research. Some of my work within my PhD, such as the design of the Newsmental system, is relevant to this, and it will be interesting to put the entire concept of collective intelligence into practice, within a larger scale project, such as this one.

References:


  • [Pegoraro, A. Look Who's Talking - Athletes on Twitter: A Case Study International Journal of Sport Communication, 2010, 3, pp. 501-514]
  • [Kassing, J. W. & Sanderson, J. Fan-Athlete Interaction and Twitter Tweeting Through the Giro: A Case Study International Journal of Sport Communication, 2010, 3, pp. 113-128]

Tuesday, March 6, 2012

Simplicity in Web Apps

Dr. BJ Fogg from Stanford University does some interesting work in understanding Web 2.0 to human interactions... he even teaches a course at Stanford fully dedicated to Facebook :-)

Anyway, this is his model of simplicity as it relates to web apps, a very brief and rough intro, but maybe you'll find it useful - http://behaviormodel.org/ability.html

What I took away from it:

  • An ugly / simplistic but useful definition of SIMPLICITY: The minimally satisfying solution at the lowest cost.
  • Simplicity is contextual, i.e. it depends on the situation or person (not necessarily the product)
  • Simplicity is a function of your scarcest resource at that moment, where Fogg identifies these resources: Time, Money, Physical Effort, Brain Cycles, Social Deviance (i.e. going against socially acceptable norms), Non-Routine
Interesting stuff, he has many more resources on his web pages.

Thursday, June 2, 2011

Automated Financial News Understanding System

I've just had the great pleasure of finishing up the http://www.newsmental.com news analysis, aggregator and community opinion formation web 2.0 system. That's right the system performs these 3 main things:
  • News from numerous sources is extracted more than 6 times a day and clustered based on simillarity of the article's text, this provides a nice overview of all the news, and in adition trending news are highlited, right at the top of the page. This sounds a little bit like news.google.com but there's much more to it, read on to find out... :-)
  • News is automatically analysed using some pretty heavy text-mining AI techniques in order to extract entities, places, people, facts, relationships and other semanitc / meaningful elements from the articles. These extracts are presented in each news-item's panel so that the user can avoid having to read the entire article - a simple lookup at the entities and relationships will provide all the quick info needed under time pressure!!
  • Most interestingly the newsmental system allows any user (logged-in or not) to rate the sentiment and impact of the news-article as you perceive it personally. This is something I would call - collaborative news analysis ala web 2.0 style! Eventually you won't be reading the news alone but every visitor to the site reads the same news-items, why then, not share the individual news understanding with the community, to help understand the news even better, than you would on your own maybe...
Let me mention that newsmental is a research project in community opinion aggregation research for my PhD, and is hence an academic study with noble research aims at better understanding collaborative news analysis. So in summary newsmental is a system that can potentially save you a lot of time keeping up with all the wide business, economics and finance news. This tool could be quite useful to traders and similar professionals where being aware of news and their overal implications plays a major role!

Thanks for reading all the way, to finish off the article I provide a few useful links:
[1] Quick (2 minute) Guide - http://www.newsmental.com/tutorialintro.aspx
[2] FAQ and some background info: http://www.newsmental.com/faq.aspx (here you can leave your email behind if you are interested into the outcomes of our study)
[3] Full Tutorial - http://www.newsmental.com/tutorial.aspx

Wednesday, May 11, 2011

AJAX UpdatePanel in ASP.net fully explained!

Tip: avoid the ASP.NET update panel whenever you can!

Update Panel is a quick and (very) dirty way to enable some AJAX on an asp.net webpage. Simply put one or several Update Panels onto a page, with one scriptmanager for the page, usually you'll want to set UpdateMode of the Update Panel(s) to "Conditional", and in case you have user-controls on your page, you might need to set EnablePartialRendering="true"  (this is by default set to true I believe) and it often seems to work just great, you get those famous flicker free partial page postbacks that are soo characteristic of AJAX. Unfortunately under the hood the updatepanel causes a full reinstantiationation of the Page’s control tree and every single control runs through its life cycle events.

Problems

I can understand that this abstraction offers certain amount of familiarity and simplicity that maybe some naive programmers will welcome very much, however it is misleading and utterly non-"ajaxy". Just imagine that you generate request and other parameter specific HTML for the same resource on the fly (this is altogether not at all uncommon with many dynamic web 2.0 applications), and you need some AJAX functionality on that HTML-page then the update panel will be an utter nightmare. Since the so called AJAX update-panel actually re-instantiates the entire control tree and runs plugs into the page-life cycle behind the scenes so that all the control events can be accessed nicely from code-behind, your dynamically generated page would need to be re-loaded from the viewstate or session state manually (kinda sucks)!!! See this post, but especially this post to illustrate the extra overhead on your side to achieve this.

Try... get this :-)

Obviously this seems too much work on server side, when all you wanted to do is send/retrieve a little bit of data to your web/db-server asynchronously. The whole idea of AJAX is that you update a small area of the page that needs updating since most of the HTML can stay the way it is a lot of bits on the wire & server processing time can be potentially saved. The side effect of which is a flicker free, quick, responsive web-page. With the asp.net update-panel it seems the main goal of the control is a flicker free update. I found a post that highlights the common mistakes with the update panel where some comments sadly point out the misleading opinion that this is a down to earth logical design. The truth is that once you know what the update-panel does exactly you can live with it, in some basic situations it might be quite alright to use it, but  it certainly isn't good AJAX by design by any standard.

The illustration below illustrates the desired AJAX scenario:


So problems begin if for example you generate controls dynamically based on the first page load, or by user-interaction, this is very common in todays dynamic web. If an update panel is used in such a common scenario then it is necessary to keep track of the controls that have been generated - usually this has to be done in the viewstate, and the framework doesnt do it for you, you do have to code up the viewstate state preservation (i.e. saving / retrieving from viewstate at the right time of the page lifecycle) yourself. This can bring a great deal of unexpected and most importantly unneeded complexity.

Of-course you can decide to stick with the update panel [for some very, highly, extremely strange reason :-)], and you can take care of the state management of dynamically generated controls as it is described in this stackoverflow.com post, or this one. Have fun ;-)...

The Solution (page methods, etc...):

Fortunatelly we can simply use direct AJAX calls. As Microsoft engineers realised that update panel (in most non-trivial scenarious) simply sucks and provided us with alternatives, specifically page methods, these are great, essentially a webservice type of method that can be declared as a static public method in my webpage class, raher than having to create a new web-service to expose the method. Page-methods allow to keep code in one place and I love them. Data is by default returned in JSON, but the format can easily be changed to XML for example (since JSON, isn't capable of representing certain complicated self-referential data-item). Check out this page for a good example of pagemethod in use.... Of course standard webservices can also be used, the options are discussed in some detail withing this great MSDN Magazine article written by Jeff Prosise on some options, other than the UpdatePanel.

JQuery or for that matter any other ajax supporting javascript library can be used instead (quite easily) to take care of asynchronous server communications, the guy from Encosia shows in a neat short article how to do this in jQuery - check it out.

Finally don't forget that if you use any postback controls, such as HTML Buttons, or ASPButton, ASPLink, the OnClientClick must contain something like "return false;" otherwise a page post-back occurs anyway as the server-side generated button click-event triggers. If you follow up these resources above, you will find that using AJAX instead of the update panel is actually very easy once you've done it a few times.

Conclusion

In conclusion update panel is nasty, it costs a lot of bandwidth and a lot of control is lost due to the nature Microsoft decided to hook it up with a pages's lifecycle. Some of that control can be regained by using the client side page-scrip-manager object as described on this page, however it doesn't resolve need for manual state-management of dynamically generated controls!

Sunday, April 3, 2011

Removing a Fake-antivirus / Spyware

So this Saturday I noticed that my Eee PC Acer Netbook-laptop (running windows-XP) got infected by one of those nasty Fake-Antiviruses. Having worked for over a year @ the Loughborough UNI PC-Clinic, I knew straight away what to do, but this one was a nasty one and it did take me over 6 hours to clean and repair my system. Having a complicated programming environment set-up on my laptop I really didn't feel like re-installing everything, and I decided to identify and eliminate the virus carefully. I briefly share my experience here since it could help some other poor soul with the same problem.
  1. ...as soon as you notice the annoying pop-ups and fake/suspiciously-looking security centre warnings, restart your system and boot it up into "Safe-Mode with Networking". On most laptops you need to press/hold F8 to get the screen from which to choose Safe-Mode...
  2. ...the problem usually is, that the spyware will de-associate .exe files and you wont be able to start any programs, such as command line, reg-edit or even a browser (browsers start-up page and proxy-settings are also changed, so be careful to fix your browser settings). If you are lucky safe-mode will prevent the spyware running, but in my case safe-mode didnt help. But there's a surprisingly simple trick: If you have numerous accounts on your XP system and you usually use only one of them select the one that you use rarely or never (quite often this is the administrator account), and if you are lucky it turns out that the anti-virus will not have infected that user-account!
  3. ...I was lucky that my Admin-user account wasn't infected and from there I was able to manually look for the process in windows task manager and search my system for such files and delete them manually [be careful not to delete system files - also most likely you will have to enable the viewing of system files under win-xp].
  4. ...in safe-mode I was also able to run the following tools: Malwarebytes, SpyBot Search & Destroy and SUPERAntySpyware. The reason for running "Safe-Mode with Networking" is you can connect to the internet to update to the latest anti-spyware/virus definition files. If a connection to internet cannot be established make sure to install the latest versions and for SpyBot Search & Destroy you can install the latest definition files separately, which is very handy. Run all the tools in sequence (rebooting - back into safe-mode [this is important!!]) Each software found different elements of the spyware and were able to remove most of it. This will take a lot of time, each scan can take much more than an hour (I tend to set the process priority for the scan to "Real Time" as the OS scheduler this way allocates more CPU time to the process & the scan will run quicker). 
  5. ...once I was relatively sure the system was clean, I run the fullest possible scan with SUPERAntySpyware again, this detected a few more issues and only when I was pretty sure the system could be clean I then booted up normal win-xp.
You might be done now! - but in my case my exe file associations were still broken (the fake-anti-virus devastation it left behind). In order to fix-this you can manually edit the registry, download registry entries to merge with your registry or simply run a tool for XP which is what I've done this time and worked like a treat.

I recommend you also do your own research, I found many useful articles online, such as this one, and depending on the version of spyware/anti-virus you might need to take a slightly different approach. Good luck! 

Wednesday, March 23, 2011

My Research Survey

In my blog post from February 2nd 2011, I mentioned several resources for constructing proper and effective questionaire surveys. After an initial draft version I have now put up online my anonymous online survey. Please feel free to visit and help me with my research by filling out the questionaire at www.newsmental.com/survey.aspx.

I designed the survey by following some of the principles mentioned in my blog post from February 2nd, but most of all I made an effort to keep the survey as short as possible (it only takes about 1 minute to fill out) as I really don't like to fill out 20 minute long surveys. So it should only take about a minute to fill out, if it doesn't or if you have any issues, comments feel free to leave me a message.