Daily rant

Sunday, May 23, 2010

Running separate xsession in Ubuntu

Every time I upgrade to a new version of Ubuntu, it breaks my xsession (which runs bear Xmonad). And every time I have to mess around for half an hour to find what needs to be done to get it back. All that needs to be done is to modify the file /usr/share/xsessions/xmonad.desktop to look like this:


[Desktop Entry]
Encoding=UTF-8
Name=XMonad
Comment=Lightweight tiling window manager
Exec=/etc/X11/Xsession
Icon=xmonad.png
Type=XSession

Saturday, March 13, 2010

Twitter corpus v0.1

I am pleased to announce that we have published the first version of our Twitter corpus. All the data was collected from Twitter's streaming API over a period of about two months (November 11th 2009 until February 1st 2010). You can download the corpus from our social media website (which we just set up recently). There is an accompanying paper which gives some statistics about the corpus. One things that might interest all the 13-year old girls out there is that it seems Justin Bieber > Nick Jonas (look at table 3 in the paper). I believe that the Twitter corpus will be of interest to anyone working in social media research and/or NLP. We do plan to release subsequent versions as we get more data (and we might release old data starting from April 2009, but more on this later).

Friday, July 10, 2009

Limits in stdint.h

I've recently had problems with compiling my code that used the numeric limits defined in stdint.h, UINT64_MAX in particular. The error I was getting was


error: ‘UINT64_MAX’ was not declared in this scope

What I didn't know is that simply including stdint.h wasn't enough. The macro __STDC_LIMIT_MACROS has to be defined before the point where stdint.h is included, otherwise the limits are not defined. The best way of defining this macro, IMHO, is by compiling with -D__STDC_LIMIT_MACROS, instead of manually defining the macro somewhere in code. Hope this post will save someone a few minutes (hours?) if they run into similar problems.

Saturday, June 20, 2009

Do public prediction markets really fail?

A few days ago I came across this article explaining why public prediction markets fail. The article gives an example where three different PMs failed to pick a winner in American Idol (Betfair) or Britain's got talent (Hubdub and Intrade). While it was definitely an interesting read, I feel that the author didn't take some things into account.
First, the success of PMs depends on the notion of participants playing risk neutral strategies (see, for example, the Manski 2005 paper) -- when people play with fake money, this may well not hold (they will tend to risk more than usual because they have nothing to lose).
Also, PMs are said to be more accurate than other ways of aggregating opinions such as polls, or exert opinions. It would be nice if the author had compared these predictions with some predictions made by opinion polls or experts and shown how the predictions differ.
Then there's the thing with data from Hubdub. As noted in the article, Hubdub's market concerned with Britain's got talent failed to predict the true outcome, giving Susan Boyle 78% chance of winning. What was not taken into account is that there were in fact more markets on Hubdub concerned with Britain's got talent. The one used in the article can be found here. However, here's another market that accurately predicted that "Susan Boyle OR Diversity" will win (Diversity won). Note the following: the market that got Susan Boyle wrong had 25k$ of activity, whereas the one that got it right had twice as much activity. So, if you were to make any decisions based on PM predictions, you would probably go for the one with more activity and you would be right. I don't know if Intrade of Betfair had more markets for the same event.
Anyway, I agree in the bottom line that accuracy of PMs depends on how much information its participants have, and that, ultimately, they will fail some of the time. However, we should be careful about making statements like "public prediction markets fail", especially since there are so many examples when they don't (this might be a topic for another post). And even when they do fail, it's important to understand why.