Tuesday, June 17, 2008

Firefox 3 download day - Mozilla just got into politics

Well, Firefox 3 finally arrived and all is well in the world. I was just browsing the downloads map and was ready to call it a night when something caught my eye. Something was strange about that map... Ah, that's it, there seems to be one state missing on the map. For all of you that don't know which one is missing, let me give you a hint -- it's a small state just north of Albania, south of Serbia. You guessed it, it's Kosovo. Yes, Kosovo. The state that got its independence in February this year is not plotted as a separate country, but rather as a part of Serbia (of which it was a part until recently). Now, it may be considered overranting, but I wonder why isn't Kosovo listed as a separate country. After all, it is recognized by the US (country which Mozilla calls home), the UK, France, Italy, Germany... The list goes on. And it's not like the guys at Mozilla didn't have time to prepare, Kosovo got its independence in February which gave them some four months to add it to the countries list. So, why can't I see how many people from Kosovo downloaded Firefox 3? And more importantly, who decides if a country is relevant enough to be added to a company's list? It seems that nowadays a country has to be recognized not only by other countries, but also by companies if it wishes to make a meaningful existence...

Monday, April 7, 2008

Saving e-mails to disk (from a gmail account)

As I was trying to download all my spam messages to disk (I'm building a database of spam messages), I realised that this is no simple task. Well, one thing I could do is save the messages using Thunderbird or Outlook, but since I don't use either (I consider the Gmail web interface very nice and user-friendly) this one is out of the question. However, after a little browsing I discovered a wonderful Python package called libgmail. Long story short, here is the script I used to download all the messages from the spam folder:

#!/usr/bin/env python
savemsg.py -- Download all messages from a specified folder
License: GPL 2.0

import sys
from getpass import getpass
import libgmail

if __name__ == "__main__":
name = sys.argv[1]
except IndexError:
name = raw_input("Gmail account name: ")

pw = getpass("Password: ")

ga = libgmail.GmailAccount(name, pw)

print "\nPlease wait, logging in..."

except libgmail.GmailLoginFailure,e:
print "\nLogin failed. (%s)" % e.message
print "Login successful.\n"

FOLDER_list = {'U_INBOX_SEARCH' : 'inbox',
'U_STARRED_SEARCH' : 'starred',
'U_ALL_SEARCH' : 'all',
'U_DRAFTS_SEARCH' : 'drafts' ,
'U_SENT_SEARCH' : 'sent',
'U_SPAM_SEARCH' : 'spam',

FOLDER_list = raw_input('Choose a folder (inbox, starred, all, drafts, sent, spam): ')
folder = ga.getMessagesByFolder(FOLDER_list)

for thread in folder:
for msg in thread:
print "Downloading message %s " % msg.id
encIndexStart = msg.source.find('charset=')
if encIndexStart != -1:
encIndexEnd = (msg.source.find(' ', encIndexStart),\
msg.source.find('\n', encIndexStart),\
msg.source.find(';', encIndexStart),\
msg.source.find('"', encIndexStart+10))
encIndexEnd = [ind for ind in encIndexEnd if ind != -1]
encIndexEnd = min(encIndexEnd)
enc = msg.source[encIndexStart + 8:encIndexEnd]
enc = enc.replace('"', '').replace(';', '')
enc = 'ascii'
print "Detected encoding %s\n" % enc
f = open(msg.id + " " + msg.subject + ".txt", 'w')
# message subject contains characters forbidden by the os in the
# file name, use just message id
f = open(msg.id + ".txt", 'w')
print "\n\nDone."

One could use the script to download messages from any gmail folder. The encoding of the message is automatically recognized and the message is saved in UTF-8 to facilitate later processing. Of course, you have to have libgmail installed to run the script. It is also very easy to adapt the script to use it for any other purpose (I actually wrote this script by changing one of the demo scripts that come with libgmail).

Tuesday, March 4, 2008

Reddit alien in LaTeX

What to do on a lazy sunday afternoon? Well, there are probably a thousand useful
things one could do, but I chose to do something completely useless. And here it is, the reddit alien drawn in LaTeX (using PsTricks) :)





% head
% ears
% eyes
% mouth
% the tentacle thingie
% arms
% body
% feet


Note that I used the memoir document class, but you could also use article if you don't have memoir available (the results would be the same). The alien does not look exactly the same as the official reddit one, but who cares? And here is what you should get when you run this example through LaTeX:

And people say LaTeX suxx...

Monday, February 25, 2008

Deferred printing in LaTeX

Ever written a textbook? Ever written any document that has questions/problems and answers to those problems? Usually when you do this, you want to have questions printed in one place (or questions in each section of the document), and answers printed at the end of the document (so your readers, which are usually students, would try to actually solve the problem rather than just look at the solution which is right under the problem).
Now, I don't know how most of you usually do this, but up until now the only way I knew to do this is the "brute force" approach---I would simply write the questions and then, at the end of the document, I would write the answers. The obvious drawback of this approach is that you have to look up each individual question when writing the answers (just to see what it was) and this can quickly become very boring. Or, if you have the questions/answers on paper, they are usually written in pairs question/answer so you have to first type all the questions (ignoring the answers), and then go back to the begining of the paper and type all the answers (ignoring the questions). Anyway, I hope you get the picture why I hate doing this.
Wouldn't it be great if I could somehow have the question and its answer in the same place in the source of the document (notice that we are not talking about WYSIWYG text processors here), but then defer the printing of answers in the output document (that is, print the answers at the end of the document)? The solution to my problem comes in a form of the LaTeX box mechanism. Just take a look at this minimal example:


\renewcommand{\theproblem}{\textit{Problem \arabic{problem}.}\,}





Some text before...

\problem{What is the capital of the US?}
\answer{Washington, D.~C.}

\problem{What is 2+2?}


Some other text goes here...




Now I know it looks a bit complicated, but it really isn't. What we are doing is creating a box and storing all the answers in it, and then, when the time is write, we simply print all the contents of that box using the \printanswers command. This way, we can simply have the problem and its solution (answer) in the same place in the source of the document, but have the answers appear in a totally different place in the output pdf (or postscript or whatever). Now, I'm not going to go into details of the LaTeX code itself here. Those who know LaTeX will pretty much understand the code, and those who don't should first learn the basics before trying to get this. Anyway, the code can be used out-of-the-box; if someone would like to change something but doesn't know how, you can contact me. Last, but not least, here is how the output pdf looks like:

UPDATE: As Evan noted, if you are going to use multiple files, don't use \include. Instead, use the \input command which does not create new .aux files.

Thursday, January 31, 2008

Unifying expressions in Python

Most of you that had some AI course in college probably heard of unification of expressions. All of you who ever programmed (and I use the term loosly) in Prolog know what this is. For all others, just a note that I'm not going to explain here what is unification and how it's done. If you don't know what it is, check out, for example this WP article.
Here we can see the algorithm for finding the most general unifier of two expressions in pseudo-code (check out the bottom of the page). And here is my implementation of unifying expressions in Python. I'm not going to explain the code here---if you're interested, you can do this yourself. I'll just give a brief example of how you can use this. So, suppose that you have the following two expressions:
expr1 = P(f(a), g(y), f(w))
expr2 = P(x, g(f(x)), y),
where P is a predicate, g and f are functions, and x, y and w are variables (this is an actual assignment from a test we gave our students some two months ago). Lets solve this using Python:

import mgunifier as mg # just to make the listing shorter

# declare predicates, functions, and variables
P = mg.predicate("P")
g = mg.function("g")
f = mg.function("f")
x = mg.variable("x")
y = mg.variable("y")
w = mg.variable("w")
a = mg.constant("a")

expr1 = [P, [f, a], [g, y], [f, w]]
expr2 = [P, x, [g, [f, x]], y]
uni = mg.mgUnifier(expr1, expr2) # find the most general unifier
new1 = uni(expr1)
new2 = uni(expr2)
print "The most general unifier is ", uni
# check if the two expressions are the same after unification
print new1
print new2

That's all there is to it. If someone has an idea on how to make things more simple or improve the code in any way, I am always glad to hear from you. Now, those of you that had some experience with Prolog probably know that, although unification is the most basic tool of this language, there's a little bug/wart when using it (this applies to SWI Prolog implementation, I don't know how others handle it). Typing, for example,

?- X = f(X).

results in X = f(**), meaning that variable X is now f(f(f(f(...)))). This is, of course, wrong, because the left and the right side of the equal sign will never be the same (and if you don't believe me, check out the algorithm). Why exactly is Prolog doing this is beyond me. Anyway, try the following code:

import mgunifier as mg

P = mg.predicate("P")
g = mg.function("g")
f = mg.function("f")
x = mg.variable("x")
y = mg.variable("y")
w = mg.variable("w")
a = mg.constant("a")

first = [P, x, [g, y], z]
second = [P, [f, a], z, y]
uni = mg.mgUnifier(first, second)
print uni

Not only that you get an error, but you can also see exactly where the algorithm failed.
Now, what's the whole point of this post? To demonstrate some Python implementation of an algorithm no one cares about? Well, partially. I was looking through the web for something similar and found only this. As it didn't satisfy me, I wrote my own program that unifies expressions, so anyone looking for something like this doesn't have to. But, what I'd really like people to see (and read) is this (copy-paste from the source):

def mgUnifier(k1, k2):
"""Return the most general unifier of two expressions k1 and k2."""

if not k1 or not k2: return supstitution()
if isinstance(k1, (constant, variable, function, predicate)) or isinstance(k2, (constant, variable, function, predicate)):
if k1 == k2:
return supstitution()
if isinstance(k1, variable):
if k1 in k2:
return error(`k1` + " in " + `k2`)
return supstitution(k2, k1)
if isinstance(k2, variable):
if k2 in k1:
return error(`k2` + " in " + `k1`)
return supstitution(k1, k2)
if not isinstance(k1, variable) and not isinstance(k2, variable):
return error(`k1` + " and " + `k2` + " cannot be unified!")

alpha = mgUnifier(head(k1), head(k2))
if isinstance(alpha, error):
return alpha
k3 = alpha(tail(k1))
k4 = alpha(tail(k2))
beta = mgUnifier(k3, k4)
if isinstance(beta, error):
return beta
return alpha(beta)

This beautifully (IMHO) demonstrates the power and clarity of Python. Compare this code to the algorithm pseudo-code---it's almost the same. This is what I love about Python---I can simply copy the algorithm's pseudo-code and then just implement the "magic" that happens in the background.
Like I said, the purpose of this program is primarly educational, the code could probably be better written, but I stuck with the implementation that was most similar to the algorithm. Keeping that in mind, I appreciate any feedback on the code.

Monday, January 21, 2008

Real-life Python

I just came home all tired and the minute I saw the place it hit me---I didn't
tidy up in ages. There were books everywhere, clothes all over the place, dirty
dishes... As I'm always tired when I come home from work, I never feel like cleaning
the place up. Somehow Python came up in my head as the solution to my problem (well,
not really a solution:) ). What if we could execute Python code in real life. I
mean, wouldn't it be great if we could do something like:


As I of course didn't feel like cleaning, I decided I'd better spend my time writing
about how I would use Python in real life. Beside cleaning the house, I think Python
could be of great help in the following problems as well.

Understanding women
Did you ever have a fight with your girlfirend/wife (it's a rhetorical question,
of course you did)? Did you ever wonder "What is going on in that mind of hers"?
Well, think about this one:

print girlfriend.__doc__

And if that didn't help, wouldn't it be great to be able to do something like

import dis
print dis.dis(girlfriend.think)

This would not only help you understand her, but also enable you to predict how
will she react. That's right, no more "What did I do wrong"---just look at the
code and you will know.

The answer to life
For all those in doubt as to whether they are wasting their life going to church
every Sunday, here's an ellegant Pythonic solution:

import time
if life.endswith(death):
while alive:
import random
religions = [christianity, islam, buddhism, hinduism]
myReligion = random.choice(religions)
for day in allDays:
if day != sunday:
#no more days, end of time is here

What would you use Python for in real life?

Wednesday, January 9, 2008

How to insert watermark in LaTeX

For my first post this year, I'll start off with something a bit different. This time we'll take a look at how to insert a watermark in a document using LaTeX. Note here that by watermark I mean some text that will appear on the background of each page of the document.
As with everything else in the wonderful world of LaTeX, there are more ways to do this. I know of two, which I will write about here. The first one is fairly simple, just include the package draftcopy and voila. Here's a minimal example of how it works:


\title{Lorem ipsum}


Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.


Running the previous example through LaTeX gives the following output:

Note that running pdflatex on this example will NOT insert the watermark. I use LaTeX->dvi->ps->pdf route to obtain the pdf.
Of course the draftcopy package gives a lot of options to customize the watermark, of which I believe \draftcopyName is the most useful. If you want the word ENTWURF to be printed instead of DRAFT, which is used by default, you would add the line "\draftcopyName{ENTWURF}{155}" in the preamble. The number 155 is the scaling factor for the font---play around with it to get the scaling you want.
For those that like to do it the hard way (like me:) ), here is a piece of code that does pretty much the same thing:


To use this code, you have to include the following packages: graphicx, eso-pic, and type1cm. Here is a minimal example:




\title{Lorem ipsum}


Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.


Be sure not to forget the \makeatletter and \makeatother commands. Unlike with draftcopy, running pdflatex on this example WILL insert the watermark in the pdf. I find this code nice because I can directly tinker with the various options. For example, if I want to change the angle of the text, I'll change the number inside the \rotatebox, if I want to move the text around the page, I'll change the two numbers inside \makebox, if I want to change the grayscale, I'll change the number inside \textcolor, etc.
UPDATE: The above code, as Ivijay pointed out, puts the word DRAFT on every page of the document. If you want it to appear on only one, you should use the
instead of
. This command only adds the watermark to one page (first one if you leave the watermark code in your preamble).

UPDATE: Neno wants to know how to use a picture instead of text. Well, it's just a matter of replacing the lines


with something like


Of course, you might have to play around with adjusting the width and height of the picture, depending on its size. Also, you might want to change tempdimb and tempdimc to properly position the image.

UPDATE: hcr suggests using draftwatermark package to do this sort of thing. I've just had a look at the package, and using it really seems simple. For most of the things I did here, there is a nice command that does that. However, some things are missing from the package. First, the color of the text is always grey (you can choose the grey scale, but can't have the watermark text in, say, red. There does not seem to be a way to change the typeface of the font for the watermark text, and the text is always centered, you can't move it around. And finally, there is no way you could insert a picture as a watermark (which is what Neno wanted to do), and IMHO this could be a big issue for many people who may want their company's logo as a watermark. So anyway, draftwatermark is a great choice if you just want to insert a word (centered) in the background of each page. If you need anything more sophisticated than that, you will have to do something in the line of what I described here.

UPDATE: Amy wants to know is there a way to put the watermark only on some pages, preferably using something like \begin{watermark} and \end{watermark}. Well, Amy, there is no easy solution to your problem. One thing you could do is use \AddToShipoutPicture* that places the watermark only on one page, and then paste the code wherever you need it. Of course, this is ugly. I have managed to hack together some solution that seems to work, but the code is really ugly and I don't advise anyone to use it unless you really need to :). So, here goes:






\title{Lorem ipsum}


page one


First watermarked page
Second watermarked page
Third watermarked page


This page should not be watermarked

Well, Amy, let me know if this works for you...