Fun with image stabilization

January 5th, 2009

Image stabilization can turn shaky “reality show” handheld camera footage into more smooth, professional looking stuff. More interestingly, it can also be used for making an impromptu panorama, or making interesting Dolly zoom effects.

I wrote an application with OpenCV to show these effects, and have videos to illustrate them below: first the panorama (in which I manually rotated the webcam to get a better field of view), then the dolly zoom (in which I just held a telephone at different distances from the webcam). The same application was used for both videos (and yes, that is my arm in the second video).

The Logic of Science, in a nutshell

September 28th, 2008

There’s this book by Jaynes, called Probability Theory: the Logic of Science. It’s supposed to be great, but I haven’t found the time to read it. So instead, I visualized the first 5 chapters of the book with wordle (see image below). The size of each word tells you how often it occurs: for example, the word “probability” occurs more often than “problem”, and “prior” comes up more than than the word “posterior.” So this is the first 5 chapters, in a nutshell:

Want to do this yourself for some other book (like Pride and Prejudice)? Go to Project Gutenberg, download a book, and paste it into wordle. See what you get.

Impossible Weather

September 12th, 2008

According to wunderground in my area, the chance of rain tomorrow (for the whole day) is 30%. But their hourly forecast says otherwise: between 8am and 11am tomorrow, for example, they say the chance of precipitation is 70%. Impossible, right? Check it out for your area, and you’ll see something similar. It’s not a bug!

So what do they mean by 30%? It doesn’t mean the obvious thing (“if I stand outside all day, what’s the chance I’ll ever get wet?”), so their all-day forecast must mean something else.

impossible weather probabilities

I looked at a bunch of different days, to see what they’re doing, and here’s what I think.

What they’re really saying is this: if you happen to go outside only once during the day, the most likely chance that it will be raining at that instant is what they report. Put another way, they pick the mode (yes, the mode!) of all the hourly forecasts, and give you that as the all-day forecast.

Instead of giving you the useful, obvious thing (what percentage of the day will it rain?), they answer a bizarre question about distributions of distributions (what’s the mode of the probabilities?).

Crazy weathermen.

Counting is Human

August 14th, 2008

We learn to count early in life; some even claim counting is innate. But what does it mean to have “two” of something?

Two oranges, for example, share some perceived properties (color, shape) but not others (position in space, precise molecular configuration). They are somehow both similar and different. This holds for every real countable thing in the world; a countable collection includes things which share some properties, and not others. If two things literally share all of their properties, they must be the same thing!

two oranges

Such classification is undoubtedly useful: by classifying objects into categories (such as “orange” or “airplane”), we only need to learn about one member of the class to “know” them all. It’s kind of a heuristic. And counting requires this heuristic, in order to have more than one thing in a group.

What’s interesting to me, in short, is that the seemingly logical idea of a “number” of things requires the very human idea of grouping, making numbers seem more like a handy companion to information compression, and less like an ideal, transcendental form.

Do what?

August 9th, 2008

sign with faded color ink

My Drive Home

August 6th, 2008

I commute to iRobot daily, as a visiting researcher. The drive takes about 1.25 hours. Here’s a one-minute version of my drive home:

And here’s a one-image version of my drive home. (For the geeks out there, it’s the per-pixel median of all images in the video sequence)

Time lapsed drive, median filtered

Numbers Start with One

July 27th, 2008

Pick any number (the number of miles on your car, or the number of pencils in your cup) from the real world, and it has a 30% chance of starting with the number 1 (like 14, 155, or 15123).

It sounds crazy, but it has to do with the distribution of numbers in the natural world. You’ll find more values between 1 and 10 than between 10 and 100, and more between 1000 and 10,000 than between 10,000 and 100,000. Put another way, there’s a log distribution of numbers that ensures that 10 is always more likely than 20, and that 100 is always more likely than 200. This principle is known as Benford’s law.

So if someone asks you to guess a number, and that number measures a real quantity (guess how many beans in the cup), start it with the number 1.

(Note: A clever friend pointed out that binary numbers always start with 1, so the strength of Benford’s law depends on the base you’re operating in. A big base (say, bigger than the base 10 system we use) will diminish the effects of Benford’s law, but will not diminish the log distribution of numbers in the natural world)

A General Middle

July 22nd, 2008

What is the mean? People cite it all the time: mean heights, mean income, and so on, but most people just understand it as a measure of central tendency and leave it at that. But it’s not the only measure: the median and mode also give something similar. And until recently, I considered them as qualitatively different tools.

It turns out that the mean, median, and mode are all answers to basically the same optimization problem. They’re related by something called the Minkowski loss, which specifies the “pain” you feel if you guess one value that may be different from the sampled value. Say you’ve seen 100 die throws of a weighted die, and you’re asked to guess the value for the next throw. If you choose the mean of the 100 die throws, you’ll minimizes expected squared error (abs(E2)); guessing the median minimizes expected error (abs(E1)), and the mode minimizes the expected chance of being wrong (abs(E0)).

So it’s this general error minimization of (abs(Eq)) that gives us all those estimators: change the value of “q”, and you get a different estimator of central tendency.

This leads to many insights: want a hybrid between a mean and a median? Minimize the expected value of (abs(E1.5). Want to be in the middle of the span of your data? Then think about what it means to minimize the expected value of (abs(E)). And more generally, one realizes that the higher that “q” is, the more we consider the distance in space, and the lower that “q” is, the more we consider probability. (This insight is credited to a tiny passage in the great book PRML. Also, if you want to get technical, the mode minimizes (abs(Eq)) as q approaches zero.)

Keeping Leopard Lean

December 19th, 2007

Leopard is nice, but its features can make your system slow. If you have Leopard and your system inexplicably slows down sometimes, here’s a recipe for speeding up your system.

When your system is acting slow, check which application has the highest CPU numbers in the activity monitor. If you find the culprit to be “backupd”, “safari”, or “mds”, then read on for potential solutions.

Time machine may be slow: If your computer is slow, and “backupd” is prominent in the activity monitor, time machine may be the problem. Its disk activity slows the system down.

Make time machine faster: Exclude files and directories in the preferences. But which ones? If you’re running entourage (or vmware, or parallels), you may want to reconsider whether you back up their files; time machine doesn’t back those up efficiently. If you have no idea why it keeps backing up files, try this command at the terminal. It will tell you which files it has backed up recently. Note that you must replace the items in brackets [] with folder names that exist on your computer.


cd /Volumes/[my backup drive]
cd Backups.backupdb/[my computer]/Latest/
find . -newerct '2 hours ago' -links 1

Spotlight may be slow: If your mac is slow, and “mds” is prominent in your activity monitor, spotlight may be the problem.

Make spotlight faster: Exclude stuff from spotlight’s indexing that you don’t need indexed. Consider excluding your time machine backup, if you have one (in the privacy section of preferences->spotlight).

Flash may be slow: If your computer is slow, and “safari” is prominent in the activity monitor, Flash may be to blame. Often, a single web page can have 5-10 flash animated advertisements. If you have many such web pages open, it can slow down your computer.

Use flash only when you want to: Use Firefox to browse the web, and install the flashblock add-on. This allows you to see the flash you want (like youtube animations), and avoid the ones you don’t.

Leopard

November 16th, 2007

Apple’s new operating system, OS X 10.5, has been out for a few weeks. Overall, it is really great; time machine and the downloads folder (no more crud on my desktop!) are highlights.

But it does have some problems. They will be fixed, I’m sure, but you may want to know before you upgrade:

  • Apple offers three ways of installing. When I chose the easiest option (“upgrade”), my system fell victim to the grey screen of death. I was forced to install again with the “archive and install” option (rather than “upgrade”), which fixed the problem. Apple blames this problem on third-party enhancements, but I never installed any. If you want to be safe, choose “archive and install” rather than “upgrade.”
  • Some full-screen tasks (such as time machine, and itunes coverflow) do not work on my mini. Admittedly, I run at a high resolution, on a fairly old mini, but it’s disappointing not to have those features work.
  • X11 is a bit broken. With Leopard, it’s supposed to just run automatically when needed, but does not always do so (ex. with terminal, and ssh -X). And when run on demand, it has problems being activated from the dock, and fires up a terminal (which I don’t want, and cannot turn off).

So, I highly recommend the upgrade once a lot of these bugs are worked out: my best guess would be mid 2008.