Tweet Classifier post-mortem

November 22, 2009 by Dave Ross

I started an experiment Friday to try categorizing tweets (Twitter messages) using a Bayesian classifier. That’s a fancier version of the software that’s separating your junk e-mail (“spam”) from the stuff you want to read. It uses word frequency to figure the probability that a given message belongs in one of the categories it knows about.

I’m not like Google or Bing, who get access to the whole firehose of real-time Twitter messages. Instead, I had to settle for the 20 latest public tweets from a publicly-accessible API, and I only ran my “learn” script every minute (every other minute toward the end) to be nice to Twitter’s servers. Still, at the time I decided to call it quits, 18,520 tweets had been processed.

I didn’t want to assign categories myself, so I decided to make use of the hashtags that are so popular on Twitter. 3456 of those 18,000+ messages (~19%) contained hashtags, and they consisted of 995 unique hashtags which my program was able to turn into categories. I had some success initially. I got a big kick when it started putting posts with bad grammar in the “cheezeburger” category. Posts with the words “gay” and “movie” were assumed to be about “New Moon”, the new Twilight sequel. A post that said “#weloveyoujustin #weloveyoujustin #weloveyoujustin…” was correctly categorized as “weloveyoujustin”.

Aside from a few outliers like that, it rarely was more than 20% certain about the categories it was assigning. The same words appeared in all sorts of posts, and Twitter’s 140 character limit discourages big, specific words. I started running it on a Friday, which meant there were a lot of #followfriday and #ff posts adding noise to the mix — it really liked assigning this category when it didn’t know what else to do. And, cross-posting tags like #fb and #in to duplicate posts on Facebook and LinkedIn led to some bad categorizing as well. Aside from #ff and #fb, other tags weren’t used very often. It seems hashtags are mostly used by specific communities like an inside joke or their own internal categorizing system.

A few posts that my program tried to learn from:

  • Eeeeeeeeeeeeee!! (Doctor Who) #pudsey
  • #Whatdoyoudo when ur not on twitter?
  • I hate wal-mart #fb
  • Thanks @swinmill back at ya! #ff
  • @HankYeomans : – ( #FAIL

As you can see, there wasn’t much to build a database of word frequency from.

I ended up killing off the “learn” program last night when it hit a scaling limit. The classifying library I was using was taking forever to assign categories because it had so many to choose from. Don’t forget, programs like this are usually just dealing with two categories: “spam” and “not spam”. 995 categories required too many calculations for my poor Mac Mini to handle.

In conclusion, I don’t consider this exercise a failure. I got to play with a Twitter library and a Bayesian classifier, which was cool. I got to play voyeur and read all sorts of strangers’ tweets. And I got a few chuckles out of it along the way. But, if a Bayesian strategy is going to be used to categorize Twitter messages, it’s going to need some serious hardware behind it.

I used these libraries:

 

Wacom Bamboo Touch review

November 19, 2009 by Dave Ross

Bamboo Touch in actionMy stay-at-home vacation kicked off Friday night with a trip to Fry’s to pick up a Wacom Bamboo Touch to try out. I planned to post a review after using it for a week, and hearing it mentioned on MacBreak Weekly reminded me I still had some writing to do.

A co-worker and I were discussing Apple’s new Magic Mouse, and we agreed it would be neat to have something like our MacBooks’ touch pads, only desktop-sized. I immediately thought of the Wacom devices I looked at the last time I went to Fry’s. The online reviews of the Bamboo Touch were mixed; people either loved it or hated it. How did it work out for me? Read on.

I got the smallest Bamboo Touch, a 8.2″ x 5.4″ pad that’s only 1/3″ thick. There are four ExpressKeys along the side, and a white LED lights up to show it’s connected to your computer’s USB. The LED glows brighter when you touch the surface.

The packaging has the coloring and design from the Bamboo Fun line, and the inside of the package was laid out really well. You could tell this box was designed by someone who loved the feeling of opening up a new Apple product. This unboxing video from leopardsoup on YouTube walks you through the box and its contents:

My Mac Mini recognized the Bamboo Touch as a mouse right away, and the drivers supplied with the tablet enabled the two-finger multitouch features. The Bamboo Touch supports a fixed set of gestures: left click, right click, drag, scroll up/down, scroll left/right, zoom in/out (pinch), rotate, and back/forward. Some reviewers found the absence of three or four-fingered gestures confusing, but my 2006 MacBook doesn’t have them either, so I don’t feel like I’m missing much. On the contrary, I get annoyed by my MacBook’s primitive gestures now (just clicking & scrolling).

Bamboo Touch pointer settingsThe Mac drivers install a System Preference pane where you associate each of the ExpressKeys with an action (such as “Switch Application” or “Show Desktop”) or assign a custom keystroke. They also let you pick left or right handed operation and adjust normal mouse settings like pointer acceleration. And, trust me, you’ll want to keep this preference pane open for the first day or two as you get used to using and adjusting the pad. I think a lot of the complaints about the pad’s accuracy could have been fixed by adjusting some settings.

I still keep a wired optical mouse around for gaming. I think Quake Live forces the pointer speed all the way up, because even the lightest touch sent my character spinning. But, the Bamboo Touch excels at all the other tasks I use my computer for. Be careful highlighting text, though. It can be tricky. And, with “drag lock” turned on, I sometimes forget to click the pad a second time to release the text or icon I’m dragging.

A couple of other reviewers complained about arm fatigue, since you can’t rest your hand on the pad like you would a mouse. You do have to hold your hand above the pad, or rest your fingers outside the pad’s active area (outlined in gray). A $10 beanbag from Allsop gave me a handy place to rest my hand, and I haven’t had any problems with strain or fatigue. Rather, I feel more relaxed using it since I don’t have to grip a mouse.

Turing in a basketOne of my reasons for wanting a new pointing device was that a wireless mouse would let me rid myself of Turing’s favorite cord to sit on. It’s hard to move a mouse around when there’s a cat keeping you from moving more than an inch. The Bamboo Touch is corded, but it doesn’t need to move. You’d think this would be an improvement, but Turing just sits on the pad now. His furry butt registers as a finger, keeping me from getting any work done. So, be careful using this around cats. Also, I don’t know how much weight this pad can support, so be careful around big kitties and dogs. Children should probably be watched around this thing, too. In fact, why don’t you put it away in a drawer when you’re not using it?

The Wacom Bamboo Touch might disappoint folks used to Apple’s current multitouch offerings, but it’s a great addition to my computing arsenal. At only $69, it does everything I’d expect a desktop touchpad to do. Budding artists out there may opt for the Bamboo Pen (also $69, but no touch features) or the Bamboo Pen & Touch ($99).