You are viewing dblume

Getting Shit Done

pixelart
I came across an old LifeHacker article Get Shit Done Blocks Distracting Web Sites So You Can Do As the Name Instructs, that mentions a productivity script, get-shit-done. So I went to the github repository only to discover that the Python script was no longer functional. I didn't have permission to fix the script at the site, so here's a quick note-to-self to get around to that. Until then...

[Edit]: There's a better, live branch at GitHub. Use leftnode's version.

Here's an archive of my changes before I discovered leftnode's branch.


  • The syntax errors were fixed. (Like the "play" string in the dictionary at the end that was supposed to be the play method.)

  • It's more Windows friendly. Changing the "hosts" file doesn't require a network driver restart in Windows.

  • You no longer have to specify an argument, "work" or "play." It'll just toggle modes every time you run it now.

  • When reverting to "play" mode, it retains whatever was in the hosts file after the endToken now.

  • I removed the bit of code that optionally reads in from an ini file. The ini file format (the keys, in particular) struck me as awkward.



#!/usr/bin/env python

import sys
import getpass
import subprocess
import os

def exit_error(error):
    print >> sys.stderr, error
    exit( 1 )

restartNetworkingCommand = ["/etc/init.d/networking", "restart"]
# Windows users may have to set up an exception for write access
# to the hosts file in Windows Security Essentials.
hostsFile = '/etc/hosts'
startToken = '## start-gsd'
endToken = '## end-gsd'
siteList = ['reddit.com', 'forums.somethingawful.com',
            'somethingawful.com', 'digg.com', 'break.com',
            'news.ycombinator.com', 'infoq.com',
            'bebo.com', 'twitter.com', 'facebook.com',
            'blip.com', 'youtube.com', 'vimeo.com',
            'flickr.com', 'friendster.com', 'hi5.com',
            'linkedin.com', 'livejournal.com',
            'meetup.com', 'myspace.com', 'plurk.com',
            'stickam.com', 'stumbleupon.com', 'yelp.com',
            'slashdot.com', 'lifehacker.com',
            'plus.google.com', 'gizmodo.com']

def rehash():
    if sys.platform != 'cygwin':
        subprocess.check_call(restartNetworkingCommand)

def work():
    with open( hostsFile, 'a' ) as hFile:
        print >> hFile, startToken
        for site in siteList:
            print >> hFile, "127.0.0.1\t" + site
            if site.count( '.' ) == 1:
                print >> hFile, "127.0.0.1\twww." + site
        print >> hFile, endToken
    rehash()

def play( startIndex, endIndex, lines ):
    with open(hostsFile, "w") as hFile:
        hFile.writelines( lines[0:startIndex] )
        hFile.writelines( lines[endIndex+1:] )
    rehash()

if __name__ == "__main__":
    if sys.platform != 'cygwin' and getpass.getuser() != 'root':
        exit_error( 'Please run script as root.' )

    # Determine if our siteList is already present.
    startIndex = -1
    endIndex = -1
    with open(hostsFile, "r") as hFile:
        lines = hFile.readlines()
        for i, line in enumerate( lines ):
            line = line.strip()
            if line == startToken:
                startIndex = i
            elif line == endToken:
                endIndex = i

    if startIndex > -1 and endIndex > -1:
        play( startIndex, endIndex, lines )
    else:
        work()
 
pixelart
My list of addresses has made its way from a physical address book, to a Palm Pilot, to Microsoft Outlook, to Google Contacts to the iPhone Contacts app. Along the way, each of the transitions has played fast and loose with the mappings of the individual fields.

Nowadays, the normal way for a Windows user to export contacts from an iPhone is to sync between the iPhone and Microsoft Outlook, then export from Outlook to a CSV file. I hate having to go through that middleman.

I prefer to take stronger ownership of my own data, and have settled on Google Contacts as the primary place for my data. There are a few reasons. I really like the open "group" (tagging) feature of contacts. I like that they have a public API for accessing and manipulating contacts. But most important is the ability to easily export and import contacts. As Joel Spolsky noticed ten years ago, a good way to get me to try a service, is to make it easy for me to change my mind and leave the service.

By default only the contacts group named "My Contacts" will sync with the iPhone when you set up an Exchange sync between the two end points. That suits my purposes just fine.

I've taken some time to groom my contacts list for Google Contacts. Here are some notes from that experience:

When you export contacts, use the Google CSV format. Your contacts will be exported to a UTF-16 file, and all the special characters you use will be retained. If you choose Outlook CSV format, then the file generated will be 8-bit regardless of the characters used in your contact list, and characters that don't map to 8-bit characters will be changed to question marks. So for 安室奈美恵's sake, choose Google CSV format.

Most people edit their CSV files in a spreadsheet editor. That's fine, but I don't trust my own eyes and hands to get everything right, so I prefer to do batch editing programmatically.

If you want to do some batch processing of the CSV file in Python, here are some snippets. These snippets have been pared down to the essentials, and don't represent good coding practices.

To read in the Google CSV:

# unicode_csv_reader and UnicodeWriter are provided
# in the documentation from the cvs module.

f = unicode_csv_reader(codecs.open( 'google.csv', 'r', 'utf-16' ))

headings = f.next()
col = {}
for i in range(len(headings)):
    col[headings[i]] = i

rows = []
for row in f:
    rows.append(row)

If you want to print out a list of contacts sorted by last name:

rows.sort(key=operator.itemgetter(col['Family Name']))

Beverly Howard suggests that contacts with no "Name" field but only a "Company" field won't sync with Outlook.
You could check for that:

if row[col['Organization 1 - Name']] and not row[col['Name']]:
    # Ensure these get synced across all devices, some don't!

Finally, after you've made your batch changes, you can write them back to a UTF-16 file like so:

out_file = open( 'google_out.csv', 'wb' )
google_out = UnicodeWriter( out_file, encoding='utf-16' )
google_out.writerow( headings )
google_out.writerows( rows )

It's been a long time coming, but I'm glad I've got more of my data in a place where it's easy for me to get it out and manipulate it any way I like.
pixelart
It really frustrates me to have to follow up Amazon Still Hates Your Life(stream) with more poor decisions by the eCommerce leader and giant.

In late October 2010, Amazon removed the ListLookup operation from the AWSECommerceService Service. When I noticed it, I tweeted about it. I was sorta late to the game, though. Amazon announced they would deprecate the service in June 2010. The thing is, they didn't deprecate the service. They removed it. Trying to call it now results in an HTTP Error 410, and the text, "Gone."

What's that mean? Well...

That's the function to access your wishlists at Amazon. Some people wrote blogging engine tools or phone apps that access Amazon wishlists. The more people that make easier access to wishlists, the more items that get purchased. You'd think Amazon would like that, right?

You'd be wrong. Amazon unplugged the mechanism they wrote to programmatically access their wishlists. I really can't see a justifiable reason behind it. It may have been "little used" by their standards, but it couldn't have been that hard to maintain.

You can blame Amazon for making their own wishlists even less accessible and useful now.
pixelart
Dreamhost offers a personal backup service.  Here are some notes on what I did to get it working from my Microsoft Windows Vista system.

Install All Required Cygwin Modules


First, I installed any missing required modules for cygwin.  At first, I had trouble with ssh, because it complained it couldn't find cygssp-0.  Using cygcheck confirmed the missing library.

~$ cygcheck ssh
Found: C:\cygwin\bin\ssh.exe
Found: C:\cygwin\bin\ssh.exe
Found: C:\cygwin\bin\ssh.exe
C:\cygwin\bin\ssh.exe
  C:\cygwin\bin\cygcrypto-0.9.8.dll
    C:\cygwin\bin\cygwin1.dll
      C:\Windows\system32\ADVAPI32.DLL
        C:\Windows\system32\ntdll.dll
        C:\Windows\system32\KERNEL32.dll
        C:\Windows\system32\RPCRT4.dll
    C:\cygwin\bin\cygz.dll
      C:\cygwin\bin\cyggcc_s-1.dll
cygcheck: track_down: could not find cygssp-0.dll

Once I installed libsso0, rsync was working correctly, but required a password to be entered.

Setup Passwordless Login


Setting up passwordless login was really easy. In an sftp connection, I made a .ssh directory from my home backup server's directory, copied my cygwin's .ssh/id_dsa.pub to the new .ssh directory and renamed the file to authorized_keys.

Determining Directories and Files to Copy


I made an exclusion list of files not to backup and named that file excl.txt. Here's what the file contains:

*.obj
*.tmp
*.sbr
*.ilk
*.pch
*.pdb
*.idb
*.ncb
*.opt
*.plg
*.aps
*.dsw
*.pyc
*.pyd
*.bsc
*.pdb
*.projdata
*.projdata1
.svn/
.git/
.bzr/
.hg/

Then I made a shell script called, backup_to_dreamhost.sh, to backup only certain directories:

#!/bin/bash
rsync -e ssh -avz --exclude-from=excl.txt /cygdrive/c/Users/David/Documents user@b.dh.com:~/David
rsync -e ssh -avz --exclude-from=excl.txt /cygdrive/c/Users/David/Downloads user@b.dh.com:~/David
rsync -e ssh -avz --exclude-from=excl.txt /cygdrive/c/Users/David/Pictures user@b.dh.com:~/David
rsync -e ssh -avz --exclude-from=excl.txt /cygdrive/c/Users/David/Music user@b.dh.com:~/David
rsync -e ssh -avz --exclude-from=excl.txt --exclude=Pinnacle/ /cygdrive/c/Users/Public/Documents user@b.dh.com:~/Public
...

Note that occasionally I have to exclude some directories that contain massive amounts of video or temporary files. I can't copy my pictures and videos without taking up more than the free 50GB space allocated for backups.

Making Automatic Backups


I ran the shell script from within cygwin, and it worked. But now, how to make Vista run it? I created a DOS Batch script called, backup_to_dreamhost.bat. It contains only one line:

C: && chdir C:\cygwin\home\username && C:\cygwin\bin\bash --login ~/backup_to_dreamhost.sh

And from the Windows Task Scheduler, I created a recurring task that runs backup_to_dreamhost.bat.

Tags:

pixelart
Four services I use changed their APIs on me last week.  Four.  What the hey, Internet?

TechCrunch

They migrated to the Disqus commenting system.  In the process of doing so, they broke a feature of their RSS feed.  Their feed has the <slash:comments> element for each item, and it used to contain the correct number of comments.

I'm too busy to have to read each TechCrunch article's title to evaluate whether I should read the article. So I wrote a recommendation engine. The number of comments each article accrues is one of the criteria my cron job uses to evaluate TechCrunch articles.  

TechCrunch broke their <slash:comments> element.  It's still there, but it always evaluates to zero.  I fixed my cron job to go get the comment count directly from disqus instead.

TechCrunch should fix their broken feed anyway.  It's not cool to lie in your RSS feed.


NetFlix


Netflix changed the format of their movie URLs.  In some places.  In their new releases RSS feed, the movie URLs separate words in the tile with hyphens, like so:

http://www.netflix.com/Movie/Harry-Brown/70117310
But their actual API, like api.netflix.com/catalog/titles, returns movie titles with underscores separating the words in the title.

http://www.netflix.com/Movie/Harry_Brown/70117310
I'm too busy to have to look up a bunch of movies to decide which to rent, so I have a cron job evaluate each week's new releases with NetFlix's personal predicted rating for me.  By changing the format of their URLs in one service, but not the other, they broke my cron job that matches movies in the feed to their corresponding IMDB ratings.  It took me a while to figure out exactly what it was that broke my service!

I fixed that by having my cron job do a fuzzy match that matches to words separated by either hyphens or underscores.

Digg

They did a major overhaul when they released V4.  I'm not really interested in the debate over what's better and what's worse.  I'm interested in what they broke.

They broke their user history feeds.  They used to support personal feeds for their users, so that you could easily see what your friends dugg, like so:

http://digg.com/users/dblume/history.rss

Around August 25th, they changed the nature of the feed to also include everything from the people that that user follows.  So instead of being a concise personal history, it became a huge mess.  The next day, they turned off the service altogether.

By changing the nature of the feed, not to mention turning it off altogether, they broke the digg component of my personal lifestream.

Digg should restore the history feeds.  They were useful.  And it's bad form to break services that you used to provide.

Twitter

Twitter turned off basic authentication and left OAuth as the only alternative.  They announced the transition, and gave developers a long time to prepare for it.  It's a good thing.

Sadly, I was using basic authentication to munge together two of their feeds into one, for inclusion into my feed reader.

http://user:password@twitter.com/statuses/home_timeline.rss
http://user:password@twitter.com/statuses/mentions.rss

So for me, all Twitter activity suddenly disappeared one day.  It took me a while to realize that I'd forgotten to migrate my feed collator's authentication from basic to OAuth.  So I went ahead and made the fix.

Oh, the awesome thing about making a certified OAuth App for twitter?  I can integrate it into my dead man's switch.  Maybe I'll tweet from beyond the grave.

Phew!

In one week, four external services broke four of my personal services.  It felt like so much household maintenance: The toilet broke, or the grass needs mowing. The upside is that in fixing each of these personal services, I added to my skill set.

How To Efficiently Waste Time

pixelart
Step One: Stop wasting time all willy-nilly.  Decide when to do it, and stick to your decision.
Step Two: Mute your friends who post, tweet or plurk without any substance (or update too frequently), but are still awesome. Save those streams for time-wasting time.  Don't add their feeds to your feed reader.  Make it so you have to force yourself to type in their account's URL.

Here are a couple of my favorite sites:

Here's a new site that has potential: Trending items on Facebook without actually having to go to Facebook.

  • It's Trending (Not sure yet how frequently the content updates, though.)

Tags:

Scrap of paper: NetGear WNDR3700 + WD TV Live

pixelart
 I've got a scrap of paper on my desk I need to clear away.  So I'm filing it here.  Eventually, I'm going to replace my Home WDS (made from a couple of WRT-54Gs with DD-WRT installed).

Somebody on a podcast (Maybe NPR's Science Friday) around Christmas time recommended looking into the Netgear WNDR 3700 and the Western Digital WD TV Live.  But then again, the TWiT folks at CES just mentioned the Boxee Box and one other product they really liked.  (Note to self: It'll probably be transcribed here when the transcription is eventually done.)

Of course, in a couple of weeks, I'll probably be fawning over a certain slate computer.  We'll see.  But for now:

Scrap of paper? Discarded.
Information that was on it? Much more retrievable and actionable.

Tags:

pixelart
Finally, I downloaded the Chrome Beta because the Beta supports extensions for the following critical features:
  • RSS Feed Detection - I needed single-click access to this
  • IE Tab - Some corporate sites only work with IE.
  • XMarks Sync - I have a tough decision to make with XMarks, now, since Chrome (Beta) supports native bookmarks sync.

There are other really attractive extensions, too. I may fall in love with some like these:
  • Google Translate
  • Google Wave Notifier

Tags:

The HeisenTwitter Uncertainty Principle

pixelart
[EDIT: Twitter later stabilized @reply visibility:  Users now see @replies when the originator and recipients are known to the user, and don't see @replies otherwise.]

On Twitter, we'd all better second-guess what we see in our friends list of tweets.  Even if you want your public replies to be visible, be aware:

Your friends won't see your public replies to other friends anymore.
Or, when they can, they can't tell to which tweet you replied.

That's a lot like the Heisenberg Uncertainty Principle. That's where you can't precisely know a particle's position and momentum at the same time. You can know either one with high precision, but not both.

At Twitter, you can either see your friends' replies, or know which tweet they were replying to. But not both.

It didn't used to be like that. If we wanted to see our friends' replies, they'd simply show up in our list of tweets, with a handy link to the tweet it's a reply to. You'd see your friends' replies, and you could click-through to see what they were replying to.

The Good Old Days
The Good Old Days - See replies and know what they replied to.


That was nice. Twitter changed that. Now, the only tweets that show up in your list of friends' tweets don't have that handy link. So you can see that they replied to something, but you can't know for sure which tweet they replied to.


Keep that in mind when you want to reply to a tweet. You have a choice to make. If you want your friends to see your reply (isn't that the point of Twitter?), then you'd better type "@username" instead of click the reply-button under the star. Or, if you want your reply to be linked to the tweet it's a reply to, you should click on the reply-button under the star.  You can't have both.


Having to make that choice sucks. Welcome to the HeisenTwitter Uncertainty Principle. This is why I like Plurk.

Tags:

pixelart
I haven't really seen this spelled out in a really clear manner on the web, so let me help out. YouTube's got useful programmatic feeds. You can specify a feed for a user's videos like:

http://gdata.youtube.com/feeds/base/users/communitychannel/uploads

You can also specify the order in which you want the videos.  Note the "orderedby" parameter in the urls that follow:

http://gdata.youtube.com/feeds/base/users/communitychannel/uploads?orderby=updated
http://gdata.youtube.com/feeds/base/users/communitychannel/uploads?orderby=published

So far, so great. Now, suppose you want to make a lifestream, and you want to include the videos that you've favorited. They've got a feed for that, too:

http://gdata.youtube.com/feeds/api/users/davidblume/favorites

But it's not right.  If you look at the data you get back, you see that it's not what you wanted. Those videos are going to be associated with the timestamp with which they were updated or published, not the time that you favorited them. And that's the time that matters to your lifestream! Given the way the programmatic feeds are organized, you'd think that there's a way to specify that, and that feed would be as follows, right? --

http://gdata.youtube.com/feeds/api/users/davidblume/favorites?orderby=favorited

Nope. After living with a workaround in my lifestream for months, only today do I learn that YouTube did create the feed I needed, but calls it this: v=2. Yeah, like that jibes with their feed explanation.

Lifestream writers, the favorites feed (ordered by time favorited) that you want is constructed like this:

http://gdata.youtube.com/feeds/api/users/username/favorites?v=2

(Replace "username" with your username, of course.)  Now I can go delete my workaround.