You are viewing dblume

Getting Shit Done

pixelart
I came across an old LifeHacker article Get Shit Done Blocks Distracting Web Sites So You Can Do As the Name Instructs, that mentions a productivity script, get-shit-done. So I went to the github repository only to discover that the Python script was no longer functional. I didn't have permission to fix the script at the site, so here's a quick note-to-self to get around to that. Until then...

[Edit]: There's a better, live branch at GitHub. Use leftnode's version.

Here's an archive of my changes before I discovered leftnode's branch.


  • The syntax errors were fixed. (Like the "play" string in the dictionary at the end that was supposed to be the play method.)

  • It's more Windows friendly. Changing the "hosts" file doesn't require a network driver restart in Windows.

  • You no longer have to specify an argument, "work" or "play." It'll just toggle modes every time you run it now.

  • When reverting to "play" mode, it retains whatever was in the hosts file after the endToken now.

  • I removed the bit of code that optionally reads in from an ini file. The ini file format (the keys, in particular) struck me as awkward.



#!/usr/bin/env python

import sys
import getpass
import subprocess
import os

def exit_error(error):
    print >> sys.stderr, error
    exit( 1 )

restartNetworkingCommand = ["/etc/init.d/networking", "restart"]
# Windows users may have to set up an exception for write access
# to the hosts file in Windows Security Essentials.
hostsFile = '/etc/hosts'
startToken = '## start-gsd'
endToken = '## end-gsd'
siteList = ['reddit.com', 'forums.somethingawful.com',
            'somethingawful.com', 'digg.com', 'break.com',
            'news.ycombinator.com', 'infoq.com',
            'bebo.com', 'twitter.com', 'facebook.com',
            'blip.com', 'youtube.com', 'vimeo.com',
            'flickr.com', 'friendster.com', 'hi5.com',
            'linkedin.com', 'livejournal.com',
            'meetup.com', 'myspace.com', 'plurk.com',
            'stickam.com', 'stumbleupon.com', 'yelp.com',
            'slashdot.com', 'lifehacker.com',
            'plus.google.com', 'gizmodo.com']

def rehash():
    if sys.platform != 'cygwin':
        subprocess.check_call(restartNetworkingCommand)

def work():
    with open( hostsFile, 'a' ) as hFile:
        print >> hFile, startToken
        for site in siteList:
            print >> hFile, "127.0.0.1\t" + site
            if site.count( '.' ) == 1:
                print >> hFile, "127.0.0.1\twww." + site
        print >> hFile, endToken
    rehash()

def play( startIndex, endIndex, lines ):
    with open(hostsFile, "w") as hFile:
        hFile.writelines( lines[0:startIndex] )
        hFile.writelines( lines[endIndex+1:] )
    rehash()

if __name__ == "__main__":
    if sys.platform != 'cygwin' and getpass.getuser() != 'root':
        exit_error( 'Please run script as root.' )

    # Determine if our siteList is already present.
    startIndex = -1
    endIndex = -1
    with open(hostsFile, "r") as hFile:
        lines = hFile.readlines()
        for i, line in enumerate( lines ):
            line = line.strip()
            if line == startToken:
                startIndex = i
            elif line == endToken:
                endIndex = i

    if startIndex > -1 and endIndex > -1:
        play( startIndex, endIndex, lines )
    else:
        work()
 
pixelart
Yesterday my daughter in the fifth grade got the following homework assignment "arrange the digits one through nine into a nine-digit prime number." (Note, since zero wasn't included, it's not really a pandigital number.)

So I asked her how she'd start. She started the way I'd want her to, by excluding the digits 2, 4, 5, 6, and 8 from the units place. And then...

...we got nuthin'.

What did the teacher want? Could we use the computer to test answers? Did the teacher teach some tricks I don't know about? Maybe.

After trying and failing to construct a few nine-digit nearly pandigital prime numbers, I finally gave into every programmer's temptation.

The brute-force tactic! Test them all! In Python, it looks like this:

#!usr/bin/python
import itertools

l = '123456789'
for p in itertools.permutations( l ):
    n = int( ''.join(p) )
    if isprime( n ): # find an implementation on the web
        print "Found it!", n
        break
 

We ran it and... What the hey‽ There isn't any such prime‽ What kind of stunt is this teacher trying to pull?

Fire in the Belly

pixelart
My pet projects seem to choose themselves, as if I had nothing to do with it. Yesterday, one sunk its teeth into me and won't let go. Tenacious little bugger.

Maybe I should say my muse gave me a present for New Year's. But it doesn't feel like that. When a project like this, no matter how small, takes hold, there's little else you can do. You can't enjoy sitting with a cup of coffee and a book. You can't enjoy working out, or watching a video. You're compelled to explore the idea.

This time, the project came from a comment that Sjon Svenson left in my last entry, Happy Birthday, Me. I got you data portability.

I was happy that I'd completed the migration of my contacts to Google Contacts, but Sjon mentioned that a hardcopy of his contacts survived an electronic device failure, and the hardcopy is the version that the whole family maintains now.

The beauty of that is that my brother can update that just as well, and his wife can -and does-.


That is really handy. So it's something that I'm going to implement across my family's Google Contacts accounts. We'll use the tags to designate with whom we want to share the contacts, and only those will be synced across accounts. Everybody will benefit from the improvements anybody else makes. It'll also keep a history of changes, in case changes or merges were incorrect.

It's very similar to a private wiki, or a shared Docs page, but it's important that this is done directly to the contacts that get synced to all of our mobile devices, too.

It seems like a pretty obvious project. If there's already an open source project for this, or if this is already your 20% project, leave me a comment.

[Update 2011-01-03]: Drat. The proper way to do this, with OAuth and the contacts stream in the official feed won't work. It doesn't enumerate all the fields available when you export contacts. I can still do what I want as a one-off, because I can have my script get the exported CSV values from the accounts I need access to, but I dearly hate code that scrapes or otherwise does actions meant for humans. I wish Google provided (or made apparent how to get) all the data, every single field, for each contact in the API's feed.

Tron and the Realm of Fantastic Opportunity

pixelart
Around 1982, I was a teenager working off-the-books at the local arcade. I took out the trash, cleaned the front windows, the bathroom, and the machines for a handful of bills that'd usually get converted to tokens and fed right back into the machines.

When I went to church, I had a trick to stay awake. I imagined playing the upper levels of Tempest.



Planning judicious use of the Superzapper is enough to get you through even the most boring sermons.

That was the year that Tron came out. And Tron spoke to me personally. I had an Apple ][+ and loved programming on it. Computers were full of so much potential, and Flynn knew how to take advantage of that. I wanted to be him.

That helped set my life on its course. I went to college, studied Computer Science, and got a job as a Programmer, just like Flynn.

I'm still a programmer, and I watch over a village of cron jobs and daemons who report to me how they're doing, and have permission to email me when they get into trouble with the other programs on the net.

Now it's 2010, and Tron Legacy has come out. Nobody's mistaking it for high art.

Tron Legacy did get some things right. In both the original and the sequel, life in the Grid was surreal, and I mean surreal etymologically. Life in the grid was above or beyond reality. In 1982's Tron, artists hand-painted the digital glow to programs in the Grid, and in 2010's Tron, only the scenes in the Grid were 3D.



There was reality, which was normal; and then there was the grid, where its reality was capable of more. I still feel that way about computers and the Internet. Nearly anything is possible.

Computers remain the realm of fantastic opportunities.

Tags:

pixelart
It really frustrates me to have to follow up Amazon Still Hates Your Life(stream) with more poor decisions by the eCommerce leader and giant.

In late October 2010, Amazon removed the ListLookup operation from the AWSECommerceService Service. When I noticed it, I tweeted about it. I was sorta late to the game, though. Amazon announced they would deprecate the service in June 2010. The thing is, they didn't deprecate the service. They removed it. Trying to call it now results in an HTTP Error 410, and the text, "Gone."

What's that mean? Well...

That's the function to access your wishlists at Amazon. Some people wrote blogging engine tools or phone apps that access Amazon wishlists. The more people that make easier access to wishlists, the more items that get purchased. You'd think Amazon would like that, right?

You'd be wrong. Amazon unplugged the mechanism they wrote to programmatically access their wishlists. I really can't see a justifiable reason behind it. It may have been "little used" by their standards, but it couldn't have been that hard to maintain.

You can blame Amazon for making their own wishlists even less accessible and useful now.

Remote Backup Doublechecker

pixelart
My Remote Backup Script is working nicely. After the backup, it writes some status to a file, "rsync_completed.txt."

But I noticed a few days ago that one of my backups didn't run. That's probably because the computer wasn't on at the designated time, or possibly because nobody was logged in, or that the system was too busy to run that particular task.

In any case, I wrote a Remote Backup Occurred Doublechecker. It runs every time I log in.  Because I'm crazy. I thought about making it a DOS batch script, but went with Python because it'd be faster for me that way.

And just to reveal my craziness, here it is (mostly):

try:
    rsync_file = os.path.join(my_root, "rsync_completed.txt")
    scriptname = sys.argv[0]
    if os.sep in scriptname:
        scriptname = scriptname.rsplit(os.sep, 1)[1]
    ask_user = False
    if not os.path.exists(rsync_file):
        import win32con
        ask_user = True
        msg = "Could not verify backup with file %s. Backup now?" % rsync_file
        flags = win32con.MB_ICONWARNING | win32con.MB_YESNO
    else:
        mtime = os.path.getmtime(rsync_file)
        dur = datetime.timedelta(seconds=time.time() - mtime)
        if dur.days > 2:
            import win32con
            ask_user = True
            msg = "It's been %s days since the last backup. Backup now?" % dur
            flags = win32con.MB_ICONQUESTION | win32con.MB_YESNO
    if ask_user:
        import win32ui
        response = win32ui.MessageBox(msg, scriptname, flags)
        if response == 6:
            import subprocess
            cmd = os.path.join(my_root, "backup_to_dreamhost.bat")
            subprocess.Popen(cmd)
except Exception, e:
    f = file(os.path.join(my_root, "Desktop", "%s Fail.txt") % scriptname, 'w')
    f.write("An exception occurred: %s %s\n" % (str(e.__class__), str(e)))
    traceback.print_exc(file = f)
    f.close()
pixelart
Four services I use changed their APIs on me last week.  Four.  What the hey, Internet?

TechCrunch

They migrated to the Disqus commenting system.  In the process of doing so, they broke a feature of their RSS feed.  Their feed has the <slash:comments> element for each item, and it used to contain the correct number of comments.

I'm too busy to have to read each TechCrunch article's title to evaluate whether I should read the article. So I wrote a recommendation engine. The number of comments each article accrues is one of the criteria my cron job uses to evaluate TechCrunch articles.  

TechCrunch broke their <slash:comments> element.  It's still there, but it always evaluates to zero.  I fixed my cron job to go get the comment count directly from disqus instead.

TechCrunch should fix their broken feed anyway.  It's not cool to lie in your RSS feed.


NetFlix


Netflix changed the format of their movie URLs.  In some places.  In their new releases RSS feed, the movie URLs separate words in the tile with hyphens, like so:

http://www.netflix.com/Movie/Harry-Brown/70117310
But their actual API, like api.netflix.com/catalog/titles, returns movie titles with underscores separating the words in the title.

http://www.netflix.com/Movie/Harry_Brown/70117310
I'm too busy to have to look up a bunch of movies to decide which to rent, so I have a cron job evaluate each week's new releases with NetFlix's personal predicted rating for me.  By changing the format of their URLs in one service, but not the other, they broke my cron job that matches movies in the feed to their corresponding IMDB ratings.  It took me a while to figure out exactly what it was that broke my service!

I fixed that by having my cron job do a fuzzy match that matches to words separated by either hyphens or underscores.

Digg

They did a major overhaul when they released V4.  I'm not really interested in the debate over what's better and what's worse.  I'm interested in what they broke.

They broke their user history feeds.  They used to support personal feeds for their users, so that you could easily see what your friends dugg, like so:

http://digg.com/users/dblume/history.rss

Around August 25th, they changed the nature of the feed to also include everything from the people that that user follows.  So instead of being a concise personal history, it became a huge mess.  The next day, they turned off the service altogether.

By changing the nature of the feed, not to mention turning it off altogether, they broke the digg component of my personal lifestream.

Digg should restore the history feeds.  They were useful.  And it's bad form to break services that you used to provide.

Twitter

Twitter turned off basic authentication and left OAuth as the only alternative.  They announced the transition, and gave developers a long time to prepare for it.  It's a good thing.

Sadly, I was using basic authentication to munge together two of their feeds into one, for inclusion into my feed reader.

http://user:password@twitter.com/statuses/home_timeline.rss
http://user:password@twitter.com/statuses/mentions.rss

So for me, all Twitter activity suddenly disappeared one day.  It took me a while to realize that I'd forgotten to migrate my feed collator's authentication from basic to OAuth.  So I went ahead and made the fix.

Oh, the awesome thing about making a certified OAuth App for twitter?  I can integrate it into my dead man's switch.  Maybe I'll tweet from beyond the grave.

Phew!

In one week, four external services broke four of my personal services.  It felt like so much household maintenance: The toilet broke, or the grass needs mowing. The upside is that in fixing each of these personal services, I added to my skill set.

"Random" Notes

pixelart
I carpool with a friend in a nearby building to the climbing gym. To decide who drives, we play email roshambo. The HTML code for the Rock/Paper/Scissors choice is presented below:

<SELECT name="p_throw">
<OPTION value="r">Rock</OPTION>
<OPTION value="p">Paper</OPTION>
<OPTION value="s">Scissors</OPTION>
</SELECT>

We've done this for years, and we each end up driving about 50% of the time, with a few short streaks breaking up the routine. I decided it was too much effort to have to actually choose rock, paper or scissors if I didn't want to. It'd be nice for the website to randomly suggest one for me.

So I added the following PHP code:

$suggest = rand(0, 2);
<SELECT name="p_throw">
<OPTION <? if ($suggest == 0) echo "SELECTED "; ?>value="r">Rock</OPTION>
<OPTION <? if ($suggest == 1) echo "SELECTED "; ?>value="p">Paper</OPTION>
<OPTION <? if ($suggest == 2) echo "SELECTED "; ?>value="s">Scissors</OPTION>
</SELECT>

There, now the web page will suggest a random throw when it loads. How convenient!

Except that my friend started destroying me in our challenges. From December 2009 through March 2010, he began winning over 75% of our challenges, and I had to keep driving the carpool. This was costing me money!

It turns out that while I was using the suggested random throw, he had a different strategy. He considered his suggested throw, and made the throw that would beat it.

So there was a correlation between the random throw suggested for me, and the one suggested for him! Even though our suggestions are generated from different pages (mine from index.php because I start the challenge, and his from throw.php because he responds to a challenge), the random number generator runs on the same server.

It's just not random enough, and it exposed an exploit that was costing me money!

There's a simple fix for this, use mt_rand() instead of rand(). So I made the following change:

$suggest = mt_rand(0, 2);
Although I made the change to mt_rand(), the fact that the pseudo-random numbers were being made by the same physical generator bothered me. I decided that instead of generating the suggested throw on the server, it'd be best to generate the suggestion at the client computer, in Javascript. So I wrote the following code and deployed that:

function Set_suggested_throw()
{
var randomnumber = Math.floor( Math.random() * 3 )
document.rpsform.p_throw.selectedIndex=randomnumber
}
</head>
<body onLoad="Set_suggested_throw()">
That's better. Now his random suggestion is generated on an entirely different machine than mine...

Except it bothered me that both random numbers were seeded in a similar fashion, and there'd often be a constant offset between the localtime and uptimes of both machines. This wouldn't do. I needed better randomness. Luckly, there's a site for that.

Great! I'll have my Javascript make a quick call to random.org to make the suggestion.

function Set_suggested_throw()
{
var xmlhttp = null;
if (window.XMLHttpRequest) {
xmlhttp = new XMLHttpRequest();
if ( typeof xmlhttp.overrideMimeType != 'undefined') {
xmlhttp.overrideMimeType('text/xml');
}
} else if (window.ActiveXObject) {
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
}
// Only if quota not exceeded. See http://www.random.org/clients/
xmlhttp.open( 'GET',
'/rnd_org/integers/?num=1&min=0&max=2&col=1&base=10&format=plain&rnd=new',
true );
xmlhttp.send( null );
xmlhttp.onreadystatechange = function() {
if ( this.readyState == 4 && this.status == 200 ) {
document.contactform.p_throw.selectedIndex=this.responseText
}
}

}
Making a call to random.org from a page generated at rps.dlma.com runs afoul of the Same Origin Policy. My server is a Linux server, so all I have to do is add a ProxyPass to my httpd.conf and restart it.

LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so

ProxyPass /rnd_org http://www.random.org

A downside is that the proxy content can be cached, so I'd have to be sure to disable that.

Even worse is that my server is a shared server running at DreamHost, and I don't have access to httpd.conf. And one can't specify a ProxyPass in a .htaccess file either.

So the work-around is to have the PHP code make the call to random.org. Great! Just code that sucker up, ensure that we don't spam random.org by checking our quotas, and falling back on the Javascript implementation if we do go over quota. I won't show you all that, just the PHP snippet.

$ctx = stream_context_create( array( 'http' => array( 'timeout' => 5 ) ) );
$url = "http://www.random.org/integers/?num=1&min=0&max=2&col=1&base=10&format=plain";
$result=file_get_contents( $url, 0, $ctx );
if ( !is_bool( $result ) || $result != false ) {
// set $suggest }
Great! The next time we played Email Roshambo, I won! Phew! As he drove me to the gym, I asked him how he'd been since I last saw him.

He said, "Oh, I've had better Fridays. I was RIFfed. We won't be carpooling anymore."

Amazon Still Hates Your Life(stream)

pixelart
Wow, it's hard to believe that even in 2010 the good folks at Amazon are still blocking word-of-mouth advertising via lifestream.

A little history: A lifestream is a voluntary record of your activities online.  Generally they're made from Atom and RSS feeds.  Those feeds generally are lists of dates and urls, so that you can make a note that "at that time, I was there."

Smart companies make user activity feeds for you.  Like FourSquare, GoodReads, Blippr.  They provide the feeds, and if you use them, then your friends see what you want to share, and they might want to spend money in that direction, too.  Everybody wins.

Amazon doesn't want to win. Not in this way.

It seems they never did.

Even when they bought Shelfari, they refused to create user activity feeds for two years, and counting.  We asked for it.  And well over 50 replies later, Amazon/Shelfari has nothing to show for it.

Amazon's got wishlists, right?  Isn't the whole point of a wishlist that you share it with friends?  A wishlist is a natural for a lifestream, too.  It'd be handy to make note that "back in 2006, I wanted that book on Winning With Subprime Mortgages."

Amazon never made it easy to associate the time you added an item to your wishlist.  Which is dumb.  The data is there.  And user feeds should reflect user activity.  We care about when we were doing stuff.

Well, I fixed that problem for Amazon once. (Same link as the last one, just under the Amazon logo.)

But they broke it again.  Now the URL in the old fix needs authentication.  And it's even harder to extract the date from their services' results for your feed.  There are a few Yahoo Pipes that list the few most recent wishlist items, but none of them associate the items with when you wished for them.

I fixed the problem again.  Here's how to make an Amazon Wishlist Feed, complete with when you wished for each item.

def Make_amazon_wishlist_feed( access_key, secret_key, list_id ):
domain = "xml-us.amznxslt.com"
base_url = "http://%s/onca/xml" % domain
params = [ ( 'AWSAccessKeyId', access_key ),
( 'ListId', list_id ),
( 'ResponseGroup', 'ListFull' ),
( 'ListType', 'WishList' ),
( 'Operation', 'ListLookup' ),
( 'Service', 'AWSECommerceService' ),
( 'Sort', 'DateAdded' ),
( 'Timestamp', time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime() ) ) ]
params.sort()
url_params_string = urllib.urlencode( params )
string_to_sign = "GET\n%s\n/onca/xml\n%s" % ( domain, url_params_string )
signature = hmac.new( secret_key, string_to_sign, hashlib.sha256 ).digest()
signature = base64.encodestring( signature ).strip()
urlencoded_signature = urllib.quote_plus( signature )
url_params_string += "&Signature=%s" % urlencoded_signature
url_handle = urllib2.urlopen( '%s?%s' % ( base_url, url_params_string ) )
response_dom = minidom.parse( url_handle )
for i in response_dom.getElementsByTagName( "ListItem" ):
pubDate = Get_node_text( i.getElementsByTagName( "DateAdded" )[0].childNodes )
title = Get_node_text( i.getElementsByTagName( "Title" )[0].childNodes )
url = "http://amazon.com/gp/product/%s" % \
Get_node_text( i.getElementsByTagName( "ASIN" )[0].childNodes )

Inside that "for" loop at the bottom, you can see the three identifiers you'll need for the individual items of your feed, the pubDate, the item's Title, and its URL. With that, you're good to go.

Seriously, it's ridiculous that Amazon doesn't just make a feed right there at your wishlist.  There should be the little orange feed indicator right next to the URL in the URL bar.

Scrap of paper: NetGear WNDR3700 + WD TV Live

pixelart
 I've got a scrap of paper on my desk I need to clear away.  So I'm filing it here.  Eventually, I'm going to replace my Home WDS (made from a couple of WRT-54Gs with DD-WRT installed).

Somebody on a podcast (Maybe NPR's Science Friday) around Christmas time recommended looking into the Netgear WNDR 3700 and the Western Digital WD TV Live.  But then again, the TWiT folks at CES just mentioned the Boxee Box and one other product they really liked.  (Note to self: It'll probably be transcribed here when the transcription is eventually done.)

Of course, in a couple of weeks, I'll probably be fawning over a certain slate computer.  We'll see.  But for now:

Scrap of paper? Discarded.
Information that was on it? Much more retrievable and actionable.

Tags: