@grammer_man who the fuck is this nigga and why u comin at me like that #Hoeassnigga


Posted: 2012-01-09 20:06   |  More posts about code computers funny idiots internet oddities

Had a spare hour last Thursday and decided to write a little twitter bot. There he is above. His name is Grammer_Man and he corrects other twitter users' misspellings, using data scraped from these Wikipedia pages.

Responses have been pouring in already, some agitated, some confused, but most positive -- which was a pleasant surprise. In any event, the minimal amount of effort in coding has paid off many times over in entertainment.

You can see who's responding at the moment by searching for @grammer_man, and also by checking his list of favourites.

Here is the (somewhat slapdash) code that powers our fearless spelling Nazi:

grabber.py

This module grabs the spelling data from Wikipedia.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import pickle

import requests
from BeautifulSoup import BeautifulSoup

def grab(letter):
    '''
    Grabs spellings from wikipedia
    '''
    url = 'http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/%s' % letter
    html = requests.get(url).content
    soup = BeautifulSoup(html)
    bullets = soup.findAll('li')
    retval = {}
    for bullet in bullets:
        if 'plainlinks' in repr(bullet):
            values = bullet.text.split('(')
            if len(values) == 2:
                retval[values[0]] = values[1][:-1] # shave off the ) at end
    return retval

def get_spellings():
    '''
    Returns a dictionary of {false: correct} spellings
    '''
    if not os.path.exists('words.pkl'):
        retval = {}
        for c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
            print 'Getting typos - %s' % c
            retval.update(grab(c))
        print 'Dumping...'
        f = open('words.pkl', 'w')
        pickle.dump(retval, f)
        f.close()
        return retval
    else:
        f = open('words.pkl', 'r')
        retval = pickle.load(f)
        f.close()
        return retval

if __name__ == '__main__':
    get_spellings()

bot.py

The bot. Selects misspellings at random, searches for them, responds to them, while also taking breaks between tweets and longer breaks every few hours.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import random
import time
import pickle

import twitter

from grabber import get_spellings

API = twitter.Api()

MESSAGES = u'''
Hey $USERNAME, didn't you mean $CORRECT there?
#
# All messages stored in here, one per line.
#
'''.split('\n')

def compose_message(twitter_post, mistake, correct):
    '''
    Choose a message from MESSAGES at random, substitute fields to personalise it and 
    check if it exceeds the twitter message limit. Try this 100 times before failing.
    '''
    retries = 0
    while retries < 100:
        message = MESSAGES[random.randint(0, len(MESSAGES) - 1)]
        message = message.replace('$USERNAME', '@%s' % twitter_post.user.screen_name)
        message = message.replace('$MISTAKE', '"%s"' % mistake).replace('$CORRECT', '"%s"' % correct)
        if message and len(message) < 141:
            return message
    return None

def correct_spelling(twitter_post, mistake, correct):
    '''
    Correct someone's spelling in a twitter_post
    '''
    print u'Correcting @%s for using %s...' %(twitter_post.user.screen_name, 
                                            mistake)
    message = compose_message(twitter_post, mistake, correct)
    if not message:
        print u'All messages were too long... Aborting...'
        return False
    else:
        failures = 0
        try:
            API.PostUpdate(message, in_reply_to_status_id=twitter_post.id)
        except Exception, e:
            print 'Failed to submit tweet (%s).'
            return False
        return True

def search(word):
    '''
    Search twitter for uses of a word, return one if it's been used recently.
    Otherwise return None.

    TODO: Add time awareness.
    '''
    print 'Searching for uses of %s...' % word
    results = API.GetSearch(word)
    if results:
        for result in results:
            if not check_if_done(result.id) and\
                not result.user.screen_name == 'grammer_man' and word in result.text:
                return result
    return None

def check_if_done(id):
    '''
    Checks if a tweet has already been responded to
    '''
    if os.path.exists('done.pkl'):
        f = open('done.pkl', 'r')
        done = pickle.load(f)
        f.close()
        if id in done:
            return True
    return False

def update_done(id):
    '''
    Updates a list of tweets that've been replied to
    '''
    if os.path.exists('done.pkl'):
        f = open('done.pkl', 'r')
        done = pickle.load(f)
        f.close()
    else:
        done = []

    done.append(id)

    f = open('done.pkl', 'w')
    pickle.dump(done, f)
    f.close()

def main():
    '''
    Main program flow
    '''
    words = get_spellings()
    counter = 0 
    while True:
        word = random.choice(words.keys())
        post = search(word)
        if counter > 100:
            rand_time = random.randint(120*60, 240*60)
            print 'Done %s tweets, sleeping for %s minutes' % (counter, rand_time/60)
            time.sleep(rand_time)
            counter = 0
        # TODO: PROPERLY PRUNE THE MISTAKES/CORRECTIONS FROM WIKIPEDIA AND REMOVE THIS:
        if not u',' in word + words[word] and not u';' in word + words[word]:
            if post:
                result = correct_spelling(post, word, words[word])
                if result:
                    counter += 1
                    print '#%s Done' % counter
                    update_done(post.id)
                    time.sleep(random.randint(300,500))


if __name__ == '__main__':
    main()

Grammer_Man uses the following libraries:

Comments


bigpicture Cataloguer 0.5 Released


Posted: 2011-01-23 20:49   |  More posts about code

Mainly a bug-fix release, with problems relating to HTML in captions fixed.

If you already downloaded the "Haiti: One Year Later" photo album, you might want to delete it and run this version of the cataloguer.

Download available here.

Comments


Bigpicture Cataloguer 0.4 Released


Posted: 2010-10-09 11:19   |  More posts about code

New version of The Big Picture Cataloguer available from here. Thanks for your patience; sorry it took so long.

Comments


Critical Bug in bigpicture Cataloguer


Posted: 2010-09-23 19:24   |  More posts about code

Update: Version 0.4 now released. Please upgrade immediately.

I'm aware of and have fixed a critical bug in the Big Picture cataloguer. The cataloguer stops downloading when it reaches a recent photo album, since its title ends with a full stop.

I moved recently and unfortunately my main computer was destroyed in the process. With it went my proper development environment.

This means I'll not be able to update the executable versions of the cataloguer for perhaps a week, but in the meantime, a new version of the script is up.

If you are encountering this bug, please check back next week for an updated version of the executables. The program will resume where it left off; you will not have lost the chance to get any galleries.

Comments


Tricks with python and music


Posted: 2010-05-22 17:30   |  More posts about art code computer science computers experimental music oddities

From Music Machinery:

One of my favorite hacks at last weekend’s Music Hack Day is Tristan’s Swinger.  The Swinger is a bit of python code that takes any song and makes it swing.  It does this be taking each beat and time-stretching the first half of each beat while time-shrinking the second half.  It has quite a magical effect.  Some examples:

Every Breath You Take (swing version) by TeeJay Sweet Child O' Mine (Swing Version) by plamere

You can find more examples in the original blog post. The results really are truly impressive. I'm looking forward to playing with Tristan Jehan's code, and also having a look at his PhD thesis:
Machines have the power and potential to make expressive music on their own. This thesis aims to computationally model the process of creating music using experience from listening to examples. Our unbiased signal-based solution models the life cycle of listening, composing, and performing, turning the machine into an active musician, instead of simply an instrument. We accomplish this through an analysis-synthesis technique by combined perceptual and structural modeling of the musical surface, which leads to a minimal data representation.

Fascinating stuff!

Comments


Big Picture Cataloguer: An update


Posted: 2010-05-20 20:11   |  More posts about art code computers internet photography

In just over a week since I released the Big Picture Cataloguer, there's been a surprising amount of interest and enthusiasm about it. Since I still haven't gotten binary versions of the program for OS X and Linux up (I've no access to an OS X computer, and getting the required libraries installed on Linux has proved to be quite difficult), I've decided to relent and share the source code of the cataloguer under a Creative Commons license.

The script makes use of pyexiv2 - the 0.2 branch - for metadata editing, mechanize for grabbing pages and submitting error reports, the very handy unaccented_map() class (included) for unicode trickery and of course the wonderful XML parser, BeautifulSoup.

Naturally, it's available from the Big Picture Cataloguer's page in the Code section of this site.

Given how much The Big Picture galleries' HTML format has subtly changed over time, and the fact I wrote this in a rush, it's quite messy, but it does the job.

Today's update is of version 0.3, which has an optional "quiet mode" to enable users to schedule the program to run frequently. Enjoy!

Comments


Boston.com Big Picture Cataloguer


Posted: 2010-05-12 20:02   |  More posts about art code computers internet media photography politics

I'm a big fan of The Boston Globe's photojournalism series, The Big Picture. So much so, in fact, that I decided to dedicate a few hours this week to building a program that would not just download the entire series, but add caption metadata to each photo, since many are informative and look very nice in Picasa, for example.

Now, I'm happy that the application is stable enough to release to the world in the Code section of my website.

Since I don't want people to be hammering The Boston Globe's servers, I've made the script wait a fraction of a second between each request, and since I don't want people to be able to disable this functionality, unfortunately only binaries will be available for the time being. Windows binaries are available already, OS X and Linux binaries to come in a few days.

Indeed, if those at The Boston Globe have a problem with how the program operates, they need simply contact me and we can come to an agreement, but I've worked hard to make sure that the program contacts their servers as little as possible.

Bug reports will be automatically submitted through this website too, but if you have any unforeseen problems (e.g. a crash or a hang), email me with as much information as possible (text describing the "Traceback" printed before the crash, what album/photo the program was working on, etc).

What can you do once you've got the entire 2GB collection of photos downloaded? Well, you can simply look through them at your own pace and comfort, or indeed choose to create a montage screensaver from them (although be warned - a screensaver that fades from a beautiful Antarctic landscape to a bloody photo of a victim of the war in Afghanistan might not be exactly what you had in mind.)

But in any event, hopefully it'll be of some use. Enjoy!

Comments