Skip's Python Bits

Add a term to the Python Glossary!

Python is a language for agile development that has gained an enthusiastic following. You can read all about it at the Python.org web site.

I've written or collected little bits and pieces of quasi-useful Python stuff. What I've announced to the public is available here.

More or Less Current Stuff

Simple Logging Wrapper for prstat
prstat is Sun's version of top. Where I work we used to use top as a crude logging tool, just letting it run with output redirected to a rotating set of logfiles. prstat can almost substitute for top in this context, however it doesn't timestamp its output. prstat-t.py solves that shortcoming. (last updated 2009-09-14)
Minimalist Mailman Review Page
If you manage a popular mailing list with Mailman these days, you know how hard it can be to review the messages that get held for your review. mmfold.py fetches the review page for a mailing list and presents a more condensed version of the review page in your web browser. The new version accepts password info in the URL. (last updated 2008-06-30)
zipargs function for shell scripts
A colleague wanted to perform a zip operation (Python's zip not the compression program of the same name) in a shell script. So I wrote something for him. Of course, it uses Python's zip() function under the covers.
Introspective dir() function
If you use the dir() function as a cheap instrospection tool, you've probably noticed that it doesn't work very well for exploring package hierarchies. Here's a replacement which roots around in package directories and eggs and lets you know what submodules and packages it contains. (last updated 2008-03-18)
bsddb185 module
Python 3.0 will no longer come with the bsddb185 module. While it's rarely used, it does have some use on systems which still use the Berkeley DB 1.85 library, mostly BSD-derived Unix systems (including Macs). I extracted the module from the current trunk (2.6a0) and stuck it on PyPI.
Lock resources
Python has a couple different file locking APIs. None are portable. The lockfile package (currently alpha - version 0.2) implements a cross-platform API and three different classes which use that API:
  • LinkFileLock (relies on the atomic nature of the link(2) system call)
  • MkdirFileLock (relies on the atomic nature of the mkdir(2) system call)
  • SQLiteFileLock (uses an SQLite database to lock files)
Add or print iCal events or todos from the command line
I use a Powerbook but rarely take it to work. This makes it difficult to manage events and todos with iCal. The appscript module makes it fairly easy to script many Mac OSX applications from Python. ical.py is a fairly simple example of appscript usage. It also relies on the dateutil package to support flexible date/time parsing.
Queue based on sockets
A thread on comp.lang.python got into a discussion of communication between multiple processes. I suggested creation of a class like Python's threaded Queue class. SocketQueue.py is a trivial implementation of the idea. (last updated 2005-09-28)
Mmencode in Python
Way back in the early days of MIME there was mmencode. It was a classical Unix filter. It was small and did one thing well. Somewhere along the way it got replaced by other tools and on my latest web server I found it's not available (at least not without grubbing around for the proper RPM). Here's a simple replacement in Python. It only implements the -q and -u flags and only writes to stdout, but that probably accounts for 99% of the usage. (last updated 2005-08-12)
Autoload modules
Someone on comp.lang.python whose name I didn't record (who are you?) came up with this nifty module autoloader. I modified it slightly. (last updated 2005-03-16)
Config file reader/writer
In response to a ConfigParser Shootout I wrote one such little beastie. Its main features are: indentation-based file format, nesting to arbitrary depth, read/write round trip (sans comments at the moment) and attribute-style or dict-style access. (last updated 2004-10-22)
Rebind global variables during reload()
The subject of the behavior of the reload() function came up recently in comp.lang.python. This trival implementation may cover most of the perceived shortcomings of the builtin reload(). (last updated 2004-03-14)
Decode strings heuristically
When dealing with Unicode inputs from various sources you may or may not know how the input is encoded. If you don't know you probably have to guess. This little module demonstrates one set of guesses. You will almost certainly want to modify it for your needs. (last updated 2004-03-01)
Session save/restore
Gerrit Holl suggested save() and load() builtins on python-dev. He was thinking about using pickles, but I implemented a simpleminded version using the readline module. Unfortunately, the readline requirement means it won't work on Windows. Feel free to fix that shortcoming and send me a patch. (last updated 2003-12-01)
Simple progress meter
For long-running calculations, it's nice to have a simple way to display progress. progress.py provides a couple classes to support this. (last updated 2004-01-24)
Latin-1-to-ASCII codec
From time-to-time you really, really, really just want ASCII, as when some spammer sends you a message with the subject, "We cän makë it lönger now" or "keep up th¯e strugglê, get out ¨of that mess" (whatever that means). latscii.py is a simple codec which makes a reasonable attempt to strip accents from Latin-1 letters and map other characters to reasonable ASCII equivalents (such as mapping '¡' to '!'). (last updated 2003-11-11)
Regular expressions as dictionary keys
The topic of using regular expressions as dictionary keys recently on comp.lang.python. (It's also come up in the past.) I had a need for this, but with dictionaries containing hundreds of keys, all the regular expression matching makes the straightforward implementation a dog. REDict.REDict uses a binary search of the keys to speed things up. has_key() is O(log len(d)) instead of O(len(d)). Using the REDict.FastREDict class, matching is more like O(1). More could probably be done (caching compiled regular expressions or optimizing the large generated regular expressions), but this suffices for the time being. (last updated 2003-10-22)
Bulk Discard of Queued Mailman Messages
A recent virus attack left me trying to manually discard a thousand or so messages per day for a Mailman-2.1 list I help administer. I wrote mmdiscard.py to deal with that from the command line. (last updated 2003-10-15)
Speeding up Python programs
(Moved to Python wiki.)
Date-parsing module
I wrote this module several years ago to recognize dates in many different formats on-the-fly. The most useful bit is the parse_date function. You probably won't want to use it as-is. Just nick the regular expressions. (last updated 2003-06-28)
Persistent Sets
2003-06-05. I just stumbled upon this. I thought it was so cool how easily Python's new Set objects (new in 2.3) could be made persistent that I thought it worth mentioning here.
Firewall-1 Logfile Summarizer
2003-05-29. This script summarizes the csv log file which can be dumped by Firewall-1 NG, at least as it exists on Solaris. It requires Python 2.3 or later, as it uses the csv module introduced in that version. It also requires Volker Tanger's fw1rules package, which is used to dump a csv file containing your rules.
CGI Environment Printer
2003-05-08. This has come in handy on a number of occasions since I wrote it several years ago. It simply displays information about the CGI environment. Compare that with your shell environment to help figure out why your CGI scripts don't work as expected. (More could be done, like displaying the path to the executable. For some reason, I never needed that.)
Dynamic Instruction Frequency Collector
2003-01-13. Every once in awhile, someone on comp.lang.python wonders about optimizing some bit of Python bytecode. The discussion usually boils down to:
  1. You need to generate a dynamic execution profile (DXP) to decide if the optimization is worthwhile.
  2. Does someone already have some DXPs laying around?
  3. Utter silence.
DXPserver is an XML-RPC server which is meant to collect and distribute dynamic execution profiles. If you think it might be useful, let me know. I don't currently run it, but would be happy to if there was some demand.
Readline & command history
2002-11-08. I refer to this during interactive startup (one of the files which gets imported via PYTHONSTARTUP. I send it to people from time-to-time. It's a useful file and also demonstrates how to use the atexit module.
Marshal written in Python
2002-10-03. Guido sent me a version of the marshal module written in Python a few years ago. (I no longer remember why.) Once when I encountered a corrupted marshal file I modified it to not raise an exception when encountering an error during load(). Instead it returns what it has accumulated up to that point. Warning: Do not install this as marshal.py! If you do, you will almost certainly live to regret that mistake!
Category deletion for ifile
2002-09-21. I've been experimently with ifile recently and flubbed some Emacs macros which I was using to categorize incoming messages. I thus wound up with some bogus categories in my .idata file. This script allows you to delete arbitrary categories from .idata files.
Alarms for asyncore
2002-01-23. I recently had a reason to start using asyncore. It's a marvelous package for doing I/O with several network sockets. One of the first things I wanted to do after getting it working was implement alarms. Signal.alarm is ugly and may not work everywhere anyway, so I took advantage of the fact that asyncore uses the timeout feature of select() and poll().
Weekend Edition Sunday Puzzle
2001-10-14. I listen on occasion to NPR's Sunday Weekend Edition. Perhaps the best segment of the show is the Puzzle run by Will Shortz. On October 7th, 2001, this challenge was posted:
Draw a 4 by 3 box. The object is to fill it with letters spelling 3 four-letter words across and 4 three-letter words reading down. The conditions: your box can not repeat any letters, and it must use all six vowels (a, e, i, o, u, y) once. All words must be uncapitalized, common English words.
The code in nprpuzzle.py solves this problem using a straightforward O(N**3) algorithm. I don't claim it's the best way to approach the problem, but it was a fun diversion for a Sunday. It uses my little progress module to track progress.
Locate Division Operators
2001-08-13. With the coming change to the semantics of integer division you'll probably want to run something like finddiv.py over your code to identify potential trouble spots. It does nothing more than identify lines containing a "/" operator. It doesn't perform any analysis to try and prune the possible list of lines it displays. It does display lines in a format that Emacs's next-error command understands.
Editor Support for Python
2001-09-10. This is no longer maintained by me. You will be redirected to the new editors page on the main Python website.
ConstantMap.py - map numeric constants to their names
The ConstantMap.ConstantMap class can be instantiated from modules of constants to map "magic numbers" back to their names. This is useful when debugging code that returns such numbers. For example, the numeric constant modules generated by the h2py script all map semi-meaningful names to mostly meaningless numbers. ConstantMap allows you to map them back. (last updated 2004-03-07)
Watch - keyboard/mouse monitor
This Python script (hosted at SourceForge) monitors keyboard and mouse activity and enforces work and rest times. It currently only runs on Linux, but it has run on Windows in the past (only directly monitoring mouse activity) and could probably run on the Mac without a lot of effort.
Soundex module
2000-12-22. This module is a Python replacement for the now defunct soundex.c. This module is a merging of separate ones written by Tim Peters and Fred Drake.
SYLK file reader
2000-10-10. This module reads SYLK files and generates CSV files. Note that it currently has only been tested with files generated from AppleWorks 5.0 on a Mac.
Rough Size Calculator
2000-09-27. There are three general sources of memory leaks in long-running Python programs: cyclical objects that reference counting can't reclaim, botches at the low-level malloc interface, and growth of container objects that are reachable, but whose growth you're unaware of. Neil Schemenauer's garbage collector in Python 2.0 does a good job identifying cyclical garbage. This module attacks the hird case. The test case uses the Cache module below.
Simple Caching Dictionary
2000-09-27. Sometimes you need to cache results of long computations or database queries, but don't want your memory consumption to grow without bound. The Cache class subclasses UserDict.UserDict to provide a cache that discards values based on access time.
XML-RPC validation suite
2000-06-05. This server passes the XML-RPC validation suite as implemented at validator.xmlrpc.com as of June 5th, 2000.
Adding gzip encoding capability to XML-RPC clients and servers
2000-04-19. The instructions in gzip-xmlrpc.txt describe simple mods to XML-RPC servers and clients to allow responses to be encoded using gzip when possible. This can help performance significantly when using XML-RPC over wide area networks. You can also download the version of xmlrpclib.py that I use which includes one or two other mods. It is based on version 0.9.8 of Fredrik Lundh's xmlrpclib package.
Manipulating recurring dates
1999-08-09. Recurring dates occur frequently in my business, e.g., "The Bill Baldwin Trio appears every Thursday at 10pm". recur.py is a crack at the problem. It allows you to intersect two recurring dates or generate a finite subset of dates that fit the recurrence pattern. There's a long comment at the start that's not much more than me thinking out loud, followed by a fairly small amount of code. Feedback is much appreciated. Experimental (last updated 1999-08-09)
Validation of CGI script parameters
1999-09-30. The cgi module that comes with Python eliminates the tedium of marshalling script parameters from the HTTP input stream. This module enhances that with type checking. You can indicate which parameters are required, which are optional and what their types must be. Type information can be given in the input form, in an auxiliary file on the server or in the CGI script itself. The latest version also handles multi-valued parameters such as would be encountered with <select> tags having the multiple attribute.
Finite State Machine
2001-09-06. I recently added the ability for the states to be regular expression objects. This makes it easier to match some inputs in a case-insensitive manner. I'm sure creative folks will find other weird uses for the capability. (Note that when testing for re matches, it simply loops through the possible inputs for that state. The first re that matches the input is considered to match. If you have multiple re's that match a particular input, which one gets picked is non-deterministic.)
Browsable Python sources
I find it convenient to use this to refer to individual files from web pages. I used to tar it all up and use a Python CGI script called tgzextr.py to pull out individual files, but I now have enough disk space to keep the sources laying about... :-) This directory is just a snapshot of the CVS repository. (last updated 1999-09-30)
Simple-minded TCP client and server
I wrote a small client and server to test TCP connection and transmission speeds using either AF_INET or AF_UNIX sockets. Some implementations of AF_INET sockets degrade as you perform more and more connections in a short period of time. This is presumably due to linear search for an available port and the fact that TCP requires sockets to hang around for awhile after closing to catch late-arriving packets. (last updated 1996-06-07)
Parser for robots.txt files
Writing a Web wanderer in Python? Here's a little piece of code to help you along. Delivered with the Python distribution now.

Demo Scripts

The same questions seem to pop up from time-to-time. Here are some short scripts that demostrate various Python modules or features

Submitting URLs through a proxy
The Mapblast script takes two arguments, a city and a state or Canadian Province, and tries to extract the city's lat/long information from the Mapblast web server. If it succeeds, it displays a line with the city, state, latitude and longitude separated by colons. If the city name is slightly misspelled (e.g. Pittsburg instead of Pittsburgh), the Mapblast server does a fair job of correcting your spelling. This script displays what Mapblast said, not what your wrote... (last updated 1998-05-12)
Submitting a web form using the POST method
The sony.py script accepts a key representing a single Sony Music artist on stdin and submits it to Sony's artist search engine by mimicking the POST method submission that a user would trigger by searching for their favorite artist. It's one stage in a pipeline that looks something like:
      httpget http://www2.music.sony.com/musicdb/TourInfo | \
          egrep -i 'DJ HONDA' | \
          sed -e 's:<OPTION VALUE="\([0-9]\+\)">.*:\1:' | \
	  sony.py | \
	  ....
      
(last updated 1998-05-17)

More or Less Obsolete Stuff

Getting interactive help about objects
I include the two functions in help.py in my .pythonrc file. During interactive sessions I can then execute help(foo) for an arbitrary object foo. If it's a Python function or method, it will display its declaration. If it has a doc string it will display that as well.
Warning about return statement usage
This patch to .../Python/compile.c causes the Python byte code compiler to print warnings if it encounters inconsistent use of the return statement within a function. This is experimental code. Use at your own risk! (last updated 1999-09-10)
Warning about hiding builtin names
This script takes a list of module names and tells you which top-level functions have local variables that hide builtin objects with the same name. (last updated 1999-09-30)
Paper for SPAM-7
This paper describes some initial work I did with a peephole optimizer for Python byte code. (last updated 1998-11-22)
Faster stack manipulation
This patch against the 1.5.1 source distribution reduces the number of temporary variables and PUSH/POP operations performed by the Python interpreter. It also cleans up a few other miscellaneous nits. See README.stack file for a summary of the changes. Experimental (last updated 1998-07-05)
Peephole optimizer for Python byte code
This patch against the 1.5.1 source distribution adds a peephole optimizer to Python. See README.peep file for an introduction. Experimental (last updated 1998-07-02)
DOS-ification of file names
1996-12-19. I wrote this short script to map long filenames to DOS 8.3 names. I imagine there are some errors in the assumptions about what can go in a DOS filename. Feedback is welcome. (last updated 1996-12-19)
Test coverage tool
This little knock-off of the profile module provides test coverage of Python scripts in the spirit of Sun's now ancient (and probably defunct) tcov tool. This is superceded by the version incorporated into the standard Python library.
Experimental threaded web server
This server avoids forking by mapping URLs to modules and then calls the appropriate function in the handler thread. This was an interesting exercise, but you should probably use Medusa instead.
Quick Index of Python Library Reference
I got tired of trying to find the names of modules in the Python Library Reference table of contents, so I wrote a little script to create an alphabetically organized table that makes it easier (for me at least) to find what I'm looking for.
Python support for VTK 2.0
The above gzip'd diffs for VTK 2.0 support integration of VTK with Python (85839 bytes). It was obsoleted by a rewrite of the VTK wrapper code generator, so if you are running a more recent version of VTK than what was released in mid-May 1998, this won't work.
Enhanced urllib module
CGI scripts tend to do lots of quoting and unquoting. I wrote a small enhancement to Python's urllib module that migrates urllib.quote and urllib.unquote into the urlop C module. The test1 function at the bottom of the urllib module runs 50-70 times faster using the urlop module on my P100/BSDI system.
Improved (?) allocation of short strings (Feedback welcome)
I wrote a special-purpose allocator for short string objects (short being <= 128 bytes, including the object header). A modified version of stringobject.c contains the whole thing. A modified version of stropmodule.c defines a "counts" function that can be called to retrieve a list of counters that track creation and deletion of, respectively, large, <=32-byte, <=64-byte, <=128-byte strings. Note that this is not yet ready for prime time! This is just experimental code at this point. It is not well enough packaged yet.
Partial C implementation of the regsub module
This is another not-quite-ready-for-prime-time module. It only implements regsub.sub and regsub.gsub at this point. I just checked my version of regsub.py and noticed I'm not even using it... I suspect there's a bug lurking in there but punted on trying to find it. Whoever wants to take on the challenge, here it is.
SPAM-1 presentation on Python/C++
Ages and ages ago (1994! That's fourteen Internet years as I write this in September 1996) it seems I was doing some Python/C++ integration. It's been superceded by lots of other work. Still, some may find the concepts presented useful.

python ring Skip Previous | Previous | Next | Skip Next | List Sites

Skip Montanaro
skip@pobox.com

Valid XHTML 1.0!

Last modified: Thu Oct 8 19:47:39 MDT 2009