The accompanying patch file (optimizer.patch) implements a peephole optimizer for Python byte code. When applied, invoking the Python interpreter with the -O flag will run the optimizer on the generated byte code before writing a .pyo file. The optimizer results in small performance gains in many areas. Attribute access and complex number manipulation appear to be the most enhanced. The optimizer proper is written entirely in Python although some changes were made to ceval.c, compile.c and intobject.c. This means it's rather slow at what it does but easily modified. Thanks to Tim Peters for many useful suggestions about how to organize the optimizer and particular optimizations to try. Thanks to Marc-Andre Lemburg for pybench, which allowed me to isolate improvements and problems. Here is a brief summary of the changes: * ceval.c - A number of new instructions were added: - LOADI - a fast load instruction for small positive integers (those referenced by the small_ints array in intobject.c). - LOAD_NONE - a fast load instruction for the constant None. - LOAD_ATTR_FAST - a fast load instruction for attributes where the object name is in the local scope. - STORE_ATTR_FAST - a fast store instruction for attributes where the object name is in the local scope. - LOAD_TWO_FAST - a fast load instruction for loading two local variables at a time. - STORE_TWO_FAST - a fast store instruction for storing two local variables at a time. * dis.py - The disassembler has been modified to accomodate the new instructions. * opcodes.py - A new module that defines opcode information (values, types, etc). Used by both dis.py and optimize.py. * optimize.py - Each optimization is defined as a subclass of the OptimizeFilter base class and performs a single optimization (or small set of strongly related optimizations). Optimizers are chained together in a pipeline. A code object is fed in the front of the pipeline and emitted out the back. The user can set the OPTLEVEL and/or OPTSTATS environment variables to affect the behavior of the optimizer a bit. See the source code for details. Here are the optimizations currently implemented. - DeadCodeRemover - deletes code in any basic block that follows an unconditional transfer - LineNumberRemover - deletes SET_LINENO instructions. Currently not being exercised because the basic optimization in compile.c already does this. - LoadStoreCompressor - combines pairs of LOAD_FAST or STORE_FAST instructions into LOAD_TWO_FAST or STORE_TWO_FAST instructions where possible. (Local variable reference must be < 256.) - MultiLoadEliminator - converts n loads of the same item into a single load and n-1 DUP_TOP instructions. - StoreLoadEliminator - converts a STORE followed immediately by a LOAD of the same item into a DUP_TOP/STORE pair. - ConstPopEliminator - deletes LOAD_CONST/POP_TOP pairs. - JumpNextEliminator - deletes JUMPs to the next block that are the last instruction of the current block. - JumpOptimizer - optimizes JUMPs that are to other unconditional jumps. - TupleRearranger - reorganizes BUILD_TUPLE n/UNPACK_TUPLE n/STORE/ ... sequences to eliminate the BUILD and UNPACK and reorder the STOREs. - ConstantShortcut - maps LOAD_CONST n, where n refers to a small positive integer, into LOADI n instructions, whose execution directly accesses the small_ints cache in intobject.c. - ConstantExpressionEvaluator - precalculates constant expressions. This is especially helpful if you use complex numbers. * intobject.c - This file has been modified to expose the small_ints cache to the interpreter. The optimizer generates LOADI instructions for small positive integers. LOADI is faster than the equivalent LOAD_CONST instruction. One significant change to its overall behavior was to initialize the entire small_ints array in the beginning instead of one-by-one as constants are encountered. This code probably bears some scrutiny by another pair of eyes. * compile.c - Hooks were added to import and execute the optimizer when the -O flag is given. It is not an error for the optimize module to not be located. Optimization is just not done. A quickie disassembler is available (compile-time flag) to help debug the optimizer's actions. (If you screw up the optimizer you will find it difficult to run the regular disassembler. This disassembler doesn't require working byte code.) If you have any questions or problems or would like to contribute new optimization filters or ideas, contact me at skip@calendar.com. Skip Montanaro