I've been around a couple times (well, maybe one and a half times) when it
comes to integrating C++ with an interpreted very high-level language
(VHLL). In here I discuss some of the experiences I've had and the problems
I've encountered in hopes that it will stimulate discussion during the
Interfacing to C++ and inheriting built-in types session at the
Python workshop.
Background
While at GE's R&D Center, I worked on a VHLL called LYMB. Python and
LYMB have very similar architectures: a high-level interpreted language used
for application development, with much of the underlying system implemented
in C. The major differences between LYMB and Python are the language syntax
and where most classes are implemented (in C versus the VHLL).
Several significant applications were developed in LYMB, including a
scientific visualization tool called VISAGE (VISualization
And Graphics Environment)[1]. The ability to easily modify the application (even
on-the-fly), create new combinations of visualization operators and new user
interface subsystems convinced us that this was the way to go.
Customers asked about C++ again and again (even though most of them weren't
using it and many had no immediate plans - it was just "the next thing").
They worried about our mechanisms for creating classes in C (which aren't
all that different from what C++ does, but simpler). We finally got a
customer who said, "We use C++, and we'll fund some development".
We did some C++ integration work at GE CRD that convinced me this was a
fruitful path, especially after I saw how cumbersome programming in C++
could be at times. Getting LYMB away from GE was like rescuing Jonah from
the whale, however, so I let the whale have Jonah and submitted an SBIR
proposal to the NSF to look at the problem.
The original plan was to write a new VHLL to front C++ that would be
well-integrated with it, but a better application development language.
Quite frankly, I was a bit worried about designing and implementing a new
language and trying to get it accepted by the community. Then I discovered
Python.
Evaluating Python
I fiddled around with Python for a little bit. Classes were written in
Python, not C like in LYMB. This is better from an application development
perspective, since you'd like to be able to use the same organizational
approach for the application as a whole as you use for the application's
underpinnings. (LYMB does have a metaclass class, which allows classes to
be developed in the LYMB scripting language, but it's implementation is not
as clean as Python's class mechanism.) Namespaces are no problem in Python.
LYMB has no construct similar to Python's module. Internally, it looked
pretty similar to LYMB, with which I was quite familiar. It even had
multiple inheritance. (One of the LYMB/C++ problems was mapping C++'s
multiple inheritance model onto LYMB's single inheritance model.) "By
George!", I thought. "This might actually work!"
First Steps
Once I chose Python as my VHLL (conceptually creating Python++), I had to
actually implement something. The goal of the research is to see how much
of the conversion we can do automatically, where we can't, and how we can
get over those places. My first task was to generate an interface manually
to a couple of simple classes. I chose the RNG and MLCG
classes in the Free Software Foundation's libg++ class library.
They had a number of interesting traits:
- Both classes were small, so manual coding wouldn't take forever.
- All visible parameter and return value types were simple (int, float,
double).
- Inheritance had to be dealt with (
RNG is the base class
for MLCG).
After converting RNG and MLCG manually and seeing
that the Python versions behaved as expected, I
decided it was time to try automating the process.
So I started hacking some Python. I first wrote a tokenizer class
(nominally written in Python, but mostly sed and
cpp). Next came a simple class parser. Finally, an interface
generator.
For the prototype, the basic approach was to implement a C module for each
class, and worry about proper subclassing in a later phase. The code for
the MLCG module shows the result.
Each module keeps a pointer to the C++ object as a private variable, a
pointer to its type (for type checking), and a flag indicating if this
module is the "owner" of the pointer.
typedef struct _CPPobject {
OB_HEAD
void *__cpp_inst__;
typeobject *__type__;
int __iown__;
} CPPobject;
Learning to Walk
At this stage, I knew I could convert simple classes, but there were several
tasks I hadn't yet approached and would have to get some experience with. I
chose a significantly more complex set of classes for my next experiment and
began refining my hacked together module generator. I chose the a set of
about 200 classes written by some friends as demo software for a book they
are working on. I took one of their simpler demo programs and tried
wrapping just the classes used in the application. That worked okay, expect
for two problems. First, there were some constructs my generator couldn't
handle, which I expected to find. Second, there were some classes behind
the scenes (base classes and such) that my software assumed were wrapped.
The first wasn't so bad, because for the prototype I was resolved to to some
manual fix-up. The second problem was bigger. I added the half dozen or so
unwrapped classes to the list of classes to interface and turned the crank
again. Hmmm, there are some more undefined modules. One more time... By
the time I grew tired of this exercise, I needed to generate interfaces to
a very large part of the entire library. That manual fix-up problem
suddenly got a whole lot bigger.
Any sane person would have gone back to square one and figured out a way to
not wrap that first set of "indirect" classes. Instead, I decided to tackle
the interface problems that could not be handled by my existing interface
generator. Here are some of the problems I faced.
- Compound arguments
- Passing a pointer and a count as two separate arguments often serves as
a poor man's vector in C. In mapping Python arguments into C++ and back, I
encountered this several times.
- Pointers in general
- Pointers raise several problems:
- Who allocated them?
- Who should delete them?
- What do they point to, and how much of that has to be mapped
between Python & C++)?
- What about function pointers?
- What about pointers masquerading as vectors?
- Overloaded operators
- It's difficult to
see how more than a few operators will map into Python. Somewhere along the
way, I think C++ programmers will have to realize that their code might get
used outside a C++ environment and make sure full class functionality is
implemented without operators. Punt.
- Templates
-
A primary motivation for templates in C++ is to implement type-safe
container classes[2]. To paraphrase Humphrey
Bogart, "We don't need no steenking collections." Besides, since templates
are a compile-time mechanism, it's difficult to see how they could be
integrated cleanly into a run-time facility. Eventually something will
probably have to be done, but for now we punt.
- Saving information
- Somehow, you have to tell the interface generator
how to handle constructs for which it has incomplete information. You'd
like to only have to give that information once. For the first prototype I
used Emacs and the Patch program, but that won't work forever.
- True Subclassing
- I think this can be handled with two
small changes to Python's internals. One, the mechanism for implementing
classes by the interpreter needs to be formally exposed to the C programmer.
Two, we need to create a C++ analogue of Python's
classobject
which, coupled with the external C++ class of interest, serves as a
foundation for a class the interface generator generates. Because of the
properties inherited from its parents, it should be possible to derive new
classes from it in Python.
- Code Length
- The
current method of repeated inline calls to
getarg() and
err_clear() was straightforward to implement, but if there are
many versions of an overloaded function, the code necessary to implement it
gets rather large (and potentially slow if calling getarg() is
expensive). It would be nice if there was some sort of table-driven
mechanism available to identify exactly which version of the function should
be called.
When Do We Get to Run?
For my work there is still the minor problem of securing follow-on funding,
but presuming that is successful, I have a number of ideas about how to
solve the above problems. Presuming they prove tractable (and I think they
will), there are more problems I haven't yet considered, some technical and
some more marketing:
- How do we get Python++ accepted in the programming community at large?
- How about programming support?
- Can you create a debugger that seamlessly crosses the boundary between
Python and C++?
I'm sure there are others. My hope is we can unearth the major ones during
the workshop and perhaps establish strategies for addressing them.
Slides
There are PostScript versions of the couple of slides I will toss on the
projector at the workshop available as well. They are:
References
1. W. J. Schroeder, W. E. Lorensen, G. D. Montanaro,
and C. Volpe. VISAGE: An Object-Oriented Scientific Visualization System.
In Proceedings of the Visualization '92 Conference, pp. 219-225,
1992.
2. Bjarne Stroustrup. The C++ Programming
Language. Addison-Wesley, 1991.