I've been around a couple times (well, maybe one and a half times) when it comes to integrating C++ with an interpreted very high-level language (VHLL). In here I discuss some of the experiences I've had and the problems I've encountered in hopes that it will stimulate discussion during the Interfacing to C++ and inheriting built-in types session at the Python workshop.

Background

While at GE's R&D Center, I worked on a VHLL called LYMB. Python and LYMB have very similar architectures: a high-level interpreted language used for application development, with much of the underlying system implemented in C. The major differences between LYMB and Python are the language syntax and where most classes are implemented (in C versus the VHLL).

Several significant applications were developed in LYMB, including a scientific visualization tool called VISAGE (VISualization And Graphics Environment)[1]. The ability to easily modify the application (even on-the-fly), create new combinations of visualization operators and new user interface subsystems convinced us that this was the way to go.

Customers asked about C++ again and again (even though most of them weren't using it and many had no immediate plans - it was just "the next thing"). They worried about our mechanisms for creating classes in C (which aren't all that different from what C++ does, but simpler). We finally got a customer who said, "We use C++, and we'll fund some development".

We did some C++ integration work at GE CRD that convinced me this was a fruitful path, especially after I saw how cumbersome programming in C++ could be at times. Getting LYMB away from GE was like rescuing Jonah from the whale, however, so I let the whale have Jonah and submitted an SBIR proposal to the NSF to look at the problem.

The original plan was to write a new VHLL to front C++ that would be well-integrated with it, but a better application development language. Quite frankly, I was a bit worried about designing and implementing a new language and trying to get it accepted by the community. Then I discovered Python.

Evaluating Python

I fiddled around with Python for a little bit. Classes were written in Python, not C like in LYMB. This is better from an application development perspective, since you'd like to be able to use the same organizational approach for the application as a whole as you use for the application's underpinnings. (LYMB does have a metaclass class, which allows classes to be developed in the LYMB scripting language, but it's implementation is not as clean as Python's class mechanism.) Namespaces are no problem in Python. LYMB has no construct similar to Python's module. Internally, it looked pretty similar to LYMB, with which I was quite familiar. It even had multiple inheritance. (One of the LYMB/C++ problems was mapping C++'s multiple inheritance model onto LYMB's single inheritance model.) "By George!", I thought. "This might actually work!"

First Steps

Once I chose Python as my VHLL (conceptually creating Python++), I had to actually implement something. The goal of the research is to see how much of the conversion we can do automatically, where we can't, and how we can get over those places. My first task was to generate an interface manually to a couple of simple classes. I chose the RNG and MLCG classes in the Free Software Foundation's libg++ class library. They had a number of interesting traits: After converting RNG and MLCG manually and seeing that the Python versions behaved as expected, I decided it was time to try automating the process. So I started hacking some Python. I first wrote a tokenizer class (nominally written in Python, but mostly sed and cpp). Next came a simple class parser. Finally, an interface generator.

For the prototype, the basic approach was to implement a C module for each class, and worry about proper subclassing in a later phase. The code for the MLCG module shows the result. Each module keeps a pointer to the C++ object as a private variable, a pointer to its type (for type checking), and a flag indicating if this module is the "owner" of the pointer.

    typedef struct _CPPobject {
        OB_HEAD
        void *__cpp_inst__;
        typeobject *__type__;
        int __iown__;
    } CPPobject;

Learning to Walk

At this stage, I knew I could convert simple classes, but there were several tasks I hadn't yet approached and would have to get some experience with. I chose a significantly more complex set of classes for my next experiment and began refining my hacked together module generator. I chose the a set of about 200 classes written by some friends as demo software for a book they are working on. I took one of their simpler demo programs and tried wrapping just the classes used in the application. That worked okay, expect for two problems. First, there were some constructs my generator couldn't handle, which I expected to find. Second, there were some classes behind the scenes (base classes and such) that my software assumed were wrapped. The first wasn't so bad, because for the prototype I was resolved to to some manual fix-up. The second problem was bigger. I added the half dozen or so unwrapped classes to the list of classes to interface and turned the crank again. Hmmm, there are some more undefined modules. One more time... By the time I grew tired of this exercise, I needed to generate interfaces to a very large part of the entire library. That manual fix-up problem suddenly got a whole lot bigger.

Any sane person would have gone back to square one and figured out a way to not wrap that first set of "indirect" classes. Instead, I decided to tackle the interface problems that could not be handled by my existing interface generator. Here are some of the problems I faced.

Compound arguments
Passing a pointer and a count as two separate arguments often serves as a poor man's vector in C. In mapping Python arguments into C++ and back, I encountered this several times.
Pointers in general
Pointers raise several problems:
Overloaded operators
It's difficult to see how more than a few operators will map into Python. Somewhere along the way, I think C++ programmers will have to realize that their code might get used outside a C++ environment and make sure full class functionality is implemented without operators. Punt.
Templates
A primary motivation for templates in C++ is to implement type-safe container classes[2]. To paraphrase Humphrey Bogart, "We don't need no steenking collections." Besides, since templates are a compile-time mechanism, it's difficult to see how they could be integrated cleanly into a run-time facility. Eventually something will probably have to be done, but for now we punt.
Saving information
Somehow, you have to tell the interface generator how to handle constructs for which it has incomplete information. You'd like to only have to give that information once. For the first prototype I used Emacs and the Patch program, but that won't work forever.
True Subclassing
I think this can be handled with two small changes to Python's internals. One, the mechanism for implementing classes by the interpreter needs to be formally exposed to the C programmer. Two, we need to create a C++ analogue of Python's classobject which, coupled with the external C++ class of interest, serves as a foundation for a class the interface generator generates. Because of the properties inherited from its parents, it should be possible to derive new classes from it in Python.
Code Length
The current method of repeated inline calls to getarg() and err_clear() was straightforward to implement, but if there are many versions of an overloaded function, the code necessary to implement it gets rather large (and potentially slow if calling getarg() is expensive). It would be nice if there was some sort of table-driven mechanism available to identify exactly which version of the function should be called.

When Do We Get to Run?

For my work there is still the minor problem of securing follow-on funding, but presuming that is successful, I have a number of ideas about how to solve the above problems. Presuming they prove tractable (and I think they will), there are more problems I haven't yet considered, some technical and some more marketing: I'm sure there are others. My hope is we can unearth the major ones during the workshop and perhaps establish strategies for addressing them.

Slides

There are PostScript versions of the couple of slides I will toss on the projector at the workshop available as well. They are:

References

1. W. J. Schroeder, W. E. Lorensen, G. D. Montanaro, and C. Volpe. VISAGE: An Object-Oriented Scientific Visualization System. In Proceedings of the Visualization '92 Conference, pp. 219-225, 1992.

2. Bjarne Stroustrup. The C++ Programming Language. Addison-Wesley, 1991.


Skip Montanaro
Email: skip@automatrix.com