Introduction to KLUDGE

1. What's KLUDGE ?

It's terribly incomplete in the first place. A mere technology demonstrator (remember X33, anyone?). A rudimentary experimental object/directory database engine, implementing a few core ideas, without much ambition to be stable and production-useful. That is, at the moment.

2. Just why KLUDGE?

Essentially because, at the moment, it's an irregular-shaped collection of existing public-domain technologies, with a few lines of my own glue code. Said collection includes MySQL for a database backend, libglib with its GTrees for indexes (not for the n-ary trees themselves), the GCC's fstream::fstream(int socket) for streams tied to BSD sockets, embedded Perl for interpreted member methods etc. KLUDGE does not stand for any meaningful acronym.

In other words, at this stage this name seems very appropriate. There's one thing I'm afraid of though, with respect to the name "kludge": I am aware that some of the authors of the software that Kludge is using might feel offended by the name of my creation. I would like to stress that the name "kludge" addresses stricly my own contribution, certainly not their quality products.

3. Download

Get the tarball here.

4. Example

Get a sample session capture here.

5. Basic ideas and functionality

Please choose your favourite style of explanation based on your background:

A) UML-style class diagrams, ERD, graphical object modeling

B) SQL, LDAP, NIS/YP etc.

If you've read as far as here, you probably already have some OO knowledge after all.

5.1. Explanation starting from graphical diagrams

Umm. The diagrams presented below are probably somewhat wrong - in the notation of arrows etc. I've received no formal education in UML. If someone could set me straight, that would be excellent.

In any object modeling framework, there are two essential "design planes": the plane of classes and the plane of instances. Actually the plane of instances is hardly ever considered - the model usually works with classes and the instances are considered some mundane real-world cargo that happens somewhere far down the road of a project's lifecycle. Nevertheless, the meta-model of Kludge addresses both these planes.

A class is a conceptual mold or archetype of a whole category of objects that we need to keep track of - these are generally called instances. E.g., Babs Jensen is an instance of class Person. In fact, this simple concept is already present in the good old relational model - there are entities and there are database records following a particular entity attribute set. Enough of this philosophic stuff - let's take a look at the KLUDGEy paradigm.

[Fig.1: Classes, instances and the root pseudo-object]
Fig.1, click to enlarge Being object-oriented, KLUDGE follows the essential OO concept - it provides two basic storage spaces, one for classes and another one for instances. Kludge is also somewhat hierarchical - thus, the space of classes is an n-ary tree, and so is the space of instances. This is a simple way to maintain a natural primary identification path for each and every class and instance. With classes, this tree simply follows the strict single-parent inheritance concept (hence, KLUDGE features a single-rooted class hierarchy with one common base class). With instances, this primary relationship is more arbitrary, but the general notion of single ownership seems good enough.
As KLUDGE aims to integrate the means of storage and manipulation for classes and instances as much as possible, there's also a single root pseudo-object - the KLUDGE Root Object.
To sum up, there are two sub-trees, one for classes and one for instances, artificially planted inside a single root pseudo-object, thus comprising a single tree - the meta-schema of KLUDGE.

Time to take a look at the meta-model in more detail.

[Fig.2: The complete tree including properties and relationships]
Fig.2, click to enlarge As explained above, classes and instances present the skeletal nodes in the twin n-ary tree. But there's more - there are leaf objects.
Firstly, both classes and instances can have a variable number of properties (equal to UML/ERD attributes). Each property belongs exclusively to a single class or instance.
Secondly, both classes and instances can have relationships among them (UMLspeak: associations. Perhaps I should better replace the word "relationship" with "association" everywhere in KLUDGE.)

[Fig.3: Relationships - a closer look]
Fig.3, click to enlarge Relationships are a perverse bunch for several reasons. They're leaf objects, but they "serve two masters" - each relationship must have both two ends defined, otherwise it cannot exist. This is to maintain referential integrity. If you try to delete a class or instance that has relationships, KLUDGE will object. Also, the class relationships have given me quite a headache when I was putting the metamodel together - depending on the way I thought of them, sometimes they occured to act like class relationships, while ten minutes later they had a startling "relationship class" flavour (UMLspeak: association class). To put it another way, I was in terrible doubt whether to implement class relationships as a special double-ended leaf object in the meta-schema, which would limit the possibilities of class-wise behavior, or whether I'd better implement them as special sort of classes, which would introduce a formally messy referential integrity checking mechanism.

It took me a while to conclude that initially I'd implement "lightweight" relationships, both for classes and for instances, dismissing for a start any unnerving thoughts of relationships with a variable number of attributes or even variable number of ends (objects related). Put another way, I wanted the basic meta-model of KLUDGE to be as simple as possible (compared to the relative compexity of e.g. UML). I also wanted to have as little of system functionality as possible implemented outside the base meta-model.

Arguably, in data modeling practice, there are numerous examples of relationship-like objects that need complex event-driven behavior, an arbitrary set of attributes and perhaps capability to act as an endpoint of other relationships. Such heavy-weight relationships should be modeled in Kludge "user space", on top of the hardwired Kludge meta-model - as Kludge classes and Instances, with an appropriate set of attributes, methods and lightweight relationships, comprising the required complex properties, behavior and inheritance.

To sum up, the meta-schema consists of six fundamental object types:

classes

class properties

class relationships

instances

instance properties

instance relationships

Each of these six types is implemented as a C++ class within the KLUDGE server demon and has a corresponding table in the backend SQL database.

As far as inheritance is concerned, the classes inherit properties and relationship stubs from their base classes (parents). For the sake of clarity, inherited items are not included in Figure 2.

At this moment, kludge is a string-oriented database demon, similar to e.g. OpenLDAP - completely lacking a GUI front-end of any kind. Figures 1 to 3 have been drawn in a generic vector-based graphical editor, for illustration purposes only.

5.2. Explanation starting from LDAP and SQL

The traditional relational databases are fine for systems of limited size and "intelligence". They're excellent for basic data storage, some are able to help you keep referential integrity and can support stored procedures - but in reality most any intelligence beyond these very basic mechanisms needs to be taken care of at application level. Which means that you end up creating custom C++ classes matching SQL tables (or vice versa). Yes the process can be automated, both table definitions and C++ class skeletons can be generated out of a CASE tool - nevertheless, my idea of elegance and efficiency is different. Generated table defs and C++ classes in a larger system cause quite some bloat. Even with advanced and tightly integrated development tools, it may prove complicated to maintain the generated C++ skeletons separately from (or symbiotically with) any hand-written custom code. Also, with SQL, polymorphism is very limited or impossible.

As Bruce Eckel says in his excellent book on C++, almost any difficult programming problem can be alleviated and solved using an additional level of indirection. Which may certainly seem to be an argument in favor of the aforementioned "skeleton generation" approach. In my opinion, the "skeleton generation" approach is far from elegant. If the approach is used to facilitate implementation of a rather large system, it equals parasitism on the back-end mechanisms of SQL and C++, creating considerable load on compiler resources, lots of redundant code in the compiled binary, lots of similar table definitions in the SQL store. This excessive load may be considered pure overhead - resulting from the low-level brute-force approach to object orientedness.

Arguably, another abstraction layer (order of indirection) is due. The database engine should store polymorphic objects with inheritance and referential integrity, rather than boring table records. There can be multiple layers of database technologies if necessary. The basic object-oriented principles should not be difficult to implement in a database server and would certainly improve overall system consistency and manageability (if not raw TPS). If said OO principles could be propagated as far as the client-side C++ API, lots of custom yet repetitive C++ coding work could be saved, and lots of dedicated compiled code. Obviously the more OO intelligence you want to transfer to the user interface, the more meta-modeling you have to do in your C++ code (as opposed to straight hard-coding a set of "user-space" classes).

In the arena of hierarchical directory databases, X.500/LDAP is an important standard. It's even object-oriented to a limited degree. For a certain type of applications it's an excellent solution (large-scale directories, with multiple maintainers, geographically distributed, with a limited number of object types). There are nevertheless potential areas of deployment where LDAP (or perhaps the OpenLDAP implementation) lacks in internal consistency checking and dynamic flexibility. Specifically, it falls behind in skunk-works style projects - centralised object stores with a dynamically evolving schema and maximum server-side functionality required, and minimum maintenance manpower available. Perhaps I'm just under-educated regarding LDAP's functionality, or I should consider other server implementations beyond OpenLDAP. Speaking of OpenLDAP: the schema is static, stored in a flat text file, needs to be edited separately, the alias records are "symbolic", pointing to absolute distinguished names and therefore prone to the "dangling pointer" syndrome, there's no "stored procedure" or "member method" functionality. No client-side transactional integrity support in a multi-user environment.

Still, LDAP is perhaps the most important inspiration for my Kludge thing. As suggested above, Kludge is intended for a different scenario than LDAP.

Kludge is built around the general object-oriented notion of classes and instances. The classes comprise a single-rooted inheritance tree. Similarly, instances comprise a single-rooted tree, based on an arbitrary "primary ownerhip" association, the only purpose of which is unique primary identification. In addition, there's a single root pseudo-object, common to the aforementioned two hierarchies, superior to the root class and the root instance. Thus, within the whole KLUDGE system, any kludge object can be uniquely identified using a primary path specifier, similar to a filesystem path. The concept of a KLUDGE path is however closer to the LDAP concept of an absolute distinguished name.

Kludge classes and instances are the node objects of the twin tree. Per se, they only have a few system attributes. User-defined properties and relationships (attributes and associations) are separate leaf objects, hooked into the respective class or instance hierarchy. Properties are key-value pairs, where key is unique inside the local "namespace" of a particular class or instance. Relationships are double-ended leaf objects, identified by a separate key (relative distinguished name) at each of the two related objects (classes or instances).

To sum up, there are six basic KLUDGE object types:

classes

class properties

class relationships

instances

instance properties

instance relationships

The classes and instances are similar to Perl packages in that they have local symbol tables (relative namespaces). All the contents within a particular class, be it child classes, properties or relationships, can be reached using their Relative Distinguished Names (an LDAP term). The same applies to the tree of instances. Plus, there's one exception: KLUDGE class names are globally unique. The names of instances and all the leaf objects are not. In other words, any KLUDGE object is identified using a KLUDGE path (absolute distinguished name), which is effectively a chain of all the relative distinguished names, starting from root ("/"). The KLUDGE path resembles a filesystem path more than anything else and the KLUDGE CLI (Command Line Interface) also knows the notion of current path (PWD). Plus, KLUDGE classes can be identified by their bare name - they have a global name space. If the user tries to indentify an object using a bare name, that can be resolved either as a global class name or as a relative name within the current path (PWD), the name resolution routine returns the local content.

Kludge uses MySQL as a database back-end. The MySQL database contains six tables, used to store the six basic KLUDGE object types. In this relational backend store, each class and instance has a unique OID - an integer, functioning as a low-level object identifier. These OID's are implemented using MySQL's auto-increment column type.

Consequently, any nested content (relational table records in any of the six tables) is referring to the superior class or instance using its OID. To be absolutely correct, class relationships also have unique OID's (so that instance-side objects can point to them) and so do instance relationships, for some obscure internal implementation reasons. Properties are identified using the superior node object's OID as a foreign key, together with the property's RDN.

These internal OID's are not visible to the user / application programmer. The point is, that the user may rename or move the class or instance, while said node object forever retains its internal OID - so that any properties or even relationships still refer to the same object, regardless of its RDN and position in the tree. Any existing relationships follow the physical referenced object. This is one of a few core ideas that help to keep referential integrity.

Another feature addressing referential integrity is the way in which relationships are visible at the KLUDGE CLI. At each of the two objects related, a relationship has a relative distinguished name (just like any other KLUDGE object) - thus, a relationship has two names (name at Master, name at Slave) and can be addressed by two different KLUDGE paths. The important point is, that if the user deletes the relationship by any of its two names, i.e. by any of its two ends, (the DELETE command takes only only one pathname at a time to be deleted), the relationship is erased as a whole, disappearing from the symbol tables at both the objects originally related. The two objects simply cease to be related.
Similarly, it's impossible to delete a class or instance that has nested contents, or a class that has instances etc.
Just like at the logical level of ERD-style or UML-style data modeling, the user doesn't need to deal with physical-level foreign key attributes etc. He can simply insert nested content with a particular RDN into an existing class or instance, or establish a relationship between two existing objects. Likewise, he can delete nested contents or cancel a relationship between objects that continue to exist.

In other words, the KLUDGE "engine" written in C++ is keeping an eye on referential integrity. It's only six core classes, so it's been perfectly feasible to prevent basic referential inconsistencies - at least considering an ideal single-thread environment. The referential integrity is a fairly generic concept, so it does make quite some sense in an object directory. I wouldn't want to code something like that on a per-class basis in a somewhat complex system relying on a myriad hard-coded C++ classes with corresponding SQL tables.

The current referential integrity mechanism in KLUDGE is still rather rudimentary - but also somewhat extensible. For instance, I can imagine adding ctor/dtor-style methods to class relationships, that could operate upon further related objects, so that larger instance conglomerates could be manipulated by a single relationship creation or deletion.

The relationships are perhaps the most interesting type of objects within the meta-model of KLUDGE. Conceptually, if we abstract from a possible arbitrary hierarchy, any relationship is essentially peer-to-peer. Any two objects can have a relationship and it may not always be clear which one is the master and which one is a slave. Technically, any KLUDGE relationship has a clear master endpoint and a clear slave endpoint, so that technically the engine can always distinguish whether it's talking about "this end" or "the other end", even e.g. in case of self-referral relationships. If the conceptual relationship depicted is peer-to-peer, it doesn't matter which endpoint is technically set to be a master or a slave.

On the inside, in the C++ code, the two endpoints are implemented as "relationship stubs", that are integral parts of a complete relationship, while it remains completely clear which end is the master and which end is the slave. In the SQL backend however, a relationship is stored in a single record, including its two ends (foreign keys of the objects related).

Any class relationship has a system attribute that says what type the relationship is: master/slave, peer-to-peer or ownership. While the M/S and P2P class relationship types are, quite unsurprisingly, instantiated into instance relationships, the "ownership" type of class relationship gets instantiated into the primary ownership references within the tree of instances. The ownership type of class relationships regulate not only subordinate owned instances but also instance properties.

KLUDGE is currently based on string-wise data storage, which can be considered a heritage of Perl, LDAP and RPSL-style hash block flat files. Class properties are mundane key-value pairs, instance properties can have their value checked against a regular expression pattern (this has yet to be coded - feature under construction).

Indexes of pointers to inherited items (properties and class relationship stubs) are runtime-only structures. Only own class properties and relationships are stored in the backend SQL store. Any class property or relationship stub is owned by only one class (the true owner of such item) and is only created once on the heap. The inheritance indexes (symbol tables) at any respective child class are mere run-time collections of weak pointers to the single allocated occurrence of the item inherited.
When any content (including a child class name) is added, modified or removed within a class, the class invalidates its own index of inherited items and also the inheritance indexes at its children presently loaded, cascade-wise. As a result, the inheritance indexes at a particular KLUDGE class will later be reassembled on demand when someone tries to resolve a name within that class.
Any inheritable content of a base class only gets inherited by the derived class if that content doesn't name-clash with local own content (in the derived class) of any kind, including further derived subclasses - there's only one local/relative namespace for all sorts of contents. If a prospectively inheritable item clashes with a local name, the local name takes precedence, effectively overriding any inheritable items of the base class known under the same name. When new own content is inserted in a class or when existing class content is renamed, its proposed name is only checked against existing own contents - inherited contents are ignored and get invalidated, so that overriding can take place upon recalculation of the inheritance indexes.
Ownership-type class relationships have their stubs inherited, too. When an owned instance is inserted somewhere in the tree of instances, we need to indicate at the newly created instance both the class of the instance and the class relationship followed upon ownership instantiation. Kludge is able to enforce per-class-relationship maximum instantiation counts (an optional system attribute of any class relationship).
The class relationships' inheritance mechanism is rather crude - but also free of significant conflict. More refined relationship inheritance mechanisms are certainly thinkable, but also rather tricky. Any dark corners of the chosen inheritance algorithm just stand out once you try to code it in C++ to embed it in the Kludge engine.

6. BLOBs and embedded Perl

KLUDGE is capable of storing large data objects as BLOBs in the backend database. There are five special object types, internally anchored to the CLASS_PROPERTY and INSTANCE_PROPERTY: CLS_BLOB, CLS_TXT_BLOB and METHOD at the plane of classes, and INST_BLOB and INST_TXT_BLOB at the plane of instances. The point is that BLOB content is never kept on the heap (except for the duration of the INSERT/SET/GET operations) and thus doesn't clutter system RAM on the server. The methods are written in Perl and are parsed on demand individually, and only the parsed subtree of Perl OPs is kept on the heap.

The implicitly involved property object holds an integer value called a BLOB ID, the BLOB data per se is stored in two more relational tables: one for binary data and another one for text data (including methods). This is to facilitate future implementation of full-text searches on text data and methods. Think of full-text searching of stored documents or dependency-checking of code in METHODs versus the database schema. The implicitly involved property object provides a name in the KLUDGE namespace and proper inheritance.

As for storage of documents: there are excellent document management systems out there. The original purpose of kludge is to model complex systems with lots of relationships, e.g. topological data. Kludge was not inteded for storage of large flat binary data. The BLOBs have been added to allow the user of KLUDGE e.g. to attach icons or simple vector sketches in arbitrary formats to various KLUDGE classes or instances - to facilitate development of generalized graphical interfaces.

The embedded Perl interpreter is currently quite limited in expression capabilities. This is due to the embedded environment, the multithreading, under-developed XS module for direct access to the KLUDGE engine. See the Bugs&Deficiencies chapter for more information :)

METHODs can be INSERTed just like any other BLOB and then can be RUN by name, either statically using a path via the class, or even via any instance of that class (and its children). The RUN command also takes a variable number of named arguments (variable definitions) enclosed in parentheses. If the METHOD's code should need access to these arguments, it has to declare something along the lines of

my %args = (@_);

and then it can access the arguments by name, e.g.

my @argument_names = keys(%args);
print $CLIENT_SOCKET $args{"some_argument_name"};
my $this_obj_name = Kludge_pl::get_object_name($args{"this"});

If the METHOD is called in the member->method fashion, Kludge inserts an additional argument into the argumnts list, called "this", holding a C++ pointer to the Kludge_Instance that invoked the method (last-but-one path element). The embedded Perl has access to the client socket using a filehandle reference called $CLIENT_SOCKET, that is pseudo-thread-private - it's really a lexical (a "my" variable), its declaration&initialization is silently inserted before parsing into every METHOD. There's a module called Kludge_pl that provides a handful of XSUBs, granting some level of access to the KLUDGE engine. The number of XSUBs available is currently too limited - this will have to be worked on.

Speaking of user-space methods (stored procedures), I have to proudly announce that Kludge also knows the notion of constructors and destructors. When an INSERT command is received at the CLI, with an INSTANCE definition as a parameter, Kludge searches the Class specified, looking for a method called CTOR - if it finds one, it executes it once the instance has been inserted along with any potential nested contents. It's possible to pass arguments in the form of (name=value; name=value;) to the CTOR within the INSERT command. Similarly, when deleting an INSTANCE, Kludge searches for a DTOR method in the respective class, and executes it before it starts to delete nested content (which can be requested using the RECURSIVE option at the CLI). It's not possible to pass arguments to the DTOR method. (Would it make sense? If so, it's not a problem to implement). The CTOR and DTOR methods both obtain a hash of named arguments, just like any other method - and both obtain 'this' as one of these arguments (with DTOR, 'this' is the only argument). Currently, CTORs and DTORs do get inherited in the sense that they show up in GET listings, but inherited CTORs and DTORs are ignored during object creations and deletions. This design decision is not final - subject to further discussion. C++ doesn't inherit constructors, but perhaps it might make sense in Kludge, right?

Now that we've briefly examined the current capabilities of the Perl embedded in KLUDGE, I can also point out the greatest conceptual weaknesses.
The most important and fundamental issue is that Kludge has its own basic object types and its own hierarchical namespace for its objects. These object types, namespace and naming resolution rules are incompatible with those of Perl - there's no straight integration. The only way from Perl to the KLUDGE engine and back is through manipulator XSUBs, with Kludge Object references (C++ pointers) being typedeffed to Perl-space integers in order to be retrieved from the KLUDGE engine and passed back as XSUB arguments (the good old XS way). My conclusion is that no amount of "magic" currently available off-the-shelf from the Perl guts can digest the namespace construction of Kludge in such a way that Kludge classes, instances and relationships could be passed to Perl code as Perl objects (blessed references to tied hashes). At the very least, such magic-heavy approach would lead to duplications of Kludge objects in Perl space in the form of Perl-space shim objects, probably resulting in inconsistencies upon any modification that would happen to the authoritative trees at the Kludge engine without Perl knowing.
In other words, there's a basic impedance mismatch between Kludge and Perl. Obviously I don't want to modify the essentials of Kludge just to better fit the model of Perl - I have created my Kludge exactly because so far there's been no such thing :) (OK, sure, I know Smalltalk has been around since late sixties). Which inherently means that Perl with its essentials can't be used as a seamless scripting complement to Kludge. The programmer in Perl embedded in Kludge will have to work with some abstraction and pay attention not to overly rely on Perl-space data, which will always be just ephemeral references to the only true reality of Kludge. Any references to Kludge engine objects kept in Perl space will only be valid so long as some other thread doesn't change the contents of the Kludge engine's trees, or the engine doesn't partially unload objects off the heap, or the Perl-space code in the current thread doesn't invoke changes larger than the programmer could fathom. Changes done by other threads and by the engine can be addressed by locking and transactional atomicity in general (subject to further study anyway), thread-private surprises will always be a problem in this scenario.

Enough of void philosophy. If someone figures out an elegant solution to this, please let me know.

To sum up the topic of Kludge and embedded Perl: at the moment the Perl's capabilities are limited, but even this kludgey way of accessing the Kludge engine's trees using explicit manipulator XSUBs is far better than no scripting at all.

7. CLI Command reference

The command-line interface of Kludge knows the concept of a "current path" (pwd). Consequently, virtually all the commands that require the specification of a path, accept either an absolute pathname or a relative name within the current pwd, or a relative path starting from the current pwd. An absolute path is one that's starting from the root - with a leading slash. A relative name or path lacks the leading slash or is starting with an explicit "./".
All object names are relative - therefore, in general, the only way to specify an object unambiguously is using an absolute path. With one minor exception: CLASS names are unique and are searched bare as a last resort attempt - therefore, it's often enough to specify just the bare class name, irrespective of the current pwd. Still, if there's incidentally a local relative name that's equal to a name of one of the classes, the name resolves to the local object (local names are searched first).
One other interesting anomaly is that any relationship can be identified by two distinct absolute paths and that can be used interchangeably.

All the commands (the command strings per se) are case-insensitive.

7.1. (HELP)

Not implemented yet.

7.2. GET

Synopsis:


GET <path>;

Description:
get a listing of contents of the object specified by <path>.

7.3. INSERT

Synopsis:


INSERT <where> <what>;

Description:
insert a new Kludge object. The 'where' argument is a kludge path, the 'what' argument is a Kludge object definition - one of the following:


CLASS <name>;
CLASS <name> { <kludge object def>; [<kludge object def>;] ... };

CLASS_PROPERTY <name> <value>;

CLS_BLOB | CLS_TXT_BLOB | METHOD <name> {
<Base64 encoded content, multiple lines>
<terminated by an empty line>
};

CLASS_RELATIONSHIP { 
   MASTER="<master class name|path>";
   SLAVE="<slave class name|path>";
   NAME_M="<name at master>";
   NAME_S="<name at slave>";
  [TYPE="<OWN | M/S | P2P>";]
  [MINCOUNT_M="<minimum count at master>";]                # not active yet
  [MAXCOUNT_M="<maximum count at master>";]                # not active yet
  [DEFAULTCOUNT_M="<default count at master>";]            # not active yet
  [NAMINGGEN_M="<naming generation pattern at master>";]   # not active yet
  [NAMINGMATCH_M="<name matching pattern at master>";]
  [MINCOUNT_S="<minimum count at slave>";]                 # not active yet
  [MAXCOUNT_S="<maximum count at slave>";]                 # not active yet
  [NAMINGGEN_S="<naming generation pattern at slave>";]    # not active yet
  [NAMINGMATCH_S="<name matching pattern at slave>";]
};

INSTANCE <name> <class name|path> <class relationship path>;
INSTANCE <name> <class name|path> <class relationship path>
{
   <kludge object>;
  [<kludge object>;]
   ...
};
INSTANCE <name> <class name|path> <class relationship path>
[(arg1=val1; arg2=val2; ...)]
{
   <kludge object>;
  [<kludge object>;]* 
};

INSTANCE_PROPERTY <name> <class name|path> <class relationship path> [value];

INSTANCE_RELATIONSHIP { 
   MASTER="<master instance path>";
   SLAVE="<slave instance path>";
   NAME_M="<name at master>";
   NAME_S="<name at slave>";
   CLS_REL="<class relationship path>";
};

INST_BLOB | INST_TXT_BLOB <name> <class name|path> <class relationship path> {
<Base64 encoded content, multiple lines>
<terminated by an empty line>
};

Comments:
When inserting a class or an instance, you can specify nested contents in the same command. When inserting an instance, you can also specify constructor arguments (a list of variable definitions).
When inserting relationships, the MASTER and SLAVE fields are not really mandatory - if ommitted, the INSERT destination will serve as a default. Thus, when both MASTER and SLAVE is ommitted, they are pointed to the same class, giving birth to a self-referral relationship :) With CLASS_RELATIONSHIPs, if the TYPE parameter is ommitted, type "OWN" is chosen by default. The namingmatch patterns follow perl-compatible regular expression syntax.
The attributes of CLASS_RELATIONSHIPs tagged "# not active yet" can be entered and have their defaults but are not observed yet - they were meant to facilitate automatic generation of repetitive names following a simple numbering convention etc. (a very old facility, planned from the very beginning, when the inclusion of embedded Perl seemed nothing but a pipe dream).

Example:

INSERT . CLASS Cat;

For more examples, check out the sample session capture.

7.4. DELETE

Synopsis:


DELETE <what> [RECURSIVE];

Description:
delete the Kludge object specified by <what>. If the option RECURSIVE is not specified, the DELETE command refuses to delete an object that has nested contents. If RECURSIVE is specified, the DELETE command will delete class-side or instance-side objects cascade-wise - it will still refuse to delete classes or class relationships that are referred to by instance-side objects.

7.5. MOVE

Synopsis:


MOVE <what> <where>;

Description:
Move an object somewhere else. You can move a class-side object under another CLASS. You can move an instance-side object under another INSTANCE. You can not move an object under a child CLASS / owned INSTANCE (create a convoluted unreachable space).
The MOVE command currently does not perform renaming.

7.6. RENAME

Synopsis:


RENAME <what> <new name>;

Description:
Rename an object, specified by a path in the <what> argument, to another relative name.

7.7. SET

Synopsis:


SET <what - a property> <new value> ;
SET <what - a blob thingy> <new BLOB value> ;       # {\nBase64\n\n} 
SET <what - a relationship> <attribute=value> ;     # a variable definition

Description:
Set a new value to a property, a blob, or a relationship attribute.

7.8. RUN

Synopsis:


RUN <what> [(arg1=value1; arg2=value2; ... )];

Description:
Run a method. Optionally you can provide named arguments that will become availeble in the Perl-space %args hash.
The RUN command exhibits a name resolution anomaly. It treats the last path element in a special way. Please note that the last path element equals the relative name of the method to be run. Consequently, depending on the Kludge type of the last-but-one path element, the path argument specifying <what> to run can either approach that method via its respective class (static-style call) or via any instance of that class or its children (an instance-side call). Normally, methods never show up in any listing of content names of an instance - our anomaly is the result of the aforementioned special treatment. Depending on the call style, the %args hash either does or does not contain an automatically added argument called "this".

7.9. CD

Synopsis:


CD <where to> ;

Description:
change "current directory" :) Obviously we're speaking kludge paths and kludge objects, rather than filesystem paths.

7.10. PWD

Synopsis:


PWD ;

Description:
display the "current working path" of this client thread's CLI.

7.11. EXIT or QUIT

Synopsis:


EXIT ;
QUIT ;

Description:
terminate the current client session.

7.12. SHUTDOWN

Synopsis:


SHUTDOWN ;

Description:
tell the server to shut down. Equal to pressing CTRL+C at the server console or to sending a SIGINT to the server in another way.
The server thread attempts a graceful shutdown. Upon the first signal received, it stops accepting new clients and waits for the currently active clients to log out. When the last client logs out, the server goes down. If the admin is impatient, he can send a second shutdown signal - in response to that, the server forces the client threads to finish (sends a signal that breaks their blocking IO calls, thus causing them to throw an exception) and, after all the client threads exit, the server thread goes down too. If the client threads fail to finish for some reason, it's possible to issue a third shutdown request - the server gives up any cleanup of the child threads, leaving that part up to the operating system, and goes down immediately. In each of the three cases though, the server does perform (or, in the third case, attempt) proper teardown of the twin tree from the heap.

8. Motivation

Sometime in early autumnt of 2000, when I first contemplated Kludge, being an IP networking tech, my original aim was (and still is) to inventorize and manage complex, heterogeneous network topology. Now, more than two years later, I can say no more than that. I am bound by several agreements (my work contract, NDA's with software suppliers) not to disclose what I've learned in my job since May, 2002. That's why the examples I'm giving strictly avoid talking about routers, switches, services, bearer circuits, copper and fiber based outside plant etc. - in general, the actual models that I seek to implement in the user space of an object database. I believe that Kludge alone currently contains none of this business-confidential know-how.

Have you ever tried implementing a directory app using an SQL database for the backend and some C/C++ for the user interface? If you have, you've probably ended up writing hard-coded C++ classes (or C structs) for each type of object to be stored in the directory, along with hard-coded SQL table definitions to store the contents of your C++ objects. Potentially with lots of repetitive code. In C++ programming textbooks, this is called C++ objects with persistent storage. Those textbooks that are labeled "advanced" usually close the matter with a suggestion that the way to go is to use a graphical CASE tool to design the class model and generate the code out of that - both the C++ class skeletons and the SQL table definitions.

Being an apprentice programmer, with some rather passive college experience related to CASE tools, I can't say that I've "been there, done that." I've seen examples of such apps, OpenH323 is one a good one if memory serves (lots of ASN.1 classes converted into C++ classes and then compiled). Thus, all I know is that this approach is viable - but also terribly apalling to me.

Without having a large enough problem to solve, I'd never get the grasp of object oriented approaches. Only when I was presented with a problem complex enough to spin my brain the right way, a few months after I left college, the once-memorized object oriented lingo sunk in. I've gone quite some way from basic C to basic object-oriented C++ (with great help from Mr. Bruce Eckel's excellent online book). But before I even started to toy with the object-oriented features of C++, I already had quite a clear concept of what KLUDGE would be about.

There are commercial object-based directory servers available, for a myriad different uses - some of them built for the management of office PC networks, others intended mainly for distributed remote authentication in TCP/IP networks, and yet another distinct class designed to function as network-based stores of human-oriented contact information.

There's DNS, there's Novell NDS, there's the Microsoft Active Directory, there's NIS/YP, there's Radius and there are other proprietary products. Many backbone IP networks in different countries are documented using RPSL-like derivatives of plain text files containing key/data hash-like objects. Most importantly, there's LDAP, the internet-based derivative of X.500, and its OpenLDAP implementation - perhaps the only standards-based and open-source solution available. Still, I was looking for something yet more refined. I was very sad to abandon The Metacase tool - its "meta" functionality seemed promising at first, but I soon found out that I was really missing a database - not a case tool. Even though some advanced CASE-wise visualization of my database data would certainly be excellent for my purposes.

I've seen about two or three proprietary ODBMSes - they had proprietary server-side scripting languages (or perhaps SmallTalk), newbie-unfriendly documentation, windows-only API and other features that were uncomfortable to me. Then there's also the public domain free ODBMS by Konstantin Knizhnik - my C++ coding skills are far too poor to get the grasp of what's going on here :)
I should also mention the claimed "object/relational" nature of PostgreSQL (table definition = class, data records = instances) which in my opinion is still far from true object-orientedness.

To sum up, I was looking for an object-oriented system, that would preferably store the class model (the "schema" if you speak LDAP) in the same manner as the object instances are stored. So that the schema could be updated at run-time, without restarting the server, let alone recompile. So that only a fairly simple meta-schema would have to be coded in C++ and SQL database structures. I also wanted the server to have a command-line user interface, perhaps somewhere between HTTP and SQL - so that I could telnet into the server for testing purposes and talk to it in a vaguely humanoid structured language. And perhaps most importantly, I wanted the server to keep strict referential integrity - which e.g. bare MySQL or LDAP can't do. Both are prone to invalid foreign key references / dangling aliases. From the very start I knew my directory would have a hierarchical tree-like skeleton to serve for primary identification of objects. I didn't insist on partial data distribution that e.g. LDAP allows (which requires one-way symbolic "alias" pointers) - hence I could contemplate a system with strong references. My directory database was to be a single-homed, centralised, strongly consistent data store. No call for messy geographical/organizational data distribution. My system was to be multi-user and multithreaded. Later I found out that I could include a scripting language for easy implementation of member methods - why not embed Perl?

And, last but not least, the system would have to be simple enough for my undereducated tired mind.

In other words, originally I was looking for a tool to meet my object modeling and storage needs. As I didn't find one, I decided to try to implement something that might at least hint the direction of my thoughts. I'm not a good C++ coder - I would like to do some high-level data modeling and manipulation, only I don't have the tools :( Therefore, for the time being the journey has become my goal.

I know I'm probably reinventing the wheel in numerous areas - I have decided to ignore this unfortunate condition. It's a fact of life. I don't even have the time to study all the commercial ODBMS implementations in detail.

9. Bugs and other deficiencies

The set of CLI commands is somewhat incomplete and imperfect - there are numerous areas where improvements could be suggested.

At this moment, the server has basic multi-threaded functionality (can serve several clients simultaneously) but I have not yet started to work on mutual locking among threads, to prevent inconsistencies. This is a high-priority issue. Lots of fun to be expected here (deadlocks etc.)

At this moment, the server does load data from the database onto the heap `on-demand' - but it does not unload data that is not needed anymore (no garbage-collection style release) until it's shut down. It certainly does proper memory management in terms of C++ new/delete or C malloc/free - I hope there are no memory leaks. It just does not yet implement a mechanism to release Kludge Objects that no client has requested for some time. This is another important ToDo item for any potential production use.
On the same note, Kludge currently does not support write-back operation (from the heap to the back-end database). Any data modifications/insertions are "written through" straight into the database - the Kludge CLI doesn't return the prompt until the operation is accomplished. The implementation of a write-back mechanism would probably require the implementation of a DB backend management thread - a moderate amount of work. Also, the Kludge server would have to take care of generating the integer OID's - currently it relies on the AUTO_INCREMENT feature of MySQL.

Next, as of now, there's no client-side API. I'm hoping to create a client-side API for C++ and perhaps Perl, preferably shipping with sample source code of a complete generic client app. My intention is that the client-side API should include a lightweight version of the server core engine, effectively replicating objects from the server on demand, with the client session on the server keeping track of the objects replicated and sending updates, and with the client API providing means to integrate the light client engine with the user interface in a way that allows immediate release of objects that are not needed.

The introduction of this light client-side replicating engine (or a full data replication between servers) will probably require that I introduce an option in the CLI to show the internal OID's. Along with much more complex arrangements to allow the server to "push" updates through the session connection in an asynchronous fashion (with respect to the CLI protocol/language). Also, any error reporting text produced by the server should be prefixed by numeric codes for machine processing at the client (a proven good practice in a number of venerable RFC-standardized protocols).

There's no access control - no user logon, no privilege resolution system. Also, there's no encryption of session traffic. All of this has to be added for production use.

The embedded Perl is not yet full-fledged. The XSUB interface needs to be extended to support the full set of basic KLUDGE operations (INSERT, SET, RENAME, MOVE, DELETE etc.). Debugging the scripts is a terrible experience - no parse-time error messages except for the sad Kludge report that "something went wrong", any Perl-space run-time error exits the server immediately (!). This is something that urgently calls for remedy. The Perl embedded in Kludge is multithreaded, using "5.005 threads" that most people consider deprecated. This is subject to further discussion and perhaps subject to change, if ITHREADS prove viable in this scenario.

See the file kludge_todo.txt (in the Kludge tarball) for a more complete and low-level listing of what's missing and what I could think of for the future.

10. Licensing policy, disclaimer stuff

Kludge is free software.

I myself would probably prefer a FreeBSD-style license (use my code for whatever you want, including your own closed-source stuff), but the GPL'ed components to Kludge cause the whole of Kludge to be also covered by the GPL.

This software is provided "as is". I disclaim liability for any potential damage caused or implied to anyone by using this Kludge.

Kludge is a hobby project. I'm working on it in my free time, at my own discretion. It is not and it hasn't been supported by my past and present employers. In this document, I speak on behalf of myself - the opinions, ideas, misconceptions and rants expressed are my own. In this document, I do not speak on behalf of my employers, past and present.

11. Final notes, a look into the future

-- a message from the author --

I've always been looking forward to the moment when I'd publish my Kludge. The One Thing I've been missing quite desperately is peer review. That is the true reason for this premature release. I need to know opinions of other people: whether or not this is pure lunacy, if it could be any use, what I should focus on, who's been there before, suggestions to trash it all and finally get a real life etc. Being a social creature, I'm publishing kludge with a hope that it would attract feedback.

If someone happens to consider this Kludge a viable approach, I am open to offers of contributions or cooperation on further development. I have very little time to spend developing Kludge myself :(

I am aware that the introduction of cooperative development would require quite some improvement in overall maturity of the means I'm using. I'd have to start using CVS and perhaps Autoconf. I'd have to perform a thorough rework to my error reporting facilities - introduce compile-time (preprocessor-based?) debugging hooks, make a clear distinction between user-space and internal errors, introduce proper logging, daemonization of the server, numbered responses in the CLI protocol... I'd have to improve portability, calculate with different versions of libglib (or replace the GTrees with my own library), Perl, PCRE, different versions of MySQL or even different SQL back-end databases, let alone native storage to some paged flat file structure.

In other words, there's still a LOT I have to learn if Kludge is to become a mature piece of software somewhere down the road.

Last but not least, I would like to thank my employers, past and present, for giving me my regular job, which is paying my living and has always been a rewarding experience, driving me to think and study further.

Frank Rysanek (Frantisek.Rysanek AT post.cz)