Programming Techniques, Tools and Languages Group

eXtensible Typing Library

XTL: an eXtensible Typing Library

Type systems built directly into the compiler or interpreter of a programming language cannot be easily extended to keep track of run-time invariants of new abstractions. Yet, programming with domain-specific abstractions could benefit from additional static checking. This paper presents library techniques for extending the type system of C++ to support domain-specific abstractions. The main contribution is a programmable "subtype" relation. As a demonstration of the techniques, we implement a type system for defining type qualifiers in C++, as well as a type system for the XML processing language, capable of, e.g., statically guaranteeing that a program only produces valid XML documents according to a given XML Schema.

Standard Types

XTL's user-extensible subtyping relation is defined by the is_subtype metafunction: if the metafunction invocation is_subtype<S, T>::type evaluates to mpl::true_, XTL considers the type S to be a subtype of type T. Since physical representations of values in different types may vary, this definition of subtyping relation implies existence of a unified conversion mechanism capable of transforming physical representation of a subtype into a physical representation of a supertype. Such a conversion in XTL is accomplished with the subtype_cast function template, invoked as subtype_cast<T>(val), where T has to be a supertype of val's type or otherwise a compile-time error is issued.

Consider the following types A and B that are in a subtyping relationship B <: A (XTL provides a rule that turns inheritance into subtyping):

class A { ... }; 
class B : public A { ... };

We further define two function types B_to_A and A_to_B. The function types here are instances of the std::function template, but with few additional restrictions can also be built-in function types:

typedef std::function<A(B)> B_to_A; // function type B -> A
typedef std::function<B(A)> A_to_B; // function type A -> B

According to the usual subtyping rules between function types (covariant on return types, contravariant on parameter types), a function type A -> B is a subtype of B -> A, and we thus would like to be able to use functions of type A_to_B everywhere where functions of type B_to_A are expected, but not vice versa. For example:

A f(B);
B g(A);
B_to_A b2a = &f;   // Wrap f into std::function
A_to_B a2b = &g;   // Wrap g into std::function
b2a = a2b;         // OK, but rejected!
a2b = b2a;         // Error

The first assignment above, even though safe, is rejected. If implicit conversions apply between corresponding argument types and return types of two functions, std::function defines an implicit conversion between those two function types. This is of no help, since there in general may not be implicit conversion from B to A. XTL, however, recognizes the subtyping relation between A_to_B and B_to_A, and with a subtype_cast the safe assignment is accepted:

b2a = subtype_cast<B_to_A>(a2b); // OK
a2b = subtype_cast<A_to_B>(b2a); // Error

Type Qualifiers

Type qualifiers are generic modifiers of a type that uniformly restrict or extend the set of values belonging to a type. Genericity of modifiers means that they are applicable to multiple and not just to a given type. In programs type qualifiers are used to capture additional semantic properties of the values. A well-known example of a type qualifier is the const qualifier of C++, used for tracking immutability of values at different program points. Other examples include type qualifiers for distinguishing between user and kernel level pointers, safe handling of format strings, and tracking of values with certain mathematical properties.

Different type qualifiers are sufficiently similar so that their treatment in the language can be generalized. General theory of type qualifiers, have been extensively studied by Jeffrey Foster in his PhD thesis. Here we will only mention that there are two kinds of type qualifiers: positive and negative ones.

A type qualifier q is positive if T <: q T for all types T for which q T is defined.
A type qualifier q is negative if q T <: T for all types T for which q T is defined.

The C++ qualifier const, type qualifier tainted, and optional are examples of positive type qualifiers because T <: const T, T <: tainted T, and T <: optional T, respectively. Qualifiers pos,neg,zero (track sign), nonnull (tracks initialized pointers), and untainted are examples of negative type qualifiers because pos T <: T, nonnull T <: T, and untainted T <: T.

It's important to note that multiple type qualifiers can be applied to a type. The semantics of the resulting type is a set of values defined by the set theoretic operations implied by the qualifier (negative qualifiers restrict the set, while positive qualifiers extend it). Such set theoretic definition of the type implies that the the order of applying qulifiers to a type should not matter (should not change the meaning of the resulting type).

Here is a complete example of working with type qualifiers using XTL. The absense of explicit subtype_cast calls is not a mistake here — they are embedded into generic conversion operator for brief syntax.

// Declare qualifiers we would like to use

DECLARE_NEGATIVE_QUALIFIER(nz);     // Non-zero value
DECLARE_NEGATIVE_QUALIFIER(pos);    // Positive value
DECLARE_NEGATIVE_QUALIFIER(neg);    // Negative value
DECLARE_NEGATIVE_QUALIFIER(untainted);// Secure value
DECLARE_POSITIVE_QUALIFIER(tainted);// Unsecure value

// Define how various operations (+,-,*,/ etc.) affect them

template<> struct add<pos, pos> {typedef qual<pos> type;};
template<> struct add<neg, neg> {typedef qual<neg> type;};
template<> struct sub<pos, neg> {typedef qual<pos> type;};
template<> struct sub<neg, pos> {typedef qual<neg> type;};
template<> struct mul<pos, pos> {typedef qual<pos> type;};
template<> struct mul<pos, neg> {typedef qual<neg> type;};
template<> struct mul<neg, pos> {typedef qual<neg> type;};
template<> struct mul<neg, neg> {typedef qual<pos> type;};
template<> struct mul<nz , nz > {typedef qual<nz>  type;};
template<> struct div<nz , nz > {typedef qual<nz>  type;};

int main()
{
  untainted<nz<neg<int>>> a(-42);
  pos<untainted<nz<int>>> b(7);
  neg<nz<long>> m = a * b;          // ok to drop neg.qual. untainted
//nz<pos<double>> e = b - a;        // error: nz is not carried by -
  pos<double> d = b - a;            // ok: no nz in the LHS
  pos<tainted<double>> e = b + a*m; // ok to add pos.qual. tainted
//pos<double> f = e;                // error: but not ok to drop it!
}

Regular Expression Types for XML

Type systems that understand XML data have gained considerable interest. The central idea is to harness the type system to guarantee statically that a particular program cannot manipulate or produce XML documents that do not conform to a particular DTD or Schema. The insight is that XML data corresponds directly to regular expression types, which then can be given a representation in the type systems of various languages.

Regular expression types, e.g., as defined in XDuce language, are sets of sequences over certain domains. Values from those domains denote singleton and composite sequences. Composite sequences are formed with the regular expression operators "," (concatenation), "|" (alternation), "*" (repetition), and "?" (optionality), together with type constructors of the form "l[...]". If S and T are types, then S,T denotes all the sequences formed by concatenating a sequence from S and a sequence from T. S | T denotes a type that is a union of sequences from S and sequences from T. Type l[T], where T is a type and l ranges over a set of labels, defines a set of labeled sequences where each sequence from T becomes classified with the label l. Type T* denotes a set of sequences obtained by concatenating a finite number of elements from T. The empty sequence is denoted with (), and T? denotes any sequence from T or an empty sequence.

Consider for example the following XML snippet describing a contact:

<contact>
  <name>Humpty Dumpty</name>
  <tel>555-4321</tel>
  <email>humpty.dumpty@tamu.edu</email>
</contact>

This snippet conforms to the following XML Schema:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<xsd:schema xmlns:xsd ="http://www.w3.org/2001/XMLSchema">
<xsd:element name="name"  type="xsd:string"/>
<xsd:element name="tel"   type="xsd:string"/>
<xsd:element name="email" type="xsd:string"/>
<xsd:element name="contact">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element ref="name"/>
      <xsd:element ref="tel"   maxOccurs="unbounded"/>
      <xsd:element ref="email" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>
</xsd:schema>

Labels classify types similarly to how XML tags classify their content; For example, the type email[...] corresponds to <email>...</email> in XML parlance. Using XDuce syntax, the regular expression type

contact[name[string], tel[string]*, email[string]*]  

corresponds to the above schema: both define the set of XML snippets with "contact" as the root element containing single "name" element, followed by zero or more "tel", followed by zero or more "email" elements.

Subtyping relation between regular expression types is defined as the subset relation between languages generated by tree automata corresponding to those types. For example, the following subtyping relationships hold:

T*, U* <: (T | U)*
T, (T)* <: T*

In practice, subtyping between two regular expression types corresponds to safe convertibility between XML fragments belonging to those types. Subtyping between XML fragments is useful, for example, in providing backward compatibility of documents that correspond to an older schema: code written against a newer schema should work for older schemas, as long as the type defined by the newer schema is a supertype of the type defined by the older one.

We define an encoding of regular expression types in C++ in which they are represented as nested template instantiations, consisting of sequence types, variants, and lists. The XML elements themselves are represented as a struct parameterized with two types, the first of which represents the element's tag and second the element's data. The following typedef represents the encoding of the regular expression type "contact" discussed above:

typedef 
element<
  contact, 
  seq<
          element<name,  string>  , 
      rep<element<tel,   string> >, 
      rep<element<email, string> > 
     > 
> Contact;

Again we utilize the static metaprogramming capabilities of C++ to establish a subtyping relation between C++ encodings of two regular expression types.

In addition to the core type system, we have implemented some supporting functionality as part of our XML processing library. This includes I/O and automatic generation of the C++ types from an XML Schema. For I/O, we provide direct streaming operations. To automate generation of the C++ types from XML types, we provide an XSL transformation from an XML Schema to C++ source code. To give a general feel for the use of our XML framework, here is a complete example of turning an XML Schema into the corresponding C++ encodings of XML types, and working with them. The comments in the code point out the parts of the code that were generated from an .xsd file (the XML Schema description), where our support functionality (input and output) is invoked, and where the subtyping checks are performed.

using namespace xml;
using namespace std;

// --- definitions generated by XSLT transformation -->
struct name { /*...*/ }; struct email   { /*...*/ };
struct tel  { /*...*/ }; struct contact { /*...*/ };

typedef element<name,    string>                 Name;
typedef element<email,   string>                 Email;
typedef element<tel,     alt<string, int> >      Tel;
typedef element<contact, seq<Name, Tel, Email> > Contact;
typedef element<contact, seq<Name, rep<Tel>, rep<Email> > > ContactEx;
// <-- definitions generated by XSLT transformation ---

int main() {
  try {
    ifstream xml("contact.xml"); 
    Contact contact; 
    xml >> contact;                 // Parse file
    ContactEx contact_ex = contact; // OK, implicit subtype_cast<ContactEx>(contact)
    cout << contact_ex;             // Output XML 
    // contact = contact_ex;        // Error, not in subtyping relation
  }
  catch(invalid_input& x) {
    cerr << "Error parsing " << x.what();
    return -1;
  }
}

To demonstrate the practicality of the library, as another, larger example, we generated XML types for two established Internet standards for syndication, the RSS and Atom, and modeled a subtyping relation between documents of these standards, allowing thus a type safe conversion frome one to another.

More details about the library as well as corresponding references can be found in one of the following papers.

Publications

Presentations

Headlines

About Us | Parasol Lab | Computer Science and Engineering | ©2009 Texas A&M University