Standard Types
XTL's user-extensible subtyping relation is defined by the is_subtype
metafunction: if the metafunction invocation is_subtype<S, T>::type
evaluates to mpl::true_, XTL considers the type S to be a subtype
of type T. Since physical representations of values in different types
may vary, this definition of subtyping relation implies existence of a unified
conversion mechanism capable of transforming physical representation of a
subtype into a physical representation of a supertype. Such a
conversion in XTL is accomplished with the subtype_cast
function template, invoked as subtype_cast<T>(val), where T
has to be a supertype of val's type or otherwise a compile-time error is
issued.
Consider the following types A and B that are in a subtyping
relationship B <: A (XTL provides a rule that turns inheritance into
subtyping):
class A { ... };
class B : public A { ... };
We further define two function types B_to_A and A_to_B.
The function types here are instances of the std::function template,
but with few additional restrictions can also be built-in function types:
typedef std::function<A(B)> B_to_A; // function type B -> A
typedef std::function<B(A)> A_to_B; // function type A -> B
According to the usual subtyping rules between function types (covariant on
return types, contravariant on parameter types), a function type A -> B
is a subtype of B -> A, and we thus would like to be able to use
functions of type A_to_B everywhere where functions of type
B_to_A are expected, but not vice versa. For example:
A f(B);
B g(A);
B_to_A b2a = &f; // Wrap f into std::function
A_to_B a2b = &g; // Wrap g into std::function
b2a = a2b; // OK, but rejected!
a2b = b2a; // Error
The first assignment above, even though safe, is rejected. If implicit
conversions apply between corresponding argument types and return types of two
functions, std::function defines an implicit conversion between those
two function types. This is of no help, since there in general may not be
implicit conversion from B to A. XTL, however, recognizes the
subtyping relation between A_to_B and B_to_A, and with a
subtype_cast the safe assignment is accepted:
b2a = subtype_cast<B_to_A>(a2b); // OK
a2b = subtype_cast<A_to_B>(b2a); // Error
Type Qualifiers
Type qualifiers are generic modifiers of a type that uniformly restrict or
extend the set of values belonging to a type. Genericity of modifiers means
that they are applicable to multiple and not just to a given type.
In programs type qualifiers are used to capture additional semantic properties
of the values. A well-known example of a type qualifier is the const
qualifier of C++, used for tracking immutability of values at different
program points. Other examples include type qualifiers for distinguishing
between user and kernel level pointers, safe handling of format
strings, and tracking of values with certain mathematical
properties.
Different type qualifiers are sufficiently similar so that their treatment in
the language can be generalized. General theory of type qualifiers, have been
extensively studied by Jeffrey Foster in his PhD thesis. Here we will only
mention that there are two kinds of type qualifiers: positive and
negative ones.
- A type qualifier q is positive if T <: q T for all types T for which q T is defined.
- A type qualifier q is negative if q T <: T for all types T for which q T is defined.
The C++ qualifier const, type qualifier tainted,
and optional are examples of positive type qualifiers
because T <: const T, T <: tainted T, and T <: optional T,
respectively. Qualifiers pos,neg,zero (track sign), nonnull
(tracks initialized pointers), and untainted are examples of negative
type qualifiers because pos T <: T, nonnull T <: T, and
untainted T <: T.
It's important to note that multiple type qualifiers can be applied to a type.
The semantics of the resulting type is a set of values defined by the set
theoretic operations implied by the qualifier (negative qualifiers restrict the
set, while positive qualifiers extend it). Such set theoretic definition of the
type implies that the the order of applying qulifiers to a type should not
matter (should not change the meaning of the resulting type).
Here is a complete example of working with type qualifiers using XTL. The
absense of explicit subtype_cast calls is not a mistake here — they are
embedded into generic conversion operator for brief syntax.
// Declare qualifiers we would like to use
DECLARE_NEGATIVE_QUALIFIER(nz); // Non-zero value
DECLARE_NEGATIVE_QUALIFIER(pos); // Positive value
DECLARE_NEGATIVE_QUALIFIER(neg); // Negative value
DECLARE_NEGATIVE_QUALIFIER(untainted);// Secure value
DECLARE_POSITIVE_QUALIFIER(tainted);// Unsecure value
// Define how various operations (+,-,*,/ etc.) affect them
template<> struct add<pos, pos> {typedef qual<pos> type;};
template<> struct add<neg, neg> {typedef qual<neg> type;};
template<> struct sub<pos, neg> {typedef qual<pos> type;};
template<> struct sub<neg, pos> {typedef qual<neg> type;};
template<> struct mul<pos, pos> {typedef qual<pos> type;};
template<> struct mul<pos, neg> {typedef qual<neg> type;};
template<> struct mul<neg, pos> {typedef qual<neg> type;};
template<> struct mul<neg, neg> {typedef qual<pos> type;};
template<> struct mul<nz , nz > {typedef qual<nz> type;};
template<> struct div<nz , nz > {typedef qual<nz> type;};
int main()
{
untainted<nz<neg<int>>> a(-42);
pos<untainted<nz<int>>> b(7);
neg<nz<long>> m = a * b; // ok to drop neg.qual. untainted
//nz<pos<double>> e = b - a; // error: nz is not carried by -
pos<double> d = b - a; // ok: no nz in the LHS
pos<tainted<double>> e = b + a*m; // ok to add pos.qual. tainted
//pos<double> f = e; // error: but not ok to drop it!
}
Regular Expression Types for XML
Type systems that understand XML data have gained considerable interest. The
central idea is to harness the type system to guarantee statically that a
particular program cannot manipulate or produce XML documents that do not
conform to a particular DTD or Schema. The
insight is that XML data corresponds directly to regular expression
types, which then can be given a representation in the type systems of
various languages.
Regular expression types, e.g., as defined in XDuce
language, are sets of sequences over certain domains.
Values from those domains denote singleton and composite sequences. Composite
sequences are formed with the regular expression operators ","
(concatenation), "|" (alternation), "*" (repetition), and "?"
(optionality), together with type constructors of the form "l[...]". If
S and T are types, then S,T denotes all the sequences formed by
concatenating a sequence from S and a sequence from T. S | T denotes a
type that is a union of sequences from S and sequences from T. Type l[T],
where T is a type and l ranges over a set of labels, defines a set of
labeled sequences where each sequence from T becomes classified with the
label l. Type T* denotes a set of sequences obtained by concatenating a
finite number of elements from T. The empty sequence is denoted with (),
and T? denotes any sequence from T or an empty sequence.
Consider for example the following XML snippet describing a contact:
<contact>
<name>Humpty Dumpty</name>
<tel>555-4321</tel>
<email>humpty.dumpty@tamu.edu</email>
</contact>
This snippet conforms to the following XML Schema:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xsd:schema xmlns:xsd ="http://www.w3.org/2001/XMLSchema">
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="tel" type="xsd:string"/>
<xsd:element name="email" type="xsd:string"/>
<xsd:element name="contact">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="name"/>
<xsd:element ref="tel" maxOccurs="unbounded"/>
<xsd:element ref="email" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Labels classify types similarly to how XML tags classify their content; For
example, the type email[...] corresponds to <email>...</email> in
XML parlance. Using XDuce syntax, the regular expression type
contact[name[string], tel[string]*, email[string]*]
corresponds to the above schema: both define the set of XML snippets with
"contact" as the root element containing single "name" element, followed by
zero or more "tel", followed by zero or more "email" elements.
Subtyping relation between regular expression types is defined as the subset
relation between languages generated by tree automata
corresponding to those types. For example, the following subtyping
relationships hold:
T*, U* <: (T | U)*
T, (T)* <: T*
In practice, subtyping between two regular expression types
corresponds to safe convertibility between XML fragments belonging to those
types. Subtyping between XML fragments is useful, for example, in providing
backward compatibility of documents that correspond to an older schema: code
written against a newer schema should work for older schemas, as long as the
type defined by the newer schema is a supertype of the type defined by the
older one.
We define an encoding of regular expression types in C++ in which they are
represented as nested template instantiations, consisting of sequence types,
variants, and lists. The XML elements themselves are represented as a struct
parameterized with two types, the first of which represents the element's tag
and second the element's data. The following typedef represents the encoding of
the regular expression type "contact" discussed above:
typedef
element<
contact,
seq<
element<name, string> ,
rep<element<tel, string> >,
rep<element<email, string> >
>
> Contact;
Again we utilize the static metaprogramming capabilities of C++ to establish
a subtyping relation between C++ encodings of two regular expression types.
In addition to the core type system, we have implemented some supporting
functionality as part of our XML processing library. This includes I/O and
automatic generation of the C++ types from an XML Schema. For I/O, we
provide direct streaming operations. To automate generation of the C++ types
from XML types, we provide an XSL transformation from an XML Schema to C++
source code.
To give a general feel for the use of our XML framework, here is a complete
example of turning an XML Schema into the corresponding C++ encodings of XML
types, and working with them. The comments in the code point out the parts of
the code that were generated from an .xsd file (the XML Schema
description), where our support functionality (input and output) is invoked,
and where the subtyping checks are performed.
using namespace xml;
using namespace std;
// --- definitions generated by XSLT transformation -->
struct name { /*...*/ }; struct email { /*...*/ };
struct tel { /*...*/ }; struct contact { /*...*/ };
typedef element<name, string> Name;
typedef element<email, string> Email;
typedef element<tel, alt<string, int> > Tel;
typedef element<contact, seq<Name, Tel, Email> > Contact;
typedef element<contact, seq<Name, rep<Tel>, rep<Email> > > ContactEx;
// <-- definitions generated by XSLT transformation ---
int main() {
try {
ifstream xml("contact.xml");
Contact contact;
xml >> contact; // Parse file
ContactEx contact_ex = contact; // OK, implicit subtype_cast<ContactEx>(contact)
cout << contact_ex; // Output XML
// contact = contact_ex; // Error, not in subtyping relation
}
catch(invalid_input& x) {
cerr << "Error parsing " << x.what();
return -1;
}
}
To demonstrate the practicality of the library, as another, larger example, we
generated XML types for two established Internet standards for syndication, the
RSS and Atom, and modeled a subtyping relation
between documents of these standards, allowing thus a type safe conversion
frome one to another.
More details about the library as well as corresponding references can be found in one of the following papers.