Design and Build Reusable XML Schemas: 10 Tips

XML Schema Definition (XSD) files define the structure and data types used in XML messages. XML schemas are a must-have in any application that relies on the use of XML. XML Schemas have become the universal definition language for integrating systems, as well as for defining common formats used for data interchange. Although there is not a "one size fits all" standard for creating schemas, it is essential to define XML Schema standards within IT organizations in order to ensure XML schemas can be easily reused, maintained and extended while minimizing impact on existing integrations. Without best practices and naming conventions, a project can end up with inconsistent schemas that may be too rigid or too relaxed to meet project requirements. In this blog post, I will go over ten practical tips for designing and building reusable XML Schemas. These recommendations can be used as a starting point for defining XML Schema standards within your organization.

1. Use XML Schema naming conventions

XML schemas primarily define complex data types by nesting simple (primitive) types. Just like you would do in any programming language, it is important to follow naming conventions. XML Schema naming standards should include rules for casing and naming of elements, among others. XML elements and attributes are typically named using lowerCamelCase, while types are typically named using PascalCase (or UpperCamelCase). These naming conventions can also specify standard suffixes to be used, e.g. all Date fields should be suffixed with the word Date.

It is critical to also adopt naming conventions for XML namespaces. XML namespace are the equivalent of packages in Java and namespaces in .NET. You should never use the default namespace or not use one, even for internal applications. I have seen many applications that make it to production with default namespaces (e.g. http://tempuri.org). Start by defining a namespace template similar to this: http://www.company.com/businessDomain/appName/entityGroup. Take into consideration that, unless overridden, the package name for generated application objects will be generated after the XML namespace (e.g. com.summa_tech.intranet.blog.types). Define a standard for XML namespaces in your organization if there isn't one already. Keep an inventory of all namespaces being used and have a small review process for approving new namespaces.

Once you have defined the namespace for your schema, start by creating your .xsd file containing the targetNamespace and a namespace prefix as follows:

<xs:schema targetNamespace="http://www.summa-tech.com/common/types" xmlns:tns="http://www.summa-tech.com/common/types" xmlns:xs="http://www.w3.org/2001/XMLSchema">

It is also a good idea to have guidelines over namespace prefixes. Usually a three letter acronym or abbreviation will do. Keep in mind that at runtime several XML binding frameworks may generate the namespace prefix dynamically (e.g. ns1, ns2, etc.).

2. Organize schema files using import and include directives

XML schemas provide two directives for organizing files: import and include. The import directive allows referencing types with different namespaces defined on a separate schema file, while the include directive allows referencing types with the same namespace referenced on separate XSD files. Without the use of these directives schema files can grow out of proportion. Organizing schemas on multiple files can help splitting ownership of schema files. For instance, one enterprise-wide group can have the primary responsibility for maintaining schema files that contain common types, while individual groups can have ownership of their application specific types. At a minimum use these directives to break common types into separate files so that they can be reused without duplication. A popular use of the import statement is to reference XML types used by a WSDL on a separate file. This allows reusing the types defined on the schema, without necessarily having to use web services.

3. Use explicit complex types

In addition to having a root element, schemas may contain several complex types. Complex types can be defined inline or explicitly. The first example below defines a root element using an inline complex type:

<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="message" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>

The next schema is equivalent to the schema above, but referencing a complex type instead.

<xs:element name="root" type="tns:ComplexType" />
<xs:complexType name="ComplexType">
<xs:sequence>
<xs:element name="message" type="xs:string" />
</xs:sequence>
</xs:complexType>

Generating XML files from both schemas will generate identical files. The benefit of defining explicit complex types is to allow reusing the type in other elements. For instance you could define an Address complex type that can be used as the type for both a BillingAddress and ShippingAddress elements. The explicit complex type also allows you to reference types defined in other XSD files and import them into your schema. From my own experience, I find it better to define all complex types explicitly upfront as they can be reused without further refactoring.

4. Use base types

Base types can be used to define common elements and attributes across several XML types. For instance a base type can define and "id" element to be available on all types extending it:

<xs:complexType name="baseType">
<xs:sequence>
<xs:element name="id" type="xs:string" />
</xs:sequence>
</xs:complexType>
<xs:complexType name="Customer">
<xs:complexContent>
<xs:extension base="tns:baseType">
<xs:sequence>
<xs:element name="name" type="xs:string" />
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

Note that you can add a base type to any complex type by simply adding the complexContent/extension elements before a sequence group.

5. Define and use XML Enumerations

XML Enumerations allow constraining element values to a discrete set of valid values. XML enumerations can only be defined on simple types (e.g. string). An alternate way of defining enumerations with complex types (i.e. types with more than one element) is by using the choice element.

Enumerations can be defined inline or as explicit types. My preference is to define them as explicit types so that they can be reused within the same schema or from other schemas. Here is an example of an enumeration defined as an explicit simple type:

<xs:element name="role" type="tns:RoleEnum" />
...
<xs:simpleType name="RoleEnum">
<xs:restriction base="xs:string">
<xs:enumeration value="Administrator" />
<xs:enumeration value="User" />
</xs:restriction>
</xs:simpleType>

6. Use XML Schema attributes for metadata

I am not a very big fan of using XML attributes mainly because of two reasons: they can only be specified as simple (primitive) types; and can make parsing logic a bit more complex. They can however be very useful in certain situations. For instance, they can be a good fit for specifying certain metadata elements that would otherwise pollute of your type definitions. An attribute can be used for a status value to indicate that a given entity has been inserted, updated or deleted. This status may not belong in the logical model, but it is required for implementation purposes. Attributes can be specified as optional (default), required or prohibited.

7. Do not make every element in the schema optional

By default all elements in a schema are required (minOccurs="1") and not nullable (nillable="false"). Unless it is for a very specific purpose, a schema with all elements required is often too rigid. The applications using such schema may not have all data elements available and will end up providing default values that can be misleading. On the other extreme, if all elements are marked as optional within a schema, it becomes too flexible and will require implementing multiple validation rules as well as clearly documenting what is required in different situations. The ideal (and often difficult) solution is to find a middle ground, making elements optional only when they must be.

8. Reuse common types and common XSD libraries

Do not reinvent the wheel. If someone within your organization has a good XML type library that you can leverage, reuse it and strive to standardize it. Such a library can include type definitions for Addresses, standard headers, or certain business entities. The latter is the harder to define as common. Different business areas need different set of data elements for the same business entity. This is why enterprise-wide canonical models often end up with most elements being marked as optional. Even having a minimalistic definition for such business entities can increase reusability and interoperability among application sharing these types.

9. Use tools

Even if you are an XML schema expert, an XML Schema editor can help you become more productive. From my experience, a good tool must meet three basic requirements. First, it must provide auto-complete capabilities and allow you to validate the schema. The second requirement is that it should let you generate sample XML files with various options, such as for populating optional nested elements up to N levels. Finally, it must let you validate an XML file against a set of schemas. This last functionality is essential during integration testing for validating files generated by applications that are not relying on an XML binding framework.

10. Governance

Without standards and processes in place XML Schemas can become too complex and hard to maintain and extend. Such processes are typically defined as part of the SOA governance within an organization. If there is not such governance in place, having a small steering committee in place can help. By committee, I mean a small group of experts that can help define new schemas and review them to ensure they comply with existing standards. This group should also encourage reuse of common schemas. It is equally important that this group does not become a bottleneck for creating and defining new schemas.

Custom Development