Discussion about JSON payloads and code generation
With this post I would like to talk about the current situation of describing JSON payloads and how we can use those specifications for code generation. To get started I will explain the situation from my view and also the lessons learned which we have gathered from building TypeSchema.
Currently the most known specification to describe JSON payloads is JSON Schema, this is also due to the fact that it is integrated into OpenAPI. JSON Schema is great to validate JSON payloads but it is difficult to use for code generators, since it has a highly recursive and flexible nature. Currently there are tools available which use JSON Schema for code generation but they all have to somehow solve the inherent problems of JSON Schema.
It looks like OpenAPI will be in future versions more JSON Schema independent so that it would be possible to use also other schema specifications, which are more optimized for a specific use case like i.e. code generation. There is currently already JTD RFC8927 which tries to solve those problems and we have developed a specification called TypeSchema for our use cases. JSON Schema has also this idea of vocabularies, which theoretical could lead to a solution but there is currently no vocabulary available for code generation and from my point of view it would be best to develop this in a separate specification, since all vocabularies inherent properties from JSON Schema which are not suited for code generation.
To be more specific we have wrote down some points which are (from our view) useful for a code generation specification:
1. Determine type based on the schema
In TypeSchema we identify each type based on the used keywords i.e. a type “integer” means the type is of type integer, a type “object” using the “properties” keyword means it is a struct type, a type “object” using the “additionalProperties” keyword means it is a map type, etc. JTD follows also the same logic that we can determine the type based on the schema (used keywords). In JsonSchema we have the problem that we can evaluate a schema only in combination with the actual data, since we can combine multiple types and keywords and the keywords are applied based on the provided data. This is great for validation but is a problem for code generation. A code generator must be able to determine the type based on the schema, but this works in JTD and TypeSchema.
2. Unique name for each entity
One really important step for each code generator is, that it is possible to find a unique name for each entity. In JsonSchema we have a problem with i.e. nested objects:
In this case a code generator has no unique name for the “parent” object. We need this name since this object is maybe also used in other objects and we want to generate only a single class to represent this type. Because of this we have disallowed nested objects at TypeSchema and allow only references inside an object, the example above would look like:
Through this restriction we are always able to find a unique name which we can use i.e. as class name and which is also recognized by the author of the schema. In JTD it looks like nested objects are also allowed, this would then result in the same problem.
3. Extending existing types
In TypeSchema we have added a new keyword “$extends” which explicit extends an existing entity from another entity. This allows also code generators to generate a programmatic “extends” for each class. In JSON Schema we need to use the “allOf” keyword for this, which allows to combine multiple schemas, but the keyword means more “the provided data must be valid against all of the provided schema”, we can extract from this an extends logic but it is not intended for this, also we could have problems with multi-inheritance. Regarding JTD from what I can see there is currently no way in JTD to extend an existing entity.
4. Generics
This is a more advanced feature of TypeSchema and we could probably live without it but it is really nice to define reusable generic types. I.e. the most prominent example is a collection object, at our platform we don't want to define a collection object for every entity type, instead we create a collection using generics and then we can use this collection with different types i.e.:
This makes it also easy to change an existing collection even if it used by many entities. Also code generators can generate actual “generics” if the language supports this like i.e. Java or TypeScript.
5. Union/Intersection types
In TypeSchema we see the “oneOf” and “allOf” keyword as union and intersection types, the “anyOf” keyword is not supported by TypeSchema. Code generators which support such types can then generate actual union types i.e. for TypeScript we generate “type MyType = FooType | BarType” if a union type is used, then we can also reference this type. Of course the object model of TypeScript fits perfectly to TypeSchema and there are probably not many languages which support union/intersection types in this way. But in general it is a feature which is really important to describe JSON payloads.
6. Import
Since in TypeSchema we have removed JSONPath it is no longer possible to reference an arbitrary schema inside a remote document. Because of this we have added the “$import” keyword which reads the complete remote schema and imports all types under the “definitions” key to a specified namespace. At the “$ref” keyword we can access an imported type now via i.e. “my_ns:StudentMap”, which is a little bit like an XML namespace. Regarding JTD I think we need some way to be able to combine multiple schema documents, so that it is possible to reference external types.
7. Keyword compatibility
In TypeSchema we have tried hard to not come up with many new keywords, so we use “properties” for objects, “additionalProperties” for maps, “items” for arrays, “oneOf” for union and “allOf” for intersection types. At the “$ref” keyword we have removed the need for JSONPath since we always directly reference an entity inside the “definitions” catalog, JTD also follows the same logic but it uses the “ref” instead of “$ref” keyword. In general the question would be how much compatibility we want to have with JSON Schema, in the case of TypeSchema we are still able to parse specific limited versions of JSON Schema.
Conclusion
This is of course our opinionated view on what a schema specification should solve for code generators. Currently we convert any TypeSchema to a fitting JSON Schema to generate a valid OpenAPI specification, but we think there is a great potential to improve the situation for code generators.