I don't know what AT is doing but I can say that while JSON Schema is okay as a ...

dtech · on May 11, 2023

There's indeed few languages that can model all of JSON Schema in their type system. Typescript comes close. However, you can just use a subset as you said.

I don't really understand why this is a problem. Unless you're using things like Haskell, Julia, or Shapeless Scala, you generally accept that not everything is modeled at the type level. I don't know the nuances of the C++ types you mentioned, but I have not encountered the ambiguity you described in Typescript or the JVM. E.g. JSON Schema is pretty clear on that any object can contain additional unspecified keys (std::map I assume) unless additionalProperties: false is specified.

> `anyOf`, `oneOf` and `allOf` for [...] C++ data structures?

Like I said I don't know C++ well enough, but these have clear translations in type theory which are supported by multiple languages.I don't know if C++ types are powerful enough to express this.

allOf is an intersection type and anyOf is an union type. oneOf is challenging, but usually modelled OK enough as an union type.

frumiousirc · on May 12, 2023

Thanks for the comment. It helps me think how to clarify what I was trying to say.

What I wanted to express is that using JSON Schema (or any such) for validation encounters a many-to-one mapping from multiple possible types across any/all given programming languages to a single JSON Schema form. That is, instances of multiple programming language types may be serialized to JSON such that their data may be validated according to a single, common JSON Schema form. This is fine, no problem.

OTOH, using JSON Schema (or any such) for codegen reverses that mapping to be one-to-many. It is this that leads to ambiguity and problems.

Restricting to a subset of JSON Schema is only goes so far. For example, we can not discard JSON Schema `object` as it is too fundamental. But, given a simple `object` schema that happens to specify all properties have a common type `T` it is ambiguous to generate C++'s `class` or `struct` or a `std::map<string,T>`. Likewise, a JSON Schema `array` can be mapped to a large set of possible collection types.

To fight the ambiguity, one possibility is to augment the schema with language-specific information. At least, if we have a JSON Schema `object` we may add a (non `required`) property to provide a hint. Eg, we may add `cpp_type` propety. Then, typically, the overhead of using a codegen schema is only beneficial if we will generate code in multiple languages. So, this type hinting approach means growing our hints to include a `java_type`, `python_type`, etc. This is minor overhead compared to writing language types in "long hand" but still somewhat unsatisfying. With enough type-theory expertise (which I lack) perhaps it is possible to abstractly and precisely name the desired type which then codegen for each language can implement without ambiguity. But, given the wealth of types, even sticking with just a programming language's standard library, this abstraction may be fraught with complication. I think of the remaining ambiguity between specifying use of C++'s `std::map` vs `std::unordered_map` given an abstract type hint of, say, `associative_array`. Or `std::array`, `std::list`, `std::vector`, `std::tuple` if given a JSON Schema `array`).

I don't think this is a failing of JSON Schema per se but is an inherent problem for any codegen schema to confront. Something new must enter the picture to remove the ambiguity. In (my) practice, this ambiguity is killed simply by making limiting choices in the implementation of the codegen. This is fine until it isn't and the user says, "what do you mean I can't generate a `std::map`!". Ask me how I know. :)

dtech · on May 15, 2023

Clear, yes, that is indeed a problem. I haven't personally encountered this to be a huge problem, but that might be ecosystem dependent.

baudehlo · on May 10, 2023

I mean the parent said OpenAPI + JSON Schema, not JSON Schema alone. OpenAPI has a ton of generators that are tweakable in the extreme.