Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know what AT is doing but I can say that while JSON Schema is okay as a validation schema it is less okay as codegen schema. I don't know if there is a fundamental divide between these two uses but in JSON Schema there is definitely an impedance mismatch.

For example, the JSON Schema structures: `anyOf`, `oneOf` and `allOf` are fairly clear when applied toward validation. But how do you map these to generating code for, say, C++ data structures?

You can of course minimize the problem by restricting to a subset of JSON Schema. But that leaves others. For example, a limited JSON Schema `object` construct can be mapped to, say, a C++ `struct` but it can also map to a homogeneous associative array, eg `std::map` or a fixed heterogeneous AA, eg `std::tuple`. It can also be mapped to an open-ended hetero AA which has no standard form in C++ (maybe `std::map<string,variant<...>>`). Figuring out the mapping intended by the author of the JSON Schema is non trivial and can be ambiguous.

At some level, I think this is an inherent divide as each language one targets for codegen supports different types of data structures differently. One can not escape that codegen is not always a fully functional transformation.



There's indeed few languages that can model all of JSON Schema in their type system. Typescript comes close. However, you can just use a subset as you said.

I don't really understand why this is a problem. Unless you're using things like Haskell, Julia, or Shapeless Scala, you generally accept that not everything is modeled at the type level. I don't know the nuances of the C++ types you mentioned, but I have not encountered the ambiguity you described in Typescript or the JVM. E.g. JSON Schema is pretty clear on that any object can contain additional unspecified keys (std::map I assume) unless additionalProperties: false is specified.

> `anyOf`, `oneOf` and `allOf` for [...] C++ data structures?

Like I said I don't know C++ well enough, but these have clear translations in type theory which are supported by multiple languages.I don't know if C++ types are powerful enough to express this.

allOf is an intersection type and anyOf is an union type. oneOf is challenging, but usually modelled OK enough as an union type.


Thanks for the comment. It helps me think how to clarify what I was trying to say.

What I wanted to express is that using JSON Schema (or any such) for validation encounters a many-to-one mapping from multiple possible types across any/all given programming languages to a single JSON Schema form. That is, instances of multiple programming language types may be serialized to JSON such that their data may be validated according to a single, common JSON Schema form. This is fine, no problem.

OTOH, using JSON Schema (or any such) for codegen reverses that mapping to be one-to-many. It is this that leads to ambiguity and problems.

Restricting to a subset of JSON Schema is only goes so far. For example, we can not discard JSON Schema `object` as it is too fundamental. But, given a simple `object` schema that happens to specify all properties have a common type `T` it is ambiguous to generate C++'s `class` or `struct` or a `std::map<string,T>`. Likewise, a JSON Schema `array` can be mapped to a large set of possible collection types.

To fight the ambiguity, one possibility is to augment the schema with language-specific information. At least, if we have a JSON Schema `object` we may add a (non `required`) property to provide a hint. Eg, we may add `cpp_type` propety. Then, typically, the overhead of using a codegen schema is only beneficial if we will generate code in multiple languages. So, this type hinting approach means growing our hints to include a `java_type`, `python_type`, etc. This is minor overhead compared to writing language types in "long hand" but still somewhat unsatisfying. With enough type-theory expertise (which I lack) perhaps it is possible to abstractly and precisely name the desired type which then codegen for each language can implement without ambiguity. But, given the wealth of types, even sticking with just a programming language's standard library, this abstraction may be fraught with complication. I think of the remaining ambiguity between specifying use of C++'s `std::map` vs `std::unordered_map` given an abstract type hint of, say, `associative_array`. Or `std::array`, `std::list`, `std::vector`, `std::tuple` if given a JSON Schema `array`).

I don't think this is a failing of JSON Schema per se but is an inherent problem for any codegen schema to confront. Something new must enter the picture to remove the ambiguity. In (my) practice, this ambiguity is killed simply by making limiting choices in the implementation of the codegen. This is fine until it isn't and the user says, "what do you mean I can't generate a `std::map`!". Ask me how I know. :)


Clear, yes, that is indeed a problem. I haven't personally encountered this to be a huge problem, but that might be ecosystem dependent.


I mean the parent said OpenAPI + JSON Schema, not JSON Schema alone. OpenAPI has a ton of generators that are tweakable in the extreme.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: