JSON Web Proofs

denys_potapov · on Dec 3, 2023

eyJ... — is a beginning of base64 encoded JSON. The fact that I know it, and see it in standards make me feel bad.

I can't proof that this is wrong, but encoding one text format in other text format and wrapping it in third text format — seems wasteful and hacky. This relates not only to this specs, but rather to web development in general.

Retr0id · on Dec 3, 2023

base64 is not just a text format, but one that's safely transmissible through e.g. 7-bit only mediums, or being embedded into a URL (assuming url-safe b64!), being stored in a cookie and passed through janky middlewares, etc. etc.

Further, base64 doesn't actually encode strings, it encodes bytes. If you're doing cryptography on something, you want to have a canonical byte representation. Canonicalizing JSON itself is error-prone, whereas decoding b64 gives you the same bytes every time.

Zamicol · on Dec 3, 2023

Sanitized, well formed JSON is generally not horribly URL encoded. It's typically less overhead than base64.

Also, for well formed JSON (not arbitrary JSON), it also works fine in HTTP headers. I think those two situations cover about 90% of use cases.

For example, here is a JSON payload URL encoded. It's not too bad, and much better than base64:

https://cyphr.me/coze#?input={%22pay%22:{%22msg%22:%22Hello%...

The initial payload is 238 bytes, URL encoded that payload is 288 bytes, as base 64 it is 318 bytes. (Here's another tool just for that: https://convert.zamicol.com/#?inAlph=text&in=%257B%2522pay%2...)

Retr0id · on Dec 3, 2023

I'd rather try to build systems that always work, rather than only working for well-formed inputs transmitted over well-behaved mediums.

Retr0id · on Dec 3, 2023

https://cyphr.me/coze#?input={%22pay%22:{%22msg%22:%22Hello,...

dwaite · on Dec 4, 2023

The JWS/JWE compact and JSON encodings are text based formats, safe for various internet protocol use (such as embedding in a HTTP header, URL query parameter, or cookie)

The header is JSON and could have potentially used another encoding for space. The payload and signature are both binary, so they needed a way to be represented in the 66 or so safe characters across all of those uses. In that case, non-padded URL-safe Base64 encoding is the best option.

Unfortunately, nested JWTs (signed and encrypted) as well as embedded binary data (such as public keys and thumbprints) in a JSON format also need to be base64 encoded. So there's a bit of a penalty in size for including these in the message, and that puts a bit of a design motivation in applications using these to limit such binary data within the messages themselves.

There is COSE, which uses CBOR to be entirely binary, but CBOR is rather robust and library support isn't close to what we have for JSON support.

For JSON Web Proofs, the goal is to define the core primitives in terms of binary data, such that a CBOR encoding does not require reinvention.

Zamicol · on Dec 3, 2023

Yes. Regarding JOSE (JWS/JWE/JWA/JWK/JWT) I consider that point one of the core design differences I have with JOSE as it results in re-encode ballooning. That's separate from the fact that JSON was intended to be human readable and base64 encoding of otherwise human readable payloads robs JSON of this design goal. (If going so far to base64 encode human readable payload, why not take advantage of efficient binary messaging standards instead of using JSON? The whole point of the "bloat" of JSON over binary forms is human readability.)

Not only is JOSE base64 encoded, but many situations have base64 payloads embedded in the JSON, meaning payloads are double encoded, and as far as I'm aware, JOSE headers are always base64 encoded, regardless of the outer (re-)encoding. Each round of base64 encoding adds overhead.

For example, if starting with a 32 byte payload, the first round base64 encodes as 43 bytes, then is inserted into JSON, and then base64 re-encoded as 58 bytes. The starting payload of 32 balloons to nearly twice the size of 58.

For a direct example from one of the RFCs the header `{"alg":"RS256"}` becomes `"eyJhbGciOiJSUzI1NiJ9"`. That's 15 bytes unencoded, human friendly JSON and 20 bytes as encoded unfriendly base64.

Further, there's a later RFC 7797 that was intended to address the complaints of this, the 'JOSE JSON serialization Unencoded Payload Option', but that too failed to address encode ballooning in the headers. From the RFC:

{ "protected": "eyJhbGciOiJIUzI1NiIsImI2NCI6ZmFsc2UsImNyaXQiOlsiYjY0Il19", "payload": "$.02", "signature": "A5dxf2s96_n5FLueVuW1Z_vh161FwXZC4YLPff6dmDY" }

That encoded header is 58 bytes but when unencoded, {"alg":"HS256","b64":false,"crit":["b64"]}, it is 42 bytes. There's no compelling reason for this design.

The better solution? Just stay as JSON.

I have more to say, but I'll leave it there. You can also check out my presentation on the matter here: https://docs.google.com/presentation/d/1bVojfkDs7K9hRwjr8zMW...

Right now a relevant slide is 115, or control-f "ballooning", or I have other text documents up on Github.

shp0ngle · on Dec 3, 2023

eyJhbG

wyc · on Dec 3, 2023

I'm excited to watch this work evolve. It aims to create the envelopes necessary to support new trust systems like those based on zero knowledge proofs on the web for more privacy-preserving claims.

What's new about this vs. JWTs? Roughly,

JWTs generally happen in two steps:

  1. Get JWT (private)
  2. Present JWT

JWPs can have three:

  1. Get JWP precursor (private)
  2. Generate JWP from JWP precursor (private)
  3. Present JWP

Between steps 1 and 2, JWPs can support algorithms that create selective disclosure, proof generation, etc. so you can show you're over 21 without revealing your date of birth. This extra steps means you don't necessarily need to share the original payload you got from an issuer. Another nice aspect is that ZKP-based algorithms can improve unlinkability to make surveillance more difficult.

adamddev1 · on Dec 3, 2023

I clicked on this thinking it might me something like Agda for web development.

therein · on Dec 3, 2023

JWT, JWS, JWE, JWP. This is getting confusing.

I am still unsure if JWT can be encrypted. You find libraries encrypting it but then does it become JWE? Is it the same thing if the library still calls it a JWT? Are all of these just subtypes of JWTs or standalone RFCs?

It is all over the place. Makes me just take the route of creating a custom JSON payload and then encrypting or signing it.

gregmac · on Dec 3, 2023

Agree the naming is a bit confusing. They're all types of JSON Web Tokens (JWT), though. They're also defined in separate RFCs, but reference back to the JWT one.

They're all made of base64url-encodes segments, separated by a period. The first segment is always the header, and you can easily identify them by starting with "eyJh".

A plain JWT has two segments, the second is the payload. In practice AFAIK it's not really used because it's just bloating your data.

JWS has a third segment that's the signature of the payload plus private key, and can be verified if you have the public key. JWT.io is a site where you can play with these.

JWE has five segments including an encrypted symmetric key, the IV and ciphertext.

So yeah, JWE is technically a different subtype. I'll only note you can stick an already-encrypted payload in a plain JWT or a JWS, and you can also take a JWT or JWS and encrypt it, and none of these is the same as actual JWE token.

dwaite · on Dec 3, 2023

Yes, encrypted JWTs are a thing. You typically have a JWT as a signed payload, then encrypt that JWT using JWE.

An example of such an approach is in the RFC appendix: https://datatracker.ietf.org/doc/html/rfc7519#appendix-A.2

Perhaps the biggest misconception about JOSE specs and JWT is that they are components, not a “batteries included” format. They are meant to be used to define further applications, such as ACMEv2.

Libraries often are written to solve specific problems (say, validating an OpenID Connect id_token) and blur the decisions such specifications made in profiling for their own use.

The nesting approach provides for non-repudiation of the decrypted JWT, which is important for federated identity use cases.

miunau · on Dec 3, 2023

Might I suggest Paseto (https://paseto.io/) instead of writing it yourself - it solves a lot of the headaches of JWT. Signing and encryption are two different things that require two different sets of keys, so you can't mess it up.

(Full disclosure, I've written one implementation: https://github.com/auth70/paseto-ts)

Zamicol · on Dec 3, 2023

And shameless plug for another alternative, Coze: https://github.com/Cyphrme/Coze

erikaww · on Dec 3, 2023

https://xkcd.com/927/

uxp8u61q · on Dec 3, 2023

That's not relevant. All the things in question are answers to different problems. They're not different approaches to the same problem.

dwaite · on Dec 3, 2023

Hello!

I a one of the editors, feel free to ask questions.

8n4vidtmkvmk · on Dec 3, 2023

What are some use cases for these?

dwaite · on Dec 4, 2023

You can say JWS (JSON Web Signature) provides integrity/authentication and JWE (JSON Web Encryption) provides confidentiality.

JWP (JSON Web Proofs) are meant to provide for privacy. I'll give an example of digital documents used as credentials.

Imaging a health record containing your vaccination history. This could be signed and shared as an interchange record between doctors. It could also be used as a proof of vaccination - this is what SMART Health Cards were.

However, there is considerable information in those records that I may not want to share - not just that I _was_ vaccinated for instance, but when and where, by whom, from which batch, potentially under what insurance. It is also common to have personal information like full name and potentially home address.

So you may want the ability to select which information is disclosed - you can do this with traditional signature and hashing algorithms by building a Merkle tree or nesting signatures with ephemeral keys. The document still has integrity, but only some subset of the information has been shared.

The signatures and hashes from a classic algorithm serve as potentially trackable identifiers as well. Newer algorithms like BBS provide unlinkability, where presentation of the same source document multiple times does not allow for parties to correlate my usages.

Finally, you also may want to disclose derived information - such as just that I have a vaccination in the last x months, rather than the precise date or location, or just that I'm of legal age to purchase alcohol in my country, rather than my birthdate. These sorts of predicate proofs are also of interest to the group, although the algorithms and representations aren't as far along yet.

erhaetherth · on Dec 5, 2023

> Finally, you also may want to disclose derived information - such as just that I have a vaccination in the last x months, rather than the precise date or location, or just that I'm of legal age to purchase alcohol in my country, rather than my birthdate. These sorts of predicate proofs are also of interest to the group, although the algorithms and representations aren't as far along yet.

That sounds kind of crazy. I can't imagine how you'd prove derived data. Very cool though. Thanks!

dwaite · on Dec 5, 2023

Yes, crazy moon math! I am not a cryptographer, and try to be careful when I wave my hands around.

Cryptocurrencies seemed to accelerate the revision, implementation and deployment of newer cryptographic techniques. Range proofs are used in that space to obscure transaction amounts, using techniques like bulletproofs.

A more primitive but easier example to understand would be a hash chain, https://en.wikipedia.org/wiki/Hash_chain.

The issuing authority signs a hash of some shared secret seed value x times, where x is my age. If I want to prove I'm over 18, I hash the seed value x - 18 times and give it along with the signed hash. The verifier hashes it 18 times to get the signed value and thus knows I'm "at least" 18 years old.

If I include a second hash of 150 minus my age, I can use that to disclose that I'm under a certain age as well. Together, I can establish an age range.

Range proofs will really loosen the amount of design consideration you need to make when creating credentials with strong privacy considerations.

For example, even processing metadata like expiry times for a credential leaks correlatable information, unless you take additional care. The renewal date for a corresponding physical document may be spread out evenly over the year and often stays consistent, so the expiry time of the physical document might divide people into one of 365 groups without some (potentially painful) additional considerations.

A range proof instead would say 'the current is within the validity period of the document'.

At the extreme this gets to verifiable computation, where I can verify that an output was created from an input document and a set of instructions. However, I can do so without seeing the input document. With this, you could externalize the entirety of processing, and rather than getting personal information back get a message saying "yes this meets your policy".

amluto · on Dec 3, 2023

IMO one of the major problems with JOSE is that the algorithm to be used for verification is part of the object being verified and is generally not part of the verification API or the verification key. This results in the alg "none" attack and no shortage of potential downgrade attacks.

After quickly skimming these drafts, I can’t tell whether it’s fixed. Certainly a lot of algorithm names are sprinkled around the encoded objects, but I didn’t spot a description of what a public key is.

I am moderately concerned by statements like:

6.3.2. Issuer Setup To use the MAC algorithm the issuer must have a stable public key pair to perform signing. To start the issuance process, a single 32-byte random Shared Secret must first be generated. This value will be shared privately to the holder as part of the issuer's JWP proof value

Few if any cryptosystems are intended to be secure if the private key is used in the wrong algorithm. This paragraph reads like one party should, as part of the API used by the application, one party sends 32 bytes, char[32]-style, to another. I don’t think any new design should work like this. New designs should fail hard when misused or when parties disagree as to what protocol variant they’re speaking.

Is there something in here to make the new standard clearly immune to algorithm confusion attacks?

dwaite · on Dec 4, 2023

> IMO one of the major problems with JOSE is that the algorithm to be used for verification is part of the object being verified and is generally not part of the verification API or the verification key. This results in the alg "none" attack and no shortage of potential downgrade attacks.

Right, and I believe this is one of the major misconceptions about JOSE - it isn't defining a data format for secure messages, but common tooling/patterns to be profiled for applications needing secure messaging.

This leads to a confusion of responsibilities between a JOSE library and the application using it.

So we are watching / participating in efforts like https://www.ietf.org/archive/id/draft-tschofenig-jose-cose-g... and https://www.ietf.org/archive/id/draft-ietf-httpbis-message-s... .

I'm hoping there is stricter guidance across the board by the time JWP goes to publication.