“Please do not make it public” (Tencent’s Sogou Input Method)

phyzome · on Aug 9, 2023

Even though it's part of the original post's title, "please do not make it public" is an extremely misleading quote.

capableweb · on Aug 9, 2023

How is it misleading exactly?

> Vulnerability disclosed to IMETS@tencent.com.

> Vulnerability disclosed again via Tencent Security Response Centre (TSRC) web portal.

> Tencent: “Thank you for your interest in Tencent security. There is no low or low security risk for this issue. We look forward to your next more exciting report.”

> Tencent: “Sorry, my previous reply was wrong, we are dealing with this vulnerability, please do not make it public, thank you very much for your report.”

> Tencent’s initial rejection of our disclosure and subsequent about-face served as inspiration for the title of this report.

It's a direct quote from a Tencent reply.

paxys · on Aug 9, 2023

Just because it is a direct quote doesn't mean it can't be misleading when shared without all the necessary context. Tencent asked for it to not be made public during the period while they were actively fixing it and well within any standard vulnerability disclosure deadline.

JohnFen · on Aug 9, 2023

I agree. I don't see anything here that seems out of line.

015a · on Aug 9, 2023

Because they said it essentially as soon as the vulnerability is reported. That's an entirely reasonable thing to ask for; don't make this public, we're working on it. And its a totally normal allowance from security researchers.

The title induces readers into thinking that they said this in some other context. Example 1: They aren't working toward fixing it, don't release this, lets just keep it hush hush. This isn't what happened. Example 2: They did fix it, but they didn't want the researcher to publish details of the problem after they fixed it. This also isn't what happened.

Assuming I understand the context correctly; its absolutely an inflammatory title that has no place in security disclosure articles like this.

netsharc · on Aug 9, 2023

Yeah, kinda disappointing that the CitizenLab folks are exploiting the (I presume) non-mastery of subtle English of the developers to create a "clickbait" title.

If they were English speakers they would've written something along the lines of "We thank you that you respected the vulnerability disclosure policy and notified us. We expect you'll continue respecting the policy and not publish this vulnerability before we resolve the issue and after a period of time where the updated software has been uploaded."

drekipus · on Aug 10, 2023

I agree that it's a miscommunication but batting for citizenlab here, it's just an all-round misunderstanding of culture and language.

Chinese culture had a very strong "save face" mentality, especially big companies that have much government involvement. So they aren't going to admit fault or indicate that they have to do something.

The correct response to tencent's initial response, was to say that you are looking for status update and will disclose vulnerability by X time. Please let us know when the issue has been fixed.

ysavir · on Aug 9, 2023

When I read the title, my impression wasn't that it regarded keeping a vulnerability private until fixed, but that there was some functionality that tencent didn't want people to know about.

rdtsc · on Aug 9, 2023

> “Please do not make it public” (Tencent’s Sogou Input Method) (citizenlab.ca)

Ok, so they didn't make it public and the development team fixed the bugs.

Maybe I am missing some new trend where the headline in these disclosures _has_ to come from the communication with the company. Kind of like vulnerabilities need custom websites with logos and cool made up names?

> Even with the reported vulnerabilities now resolved, the Sogou app relies on transmitting typed content to Sogou’s servers as part of its ordinary functionality.

Well besides the email firewall mess back and forth, shouldn't that have been the main headline: "Everything you're typing on your keyboard is being sent to China"?

Waterluvian · on Aug 9, 2023

“These findings underscore the importance for software developers in China to use well-supported encryption implementations such as TLS instead of attempting to custom design their own.”

I’m very interested in better understanding this. Why do they elect to do this? Is this just developer hubris, as found everywhere? Does this relate to government regulation or control, whether above or under the table?

newaccount74 · on Aug 9, 2023

My experience with TLS is that it is not trivial to use.

Understanding how to use eg. OpenSSL APIs correctly to ensure that a connection is secure, the certificates are valid, etc. is not trivial. The APIs are poorly documented, hard to use, and many examples you can find are outdated (some OpenSSL APIs return different numbers on success/failure depending on version).

The platform native libraries are not much better. For example, the SecTrust APIs on macOS / iOS are also poorly documented, hard to use, and have bugs (eg. some time ago they suddenly started to reject valid certificates from Google cloud for some reason).

Also, your code is always a ticking time bomb, because TLS algorithms are deprecated, certificates expire, etc. So you are always at the risk of your client code to stop working at some point.

So in my opinion, there are often good reasons not to use TLS. But if you make a mistake, everyone will say "You should have used TLS". I wonder what people say when they find a bug despite you using standard crypto?

manuelabeledo · on Aug 9, 2023

It may not be trivial to use, but I fail to understand how a solution to a very hard problem is better if tailored. For example, Open/LibreSSL are widespread, have large communities of both maintainers and developers, which necessarily subjects them to continuous audits over time.

> Also, your code is always a ticking time bomb, because TLS algorithms are deprecated, certificates expire, etc. So you are always at the risk of your client code to stop working at some point.

Certificate expiration should be handled as part of the configuration management lifecycle. Same goes for TLS algos. If you are hardcoding either of these, you are definitely doing something wrong.

est31 · on Aug 9, 2023

For the deprecated TLS algorithms, just use a bunch of reverse proxies at the front using the latest Debian, CentOS, or Ubuntu LTS, with mostly default settings.

For OpenSSL, app developers don't need it. There is OS builtin libraries to do http requests (which is what was done here).

As for certificates, there is plenty of solutions allowing for auto-renewal. It's very easy to set up using automation.

newaccount74 · on Aug 10, 2023

> There is OS builtin libraries to do http requests

If all you want to do is fetch an URL from a public HTTPS server, then I agree.

If you want to do anything even slightly more complex (eg. pin a specific cert, or use TLS with a protocol other than HTTPS) then the built in APIs are almost as bad as OpenSSL (at least on macOS/iOS, I don't know other platforms).

> As for certificates, there is plenty of solutions allowing for auto-renewal.

It's not a problem on the server side. It's a problem on the client side when the root certs expire. It puts an expiration date on your software.

jsiepkes · on Aug 9, 2023

> I wonder what people say when they find a bug despite you using standard crypto?

Not using TLS doesn't automatically mean you need to "roll your own crypto". They could have used a well documentend library such as Google Tink[1] instead of doing their own crypto.

[1] https://github.com/google/tink

Nextgrid · on Aug 9, 2023

It's still more trivial than rolling your own?

I would understand (not approve of it, but merely understand) completely ignoring security/authentication - that would obviously be easier and avoid having to answer hard questions and make hard decisions.

But here it seems like they've put even more effort actually designing some custom encryption scheme based on (wrongly-applied) cryptographic primitives complete with custom request encapsulation format, etc. This is more work than just swapping your TCP channel with a TLS one and reasonably trivial auxiliary code to load/renew certificates. In this case since they're running it over HTTP it's even easier to just put a reverse-proxy in front that will add HTTPS on top.

hangonhn · on Aug 9, 2023

Developer ignorance rather than hubris. People who don't really know anything about cryptography has the naive and wrong impression that encryption renders your secret completely safe against anything and that only by getting the key or a major cipher vulnerability would the plaintext be revealed. They treat it like a blackbox because they don't know anything. In recent years, some of the crypto libraries have methods (i.e. Fernet) that are much safer and takes care of these issues for you but it's still very possible to make mistakes. I've seen engineers use a static IV for AES because they didn't know how they would be able to search for the encrypted data other than making the ciphertext the same for a given key and plaintext. Basically they severely weakened it because they didn't understand the purpose of a random IV. Again, they thought key + plaintext -> encrypt = super secure.

JohnFen · on Aug 9, 2023

It's pretty common for devs who are inexperienced with cryptography to succumb to the temptation to roll their own, especially if they start studying cryptography algorithms.

It's always a mistake, though. This is something I had to cover with younger devs quite a bit back when I worked for a company that made heavy use of cryptography.

2OEH8eoCRo0 · on Aug 9, 2023

> Why do they elect to do this?

They could be rightly suspicious of a western TLS implementation but discovered the pitfall of writing their own. Could have also been intentional.

manuelabeledo · on Aug 9, 2023

> They could be rightly suspicious of a western TLS implementation but discovered the pitfall of writing their own. Could have also been intentional.

They could have deployed TLS with some cipher of Chinese origin, not like Chinese companies haven't done this before [0]

[0] https://ciphersuite.info/cs/TLS_SM4_GCM_SM3/

lucubratory · on Aug 9, 2023

If there's a zero day that's been embedded in a protocol by the NSA or actively used by the NSA, I normally wouldn't expect it to come from the actual encryption process itself. It would be something that choosing your own cipher wouldn't fix, because it would be about compromising security on the software level rather than the encryption level. There's a very good reason the PRC won't allow compromised Cisco routers, it wouldn't surprise me if there was similar thinking here, justified or not.

manuelabeledo · on Aug 10, 2023

> … actively used by the NSA …

So the NSA would purposefully embed an exploitable bug in a cipher they themselves use?

> … it would be about compromising security on the software level rather than the encryption level

Those are two very different things. A cipher could very well be exploitable. What you are talking about is the implementation.

In other words, a vulnerable cipher would be exploitable regardless of the implementation.

It is not the case, though. They tried to deploy their own protocol on top of a well known cipher.

paxys · on Aug 9, 2023

The article says that they use both HTTP and HTTPS endpoints, and the exchanges using HTTPS are secure (as expected). My guess is they had to build their own encryption scheme paired with plain HTTP for older devices or those that for some reason weren't compatible with the latest TLS standards (which are a lot of them).

olliej · on Aug 9, 2023

I'm not a huge fan of the blog title - the clear intent of the title is to make it sound like they didn't want any public disclosure, but my reading is that the first response incorrectly considered it low priority, and then after Tencent realized it was a real issue they quickly said "whoops, please don't disclose this as we need to fix it".

It seems like this could be in part mitigated by making sure their server is not an oracle (though obviously fixing the primitives is also important, but older/non-updatable clients could exist).

I would guess the traffic all over TLS on iOS due to "App Transport Security" requiring https by default - it's not a huge leap to turn it off, but it's controlled by the App's Info.plist so is trivially indexable. Also probably more work than just adding 's' to the protocol (at least from the PoV of the individual dev working on the code).

miki123211 · on Aug 9, 2023

> In this report, we analyze Tencent’s Sogou Input Method, the most popular Chinese input method with over 455 million monthly active users and versions of the app for multiple platforms, including Windows, Android, and iOS. Sogou Input Method accounts for 70% of Chinese input method users, with products by iFlytek and Baidu taking second and third place, respectively. This part is surprising to me. Are the Chinese input methods provided directly by the operating systems not enough? I'm surprised that Microsoft, Apple et al provide such a sub-par service in China that over 450 million people were bothered enough to install a third-party keyboard.

saurik · on Aug 9, 2023

It is a lot better now, but the Chinese keyboard used to be so not-as-good on iOS that Baidu's official help for people (and even for a while an advertisement on their home page!) suggested they jailbreak their phone to get a better keyboard.

lmm · on Aug 9, 2023

> I'm surprised that Microsoft, Apple et al provide such a sub-par service in China that over 450 million people were bothered enough to install a third-party keyboard.

American companies man. They really don't care. So much stuff just breaks or does the bare minimum if you're using any kind of IME, or if you're not using UTF8, or even if you're not using ASCII. There seems to just be a general cultural incomprehension that there's any other way of writing on computers. (It's not even limited to companies - there's a whole bunch of Linux stuff with the same problem, e.g. Snap/FlatPak just break everything and don't care)

dmoy · on Aug 9, 2023

phone native pinyin->character input seems fine now? Google keyboard and iphone's keyboard for pinyin->characters works fine. If you're using normal mandarin.

Windows OS pinyin->character input.... I don't know if I've ever seen someone use something other than Sogou lol, so honestly I can't say how good or bad windows native is at this now.

It's a hard problem, because of the 1:many fanout for any given pinyin input. "bao" can mean like 40 different things - bread thing, weak (?), hug, violence (?), treasured-one (think like "my precious", like a pet name for a baby?), etc etc. Sogou had a big leg up for a long time because it figured out the correct words from context much better than other alternatives, requiring fewer manual selections.

(semi-related note, google's voice->text still fails pretty hard for regional mandarin. For example it really doesn't like the hard Rs at the end of words in far northeastern mandarin. It can't seem to figure out that "baoERRRR" is actually just "bao". That problem doesn't exist for pinyin->characters though)

djtango · on Aug 9, 2023

Haven't used sogou since like 2009 but back then it not only was ace at contextual prediction but also had colloquialisms built in

LordShredda · on Aug 9, 2023

Very responsible handling of a usual cryptography failure. What's more impressive is tencents developers willingness to cooperate despite the firewalls and communication issues. Also do not make your own crypto algorithm

myself248 · on Aug 9, 2023

I must've missed the bit where they explained why _a keyboard_ would be sending anything at all _across the network_ in the first place.

Anyone?

taldo · on Aug 9, 2023

From TFA

> While alphabetic keyboards typically provide autocomplete features for more expedient typing, predictive features in Chinese input methods are more crucial when using input methods such as pinyin where hundreds of characters might match an inputted pinyin syllable. For longer strings of syllables, an IME will commonly reach out over the network to a cloud-based service for suggestions if suitable suggestions are not available in the input method’s local database.

Not saying whether they should, but it's pretty easy to understand why they do it.

Nextgrid · on Aug 9, 2023

It's impressive that users are ok with this. This is even beyond the (now generally-accepted) analytics and ad targeting, it's literally "we'll send all your keystrokes to a remote server", a literal keylogger.

lucubratory · on Aug 9, 2023

The privacy environment in the PRC is pretty different to here (I'm in Australia, but speaking for the West in general). In the PRC there is an expectation by everyone (government and citizen) that the government will have access to your data if it wants it and trying to keep that data away from the government is itself criminal. Foreigners are held to a different standard, but the takeaway is that for the vast majority of PRC internet users security from government invasion of privacy is "not a concern"; it's essentially known that it will always be the case, so why be worried about it?

On the consumer privacy side, talking about consumer data safety from companies using it for commercial exploitation, that's where there is allowable space to be protective, and the laws reflect that in the PRC. If Tencent was found to be doing something that was seriously exploitative of user privacy in a way that made enough Chinese very angry, it would be almost certain that Tencent would be breaking some law to do so, and depending on the circumstances you could see anything from mandatory "make it right" directives from the government to the execution of Pony Ma as a result. PRC citizenry expect that if they are seriously harmed, en masse, by a company, the government will make it right.

This is a very different environment and culture from what we have in the West. In general, if I was the victim of significant harm (alongside many other people) by a large corporation, I would not expect justice. The CEO or those who made the decisions that harmed me wouldn't be executed, the company wouldn't be forced to push an update making my device safe to use again or pay a full refund to everyone, etc. As a result, if I care about this issue I'm not really thinking about the consequences for big companies of harming me, I'm just thinking about how to protect myself. What software can I individually use to protect my privacy, what companies should I individually avoid because I know I don't like what they're doing, etc. These are philosophically quite different approaches to privacy concerns - we have a lot more individual freedom in the West if we choose to use it, when it comes to individual net privacy, but the general attitude of PRC is that they don't have to worry about the privacy thing because the government will handle it one way or another.

pm2222 · on Aug 9, 2023

“These findings underscore the importance for software developers in China to use well-supported encryption implementations such as TLS instead of attempting to custom design their own.” So generally speaking established standard are scrutinized more and thus more trustworthy, right? I can think of all those WiFi encryption methods we’ve been through and they are all vulnerable, sooner or later.

JohnFen · on Aug 9, 2023

> So generally speaking established standard are scrutinized more and thus more trustworthy, right?

Yes, in large part.

Also, implementing good cryptography requires specialist mathematical skills on par with dev skills. It's very easy to make a really trivial mistake such that it looks like the crypto is solid, when it's in fact very weak.

The ability to make a trivial mistake that's hard to spot, combined with the high stakes involved, makes cryptography something that's better left to the experts.

pphysch · on Aug 9, 2023

What is the significance of the headline? It seems like the editors are trying to play into popular stereotypes for clicks, because reading through the disclosure log, it seems like a straightforward process marred by some minor email/communication issues. No real attempt at "suppression/censorship", as the headline implies. What am I missing?

myself248 · on Aug 9, 2023

Tencent initially misclassified the issue as not a security risk. Shortly after, they reconsidered and asked the researchers not to make it public.

stefan_ · on Aug 9, 2023

Yes, what could be wrong with some keyboard input addon that sends every keypress to Tencent, and on top of that, in a manner trivial for a passive eavesdropper to decode?

We used to call these things "keyloggers".

pphysch · on Aug 10, 2023

The severity of the vulnerability has nothing to do with this sensationalized headline.