Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Right! I was looking forward to some insight into what's in that thing, but there was nothing.

Why IS the Gmail app 700MB?



Emerge Tools has an old thread on why it's actually so big: https://x.com/emergetools/status/1810790280922288617


Thanks, that thread is great!

They have a neat treemap breakdown here: https://www.emergetools.com/app/example/ios/com.google.Gmail

130MB is localization data.

This detail was interesting too: https://twitter.com/emergetools/status/1810790291714314706

> There's over 20k files in the app, 17k of which are under 4 kB. In iOS, the minimum file size allocation is 4 kB, so having many small files causes unnecessary size bloat. Gmail could save 56.4 MB by moving their small files to an Asset catalog


Yep, localization is a huge size bloat for enterprisey apps that support many locales. There is no Apple provided way to dynamically download select localization packs based on the device locale. Meta came up with their own solution: https://engineering.fb.com/2022/05/09/android/language-packs...

The small filesize issue is something we commonly see in games, was surprised to see it for Gmail.

And btw we open-sourced much of our analysis after being acquired by Sentry: https://github.com/getsentry/launchpad


130 MB for localization? At 50 languages that would be 2.6 MB/language. If we assume an average 50 bytes per string and another 50 for an identifier, that's 27,000 strings.

That doesn't seem right. Localization feels like it should add a few MB. Not over 100. (Plus shouldn't it be compressed, and locally uncompressed the first time a language gets used?)


Localization doesn’t just mean string translations. Apple platforms give you the freedom to redo the UI to fit the language. For example, parts of System Preferences (not sure about Settings) would look completely different in languages with long words because the original design for English simply didn’t fit. The translators rearranged buttons to make the text fit.


I just looked. In this case, it is just string translations.

In the version I'm looking at there are 27,470 .strings files totaling 69 MiB, but they take up 155.9 MiB of disk space due to the 4 KiB filesystem block size.

The keys for the strings take up 39% of the space while the values take up 61%. About 12% of translations are duplicated (the word "Cancel" is translated like 53 times)

So 55% of the space used for strings localization is just pure waste due to having so many small files. The long keys are rather wasteful too and about 12% of the translations are duplicated (i.e. the word "Cancel" is translated 50+ times per language).

Some of this is arguably Apple's fault. Their whole .string file per table per language is incredibly space inefficient by default.


Sure, but that's a few KBs for each locale at most. Still a long way to 100 MB.

They probably just have lots of leftover localized assets that nobody dares to touch as they aren't sure if it's used anywhere.


Thank you! I've been wondering about exactly this, your explanation makes complete sense.


Any rtl language will probably require a lot of different assets. If for no other reason than gradients. Plus anything asymmetric.


The version I'm looking at has 27,470 strings files, mostly of them less than 4 KB each.

Since the iOS filesystem uses 4 KB blocks, it looks like about half the space is just wasted.


It probably isn’t just text that is localized.


images also may need localization.


4kB is also the minimum file size on Linux, so I imagine a similar issue could exist on Android.


The Gmail app on Android is 150MiB in size.

Android traditionally puts resources into a compressed archive, though, so by simply using an archive for storage, Google may be avoiding the 4k size problem.


Wonder if it is better to create separate localized app download such as gmail-japanese, etc.


Google Play offers such functionality already, it's called App Bundles. Instead of uploading an entire APK, the developers can upload the app assets that get bundled into device-specific APKs containing only the resources necessary for the end device. So you'd only get native libs for your phone CPU architecture, translations for the device language and image assets matching the device resolution for example. In fact, I think it's mandatory now to use the app bundles format (but you're still free to configure it to some extent)

I now see the article is about iOS app, but it looks like the Android app is anywhere between 50mb and 100mb (depending on the apk download side I look at) which is much more reasonable



Author here. Thanks for sharing this. It seems they released an updated version of this analysis last year [1]. It matches what I saw when analyzing the IPA. I tried to do a deeper analysis on the code itself using several tools, including Google's own bloaty [2] which was not very useful without symbols, classdumpios [3] which revealed something like 50k interfaces starting with "ComGoogle", and Ghidra [4], which I left running for a day to analyze the binary, but kept hanging and freezing so I gave up on it. Perhaps comparing the Android and iOS code could lead to something more fruitful.

[1] https://x.com/emergetools/status/1943060976464728250

[2] https://github.com/google/bloaty

[3] https://github.com/lechium/classdumpios

[4] https://github.com/NationalSecurityAgency/ghidra


Looks like it's mostly strings, probably due to localization. They should consider compressing each localization/language, and decompressing the needed bundle on first startup (or language change). Even better: Download the language bundle when needed.


Well, that's a question for OS level. If the OS doesn't require the user to download the language and so language-switching to a new language is doable as an offline operation, I could see it being frustrating that switching to a new language must be done online.

So compression/deduplication is probably the better option. Rather than storing as 1 zip per language, though, you'd probably want a compression format that also eliminates duplication that may occur between languages if you're storing all languages compressed on the system. That means you'd need compression to handle the entire language complex being in one massive compressed blob and you'd just extract out the languages you needed. I assume there are some forms of zipping that do this better than others.


So is the extra space not accounted for from then to now AI related pieces?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: