Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Took me a while to figure it out, but it appears every instance of the letters 'xe' has been removed. Anyone care to guess why?

a herd of red on (oxen)

Lufthansa bos (boxes)

his day's greatest ertion (exertion - that's when I caught on)

ercising his jaw muscle (exercising)



Almost certainly a botched attempt to fix Unicode code points showing up in text.

For instance, the Euro symbol (U+20AC) is encoded in UTF-8 as the three bytes E2 82 AC. In Python (and other languages?) this can sometimes be misencoded as the 9-char string `\xe2\x82\xac`. The leading '\x' is a special sequence in Python to indicate a hex value.

Someone who didn't know what they were looking at might try to do a couple of heavy-handed replacements on article text to undo this. '\xe' is commonly seen in bad utf8 -> ascii translations because of how utf8 encodes code points.

See the specific Euro example here: https://en.wikipedia.org/wiki/UTF-8#Examples

And a popular answer on StackOverflow specifically on '\xe2' in python output: https://stackoverflow.com/questions/21639275/python-syntaxer...


Your observation was so unexpected and unusual that it is actually more interesting than the long winded article itself. I had noticed the weird half eaten words but thought it was just a poorly edited website with typos.

The Unicode elimination explanation by another person replying to your comment was also quite interesting to read.


Missed the edit deadline, but here's an update:

I've got the day off today, so I'm on this like a poorly-disciplined bloodhound. After searching GQ.com with all 26 letter combinations x_ it seems 'xb' and 'xe' have been removed, site-wide (edit: not site-wide), while none of the other combos are affected. My test words are listed below.

So, is this starting to ring a bell for anybody? Do xb and xe have anything in common? Defunct formatting codes? Emacs function keys, I only half-jokingly joked?

examination

Oxbridge

excuse

Disney XD

boxes

exfoliate

foxglove

exhale

exit

Jaguar XJ

Jaguar XK

axle

axman

Oxnard

exoskeleton

expose

exquisite

iPhone XR

coxswain

next

sexual

Final Fantasy XV

Maxwell

XXX

sexy

Olympus XZ-1


Wow, great work so far. This is really interesting.

My guess is that they were moving content out of some proprietary early-2000s CMS around 2015. Instead of carefully parsing the storage format and extracting the text, they dumped it and the output was peppered with garbage. To sanitize the output, they simply elided certain character sequences.

Further speculation, 'xb' and 'xe' (for 'beginning' and 'end') were control sequences marking the extent of something in the old CMS format

Edit: These people would be the ones to ask:

> The Software Engineering team at Condé Nast International (CNI) knew it needed an automated way to migrate the vast quantities of content, and it developed a tool to do just that, recognizing that no off-the-shelf tool could cope with the disparate set of content it was facing, spanning multiple territories, languages and content types. But to meet its hard three-month deadline of migrating the first territory, Germany, CNI also saw the need for additional resources who were experienced in key technologies, including Node.js and React, so it selected NearForm.

https://www.nearform.com/blog/case-study/accelerating-transf...


Ha - I suspect you're looking in the right direction. Especially when they talk about that "hard three-month deadline." What is it with the arbitrary deadlines, people? The deadline happens once, but the mistake hangs around forever.


"xe" has occasionally been used as a gender-neutral pronoun. Possibly at some point GQ changed their style guide with regards to it, and some (presumably well-meaning) editor used a malformed search and replace on it.


This makes some sort of sense but then wouldn't we be seeing "etheyrtion" instead of "ertion"? Even assuming a find/replace properly targeted the word "xe" we don't have any "replace" happening, which doesn't really track.


Not every instance:

next context exterior exhaustion exploded FedEx


ex vs xe.


Woops, yes, transposed when I was trying to determine when it may have been introduced.

archive.org has only "red on" (archive since 2015, though the article is dated September 2003).

google books finds a 2016 book with this article containing "red oxen".

And gq.com elsewhere has the word "oxen" September 2015: https://www.gq.com/gallery/andrew-moore-photography-book-dir....

Earliest instance I found was February 2003 https://www.gq.com/story/michael-paterniti-surfing-2003 "ercise". There were many others, through at least May 2015 https://www.gq.com/story/old-people-are-robbing-more-banks "safe deposit bos"

Failed to post another edit: Missing-xe disease affects more conde nast publications in that time frame than just GQ as well - for example, teenvogue talks about "ercise" a lot, including https://www.teenvogue.com/story/simple-ways-to-excercise (note the "exCercise")


Holy cow. So it has to have been done recently then, I guess, and probably globally? Which might lend some credence to the "pronoun" theory.

I searched for "bos" and turned up quite a few:

https://duckduckgo.com/?q=%22bos%22+site%3Awww.gq.com

Edit: found a new one - remixes/remis:

https://duckduckgo.com/?q=remis+site%3Awww.gq.com


so does "fos" instead of foxes. This is kind of silly.

https://duckduckgo.com/?q=%22fos%22+site%3Awww.gq.com&kp=1&i...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: