The difference is that Word2Vec "learned" these relationships auto-magically from the patterns in the surrounding words in the context in which they appear in written text. Don't forget that this was a revolutionary result at the time, and the actual techniques involved were novel. Word2Vec is the foundation of modern LLMs in many ways.
I can't edit my own post but there are two other big differences between the Prolog example and the Word2Vec example.
1. The W2V example is approximate. Not "fuzzy" in the sense of fuzzy logic. I mean that Man Woman Queen King are all essentially just arrows pointing in different directions (in a high dimensional space). Summing vectors is like averaging their angles. So subtracting "King - Man" is a kind of anti-average, and "King - Man + Woman" then averages that intermediate thing with "Woman", which just so happens to yield a direction very close to that of "Queen". This is, again, entirely emergent from the algorithm and the training data. It's also probably a non-representative cherry picked example, but other commenters have gone into detail about that and it's not the point I'm trying to make.
2. In addition to requiring hand-crafted rules, any old school logic programming system has to go through some kind of a unification or backtracking algorithm to obtain a solution. Meanwhile here we have vector arithmetic, which is probably one of the fastest things you can do on modern computing hardware, not to mention being linear in time and space. Not a big deal in this example, could be quite a big deal in bigger applications.
And yes you could have some kind of ML/AI thing emit a Prolog program or equivalent but again that's a totally different topic.
Someone once told me you need humongous vectors to encode nuance, but people are good at things computers are bad at, and vice-versa. I don't want nuance from computers any more than I want instant, precise floating point calculations from people.
I think you are missing the difference between a program derived from training data and logic explicitly created. Go ahead and proceed to continue doing what you are doing for all words in the dictionary and see how the implementation goes.
It depends whether you want your system to handle all of natural language and give answers which are correct most of the time (but it isn't easy to tell when it's wrong), or to handle a limited subset of natural language and either give answers which are demonstrably correct (once it's fully debugged or proven correct), or tells you when it doesn't know the answer.
These are two opposing approaches to AI. Rule induction is somewhere in between - you use training data and it outputs (usually probabilistic) human-readable rules.
this completely misses how crazy word2vec is. The model doesn't get told anything about word meanings and relationships and yet the training results in incredibly meaningful representations that capture many properties of these words.
And in reality you can use it in much broader applications than just words. I once threw it onto session data of an online shop with just the visited item_ids one after another for each individual session. (the session is the sentence, the item_id the word)
You end up with really powerful embeddings for the items based on how users actually shop. And you can do more by adding other features into the mix. By adding "season_summer/autumn/winter/spring" into the session sentences based on when that session took place you can then project the item_id embeddings onto those season embeddings and get a measure for which items are the most "summer-y" etc.
I prefer the old school
where definitions are human readable rules and words are symbols.