Maybe the problem is that most of these models seem to rely on sequential information (even the transformer needs this for forward generation of text) to encode long range information.
But I can’t remember the last time I relied on sequentially remembering the ordering of tokens in order to complete an essay or hell even reply to an email.
Structurally we retain some kind of hierarchical information (topic, places, names, events) about text.
Is there any active research looking into text generation models which do this? Maybe some kind of query that is made in a learned vector space and which is not temporarily dependent but rather “spatially” - as in these are the facts about the text being generated so far.
I have been trying to fine-tune GPT-2 on genre fiction to work as a sort of "fiction replicator". Stylistically it actually seems to do quite reasonably, but it lacks narrative cohesion. This problem, as you point out, is corpus agnostic.
I thought of trying to keep track of characters and key interactions outside of the model, but I haven't figured out how to make these two models interact reliably -- outside of just having the first component generate prompts for the second model in a kind of cooperative setting.
Is there a known way to set up transformer to do infix generation? That is: give it a start and end prompt, and an estimated number of tokens to fill in between. That seems like it should be doable and could improve things, but I haven't found any work on this problem yet and haven't had the time (and potentially don't have the skills) to look deeply myself yet.
But I can’t remember the last time I relied on sequentially remembering the ordering of tokens in order to complete an essay or hell even reply to an email.
Structurally we retain some kind of hierarchical information (topic, places, names, events) about text.
Is there any active research looking into text generation models which do this? Maybe some kind of query that is made in a learned vector space and which is not temporarily dependent but rather “spatially” - as in these are the facts about the text being generated so far.