No, that is the training agent implemented as an instance of quicksort. I don't know why the OP reports that it's "a few percent faster than quicksort" because I can't find where that is claimed in the paper.
In fact, as far as I understand it the paper claims that the learned model (i.e. the student agent) has learned to reproduce the behaviour of this teacher agent after training for a smaller number of steps on average than what the teacher needs to sort a list. That is what the entire claim about superior efficiency rests on (there is no example of a learned model and no asymptotic analysis of the program such a model represents).
As I understand it they designed quicksort like that to be able to train with it. It is quite clear from the video where it is called "quick sort agent" compared to the model one and function1 and function2 is in the stack trace.
"We found adding the previously executed action to the input state st is sufficient to handle dis-ambiguation for this quick sort implementation. Alg. 8 shows the converted quick sort scripted agent"
Right! I was hoping to see the body of one of their system induced algorithms. They don't seem to have included any - or maybe I am mistaken and their system is opaque and does not allow a generated algo. to be read out...