Since it says on the blog that its only images, text and audio input, does GPT-4...

		itissid on May 13, 2024 \| parent \| context \| favorite \| on: GPT-4o Since it says on the blog that its only images, text and audio input, does GPT-4o likely have a YOLO like model on the phone to pre-process the video frames and send BBoxes to the server?