None of them do it well from our experience. We had to write our own custom pipe... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		leetharris 3 months ago \| parent \| context \| favorite \| on: GLM-OCR – A multimodal OCR model for complex docum... None of them do it well from our experience. We had to write our own custom pipeline with a mixture of legacy CV approaches to handle this (AI contract analysis). We constantly benchmark every new multimodal and VLM model that comes out and are consistently disappointed.

coder543 3 months ago [–]

If someone releases a benchmark/dataset, I'm sure that significantly increases the chances of one of these AI labs training on the task.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact