{"id":2082,"date":"2025-10-20T09:36:17","date_gmt":"2025-10-20T13:36:17","guid":{"rendered":"https:\/\/ozer.gt\/log\/?p=2082"},"modified":"2025-10-21T11:01:42","modified_gmt":"2025-10-21T15:01:42","slug":"update-on-using-llms-for-ocr","status":"publish","type":"post","link":"https:\/\/ozer.gt\/log\/2025\/10\/20\/update-on-using-llms-for-ocr\/","title":{"rendered":"Update on using LLMs for OCR"},"content":{"rendered":"<p>Here&#8217;s an update on <a href=\"https:\/\/ozer.gt\/log\/2025\/09\/30\/using-llms-for-text-extraction\/\">using LLMs for OCR<\/a> without having to use the same hammer (generic model) for all nails. DeepSeek has released an OCR-focused model: <a href=\"https:\/\/github.com\/deepseek-ai\/DeepSeek-OCR\">https:\/\/github.com\/deepseek-ai\/DeepSeek-OCR<\/a><\/p>\n<p>Check out the deep parsing mode, which is parsing images within documents through secondary model calls. Very useful for data extraction. The results are pretty impressive too:<\/p>\n<blockquote><p>Our work represents an initial exploration into the boundaries of vision-text compression, investigating how many vision tokens are required to decode \ud835\udc41 text tokens. The preliminary results are encouraging: DeepSeek-OCR achieves near-lossless OCR compression at approximately 10\u00d7 ratios, while 20\u00d7 compression still retains 60% accuracy. These findings suggest promising directions for future applications, such as implementing optical processing for dialogue histories beyond \ud835\udc58 rounds in multi-turn conversations to achieve 10\u00d7 compression efficiency.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s an update on using LLMs for OCR without having to use the same hammer (generic model) for all nails. DeepSeek has released an OCR-focused model: https:\/\/github.com\/deepseek-ai\/DeepSeek-OCR Check out the deep parsing mode, which is parsing images within documents through secondary model calls. Very useful for data extraction. The results are pretty impressive too: Our [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"cybocfi_hide_featured_image":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-2082","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts\/2082","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/comments?post=2082"}],"version-history":[{"count":1,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts\/2082\/revisions"}],"predecessor-version":[{"id":2083,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts\/2082\/revisions\/2083"}],"wp:attachment":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/media?parent=2082"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/categories?post=2082"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/tags?post=2082"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}