Mumbai
Bengaluru based startup Sarvam AI has said its new vision and speech models perform better than major global rivals in key tests for Indian languages. The company claimed its systems beat Google Gemini and ChatGPT in optical character recognition and text to speech benchmarks. In a social media post, co founder Pratyush Kumar said Sarvam Vision reached 84.3 per cent accuracy on the olmOCR Bench English subset, crossing results posted by larger frontier models.
On another test, OmniDocBench version 1.5, the model scored 93.28 per cent overall, showing strong ability in reading complex formulas, charts, and page layouts. Kumar said the Bulbul V3 text to speech system supports 35 voices and works across all 22 scheduled Indian languages, even with low quality scans.
He added that Sarvam Vision is currently the strongest model for Indian language documents. The Vision series uses a three billion parameter model that can caption images, read scene text, understand charts, and parse difficult tables. Examples shared online showed accurate extraction of technical data from merged tables and charts from the latest Economic Survey.
Sarvam AI said its goal is to make artificial intelligence useful and accessible across India by building tools suited to local needs. Union IT minister Ashwini Vaishnaw also praised the work, saying it reflects the growing strength of India’s national AI mission.
The startup said its research focuses on practical use cases such as governance, education, and business documents. By supporting every scheduled language, the tools aim to reduce dependence on foreign platforms and help citizens, firms, and public offices adopt digital services with greater trust, speed, and linguistic inclusion nationwide. Officials said the achievement signals rapid progress nationwide.


