Digitization of data and content are very crucial in the enablement of our digital eco-system and economy. Keeping this extremely important aspect in mind, Avenir offers a wide range of outsourcing services including scanning, indexing, quality and data management, transcription and NLP (Natural Language Processing) training data generation.
Scanning and digitization of land records and provident fund records for a State government in India
Avenir has been involved in the digitization and registration of provident fund documents of unorganized labourers under SASPFW scheme. We successfully digitized records amounting to INR four crores pertaining to six financial years 2011-12, 2012-13 ,2013-14, 2014-15, 2015 -16 ,2016-2017 for different blocks of Diamond Harbor ,Murshidabad and Malda districts. The project was divided into two parts- a) Online Form Registration(Form1) and b) Data Entry(Form4). It was a quite challenging project considering the quality of document, illegible handwriting of agents, managing the mismatch in money deposited, downtime of servers, missing account numbers in the system We worked with a team of about 15 – 20 operators working with all their efforts round the clock in shifts to complete the job in time.
Avenir was delighted to be part of scanning and digitization of SDL and LRO records of page size A3/A4/Legal at Darjeeling subdivision. Challenge was not in the term of number of pages but was in the locational disadvantage and condition of the pages.
Processes that were followed included:
- collection of raw Documents from district office
- Scanning of Mouja specific Documents in ADF/Book scanner before rectification of curling, torn pages
- Image Correction Process and Quality Control per Khatiyan basis that included Orientation, Splitting, Resolution to be fixed at 300 DPI, Deskewing, providing margin around the image, Cropping and final verification
- Merging of set of images per Khatiyan as a single color PDF file as per a fixed naming convention
- Handing over the final files to the client after acceptance testing
- Maintaining proper security measures in handling documents
- Project status reporting to client on a daily as well as weekly basis
AI/ML training data generation and quality audit for NLP startup firm
Avenir has been a part of the voice data acquisition project of Mihup, a vernacular voice interface providing major, as part of their mixed language speech recognition engine training. As a part of the project, the team took part in
a. Voice recording of given or impromptu speeches on selected topics through the voice recorder app of Mihup
b. Language of recording was Bengali/English/Hindi
c. Uploading the voice recording clips in their engine server
Avenir has been part of quality audit of transcription files against the customer-sales interaction audio clips. This has been part of their mixed language speech recognition engine training and tuning. The processes followed were – Listening to the real sales conversation clips – which were mostly full of noise and of mixed language( English, Bengali and Hindi), Validating the transcripted files as provided by Mihup against the clips, Rectification of transcripted files in case of any error in the files – the text was to be written in UNICODE type and submission to the Mihup server for feeding to the engine
Let’s work together on your
next data and content related outsourcing requirement
Be it digitization of physical documents, data entry, indexing, meta data generation, master data management, transcription, quality data generation and annotation for AI / ML (machine learning) applications, Avenir is ready to help.