Skip to content
OpenTrain AI

Document Data Extraction (Law/Finance/Accounting)

OpenTrain AI · Remote · Worldwide · Posted May 15, 2026

Apply for this job Hourly · $8–$11.2/hr

This is a document extraction and annotation project where you'll be working with PDF and DOCX files across a variety of document types (legal contracts, patents, financial records, compliance documents, and more). Your role is to carefully read each document and extract structured information into an annotation panel according to a defined JSON schema.

Each task comes with three AI-generated draft outputs as starting points — your job is to review all three, select the best one, validate every field against the source document, correct any errors or omissions, and ensure the final output is fully accurate and complete.

1. Reading source PDF/DOCX documents thoroughly before extracting
2. Understanding and following JSON schemas (required/optional fields, data types, arrays, objects, nested structures)
3. Selecting the best AI-generated draft output and editing it — not starting from scratch
4. Validating all fields against the source document with high accuracy
5. Capturing all relevant repeated/dynamic entries (e.g., line items, multiple clauses)
6. Setting truly missing required fields to null per schema instructions
7. Writing original summaries in your own words where the schema calls for it