OCR Glossary
OCR jargon got you stumped? Our handy glossary explains common terms. Then optimize your document processing easily with our smart OCR tool.
Accuracy
The measure of how correctly an OCR system recognizes characters. Modern systems achieve 97%+ accuracy through advanced neural networks.
Activation Function
Mathematical functions (ReLU, Sigmoid, Tanh) that introduce non-linearity into neural networks.
Alignment
The process of matching extracted text with its corresponding visual position in the original document.
Anchor Boxes
Predefined bounding boxes of various shapes used in object detection models like YOLO and Faster R-CNN.
ANPR
Specialized OCR technology for reading vehicle license plates in traffic and security applications.
API
A set of protocols enabling software integration. OCR APIs allow developers to integrate text extraction capabilities into applications.
ASCII
Character encoding standard representing text in computers, commonly used in OCR output.
Attention Mechanism
A neural network component allowing models to focus on relevant input parts using Query, Key, and Value parameters.
Augmentation
Techniques to artificially expand training datasets through transformations like rotation, scaling, and noise addition.
Autoencoder
A neural network that learns efficient data representations through encoding and decoding, used for compression and feature extraction.
Backpropagation
Algorithm for training neural networks by calculating gradients and updating weights backward through layers.
Baseline Detection
Identifying the imaginary line on which characters sit, crucial for accurate text line segmentation.
Batch Processing
Automated processing of multiple documents without manual intervention. Modern systems process 1000s of pages daily.
Beam Search
Decoding algorithm that maintains multiple hypotheses during sequence generation for better accuracy.
Benchmark Dataset
Standard datasets like MNIST, COCO-Text, ICDAR used for evaluating OCR model performance.
BERT
Transformer-based model for natural language understanding, used in document AI for semantic comprehension.
BiLSTM
RNN variant processing sequences in both forward and backward directions, used in handwriting recognition.
Binarization
Converting images to black and white (binary) format to simplify OCR processing and improve character detection.
Bounding Box
Rectangular regions enclosing detected text or objects, defined by (x, y, width, height) coordinates.
Byte Pair Encoding
Tokenization method that builds vocabulary by merging frequent character sequences, used in modern NLP.
CER
Metric measuring OCR accuracy = (Substitutions + Deletions + Insertions) / Total Characters. Lower is better.
Character Segmentation
Process of separating individual characters from connected text for recognition.
CLIP
OpenAI model learning visual-textual relationships through contrastive learning, used in DeepSeek-OCR's encoder.
CNN
Deep learning architecture using convolutional layers to extract spatial features from images, fundamental to OCR.
Confidence Score
Probability value indicating how certain the OCR system is about a recognition result.
Context Optical Compression
DeepSeek's technique compressing documents to minimal visual tokens. Achieves 10× compression at 97% accuracy.
Contrastive Learning
Training method using positive/negative pairs to learn discriminative representations.
CRNN
Architecture combining CNN feature extraction with RNN sequence modeling for text recognition.
Cross-Attention
Attention mechanism relating two different sequences, crucial for vision-language model integration.
CTC
Loss function for training sequence models without requiring alignment between input and output.
Data Augmentation
Techniques expanding training data through rotation, scaling, noise, perspective transformation.
Deep Learning
Machine learning using multi-layer neural networks to learn hierarchical representations.
DeepEncoder
DeepSeek-OCR's 380M parameter vision encoder combining SAM (local) and CLIP (global) with 16× compression.
Denoising
Removing noise and artifacts from scanned images to improve OCR accuracy.
Deskewing
Correcting rotated or tilted document images to horizontal alignment before OCR processing.
Detectron2
Facebook's object detection framework used for document layout analysis and table detection.
Document AI
AI technologies for understanding, extracting, and processing information from documents.
Document Layout Analysis
Identifying and categorizing document regions (text, tables, images) with geometric and logical analysis.
DPI
Image resolution metric. OCR typically requires 300 DPI or higher for optimal accuracy.
Dropout
Regularization technique randomly dropping neurons during training to prevent overfitting.
Embedding
Dense vector representations of tokens capturing semantic meaning in continuous space.
Encoder-Decoder
Architecture where encoder processes input and decoder generates output, used in seq2seq tasks.
End-to-End Learning
Training models directly from raw input to final output without manual feature engineering.
F1 Score
Harmonic mean of precision and recall, comprehensive metric for classification performance.
Feature Extraction
Process of identifying and extracting relevant patterns from input data for model processing.
Feature Map
Output of convolutional layers showing detected features at different abstraction levels.
Fine-tuning
Adapting pre-trained models to specific tasks by training on domain-specific data.
Font Recognition
Identifying font types and styles in document images to improve OCR accuracy.
Form Recognition
Specialized OCR for structured documents like invoices, receipts, and forms with key-value extraction.
Fusion Model
Architecture merging representations from multiple modalities (vision + text) for unified understanding.
hOCR
File format embedding OCR output in HTML with layout info, bounding boxes, and confidence scores.
HTR
Machine learning technology recognizing handwritten text, more challenging than printed text OCR.
Hyperparameter
Configuration settings (learning rate, batch size, layers) set before model training.
ICR
Advanced OCR using ML to recognize both typed and handwritten text across various fonts.
Image Preprocessing
Techniques like binarization, deskewing, denoising applied before OCR to improve quality.
Inference
Using trained models to make predictions on new, unseen data.
IoU
Metric measuring bounding box overlap, calculated as intersection area / union area.
IWR
AI technology recognizing words from user-defined dictionaries using OCR/ICR character outputs.
Language Model
Statistical model predicting word sequences, used in OCR for spell-checking and context understanding.
LayoutLM
Microsoft's transformer model combining text, layout, and image information for document understanding.
Learning Rate
Hyperparameter controlling step size in gradient descent optimization.
Line Segmentation
Separating document pages into individual text lines for sequential processing.
LoRA
PEFT method fine-tuning models by training small adapter matrices instead of all weights.
Loss Function
Mathematical function measuring difference between predictions and ground truth, guides training.
LSTM
RNN variant with gates to handle long-range dependencies, used in sequence-to-sequence OCR tasks.
Named Entity Recognition
NLP task identifying and classifying entities (names, dates, locations) in extracted text.
Neural Architecture Search
Automated method for discovering optimal neural network architectures.
Normalization
Scaling input data to standard range, improving model training stability and convergence.
OCR
Technology converting images of text into machine-readable digital text using computer vision and ML.
OLR
Document layout analysis segmenting text zones from non-text zones before OCR processing.
OMR
Technology analyzing watermarks, logos, symbols, and patterns on paper documents.
Optimization
Process of adjusting model parameters to minimize loss and improve performance.
Overfitting
Model memorizing training data instead of learning generalizable patterns, poor on new data.
PDF Parsing
Extracting text, layout, and structure from PDF files, combining text extraction with OCR for scanned PDFs.
PEFT
Techniques like LoRA enabling model adaptation with minimal trainable parameters and memory.
Positional Encoding
Adding position information to token embeddings in transformers to maintain sequence order.
Precision
Ratio of true positives to all predicted positives, measuring prediction correctness.
Preprocessing
Preparing raw images through binarization, deskewing, noise removal before OCR.
QLoRA
Memory-efficient fine-tuning loading pretrained models as 4-bit quantized weights with LoRA adapters.
Quantization
Reducing model precision (32-bit to 8-bit/4-bit) to decrease memory and increase inference speed.
Query-Key-Value
Three components of attention mechanism: Query (what to find), Key (compare against), Value (retrieve).
SAM
Meta's vision model for image segmentation, used in DeepSeek-OCR's encoder for local detail capture.
Scene Text Recognition
OCR for text in natural images (street signs, products), more challenging than document OCR.
Self-Attention
Attention mechanism where each token attends to all other tokens in same sequence.
Semantic Segmentation
Pixel-level classification assigning each pixel to a category, used in layout analysis.
Table Detection
Identifying and extracting structured table data from documents with cell segmentation.
Tesseract
Open-source OCR engine supporting 100+ languages, widely used baseline system.
Text-Line Extraction
Segmenting document pages into individual text lines for processing.
Token
Basic unit of text (word, subword, character) processed by NLP models.
Transfer Learning
Using knowledge from pretrained models to improve performance on new tasks with less data.
Transformer
Architecture using self-attention mechanisms, foundation of modern NLP and vision models.
