A fundamental evaluation metric measuring the proportion of correct predictions made by a machine learning model out of all predictions, providing a basic measure of model performance.
Accuracy
Accuracy is one of the most fundamental and widely-used evaluation metrics in machine learning, measuring the proportion of correct predictions made by a model out of the total number of predictions. It provides a straightforward assessment of overall model performance across all classes or outcomes.
Mathematical Definition
Basic Formula Accuracy = (Number of Correct Predictions) / (Total Number of Predictions)
In Terms of Confusion Matrix Accuracy = (True Positives + True Negatives) / (True Positives + True Negatives + False Positives + False Negatives)
Percentage Form Accuracy is often expressed as a percentage: Accuracy Γ 100%
Types of Accuracy
Overall Accuracy Standard accuracy across all classes:
- Single metric for entire dataset
- Equal weighting of all predictions
- Most common accuracy measurement
- Suitable for balanced datasets
Balanced Accuracy Average of per-class accuracies:
- (Sensitivity + Specificity) / 2 for binary classification
- Average recall across all classes for multi-class
- Better for imbalanced datasets
- Prevents bias toward majority classes
Top-K Accuracy Considering multiple predictions:
- Correct if true label is in top-k predictions
- Common in image classification (top-5 accuracy)
- Useful for models with uncertainty
- More lenient evaluation metric
Applications by Domain
Classification Tasks
- Image classification and recognition
- Text categorization and sentiment analysis
- Medical diagnosis and screening
- Fraud detection and security systems
Multi-Class Problems
- Object detection and identification
- Language identification
- Product categorization
- Customer segmentation
Limitations and Considerations
Class Imbalance Problems When dataset classes are unbalanced:
- High accuracy can be misleading
- Model may simply predict majority class
- Need complementary metrics (precision, recall)
- Consider balanced accuracy alternatives
Cost-Sensitive Scenarios When different errors have different costs:
- Medical diagnosis: false negatives costly
- Spam detection: false positives annoying
- Security systems: different risk levels
- Accuracy alone insufficient for evaluation
Threshold Sensitivity For probabilistic classifiers:
- Accuracy depends on decision threshold
- Different thresholds yield different accuracy
- May need threshold optimization
- Consider ROC curves and AUC metrics
Improving Model Accuracy
Data Quality Enhancement
- Clean and preprocess data thoroughly
- Handle missing values appropriately
- Remove or correct mislabeled examples
- Ensure representative training data
Feature Engineering
- Select relevant and informative features
- Create new features from existing ones
- Remove redundant or noisy features
- Apply appropriate scaling and normalization
Model Selection and Tuning
- Choose appropriate algorithms for the problem
- Optimize hyperparameters systematically
- Use cross-validation for robust evaluation
- Consider ensemble methods for improvement
Training Strategies
- Implement proper regularization techniques
- Use adequate training data size
- Apply data augmentation when appropriate
- Monitor for overfitting and underfitting
Accuracy in Different Contexts
Training Accuracy Performance on training data:
- Indicates model learning capacity
- Should improve during training
- High training accuracy may indicate overfitting
- Compare with validation accuracy
Validation Accuracy Performance on held-out validation data:
- Guides model selection and hyperparameter tuning
- Prevents overfitting during development
- Should track closely with training accuracy
- Used for early stopping criteria
Test Accuracy Final performance evaluation:
- Unbiased estimate of generalization performance
- Should only be computed once after model finalization
- Represents real-world performance expectation
- Critical for model deployment decisions
Complementary Metrics
Precision and Recall For more detailed performance analysis:
- Precision: Quality of positive predictions
- Recall: Coverage of actual positive cases
- F1-Score: Harmonic mean of precision and recall
- Important for imbalanced datasets
Confusion Matrix Detailed breakdown of predictions:
- Shows exact prediction patterns
- Reveals class-specific performance
- Helps identify systematic errors
- Enables targeted model improvements
ROC and AUC For probabilistic classifiers:
- ROC curve shows threshold trade-offs
- AUC summarizes overall classification ability
- Threshold-independent evaluation
- Useful for binary classification problems
Reporting Best Practices
Statistical Significance
- Report confidence intervals
- Use cross-validation for robust estimates
- Test statistical significance of improvements
- Consider sample size effects
Context and Baselines
- Compare against relevant baselines
- Provide domain-specific context
- Report accuracy alongside other metrics
- Explain practical significance of results
Error Analysis
- Analyze misclassified examples
- Identify systematic error patterns
- Report per-class accuracy when relevant
- Discuss failure modes and limitations
Understanding accuracy as both a useful metric and its limitations is crucial for effective machine learning model evaluation and deployment decisions.