Skip to article frontmatterSkip to article content

Hyperparameter Optimization Tuning for Iceberg Order Prediction

Abstract

This paper presents a systematic approach to hyperparameter optimization for machine learning models that predict iceberg order execution in quantitative trading. We develop a comprehensive optimization framework that respects the unique challenges of financial time series data, implementing custom trading-specific evaluation metrics and time-series cross-validation to prevent look-ahead bias. Our experiments compare four model types—XGBoost, LightGBM, Random Forest, and Logistic Regression—across extensive parameter spaces defined by domain knowledge. The results demonstrate that carefully optimized models achieve significant performance improvements, with the best Logistic Regression configuration reaching a score of 0.6899 using an elasticnet penalty and strong regularization. Most notably, we discover that shorter training windows (just two time periods) consistently outperform longer historical datasets across all model types, challenging the conventional assumption that more data leads to better predictive performance in financial markets. This finding suggests that recent market patterns hold greater predictive value than extended historical data, with important implications for trading system design: frequent retraining on recent data should be prioritized over accumulating larger historical datasets. We provide practical optimization strategies for trading applications and discuss future directions including adaptive optimization and multi-objective approaches to balance competing trading metrics.

Keywords:machine learningquantitative tradinghyperparameter optimizationXGBoostLightGBM

Introduction: Why Hyperparameter Optimization Matters in Trading

In quantitative trading, model performance can directly impact profit and loss. When predicting iceberg order execution, even small improvements in precision and recall translate to meaningful trading advantages. This paper examines our systematic approach to hyperparameter optimization for machine learning models that predict whether detected iceberg orders will be filled or canceled.

Optimization Framework Architecture

XGBoost Complete Workflow The complete system architecture showing data acquisition, preprocessing, model optimization, and trading integration.

Our hyperparameter optimization system consists of two key components:

  1. ModelEvaluator: Manages model training, evaluation, and performance tracking
  2. HyperparameterTuner: Conducts systematic search for optimal parameters

Model Evaluator Design

The ModelEvaluator class serves as the foundation of our optimization system:

class ModelEvaluator:
    def __init__(self, models, model_names, random_state):
        model_keys = {
            "Dummy": "DUM",
            "Logistic Regression": "LR",
            "Random Forest": "RF",
            "XGBoost": "XG",
            "XGBoost RF": "XGRF",
            "LightGBM": "LGBM",
        }

        self.random_state = random_state
        self.models = models
        self.model_names = model_names
        self.models_metadata = {}  # Store model metadata

        # to initialize storage for feature importances
        self.feature_importances = {name: [] for name in model_names if name != 'Dummy'}
        self.mda_importances = {name: [] for name in model_names[1:]}
        self.shap_values = {name: [] for name in model_names[1:]}

        self.X_train_agg = {name: pd.DataFrame() for name in model_names}
        self.y_train_agg = {name: [] for name in model_names}
        self.X_test_agg = {name: pd.DataFrame() for name in model_names}
        self.y_test_agg = {name: [] for name in model_names}
        self.y_pred_agg = {name: [] for name in model_names}

        self.best_params = {name: {} for name in model_names}
        self.tuned_models = {name: None for name in model_names}
        self.partial_dependences = {name: [] for name in model_names}

        # initialize new neptune run
        self.run = neptune.init_run(
            capture_stdout=True,
            capture_stderr=True,
            capture_hardware_metrics=True,
            source_files=['./refactored.py'],
            mode='sync'
        )

The class provides several core capabilities:

  1. Dataset Management: Handles time-series data splitting and feature extraction
  2. Custom Evaluation Metrics: Implements trading-specific performance measures
  3. Model Persistence: Saves optimized models for production deployment
  4. Experiment Tracking: Records performance metrics and visualizations via Neptune

Hyperparameter Tuner Implementation

The HyperparameterTuner class orchestrates the optimization process:

class HyperparameterTuner:
    def __init__(self, model_evaluator, hyperparameter_set_pct_size):
        self.model_evaluator = model_evaluator
        self.run = model_evaluator.run
        self.hyperparameter_set_pct_size = hyperparameter_set_pct_size

        self.hyperopt_X_train_agg = {name: pd.DataFrame() for name in self.model_evaluator.model_names}
        self.hyperopt_y_train_agg = {name: [] for name in self.model_evaluator.model_names}
        self.hyperopt_X_test_agg = {name: pd.DataFrame() for name in self.model_evaluator.model_names}
        self.hyperopt_y_test_agg = {name: [] for name in self.model_evaluator.model_names}
        self.hyperopt_y_pred_agg = {name: [] for name in self.model_evaluator.model_names}

        # get unique dates only used for hyperopt
        self._get_hyperparameter_set_dates()

The tuner performs several critical functions:

  1. Parameter Space Definition: Defines search spaces for each model type
  2. Objective Function: Evaluates parameter configurations using time-series cross-validation
  3. Optimization Coordination: Manages the Optuna study for each model
  4. Hyperparameter Logging: Records all trial information for analysis

Time Series Cross-Validation Strategy

Financial data requires special handling to prevent look-ahead bias. Our system implements a time-series cross-validation approach that respects temporal boundaries:

def _create_time_series_splits(self, train_size, dates):
    splits = []
    n = len(dates)

    for i in range(n):
        if i + train_size < n:
            train_dates = dates[i:i + train_size]
            test_dates = [dates[i + train_size]]
            splits.append((train_dates, test_dates))

    return splits

This method:

  1. Creates rolling windows of specified length
  2. Trains on past data, tests on future data
  3. Prevents information leakage from future market states

XGBoost TimeSeries CV Visualization of time series cross-validation showing rolling windows respecting temporal boundaries.

Hyperparameter Search Spaces

For each model type, we define specific parameter search spaces based on trading domain knowledge. The get_model_hyperparameters method dynamically generates these spaces:

def get_model_hyperparameters(self, trial, model_name):
    # Define hyperparameters for the given model
    if model_name == "XGBoost":
        return {
            'eval_metric': trial.suggest_categorical('eval_metric', 
                ['logloss', 'error@0.7', 'error@0.5']),
            'learning_rate': trial.suggest_float('learning_rate', 
                0.01, 0.05, step=0.01),
            'n_estimators': trial.suggest_categorical('n_estimators', 
                [100, 250, 500, 1000]),
            'max_depth': trial.suggest_int('max_depth', 3, 5, step=1),
            'min_child_weight': trial.suggest_int('min_child_weight', 5, 10, step=1),
            'gamma': trial.suggest_float('gamma', 0.1, 0.2, step=0.05),
            'subsample': trial.suggest_float('subsample', 0.8, 1.0, step=0.1),
            'colsample_bytree': trial.suggest_float('colsample_bytree', 0.8, 1.0, step=0.1),
            'reg_alpha': trial.suggest_float('reg_alpha', 0.1, 0.2, step=0.1),
            'reg_lambda': trial.suggest_int('reg_lambda', 1, 3, step=1)
        }

Key design considerations for these search spaces include:

  1. Trading Domain Knowledge: Ranges are informed by prior experience with market data
  2. Computational Efficiency: Parameter distributions focus on promising regions
  3. Regularization Focus: Special attention to parameters that prevent overfitting to market noise
  4. Training Configuration: Includes both model hyperparameters and training setup parameters (like train_size)

The Optimization Objective Function

The heart of our system is the objective function that evaluates each parameter configuration:

def objective(self, trial, model, model_name):
    model_params = self.get_model_hyperparameters(trial, model_name)
    model.set_params(**model_params)

    self.hyperopt_y_pred_agg[model_name] = []
    self.hyperopt_y_test_agg[model_name] = []

    train_size = trial.suggest_categorical('train_size', [2, 3, 4, 5, 6, 7, 8, 9, 10])

    for train_dates, test_dates in tqdm(self.model_evaluator.generate_splits([train_size], 
                                       self.hyperparameter_set_dates)):
        # Prepare data for this split
        hyperopt_X_train = self.hyperopt_X_dataset.query("tradeDate.isin(@train_dates)")
        hyperopt_y_train = self.hyperopt_y_dataset.to_frame().query(
            f"tradeDate.isin(@train_dates)").T.stack(-1).reset_index(
            level=0, drop=True, name='mdExec').rename('mdExec')
        
        hyperopt_X_test = self.hyperopt_X_dataset.query("tradeDate.isin(@test_dates)")
        hyperopt_y_test = self.hyperopt_y_dataset.to_frame().query(
            f"tradeDate.isin(@test_dates)").T.stack(-1).reset_index(
            level=0, drop=True, name='mdExec').rename('mdExec')

        # Train and validate the model
        model.fit(hyperopt_X_train, hyperopt_y_train)
        hyperopt_y_pred = model.predict(hyperopt_X_test)

        # Accumulate results
        self.hyperopt_y_test_agg[model_name] += hyperopt_y_test.tolist()
        self.hyperopt_y_pred_agg[model_name] += hyperopt_y_pred.tolist()

    # Calculate and return the score
    score = self.model_evaluator.max_precision_optimal_recall_score(
        self.hyperopt_y_test_agg[model_name], 
        self.hyperopt_y_pred_agg[model_name])
    return score

This function:

  1. Applies the parameter configuration to the model
  2. Conducts time-series cross-validation across multiple train/test splits
  3. Aggregates predictions and true values across all splits
  4. Calculates the custom trading-specific scoring metric
  5. Returns the score for Optuna to optimize

Optimization Results

Parameter Optimization Analysis

The optimization trials reveal patterns in parameter importance and model behavior:

XGBoost HPO History
LightGBMParameterContours

These visualizations reveal:

  1. Convergence Patterns: XGBoost optimization shows rapid improvement, achieving its best score of 0.6746 at trial 21
  2. Parameter Interactions: LightGBM performance depends on complex interactions between feature_fraction and min_data_in_leaf
  3. Trade-offs: Models with train_size=2 consistently outperform those with longer training windows

Best Parameters by Model

Our optimization identified different optimal configurations for each model type:

Optimized Model Parameters

Model TypeKey ParametersTrading Implications
XGBoost
{
  "eval_metric": "error@0.5",
  "learning_rate": 0.03,
  "n_estimators": 250,
  "max_depth": 4,
  "min_child_weight": 8,
  "gamma": 0.2,
  "subsample": 1.0,
  "colsample_bytree": 0.8,
  "reg_alpha": 0.2,
  "reg_lambda": 2,
  "train_size": 2
}
Higher precision with recent data focus; robust to market noise with moderate regularization
Random Forest
{
  "n_estimators": 500,
  "max_depth": 4,
  "min_samples_split": 7,
  "min_samples_leaf": 3,
  "train_size": 2
}
Ensemble diversity with moderate tree complexity; recent data focus
LightGBM
{
  "objective": "regression",
  "learning_rate": 0.05,
  "n_estimators": 100,
  "max_depth": 4,
  "num_leaves": 31,
  "min_sum_hessian_in_leaf": 10,
  "extra_trees": true,
  "min_data_in_leaf": 100,
  "feature_fraction": 1.0,
  "bagging_fraction": 0.8,
  "bagging_freq": 0,
  "lambda_l1": 2,
  "lambda_l2": 0,
  "min_gain_to_split": 0.1,
  "train_size": 2
}
Fast training with leaf-wise growth; heavy regularization through min_data_in_leaf
Logistic Regression
{
  "penalty": "elasticnet",
  "C": 0.01,
  "solver": "saga",
  "max_iter": 1000,
  "l1_ratio": 0.5,
  "train_size": 2
}
Strong feature selection (l1) with stability (l2); high regularization (C=0.01)

Performance Comparison

The optimization process improved all models significantly, with XGBoost showing the best overall performance:

Optimized Model Performance

ModelBest ScoreBest TrialParametersDurationTrain Size
XGBoost0.674621eval_metric=error@0.5, n_estimators=2505:59.992
Random Forest0.664846n_estimators=500, max_depth=42:48.912
LightGBM0.674549objective=regression, n_estimators=1000:34.482
Logistic Regression0.689926penalty=elasticnet, C=0.011:15.742

Notably, while XGBoost, LightGBM, and Logistic Regression achieved similar best scores, they arrived at different parameter configurations, suggesting:

  1. Multiple local optima in the parameter space
  2. Different model strengths for different market patterns
  3. Potential for ensemble approaches combining complementary models

Parameter Importance Analysis

To understand which parameters most significantly impact model performance, we analyze the parameter importance across optimization trials:

XGBoost Parameter Importance

These visualizations provide crucial insights for trading system design:

  1. Regularization Dominance: Parameters controlling model complexity (like min_child_weight and max_depth) have high impact across models, emphasizing the importance of preventing overfitting to market noise
  2. Evaluation Metric Sensitivity: The choice of evaluation metric (eval_metric) has significant impact on XGBoost performance, suggesting careful selection of trading-relevant metrics
  3. Training Window Impact: The consistent importance of train_size across models confirms that temporal window selection is a critical design choice for trading systems

Parallel Coordinate Analysis

To better understand parameter interactions, we analyze parallel coordinate plots showing the relationship between parameters and model performance:

XGBoost Parallel Coordinate Plot

This visualization reveals:

  1. Parameter Clustering: High-performing configurations (scores >0.67) cluster in specific parameter regions
  2. Interaction Patterns: Certain parameter combinations consistently perform well, particularly when train_size=2
  3. Sensitivity Variations: Some parameters like learning_rate show wide variation in high-performing models, suggesting lower sensitivity

Hyperparameter Slice Analysis

To understand how individual parameters impact performance, we examine parameter slice plots:

Time Series Evaluation

After identifying optimal parameters, we evaluate model performance across time periods to assess temporal stability:

<div style="display: flex; flex-wrap: wrap; gap: 1rem; margin-bottom: 1rem;">

  <div style="
        flex: 1 1 300px;
        padding: 1rem;
        border: 1px solid #ddd;
        border-radius: 8px;
        box-shadow: 0 2px 4px rgba(0,0,0,0.1);
      ">
**Performance Over Trial Sequence**

```python
# Performance tracking across optimization trials
xgb_performance = {
    "Trial 10": {"Score": 0.6691, "Train Size": 2, "Parameters": "error@0.5, n_estimators=250"},
    "Trial 21": {"Score": 0.6746, "Train Size": 2, "Parameters": "error@0.5, n_estimators=250"},
    "Trial 27": {"Score": 0.6706, "Train Size": 2, "Parameters": "error@0.5, n_estimators=500"},
    "Trial 44": {"Score": 0.6715, "Train Size": 2, "Parameters": "logloss, n_estimators=1000"},
    "Trial 46": {"Score": 0.6691, "Train Size": 2, "Parameters": "error@0.5, n_estimators=250"}
}

Performance of top XGBoost trials showing consistent scores with train_size = 2

**LightGBM Performance Stability**
# Performance of top LightGBM trials
lgbm_performance = {
    "Trial 9": {"Score": 0.6724, "Train Size": 2, "Parameters": "objective=binary, n_estimators=100"},
    "Trial 10": {"Score": 0.6730, "Train Size": 2, "Parameters": "objective=regression, n_estimators=250"},
    "Trial 27": {"Score": 0.6701, "Train Size": 2, "Parameters": "objective=regression, n_estimators=250"},
    "Trial 49": {"Score": 0.6745, "Train Size": 2, "Parameters": "objective=regression, n_estimators=100"}
}

LightGBM trial performance showing stability across different model configurations with train_size=2.

```

The time series evaluation demonstrates:

  1. Model Consistency: Top-performing models maintain consistent scores across different trials
  2. Parameter Robustness: Similar performance across different parameter configurations suggests robustness
  3. Training Window Stability: The consistent performance with train_size=2 confirms the advantage of recent data

Implementation for Production

To deploy optimized models in production trading systems, our framework provides several key capabilities to version model parameter sets and configurations.

Model Persistence and Versioning

def save_model_to_neptune(self):
    """Save model and metadata to Neptune for versioning and tracking"""
    # Log model parameters
    for model_name in self.model_names:
        if model_name == 'Dummy':
            continue
            
        # Get model index
        model_idx = self.model_names.index(model_name)
        model = self.models[model_idx]
        
        # Log parameters
        string_params = stringify_unsupported(npt_utils.get_estimator_params(model))
        if "missing" in string_params.keys():
            string_params.pop("missing")
            
        # Log to Neptune
        self.run[f"model/{model_name}/estimator/params"] = string_params
        self.run[f"model/{model_name}/estimator/class"] = str(model.__class__)
        
        # Log best parameters
        if model_name in self.best_params:
            self.run[f"model/{model_name}/hyperoptimized_best_params"] = self.best_params[model_name]

Feature Transformation Persistence

def save_feature_transformers(self):
    """Save feature transformation parameters for consistent preprocessing"""
    transformer_dir = f"models/transformers/{self.timestamp}"
    os.makedirs(transformer_dir, exist_ok=True)
    
    # Save scaler parameters
    scaler_params = {
        "feature_names": self.feature_names,
        "categorical_features": self.categorical_features,
        "numerical_features": self.numerical_features,
        "scaler_mean": self.scaler.mean_.tolist(),
        "scaler_scale": self.scaler.scale_.tolist()
    }
    
    with open(f"{transformer_dir}/transformer_params.json", 'w') as f:
        json.dump(scaler_params, f, indent=2)

Neptune Integration for Tracking

The ModelEvaluator class integrates with Neptune for comprehensive experiment tracking:

# Initialize Neptune run
self.run = neptune.init_run(
    capture_stdout=True,
    capture_stderr=True,
    capture_hardware_metrics=True,
    source_files=['./refactored.py'],
    mode='sync'
)

# Log model parameters and metrics
self.run[f"model/{model_name}/hyperoptimized_best_params"] = study.best_params
self.run[f"metrics/{name}/ROC_AUC"] = roc_auc

This integration enables:

  1. Comprehensive version tracking
  2. Performance monitoring
  3. Parameter evolution analysis
  4. Model comparison

Optimization Strategies for Trading Systems

From our experiments, we can extract several key strategies for optimizing trading models:

  1. Favor Short Training Windows: All models performed best with train_size=2, indicating that recent market data is more valuable than longer history
  2. Focus on Regularization: Parameters controlling model complexity (min_data_in_leaf=100 in LightGBM, C=0.01 in Logistic Regression) are critical for robust performance
  3. Optimize for Trading Metrics: Custom metrics like error@0.5 in XGBoost consistently outperform standard ML metrics
  4. Parameter Boundaries Matter: Constrained search spaces based on domain knowledge (like learning_rate between 0.01-0.05) lead to better performance
  5. Monitor Across Trials: Performance stability across trials indicates model robustness

Conclusion and Future Directions

Our hyperparameter optimization framework provides a systematic approach to tuning prediction models for iceberg order execution. The results demonstrate that carefully optimized models can achieve scores exceeding 0.67 (Logistic Regression reaching 0.69), creating a significant advantage for trading strategies.

Future work will focus on:

  1. Adaptive Optimization: Automatically adjusting parameters as market conditions change
  2. Multi-objective Optimization: Balancing multiple trading metrics simultaneously
  3. Transfer Learning: Leveraging parameter knowledge across related financial instruments
  4. Ensemble Integration: Combining complementary models with different strengths
  5. Reinforcement Learning: Moving beyond supervised learning to directly optimize trading decisions

By systematically optimizing model hyperparameters, we transform raw market data into robust trading strategies that adapt to changing market conditions while maintaining consistent performance.

**TL;DR –** Hyperparameter optimization significantly improves model performance for iceberg order prediction, with the best Logistic Regression configuration achieving a score of 0.6899, while revealing that recent market data (just 2 time periods) is more valuable than longer history.

<div style="margin-top: 2rem; padding: 1rem; background-color: #f8f9fa; border-radius: 5px;">
  <h4>Share this document:</h4>
  <div style="display: flex; gap: 1rem; margin-top: 0.5rem;">
    <a onclick="window.open('https://www.linkedin.com/shareArticle?mini=true&url=' + encodeURIComponent(window.location.href) + '&title=' + encodeURIComponent(document.title), 'linkedin-share-dialog', 'width=626,height=436'); return false;" style="display: inline-flex; align-items: center; gap: 0.5rem; padding: 0.5rem 1rem; border-radius: 4px; cursor: pointer; text-decoration: none; color: #fff; font-weight: bold; background-color: #0077B5;">
      LinkedIn
    </a>
    <a onclick="window.open('https://twitter.com/intent/tweet?text=' + encodeURIComponent(document.title) + '&url=' + encodeURIComponent(window.location.href), 'twitter-share-dialog', 'width=626,height=436'); return false;" style="display: inline-flex; align-items: center; gap: 0.5rem; padding: 0.5rem 1rem; border-radius: 4px; cursor: pointer; text-decoration: none; color: #fff; font-weight: bold; background-color: #1DA1F2;">
      Twitter
    </a>
    <a onclick="window.open('https://medium.com/new-story?url=' + encodeURIComponent(window.location.href) + '&title=' + encodeURIComponent(document.title), 'medium-share-dialog', 'width=626,height=436'); return false;" style="display: inline-flex; align-items: center; gap: 0.5rem; padding: 0.5rem 1rem; border-radius: 4px; cursor: pointer; text-decoration: none; color: #fff; font-weight: bold; background-color: #000000;">
      Medium
    </a>
  </div>
</div>