To ensure accurate predictions using an already trained model, follow these steps to create a compatible dataset with metrics:
1. Match the Training Dataset Structure
Your prediction dataset must exactly replicate the structure of the dataset used during model training. This includes:
- Metrics:Ensure all metrics are calculated using the same formulas as in the training dataset.
- Columns:Maintain the same column names, formats, and sequence.
- Data Scope:Include only data for the last available observation date (
date_observation).
2. If the Model Was Trained Using a .csv File
- Create a
.csv file with metrics that mirror the training dataset in every aspect.
- Double-check the consistency of column names, data types, and calculated values.
3. If the Model Was Trained Using a Google BigQuery (BQ) Table
You have two options for creating the prediction dataset:
- Option 1: Extend the Original Table
- Add rows with new data (users and their metrics) to the table used during training.
- The system will automatically select rows with the latest
date_observation for predictions.
- Option 2: Create a New Prediction Table
- Set up a new table in BigQuery that mirrors the structure of the training dataset:
- Use identical metrics, column names, and formats.
- Configure a regular update schedule for this table:
- Calculate metrics periodically based on all available data up to the analysis date.
- This ensures predictions are based on the most up-to-date information.
4. Switching from .csv to BigQuery (Optional)
If you initially trained the model using a .csv file but now want to use BigQuery for predictions: