How to Prepare Dataset for Prediction

To ensure accurate predictions using an already trained model, follow these steps to create a compatible dataset with metrics:

Your prediction dataset must exactly replicate the structure of the dataset used during model training. This includes:

Metrics:Ensure all metrics are calculated using the same formulas as in the training dataset.
Columns:Maintain the same column names, formats, and sequence.
Data Scope:Include only data for the last available observation date (date_observation).

Create a .csv file with metrics that mirror the training dataset in every aspect.
Double-check the consistency of column names, data types, and calculated values.

You have two options for creating the prediction dataset:

Option 1: Extend the Original Table
- Add rows with new data (users and their metrics) to the table used during training.
- The system will automatically select rows with the latest date_observation for predictions.
Option 2: Create a New Prediction Table
- Set up a new table in BigQuery that mirrors the structure of the training dataset:
  - Use identical metrics, column names, and formats.
- Configure a regular update schedule for this table:
  - Calculate metrics periodically based on all available data up to the analysis date.
  - This ensures predictions are based on the most up-to-date information.

If you initially trained the model using a .csv file but now want to use BigQuery for predictions: