Optional: Large Dataset
If you’ve completed the main activities and have a specific idea that could benefit from access to more data, you can optionally use a larger dataset we’ve provided:
/fast_scratch_1/TRISEP_data/AdvancedTutorial/large_dataset.parquet
This version contains 4 million events, but we don't recommend using it by default. For most learning goals in this tutorial, the smaller dataset is more than sufficient and will train models much faster. Training on the larger dataset can take ~30 minutes per epoch, so it is only recommended if you have a specific hypothesis or idea that requires more data.