Week 5: Section Activity
- Due No due date
- Points 10
- Questions 2
- Available after Feb 24 at 12am
- Time Limit None
- Allowed Attempts Unlimited
Instructions
Section Activity Part 1: Choosing Hyperparameters
Launch JupyterHub. Then download SmallGPT.ipynb from Canvas. Once the server is running, upload that file to JupyterHub. You are going to train your own GPT model on Shakespeare's work.
Your goal as a section is to try a bunch of different combinations of hyperparameters, and see which ones give the best result (lowest validation loss) after running for 45 minutes. Here are the hyperparameters you can change in the notebook, with suggested but not exhaustive possibilities. Your TF will write a table of choices up on the board to ensure that everyone has a different set of hyperparameters. You won't want to just set all parameters to their maximum values—that will make for a large model that will train very slowly. We are trying to find a "sweet spot" that achieves good performance for a fixed duration of training.
Once you have selected your hyperparameters, edit the 2nd cell of the Python notebook to reflect your choices.
- Batch Size: This determines how many training samples get processed together at each step.
You could try: 8, 16, 32, or 64. - Block Size: This determines the context length (characters of "context" or "attention").
You could try: 32, 64, 96, or 128. - n_embd: This sets the dimension of the embedding space.
You could try: 32, 64, or 96. - n_head: This sets the number of parallel "attention heads" in the model.
You could try: 2, 4, or 8.
n_layer: This sets the number of layers of "attention" in the model.
You could try: 2, 4, 6, or 8.
Once all the hyperparameters are set, run the cells in the notebook in order, starting from the top. Recall you can either use shift-Return or click the "play" triangle button to run a cell.
After you run the 3rd cell, which ends with "follow()," it will print the total number of parameters (in millions), train the model for one step, and print the output—which will look like random gibberish. This is a test to ensure that you haven't broken anything in the code. If something goes wrong, ask a classmate or TF for help. If necessary, you can close the notebook, delete it, and upload a new fresh copy of SmallGPT from your computer.
Now go back to the start of that 3rd cell, where you'll see:
- # How long to train the model?
train_hours = 1e-10
Change that so it will run for 45 minutes, or 3/4 of an hour:
- train_hours = 0.75
Then run that cell (the 3rd cell). It should start training. You'll see an updated training loss and validation loss every 100 steps of training. If something goes wrong, ask a classmate or TF for help.
While your model is training, you'll do the next part of the section activity. Once the training is done, record your final validation loss.
Section Activity Part 2: Paper Discussion
Your TF will ask you to share with the rest of the class:
- A brief description of your paper topic and what you found
- How you used Generative AI
- What you found to be useful or not useful
Plan on about 2-3 minutes to present this information to your peers. You'll just be presenting verbally, informally (no slides).
Section Activity Part 3: Predicting text
After 45 minutes, you should have a model that finished training. In cell 4, you can type in text ("Hamlet" in the example) that generates predicted text using your input as a stem. Do you get gibberish or something sensible?