Ask HN: What's something interesting you learned from training your own GPT?

Posted by amadeuswoo 1 day ago

Not using APIs, actually training a model from scratch, even a small one

What surprised you about the data, the training process, or the output?

Comments

Comment by linolevan 1 day ago

For tiny models, the SFT data mixture is unbelievably critical to usability. They are unable to generalize in almost any way. If you don't have multi-turn conversations, they will not be able to do multi-turn conversations. If you have multi-turn conversations which are just chatting, and then single turn conversations for math, it will be unable to do math in a multi-turn setting. This is much less true for bigger models.

Comment by dlcarrier 1 day ago

Neural network development platforms are even more bloated and broken than the record set by FPGA development platforms and even mobile phone development platforms.

Comment by baranmelik 1 day ago

That it's really easy to overfit a model