Contents

These notes are a collection of various notes I’ve taken either at UT, from a textbook, or from papers. They are split into sections for easy reading, and are written in \(\LaTeX\) enriched markdown.

While in no particular order, the better examples of each section are put at the top.

Papers

NLP

  1. Multilingual Translation: Where does backtranslation, as an objective, come from? Why does it work? How far can we push it?

  2. BERT: What is BERT? Why is it used everywhere? Why is it better than word2vec?

  3. Neural Text Degeneration: What decoding schemes are optimal, and when? Why do transformers repeat themselves so much?

  4. Product Keys: How can we help huge transformers overfit? How can we better keep training data in the parameters of our big models?

  5. CLIP: How can we tie vision and NLP techniques together? How can we decide if a given image fits in a class label that is natural language?

  6. MASS: Why is MASS more than just an autoencoder? Why does it work at all for translating?

  7. Alligning Embeddings: Can we use word2vec / Glove to translate? Why or why not?

  8. Aesthetic Image Captioning: How can we generate a meaningful captioning dataset from instagram comments? Why is this even useful?

Vision

  1. Image Transformer: How can we apply transformers to images? How does attention work? Is it good (yes!)?

  2. StyleGAN: Why is StyleGAN so good? What even is it? What can we learn from it?

  3. Discrete (Normalizing) Flows: How can we generalize normalizing flows to the discrete case? Why do we even want to?

  4. Challenging Common Assumption in Disentanglment: What is disentanglement? Is it even, mathematically, possible?

  5. Latent Skip: How can we help avoid mode collapse in VAEs?

  6. Conditional Priors: How can we frame a VAE as conditional generation (\(p(x \vert z, y)\))?

  7. Balancing Reconstruction Loss: How can we balance reconstruction loss in VAEs?

  8. Adveserial Autoencoders: What if in our VAE, we don’t want a Gaussian prior?