Why this Class?

  • Deep learning is the holy grail for almost all modern AI tasks
  • Breakthroughs with deep leanring
    • LLMs (Large Language Models) - ChatGPT, Llama-2, PaLM-2
    • Diffusion model - Stable Diffusion, Midjourney
    • Multimodal: DALL-E 3
    • Automatic speech recognition

Large Language Model (LLM)

  • Probabilistic model that can generate probabilities of a series of words, based on text corpora it was trained on
  • Built by a series of Transformers
    • GPT-3 (175B)
    • LLama-2 (7B, 13B, 70B)
    • PaLM-2 (rumor: ~340B (L))
    • GPT-4 (rumor: > 1,000B)
  • It can perform
    • Reasoning
    • Multilingual translation
    • Coding

Transformer Image from Attention is All you Need

Diffusion Model

  • An image generative model
    • Allows you to sample from an image distribution
    • Unconditional / conditional
    • Stable Diffusion, Midjourney are all text-to-image models

Flaws / Dangers of AI

Although Deep Learning Models have made huge strides in recent years, they still contain many flaws, we are still far from an AGI.

Hallucinations: LLMs will sometimes generate information from nowhere, providing misleading information. i.e generating a fake city name.

Vulnerability To Input Perturbations: AI’s can output incorrect responses if inputs are designed to perturb it. i.e adversarial pattern causing Panda to be classified as a Gibbon. More dangerously adversarial patterns on street signs or jailbreaking LLMs.

DALL-E 3

DALL-E 3 combines ChatGPT with image generative model for better understanding and faithful generations, link to video.