AI Safety Student Team Logo

Fellowship

This fellowship reading list for the UChicago AI Safety Club provides a structured exploration of key AI safety topics across seven weeks. It covers a range of crucial subjects from scaling laws and instrumental convergence to AI governance and critical perspectives on AI safety.

Learn More & Apply

Week 1: Scaling and Instrumental Convergence

Explore the implications of increasingly intelligent systems, focusing on scaling laws, superintelligence, and instrumental convergence.

Week 2: Outer Alignment

Examine the challenges in correctly specifying training goals for AI systems.

Week 3: Deception & Mesa-optimization

Investigate the concept of mesa-optimizers and the potential for deceptive behavior in AI systems.

  • Deceptive Alignment

    An in-depth exploration of deceptive alignment and pseudo-alignment, providing insights into inner alignment issues.

Week 4: AI Security Concerns

Explore various AI security issues including jailbreaks, adversarial examples, and potential vulnerabilities.

Week 5: AI Governance

Examine the challenges and approaches to governing AI development and deployment.

Week 6: Criticisms and Counter-Arguments

Examine critiques of AI safety concerns and alternative perspectives on AI development.

Week 7: Further Reading and Discussion

Explore various AI alignment approaches and dive deeper into specific areas of interest. Fellows will choose one of the optional readings to focus on for the week.