Overview
This project investigates generative monoculture in large language models — the phenomenon where model outputs become systematically less diverse than the underlying training data, raising concerns for alignment and equitable representation.
Key Contributions
- Formalized diversity collapse using distributional dispersion metrics, providing a rigorous framework for measuring output homogenization across model generations.
- Proposed a group-aware fairness definition to detect when diversity loss disproportionately affects certain demographic or cultural groups.
- Analyzed the implications of monoculture for alignment and representation in generative models, highlighting risks in downstream applications where output variety matters.