Conclusion and Bibliography for “Understanding the diffusion of large language models”
This post is one part of the sequence Understanding the diffusion of large language models. As context for this post, I strongly recommend reading at least the 5-minute summary of the sequence.
Conclusion
In this sequence I presented key findings from case studies on the diffusion of eight language models that are similar to GPT-3. The phenomenon of diffusion has broad relevance to risks from TAI:
- The diffusion of AI technology affects when TAI will be developed, and which actors will lead AI development by what margin. This in turn affects how safe the TAI systems are, how the systems are used, and what the state of global politics and economics is like when the systems are used.
- Diffusion can have benefits, such as helping less-resourced actors to scrutinize leading AI developers, and supporting AI alignment research outside of leading industry AI labs.
GPT-3-like models are quite a specific domain, and may seem far from TAI. Nonetheless, I centered my research on case studies of GPT-3-like models because I think they are relatively informative about how diffusion will impact TAI development. In particular:
- The way that diffusion works today (in broad terms) might persist until the development of TAI, especially if TAI is developed relatively soon (e.g., in the next 10 years).
- TAI systems (or components of them) might resemble today’s best-performing language models, especially if the scaling hypothesis is true. So the implications of diffusion related to such models may be similar to the implications of diffusion related to transformative AI systems.
- Even if a lot changes between now and TAI, the history of diffusion improves our understanding of what could happen.
My research has strong limitations, including that:
- Much of the data from my case studies is highly uncertain, with quantitative estimates often spanning an order of magnitude.
- I often generalize from a small set of case studies in a narrow domain. Some of my conclusions are not robust to counterexamples that I might discover in the future. However, I have tried my best to factor this possibility into my confidence levels.
- Many of my bottom-line conclusions are not supported by much hard evidence, and are instead based on a combination of logical arguments and intuitions.
I think that the concept of diffusion is a productive framing to study competition, publication strategy, and other important dynamics of AI development. I’m excited for other researchers to continue work on diffusion. These are some of my recommended topics for future work (see this previous post for more):
- Further evaluation of my proposals to limit access to datasets and algorithmic insights
- The relevance and importance of diffusion mechanisms that were not involved in my case studies.
- These mechanisms include theft or the leaking of information.
- Case studies in other domains of AI.
- This would be useful both to expand the overall amount of empirical data on diffusion, and to make comparisons to my existing case studies.
- Notable candidates for study are AlphaGo Zero (game playing domain) and DALL-E (text-image domain).
- How the publication strategy of emerging AI developers will shift as they grow.
- How much deployment costs (rather than development costs) will limit the diffusion of (transformative) AI capabilities.
- How much different inputs to AI development contribute to AI progress.
- At various points in this sequence I presented my best guesses about the relative importance of different inputs to AI development, but I still have a lot of uncertainty that warrants further research.
Bibliography
AI21 Labs. (2022). Announcing AI21 Studio and Jurassic-1 Language Models. https://www.ai21.com/blog/announcing-ai21-studio-and-jurassic-1
Ahmed, N., & Wahed, M. (2020). The De-Democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research. ArXiv. https://arxiv.org/abs/2010.15581
Aiken, C., Kagan, R., & Page, M. (2020). “Cool Projects” or “Expanding the Efficiency of the Murderous American War Machine?” AI Professionals’ Views on Working With the Department of Defense. Center for Security and Emerging Technology. https://cset.georgetown.edu/publication/cool-projects-or-expanding-the-efficiency-of-the-murderous-american-war-machine/
Alvi, A., & Kharya, P. (2021). Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model. Microsoft Research Blog. https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/
Anderljung, M. (2021). Compute Governance Ideas. Some AI Governance Research Ideas. https://docs.google.com/document/d/13LJhP3ksrcEBKxYFG5GkJaC2UoxHKUYAHCRdRlpePEc
[Anthony]. (2020). Date Weakly General AI is Publicly Known. Metaculus. https://perma.cc/P6KM-LZY9
Baidu Research. (2021). Introducing PCL-BAIDU Wenxin (ERNIE 3.0 Titan), the World’s First Knowledge Enhanced Multi-Hundred-Billion Model. http://research.baidu.com/Blog/index-view?id=165
Barnett, M. (2020). Date of Artificial General Intelligence. Metaculus. https://perma.cc/2UTN-PME7
Barr, J. (2019). Amazon EC2 Update - Inf1 Instances with AWS Inferentia Chips for High Performance Cost-Effective Inferencing. Amazon Web Services. https://aws.amazon.com/blogs/aws/amazon-ec2-update-inf1-instances-with-aws-inferentia-chips-for-high-performance-cost-effective-inferencing/
Biderman, S., Bicheno, K., & Gao, L. (2022). Datasheet for the Pile. Eleuther AI. https://arxiv.org/pdf/2201.07311.pdf
BigScience. (2022). Introducing the World’s Largest Open Multilingual Language Model: BLOOM. https://bigscience.huggingface.co/blog/bloom
Black, S., Biderman, S., Hallahan, E., Anthony, Q., Gao, L., Golding, L., He, H., Leahy, C., McDonell, K., Phang, J., Pieler, M., Prashanth, U. S., Purohit, S., Reynolds, L., Tow, J., Wang, B., & Weinbach, S. (2022). GPT-NeoX-20B: An Open-Source Autoregressive Language Model. EleutherAI. https://arxiv.org/abs/2204.06745
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bogh, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., … Liang, P. (2021). On the Opportunities and Risks of Foundation Models. Center for Research on Foundation Models. https://arxiv.org/abs/2108.07258
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language Models are Few-Shot Learners. OpenAI. https://arxiv.org/abs/2005.14165
Carlsmith, J. (2022). Is Power-Seeking AI an Existential Risk?. Open Philanthropy. https://arxiv.org/abs/2206.13353
Bloem, P. (2019). Transformers from Scratch. Peterbloem.nl. https://peterbloem.nl/blog/transformers
Bostrom, N. (2019). The Vulnerable World Hypothesis. Global Policy. https://nickbostrom.com/papers/vulnerable.pdf
Buchanan, B., Musser, M., Lohn, A., & Sedova, K. (2021). Truth, Lies, and Automation: How Language Models Could Change Disinformation. Center for Security and Emerging Technology. https://cset.georgetown.edu/wp-content/uploads/CSET-Truth-Lies-and-Automation.pdf
Chen, H., Fu, C., Rouhani, B. D., Zhao, J., & Koushanfar, F. (2019). DeepAttest: An End-to-End Attestation Framework for Deep Neural Networks. Association for Computing Machinery. https://www.microsoft.com/en-us/research/uploads/prod/2019/05/DeepAttest.pdf
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberst, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., … Fiedel, N. (2022). PaLM: Scaling Language Modeling Pathways. Google Research. https://arxiv.org/pdf/2204.02311.pdf
Clare, S. (2021). Great Power Conflict. Founders Pledge. https://founderspledge.com/stories/great-power-conflict
Clark, J., Brundage, M., & Solaiman, I. (2019). GPT-2: 6-Month Follow-Up. OpenAI. https://openai.com/blog/gpt-2-6-month-follow-up/
Clifton, J. (2021). CLR’s Recent Work on Multi-Agent Systems. AI Alignment Forum. https://www.alignmentforum.org/posts/EzoCZjTdWTMgacKGS/clr-s-recent-work-on-multi-agent-systems
Etchemendy, J., & Li, F. (2020). National Research Cloud: Ensuring the Continuation of American Innovation. Human-Centered Artificial Intelligence. https://hai.stanford.edu/news/national-research-cloud-ensuring-continuation-american-innovation
Erdil, E., & Besiroglu, T. (2022). Algorithmic Progress in Computer Vision. Epoch. https://arxiv.org/pdf/2212.05153.pdf
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Google AI Language. https://arxiv.org/abs/1810.04805
Dillet, R. (2021). Hugging Face raises $40 million for its natural language processing library. TechCrunch. https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library/
Fedus, W., Zoph, B., & Shazeer, N. (2021). Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Google. https://arxiv.org/abs/2101.03961
Field, H. (2022). How Microsoft and Google Use AI Red Teams to “Stress Test” Their Systems. Emerging Tech Brew. https://www.emergingtechbrew.com/stories/2022/06/14/how-microsoft-and-google-use-ai-red-teams-to-stress-test-their-system
[GAA] (2021). Nuclear Espionage and AI Governance. Effective Altruism Forum. https://forum.effectivealtruism.org/posts/CKfHDw5Lmoo6jahZD/nuclear-espionage-and-ai-governance-1
Ganguli, D., Hernandez, D., Lovitt, L., DasSarma, N., Henighan, T., Jones, A., Joseph, N., Kernion, J., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., Drain, D., Elhage, N., Showk, S. E., Fort, S., … Clark, J. (2022). Predictability and Surprise in Large Generative Models. Association for Computing Machinery. https://arxiv.org/abs/2202.07785
Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N., Presser, S., & Leahy, C. (2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. EleutherAI. https://arxiv.org/abs/2101.00027
Gertler, A., Aird, M., [Leo], & [Pablo]. (2021). Credal Resilience. Effective Altruism Forum. https://forum.effectivealtruism.org/topics/credal-resilience
Gong, N. (2021). Model Stealing Attacks. Duke University. https://people.duke.edu/~zg70/courses/AML/Lecture14.pdf
Gwern.net. (2020). The Scaling Hypothesis. https://www.gwern.net/Scaling-hypothesis
H., D. (2020). How Much Did AlphaGo Zero Cost?. Dansplaining. https://www.yuzeh.com/data/agz-cost.html
Hao, K. (2020). The Messy, Secretive Reality Behind OpenAI’s Bid to Save the World. MIT Technology Review. https://www.technologyreview.com/2020/02/17/844721/ai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality/
Hernandez, D., & Brown, T. (2020). AI and Efficiency. OpenAI. https://openai.com/blog/ai-and-efficiency/
Hernandez, D., Brown, T., Conerly, T., DasSarma, N., Drain, D., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Henighan, T., Hume, T., Johnston, S., Mann, B., Olah, C., Olsson, C., Amodei, D., Joseph, N., Kaplan, J., & McCandlish, S. (2022). Scaling Laws and Interpretability of Learning from Repeated Data. Anthropic. https://arxiv.org/abs/2205.10487
Hobbhahn, M., & Besiroglu, T. (2022). Trends in GPU Price-Performance. Epoch. https://epochai.org/blog/trends-in-gpu-price-performance
Hobson, D. (2022). A Data Limited Future. LessWrong. https://www.lesswrong.com/posts/gqqhYijxcKAtuAFjL/a-data-limited-future
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D. D. L., Hendricks, L.A. , Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., Driessche, G. V. D., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., & Sifre, L. (2022). Training Compute-Optimal Large Language Models. DeepMind. https://arxiv.org/abs/2203.15556
Karnofsky, H. (2016). Some Background on Our Views Regarding Advanced Artificial Intelligence. Open Philanthropy. https://www.openphilanthropy.org/research/some-background-on-our-views-regarding-advanced-artificial-intelligence/
Karnofsky, H. (2021). AI Timelines: Where the Arguments, and the “Experts,” Stand. Cold Takes. https://www.cold-takes.com/where-ai-forecasting-stands-today/
Karnofsky, H. (2022). How Might We Align Transformative AI If It’s Developed Very Soon?. LessWrong. https://www.lesswrong.com/posts/rCJQAkPTEypGjSJ8X/how-might-we-align-transformative-ai-if-it-s-developed-very
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling Laws for Neural Language Models. OpenAI. https://arxiv.org/abs/2001.08361
Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in Vision: A Survey. ACM Comput. Surv., 54(10s). https://dl.acm.org/doi/abs/10.1145/3505244
Khrushchev, M. (2022). Yandex Publishes YaLM 100B. It’s the Largest GPT-Like Neural Network in Open Source. Yandex. https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6
Kim, B., Kim, H., Lee, S., Gichang, L., Kwak, D., Jeon, D. H., Park, S., Kim, S., Kim, S., Seo, D., Lee, H., Jeong, M., Lee, S., Kim, M., Ko, S. H., Kim, S., Park, T., Kim, J., … Sung, N. (2021). What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers. Naver. https://arxiv.org/pdf/2109.04650.pdf
Ladish, J., & Heim, L. (2022). Information Security Considerations for AI and the Long Term Future. Effective Altruism Forum. https://forum.effectivealtruism.org/posts/WqQDCCLWbYfFRwubf/information-security-considerations-for-ai-and-the-long-term
Leahy, C. (2022). Announcing GPT-NeoX-20B. EleutherAI. https://blog.eleuther.ai/announcing-20b/
[lennart] (2021). Compute Governance and Conclusions - Transformative AI and Compute [¾]. Effective Altruism Forum. https://forum.effectivealtruism.org/posts/g6cwjcKMZba4RimJk/compute-governance-and-conclusions-transformative-ai-and
Leopold, G. (2019). AWS to Offer Nvidia’s GPUs for AI Inferencing. HPC Wire. https://www.hpcwire.com/2019/03/19/aws-upgrades-its-gpu-backed-ai-inference-platform/
Lepikhin, D., Lee, H., Xu, Y., Chen, D., Firat, O., Huang, Y., Krikun, M., Shazeer, N., & Chen, Z. (2020). GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. Google. https://arxiv.org/pdf/2006.16668.pdf
Lieber, O., Sharir, O., Lenz, B., & Shoham, Y. (2021). Jurassic-1: Technical Details and Evaluation. https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf
Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., & Chen, W. (2021). What Makes Good In-Context Examples for GPT-3?. Microsoft Dynamics 365 AI. https://arxiv.org/abs/2101.06804
Lohn, A., & Musser, M. (2022). AI and Compute: How Much Longer Can Computing Power Drive Artificial Intelligence Progress?. Center for Security and Emerging Technology. https://cset.georgetown.edu/publication/ai-and-compute/
Muehlhauser, L. (2019). What Open Philanthropy Means by “Transformative AI”. Open Philanthropy. https://docs.google.com/document/d/15siOkHQAoSBl_Pu85UgEDWfmvXFotzub31ow3A11Xvo/edit
Narayanan, D., Shoeybi, M., Casper, J., LeGresley, P., Patwary, M., Korthikanti, V. A., Vainbrand, D., Kashinkunti, P., Bernauer, J., Catanzaro, B., Phanishayee, A., & Zaharia, M. (2021). Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. NVIDIA. https://arxiv.org/abs/2104.04473
Naver. (2021). Press Release: Naver Unveils Korea’s First Super-Scale AI ‘HyperCLOVA’... “We Will Lead the Era of AI for All”. https://www.navercorp.com/promotion/pressReleasesView/30546
[nostalgebraist] (2022). Chinchilla’s Wild Implications. LessWrong. https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications
OpenAI. (2022). Best Practices for Deploying Language Models. https://openai.com/blog/best-practices-for-deploying-language-models/
OpenAI. (2022). Powering Next Generation Applications with OpenAI Codex. https://openai.com/blog/codex-apps/
[Pablo], & [Leo]. (2021). AI Race. Effective Altruism Forum. https://forum.effectivealtruism.org/topics/ai-race
[Pablo], Aird, M., & [Leo]. (2021). Alignment Tax. Effective Altruism Forum. https://forum.effectivealtruism.org/topics/alignment-tax
Radford, A., & Narasimhan, K. (2018). Improving Language Understanding by Generative Pre-Training. Semantic Scholar. https://www.semanticscholar.org/paper/Improving-Language-Understanding-by-Generative-Radford-Narasimhan/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035
Radford, A., Wu, J., Amodei, D., Clark, J., Brundage, M., & Sutskever, I. (2019). Better Language Models and Their Implications. OpenAI. https://openai.com/blog/better-language-models/
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models Are Unsupervised Multitask Learners. Semantic Scholar. https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe
Rae, J. W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., Aslanides, J., Henderson, S., Ring, R., Young, S., Rutherford, E., Hennigan, T., Menick, J., Cassirer, A., Powell, R., Driessche, G., Hendricks, L. A., Rauh, M., Huang, P., …Irving, G. (2021). Scaling Language Models: Methods, Analysis & Insights from Training Gopher. DeepMind. https://arxiv.org/abs/2112.11446
Rae, J., Irving, G., & Weidinger, L. (2021). Language Modelling at Scale: Gopher, Ethical Considerations, and Retrieval. DeepMind. https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2019). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Google. https://arxiv.org/abs/1910.10683
Ramesh, A., Pavlov, M., Goh, G., & Gray, S. (2021). Dall-E: Creating Images from Text. OpenAI. https://openai.com/blog/dall-e/
Rosset, C. (2020). Turing-NLG: A 17-Billion-Parameter Language Model by Microsoft. Microsoft Research Blog. https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/
Ryugen, H. (2022). Taiwan’s Share of Contract Chipmaking to Hit 66% This Year: Report. Nikkei Asia. https://asia.nikkei.com/Business/Tech/Semiconductors/Taiwan-s-share-of-contract-chipmaking-to-hit-66-this-year-report
Sandbrink, J., Hobbs, H., Swett, J., Dafoe, A., & Sandberg, A. (2022). Differential Technology Development: A Responsible Innovation Principle for Navigating Technology Risks. SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4213670
Sanh, V., Webson, A., Raffel, C., Bach, S. H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Le Scao, T., Raja, A., Dey, M., Bari, M. S., Xu, C., Thakker, U., Sharma, S. S., Szczechla, E., Kim, T., … Rush, A. M. (2021). Multitask Prompted Training Enables Zero-Shot Task Generalization. International Conference on Learning Representations. https://arxiv.org/abs/2110.08207
Schneider, J. (2022). War in Taiwan and AI Timelines. Effective Altruism Forum. https://forum.effectivealtruism.org/posts/PAxTSZPW7MBXKkvZg/war-in-taiwan-and-ai-timelines
Sevilla, J., Heim, L., Ho, A., Besiroglu, T., Hobbhahn, M., & Villalobos, P. (2022). Compute Trends Across Three Eras of Machine Learning. ArXiv. https://arxiv.org/abs/2202.05924
Sevilla, J., Heim, L., Hobbhahn, M., Besiroglu, T., Ho, A., & Villalobos, P. (2022). Estimating Training Compute of Deep Learning Models. Epoch. https://epochai.org/blog/estimating-training-compute#appendix-b-comparing-the-estimates-of-different-methods
Sevilla, J., Villalobos, P., Ceron, J. F., Burtell, M., Heim, L., Nanjajjar, A. B., Ho, A., Besiroglu, T., Hobbhahn, M., Denain, J., & Dudney, O. (2022). Parameter, Compute and Data Trends in Machine Learning. https://docs.google.com/spreadsheets/d/1AAIebjNsnJj_uKALHbXNfn3_YsT6sHXtCU0q7OIPuc4/edit#gid=1917852922
Shah, R. (2020). Alignment Newsletter #103 - ARCHES: An Agenda for Existential Safety, and Combining Natural Language with Deep RL. LessWrong. https://www.lesswrong.com/posts/gToGqwS9z2QFvwJ7b/an-103-arches-an-agenda-for-existential-safety-and-combining
Shaohua, W., Zhao, X., Yu, T., Zhang, R., Shen, C., Liu, H., Li, F., Zhu, H., Luo, J., Xu, L., & Zhang, X. (2021). Yuan 1.0: Large-Scale Pre-Trained Language Model in Zero-Shot and Few-Shot Learning. Inspur Artificial Intelligence Research Institute. https://arxiv.org/abs/2110.04725
Shelvane, T. (2022). The Artefacts of Intelligence: Governing Scientists’ Contribution to AI Proliferation. Centre for the Governance of AI. https://www.governance.ai/research-paper/the-artefacts-of-intelligence-governing-scientists-contribution-to-ai-proliferation
Shevlane, T. (2022). Structured access: an emerging paradigm for safe AI deployment. University of Oxford. https://arxiv.org/abs/2201.05159
Shelvane, T., & Dafoe, A. (2020). The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse?. Future of Humanity Institute. https://www.fhi.ox.ac.uk/wp-content/uploads/The-Offense-Defense-Balance-of-Scientific-Knowledge.pdf
Shliazhko, O., Fenogenova, A., Tikhonova, M., Mikhailov, V., Kozlova, A., & Shavrina, T. (2022). mGPT: Few-Shot Learners Go Multilingual. ArXiv. https://arxiv.org/abs/2204.07580
Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2019). Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. NVIDIA. https://arxiv.org/abs/1909.08053
Silver, D., & Hassabis, D. (2017). AlphaGo Zero: Starting from Scratch. DeepMind. https://www.deepmind.com/blog/alphago-zero-starting-from-scratch
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., & Hassabis, D. (2017). Mastering the Game of Go Without Human Knowledge. DeepMind. https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf
Soltan, S., Ananthakrishnan, S., FitzGerald, J., Gupta, R., Hamza, W., Khan, H., Peris, C., Rawls, S., Rosenbaum, A., Rumshisky, A., Prakash, C. S., Sridhar, M., Triefenbach, F., Verma, A., Tur, G., & Natarajan, P. (2022). AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model. Amazon Alexa AI. https://arxiv.org/abs/2208.01448
Sutton, R. (2019). The Bitter Lesson. Incomplete Ideas. http://incompleteideas.net/IncIdeas/BitterLesson.html
Tian, Y., Ma, J., Gong, Q., Sengupta, S., Chen, Z., Pinkerton, J., & Zitnick, L. (2019). ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero. 36th International Conference on Machine Learning. https://arxiv.org/abs/1902.04522
Tramèr, F., Zhang, F., Juels, A., Reiter, M. K., & Ristenpart, T. (2016). Stealing Machine Learning Models via Prediction APIs. 25th Usenix Security Symposium. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H. S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., … Le, Q. (2022). LaMDA: Language Models for Dialog Applications. Google. https://arxiv.org/abs/2201.08239
Tsinghua University. (2022). GLM-130B: An Open Bilingual Pre-Trained Model. http://keg.cs.tsinghua.edu.cn/glm-130b/posts/glm-130b/
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). Attention Is All You Need. 31st Conference on Neural Information Processing Systems. https://arxiv.org/abs/1706.03762
Villalobos, P., Sevilla, J., Heim, L., Besiroglu, T., Hobbhahn, M., & Ho, A. (2022). Will We Run Out of ML Data? Evidence From Projecting Dataset Size Trends. Epoch. https://epochai.org/blog/will-we-run-out-of-ml-data-evidence-from-projecting-dataset
Wang, S., Sun, Y., Xiang, Y., Wu, Z., Ding, S., Gong, W., Feng, S., Shang, J., Zhao, Y., Pang, C., Liu, J., Chen, X., Lu, Y., Wang, X., Bai, Y., Chen, Q., Zhao, L., Li, S., … Wang, H. (2021). Ernie 3.0 Titan: Exploring Larger-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation. Baidu Inc. https://arxiv.org/pdf/2112.12731.pdf
Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., & Le, Q. V. (2021). Finetuned Language Models Are Zero-Shot Learners. Google Research. https://arxiv.org/pdf/2109.01652v1.pdf
Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., & Le, Q. V. (2022). Finetuned Language Models Are Zero-Shot Learners. Google Research. https://arxiv.org/pdf/2109.01652.pdf
Wiblin, R., & Harris, K. (2022). Nova DasSarma on Why Information Security May Be Critical to the Safe Development of AI Systems. 80,000 Hours. https://80000hours.org/podcast/episodes/nova-dassarma-information-security-and-ai-systems/
Wiggers, K. (2021). AI21 Labs trains a massive language model to rival OpenAI’s GPT-3. VentureBeat. https://venturebeat.com/business/ai21-labs-trains-a-massive-language-model-to-rival-openais-gpt-3/
Wiggers, K. (2022). OpenAI Rival AI21 Labs Raises $64M to Ramp Up its AI-Powered Languages Services. TechCrunch. https://techcrunch.com/2022/07/12/openai-rival-ai21-labs-raises-64m-to-ramp-up-its-ai-powered-language-services/
Wu, S., Zhao, X., Yu, T., Zhang, R., Shen, C., Liu, H., Li, F., Zhu, H., Luo, J., Xu, L., & Zhang, X. (2021). Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning. Inspur Artificial Intelligence Research Institute. https://arxiv.org/abs/2110.04725
Zeng, A., Liu, X., Du Z., Wang, Z., Lai, H., Ding, M., Yang, Z., Xu, Y., Zheng, W., Xia, X., Tam, W. L., Ma, Z., Xue, Y., Zhai, J., Chen, W., Zhang, P., Dong, Y., & Tang, J. (2022). GLM-130B: An Open Bilingual Pre-trained Model. Tsinghua University. https://arxiv.org/pdf/2210.02414.pdf
Zeng, W., Ren, X., Su, T., Wang, H., Liao, Y., Wang, Z., Jiang, X., Yang, Z., Wang, K., Zhang, X., Li, C., Gong, Z., Yao, Y., Huang, X., Wang, J., Yu, J., Guo, Q., Yu, Y., Zhang, Y., … Tian, Y. (2021). PanGu-α: Large-Scale Autoregressive Pretrained Chinese Language Models With Auto-Parallel Computation. PanGu-α Team. https://arxiv.org/pdf/2104.12369.pdf
Zhang, S., Diab, M., & Zettlemoyer, L. (2022). Democratizing Access to Large-Scale Language Models with OPT-175B. Meta AI. https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/
Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, Xi V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P. S., S., Wang, T. & Zettlemoyer, L. (2022). OPT: Open Pre-trained Transformer Language Models. Meta AI. https://arxiv.org/abs/2205.01068
Zwetsloot, R., & Dafoe, A. (2019). Thinking About Risks From AI: Accidents, Misuse and Structure. Lawfare. https://www.lawfareblog.com/thinking-about-risks-ai-accidents-misuse-and-structure
Acknowledgements
In addition to feedback-givers, I'd like to thank:
- My manager at Rethink Priorities, Michael Aird, for helping me become a better researcher throughout this project. Michael’s support, advice, and feedback were crucial to improving and finishing this sequence.
- Rethink Priorities for supporting me to do this project.
- All of the experts who responded to my questions.
- Adam Papineau for copyediting.
This research is a project of Rethink Priorities. It was written by Ben Cottier. Thanks to Alexis Carlier, Amanda El-Dakhakhni, Ashwin Acharya, Ben Snodin, Bill Anderson-Samways, Erich Grunewald, Jack Clark, Jaime Sevilla, Jenny Xiao, Lennart Heim, Lewis Ho, Lucy Lim, Luke Muehlhauser, Markus Anderljung, Max Räuker, Micah Musser, Michael Aird, Miles Brundage, Oliver Guest, Onni Arne, Patrick Levermore, Peter Wildeford, Remco Zwetsloot, Renan Araújo, Shaun Ee, Tamay Besiroglu, and Toby Shevlane for helpful feedback. If you like our work, please consider subscribing to our newsletter. You can explore our completed public work here.