Editorial noteThe findings from this series of semi-structured interviews provided grounding for a larger project on the usage and utility of LLMs in the workplace. Parts 2 and 3 of this three-part series focus on usage in the general population (available here) and usage among programmers (available here), using survey-based methodology and large sample sizes, with questions informed by the present research. We thank Open Philanthropy (Open Philanthropy Project LLC) for funding this research report. The views expressed herein are not necessarily endorsed by Open Philanthropy. |
Executive summary
Motivation and research context
There is considerable discussion regarding the potential transformative impact that LLMs will have in the workplace and on society more broadly. This qualitative research forms part of a larger effort to understand and quantify the usage of LLMs, with an emphasis on work-related or commercial settings. Understanding what LLMs are being used for, and the extent of their usage, can inform estimates of the broader societal impacts expected from AI adoption.
This report covers findings from 19 semi-structured interviews with self-identified LLM power users, conducted between April and July of 2024. Power users are distinct from frontier AI developers: they are sophisticated or enthusiastic early adopters of LLM technology in their lines of work, but do not necessarily represent the pinnacle of what is possible with a dedicated focus on LLM development. Nevertheless, their embedding across a range of roles and industries makes them excellently placed to appreciate where deployment of LLMs can create value, and what the strengths and limitations of them are for their various use cases.
These qualitative interviews represent the first stage of a wider project that includes large scale surveys of the general public and of software engineers/developers or frequent coders. The purpose of these interviews was two-fold:
- Scoping out potential use cases, promises, and pitfalls related to LLMs at work to reduce the chances that our subsequent quantitative treatments of this topic are missing important areas of interest
- Getting a more detailed picture of precisely how and for what LLMs are being used in the workplace and integrated into workflows that is not possible via more quantitative methods
This part of the research is not intended to directly quantify the extent of LLM usage or the value of different use cases, nor are the use cases exhaustive.
Use cases
We identified eight broad categories of use case, namely:
- Information gathering and advanced search
- Summarizing information
- Explaining information and concepts
- Writing
- Chatbots and customer service agents
- Coding – code generation, debugging/troubleshooting, cleaning and documentation
- Idea generation
- Categorization, sentiment analysis, and other analytics
These general categories of use case are not surprising, but some specific use cases were technically impressive, for example:
- A real-time speech to text pipeline that would listen to an incoming call, and provide the service agent with relevant policy information (see here)
- An automated pipeline to collect information about upcoming events and generate descriptive text about them to appear automatically on a website (see here)
- A multimodal pipeline to generate variations of products (see here)
- A customized customer service tool augmented with specific information about a company (see here)
These more advanced use cases tended to come from those with backgrounds in coding, and also whose job roles directly involved production of AI-based tools or pipelines. However, we note that the ways in which these same people utilized LLMs to assist in such work tended to fit the more mundane use case categories noted above.
In terms of how interviewees now approached their work (vs. before the advent of LLMs), common themes were:
- For coders, less reliance upon forums, searching, and asking questions of others when dealing with bugs
- A shift from more traditional search processes to one that uses an LLM as a first port of call
- Using an LLM to brainstorm ideas and consider different solutions to problems as a first step
- Some workflows are affected by virtue of using proprietary tools within a company that reportedly involve LLMs (e.g., to aid customer service assistants, deal with customer queries)
Most interviewees engaged with LLMs via the typical chat interfaces, with use of APIs mostly restricted to when developing LLM-based tools. Several respondents felt that more integration of LLMs into the tools/applications people already use could improve their workflow and increase adoption.
One respondent highlighted ‘low-’ and ‘no code’ application development frameworks that facilitate building LLM-based apps and agents, such as LangChain and Semantic Kernel. Some such tools could change development workflows by requiring less knowledge of coding to develop quite sophisticated tools, although we are uncertain whether these tools are being used to produce cutting edge or production-ready applications.
Value of use cases
Multiple interviewees reported substantial speed increases as a result of using LLMs. In some cases such speed increases allowed tasks to be approached differently and produced products of better quality.
Some interviewees reported high failure rates (50% or more) for getting useful assistance from LLMs, but that this was low cost due to being quick to discover. Interviewees highlighted the importance of checking the validity of LLM outputs, although almost none reported highly systematic approaches to checking outputs for errors. Some use cases may also have been valuable to the user, but a source of concern for their employers – for example, an interviewee relying nearly entirely on LLM responses to both propose and validate cybersecurity checks.
Automation
Most respondents had not developed or did not use fully automated LLM-based pipelines, with humans still ‘in the loop’. The greatest indications of automation were in customer service oriented roles, and interviewees in this sector expected large changes and possible job loss as a result of LLMs. Several interviewees felt that junior, gig, and freelance roles were most at risk from LLMs.
Views among developers regarding automation varied considerably. Some were skeptical that development would be readily automatable with current LLMs, with more rote tasks such as code documentation and data processing being seen as more likely targets. Others believed that LLMs would dramatically change what was expected of developers and whether it was worth learning to code given the advent of LLMs.
The possibility of errors/hallucinations was a frequently cited challenge to the prospect of fuller automation. The perception of risks from executives, or the risk of the fallout if mistakes are made, may also be a strong barrier to fuller automation. However, interviewees reported more encouragement or indifference than concern from colleagues and managers about using LLMs.
We note that people may be inclined to underestimate the likelihood of their job roles being automated, as this is a stressful prospect to take seriously.
Limitations and drawbacks of LLMs
Perceived limitations or concerns about LLMs could be condensed into the following main categories:
- Privacy and intellectual property. Risk of exposing sensitive information and proprietary code to LLMs.
- Out of date information/access to contemporary information. LLMs are perceived as often lacking the latest data, limiting their reliability for tasks needing current information.[1]
- Hallucinations/inaccuracies. LLMs may generate false or inaccurate information, causing trust issues and potential errors in outputs.
- Concerns over training data. Training data may contain inaccuracies or conspiracy theories, and online information may further degrade over time with LLM-generated content.
- Liability. Uncertainty over who is responsible for mistakes made by LLMs, necessitating clear regulations.
- Stylistic concerns. LLM outputs may have distinct styles or “personalities” that can be inappropriate or limiting for various use cases.
- Poor performance in low information domains. LLMs perform poorly in areas with less training data, leading to biases and reduced effectiveness in specific domains or languages.
- Skill degradation and laziness. Reliance on LLMs may cause skills such as writing and coding to atrophy or fail to develop properly.
These limitations formed the basis of limitation-related questions in our large sample survey work with the general population and with software developers/coders.
Concluding remarks
These interviews reveal that LLM power users primarily employed the technology for core tasks such as information gathering, writing, and coding assistance, with the most advanced applications coming from those with coding backgrounds. Although users reported significant productivity gains, they usually maintained human oversight due to concerns about accuracy and hallucinations. The findings suggest LLMs were primarily being used as sophisticated assistants rather than autonomous replacements, but many interviewees remained concerned that their jobs might be at risk or dramatically changed with improvements to or wider adoption of LLMs.
Introduction
This report focuses on qualitative, semi-structured interviews conducted from April to July of 2024, with 19 self-identified power-users of LLMs in the workplace. Respondents were primarily recruited through outreach on tech-related forums in which users were discussing use cases of LLMs that they had been involved with, as well as via a data science meetup group based in the Netherlands. Three respondents were additionally identified through personal connections, and another through direct outreach after reading their material about LLM use cases online. Each user was pre-screened before the interview to screen out probable disingenuous responders, as well as those whose use cases seemed to be highly generic (e.g., asking GPT to write an email response or like a dictionary), or whose roles and use cases seemed to be well covered by interviews we had already conducted.
A template for the semi-structured interview can be found here. Semi-structured interviews aim to provide a balance between exploring potentially novel information that arises in the process of an interview, while also guiding conversations towards particular topics of interest to the interviewer. The template was not followed verbatim, but rather used as a guide for the interviewer to seek to cover the primary topics of consideration. Depending on time constraints or if answers to some prompts led to specific other prompts already being covered, some sections or subsections were skipped, or reached in a different order to that indicated in the template.
The interview sought to cover:
- Background information on interviewees – the interviewee’s job role responsibilities, as well as the LLMs they used and typical amount/frequency of use.
- LLM use cases and their value – the tasks for which the interviewee used LLMs, including discussion of how useful they were for such tasks, and how the use of LLMs may or may not have changed their workflow.
- Knowledge, strategies, and best practices for LLMs – whether interviewees were aware of and/or had utilized any particular techniques such as prompt engineering and fine tuning, and whether they used any particular plugins or APIs to better integrate LLMs into their workflow.
- Automation and future use cases – whether interviewees had managed to automate any aspects of their workflow using LLMs, the perceived prospect of automation in the future, and how they saw LLMs evolving in their industry in the near- (~1-2 year) to medium-term (~5+ years) future, and whether they thought LLMs posed a threat to their job roles or those of their colleagues.
- Concerns, threats, and limitations – perceived issues with LLMs, including technical limitations as well as possible pushback or skepticism from others.
In the following section, we provide an overarching summary of interviewees’ responses to each of these sections. Note that in some cases, we quantify the number of people endorsing certain ideas or concerns, but we don’t believe that this quantification should be the focus of such qualitative work or is representative; this work is aimed at scoping out a wide range of different LLM uses.
Background information on interviewees
Interviewees came from a wide range of organizations and job roles, although – understandably, given the focus on tech power users – there was a predominance of those with data science or engineering backgrounds (n = 9), or technology/AI consultants (n = 3).[2] Other roles included a policy generalist based in the US, a non-profit field-building advocate, an in-house corporate lawyer, a programmer at a trading firm, a 3D technical artist, customer-service oriented roles, PhD candidates/recent graduates, and content producers. Industry areas covered law, policy, advocacy, computer gaming, finance/trading, insurance, health/medical content, academia, cloud management, cybersecurity, energy, disaster management and satellite imagery, and shipping and deliveries. Background information on each interviewee is presented in Table 1.
With respect to frequency of usage, all respondents reported using LLMs at least almost every day. However, the extent of this usage per day was reported to vary within respondents depending on the types of work or project that they were working on. In addition, the depth of this usage varied considerably among respondents. For example, one respondent reported being almost entirely reliant upon responses from an LLM when working in a cybersecurity role that they had been shifted into after their initial job role was closed down. On the other end of the spectrum, other respondents reported that they would use LLMs quite frequently, but often just in short bursts to ask the LLM a particular question or to make a one-shot effort at producing some code. Given that the exact usage was dependent on the use cases, we think an impression of the amount of usage is better gained through looking at our use cases and workflow section below.
With respect to the specific LLMs that respondents used, we saw a wide variety of generative AI products. Everyone reported having used GPT/chatGPT, and 17 of the respondents noted it as a primary or equal primary LLM for them. Seven interviewees noted having also used Claude, and 7 noted having used Gemini. Other LLMs or LLM-based applications reported by a minority of users were Llama, Mistral, Perplexity, InflectionAI, BERT, and Copilot. A couple of respondents also reported some other generative AI products such as Adobe Firefly and Dall-E for image generation.
Table 1. Background information on interviewees
| # | Industry | Job description | Country | Sex | Age | Education | Income (USD) | Racial identification (based on US census categories) |
|---|---|---|---|---|---|---|---|---|
| 1 | Policy | Policy generalist | USA | Male | 25-34 | Some college, no degree | Between $100,000 and $150,000 | White |
| 2 | T | AI engineer | Australia | Male | 45-64 | Completed master’s degree | Between $80,000 and $99,999 | Other race |
| 3 | IT Consulting | AI consultant | The Netherlands | Male | 35-44 | Completed master’s degree | Between $75,000 and $79,999 | Asian or Asian American |
| 4 | Travel and Tourism | Corporate law (regulatory compliance) | USA | Male | 45-64 | Completed bachelor’s degree | Between $100,000 and $150,000 | White |
| 3D Graphics | 3D technical artist | USA | Male | 25-34 | Some college, no degree | Between $100,000 and $150,000 | Identify with two or more races | |
| 6 | Computer Science Research | PhD in security systems/ reliability | USA | Male | 35-44 | Completed master’s degree | Between $100,000 and $150,000 | White |
| 7 | Technology | Programmer | UK | Male | 25-34 | Completed master’s degree | Between $80,000 and $99,999 | Black or African American |
| 8 | Technology | Running desktop services for a large tech company | USA | Male | 35-44 | Completed master’s degree | Over $150,000 | Asian or Asian American |
| 9 | Technology | Data scientist | Nigeria | Male | 25-34 | Completed bachelor’s degree | Between $20,000 and $49,999 | Black or African American |
| 0 | Energy | Head of customer service | USA | Male | 25-34 | Completed master’s degree | Between $80,000 and $99,999 | Prefer not to say |
| 11 | Health media start-up | Health media design, content creation, copywriting | USA | Female | 18-24 | Completed bachelor’s degree | Between $80,000 and $99,999 | Asian or Asian American |
| 12 | Geospatial industry | Data scientist for mapping using GIS | USA | Male | 25-34 | Completed bachelor’s degree | Between $75,000 and $79,999 | Black or African American |
| 13 | Technology | ML engineer | South Africa | Male | 25-34 | Completed bachelor’s degree | Between $80,000 and $99,999 | Black or African American |
| 14 | Environmental Science | PhD in climate data and ML | USA | Female | 35-44 | Completed doctorate degree | Between $100,000 and $150,000 | White |
| 15 | Technology | Data scientist | South Africa | Male | 25-34 | Completed bachelor’s degree | Between $15,000 and $19,999 | Black or African American |
| 16 | Deliveries | Handling customer information and complaints | USA | Male | 25-34 | Completed master’s degree | Between $80,000 and $99,999 | Black or African American |
| 17 | Cloud Software for conversational commerce | Data scientist | The Netherlands | Male | 25-34 | Completed master’s degree | Between $50,000 and $74,999 | White |
| 18 | Advocacy Nonprofit | Co-Founder | USA | Female | 25-34 | Completed doctorate degree | Prefer not to say | Asian or Asian American |
| 19 | Finance | Programmer | USA | Male | 35-44 | Completed master’s degree | Over $150,000 | White |
LLM use cases and their value
Use cases
Responses from our interviewees covered a wide range of use cases, which we have categorized into eight overarching categories. Where some use cases may not be totally clear, we provide some tangible examples. Some of the use cases also intersect or are not wholly independent of one another.
Many of the categories of use case are similar to what might be expected from quite novice users – such as using LLMs for gathering information or to help with writing – but our interviewees appear to differ from more casual users in the extent of their engagement with LLMs as part of their standard workflow. We’ve provided some more detail in boxes below about specific examples that appeared to be either especially time-saving or innovative among the more mundane examples.
Example use case 1. Automated, multimodal customer service pipelines Interviewee 2, a developer who also consults and provides strategic advice on how companies can utilize AI, reported having developed an innovative customer service pipeline. The client was an insurance company in an area that was prone to severe weather events, which can result in sudden inundations of claims from highly stressed customers. Typically a customer service agent would have to be speaking with a customer and trying to calm them down all while searching through lengthy policy documents and claims information to determine which parts of the company policies applied to the customer. The interviewee implemented a system in which OpenAI’s Whisper (an automatic speech recognition system) received audio from the call, and in turn could generate text of the ongoing conversation. This text was then fed to GPT, which also had access to specific company documents and policy information, and was prompted to find the most relevant aspects of the documentation and help assess which parts of a claim might be covered. This information would be fed back in real time to the customer service agent, reportedly with information tagging exactly where to find the relevant information in case it needed to be checked. This was reported to greatly speed up the processing of claims and ease the burden on service operators. |
Information gathering and advanced search
Multiple interviewees used LLMs to help search for and gather information. For example, Interviewee 4’s (the corporate lawyer) use of LLMs for search and information gathering formed a key part of their new workflow, with searching for sources and documents via LLMs taking over from the much more laborious process of directly finding and searching through all the potentially relevant regulations. Using an LLM greatly sped up the time it took to get to a point where he knew he was in the right ballpark with respect to the information he had on hand, from which he could start conducting further due diligence into the specifics of the identified rules and regulations.
A further form of search and information gathering was using LLMs for difficult types of search query for which traditional search engines were reportedly not well suited. For example, one PhD candidate (Interviewee 6) noted using Gemini to identify obscure research papers – this was difficult due to the terms of the search being very generic, whereas an LLM could understand that the combination of words related to a more specific computer science issue. After identifying even just one such paper, he could then start more manually working from that point using the paper’s citations.
Interviewee 19 (a programmer at a trading firm) explicitly noted the advantage of using LLMs for search over a typical search engine was that the LLM seems to understand what is being searched for, and that it is possible to iterate on top of an initial query if the intention is misunderstood.
In terms of value, interviewees often felt that they could much more rapidly arrive at a place where they had valid and usable knowledge than if they had to go through a traditional search and reading process.
Summarizing information
Multiple interviewees reported using LLMs to summarize and condense information. For example, an early career employee at a content-related service for hospitals and clinics (Interviewee 11) used LLMs to find key points of interest from medical information about different illnesses. LLMs would be used to condense information from longer-form articles down to shorter, engaging chunks of information. She felt that this capability of LLMs enabled her to use a wider variety of sources, as she would not realistically be able to read through so many long form pieces and would previously have to rely on information that was already highly condensed.
Another example use case was Interviewee 7, who had generated a web scraping system (manually) to collect information about a host of charity and church related events in all sorts of different formats from many different websites. The LLM was integrated into this system and able to provide summary information about different upcoming events and what might be expected at them – this could then be directly pasted in summary form onto their own website. A less advanced version of this was done by Interviewee 18, who used LLMs to summarize and create a ranking/top 10 for relevant advocacy-related events from a long list of possible events.
Example use case 2. Automatic updating of upcoming event information Interviewee 7 is a developer who worked on a project with a company that works in the faith/charity sector. A key task was to try to automate the collection and presentation of information about upcoming church or charity related events from across the UK. The interviewee was able to develop a system that could collect information from newsletters and websites from a host of different organizations. GPT was used to extract important event information, and produce summary materials that could be made to appear automatically on their website, providing users with ongoing, up-to-date information about upcoming events. This approach was reportedly very successful and useful, although the interviewee noted that its development took many months: he had to manually generate the system that retrieved all the information, with all the different sites requiring different approaches to extract the relevant information, as well as trying to futureproof the system in case those sources changed over time. |
Explaining information and concepts
Multiple respondents noted using LLMs to rapidly skill up or inform themselves about a particular topic area or concept. For example, Interviewee 7 discussed using an LLM to provide them with background knowledge to understand what a Kalman filter is, as well as to inform them about what a particular piece of software (Tesseract) is, and how it might be possible to use in their work. Multiple interviewees noted using LLMs similarly; as a means of explaining complex information in simple terms that were more on their level of understanding, or to help learn about concepts of relevance to their work (e.g., Interviewee 2). Interviewee 14, the recent PhD graduate, felt that access to LLMs in the early stages of her PhD might have enabled her to much more rapidly be at a point where she could meaningfully contribute to science in her area of expertise. She also noted that the LLM helps get past the barrier of feeling like you are wasting someone’s time or embarrassing oneself by asking a ‘stupid’ question.
This approach would sometimes also be applied to code-related problems. For example, Interviewee 5 used LLMs to help with explaining the code of others who wrote in a different style, or to parse other people’s explanations of a complex technical problem in less technical language.
Writing
Multiple interviewees reported using LLMs to help with writing tasks. For example, the policy generalist (Interviewee 1) reported using LLMs to produce more targeted and well crafted policy-related letters of support for potential supporters of a bill. Through prompting with examples, these letters could be crafted to more closely match the vision and voice of the potential supporter. This was found to work better, as well as more quickly and cheaply, than the usual process that would involve contracting a communications specialist. Other interviewees noted somewhat more mundane, but still time saving, use cases for writing, such as drafting professional-sounding emails either to send to supervisors (e.g., Interviewee 17) or to summarize things and keep their team up to date without the risk of sounding unprofessional or taking too long due to English being a second language (e.g., Interviewee 8). However, many interviewees noted feeling that the LLMs often had a particular tone or style of writing that could feel very unnatural or forced (e.g., Interviewee 14), and some felt it was not especially useful for writing (e.g., Interviewee 6 felt it was not useful for writing academic work).
Chatbots and customer service agents
Multiple interviewees noted either the use of LLM-based chatbots/customer service agents as part of their organization, or were working on the development of them. For example, Interviewee 8 suggested that an LLM-based chatbot was used in their organization to handle certain personnel-related requests. For example, if an employee wanted a new ergonomic keyboard, the chatbot was integrated such that the request could be understood and the product searched for in an internal catalog. The bot would suggest a range of options and then order the item if the requester liked the options. It was suggested this system could handle 30-40% of frontline requests. Interviewee 7 was working on developing a chatbot-style system that could be provided to other organizations, and would have access to internal information about things within the organization through Retrieval Augmented Generation (RAG). Employees could then ask questions, with the bot responding according to company-specific information.
In one case, Interviewee 2 noted having developed an LLM-based system for a company dealing with insurance claims, that often come in en masse owing to severe weather events that affect many people at once. Usually, service agents would have to be interacting very professionally with sometimes irate customers, all while searching through policy materials to find relevant information about what parts of a claim would or would not be covered. A system was developed in which real time calls were fed to Whisper (a speech to text system) to convert the information to text. This text could then be fed to GPT4, which was also given access to policy information. This system could then give feedback on what information was relevant for the service agent on the call, along with citing where the information was from so that it could be checked if needed. This was reported to substantially reduce call times and also the burden on the service agent.
Example use case 3. A multimodal system for product ideation Interviewee 3 was a consultant and developer specializing in AI. Although not part of an official project he was working on, he generated a proof-of-concept system for fashion-related product ideation. Firstly, typical coding was used to scrape images, prices, and a short product description from a store website. Next, the image and description were fed to GPT, with a prompt focused on providing a more interesting and elaborate description of the product pictured. These augmented product descriptions were then fed to an AI image generator, with instructions to produce new fashion items based on those descriptions. Although the interviewee produced this pipeline just as part of a side project to see if it were in principle possible or could be useful, he subsequently discovered that a fast fashion company he had previously worked with was investigating the use of a very similar system. |
Coding – code generation, debugging/troubleshooting, cleaning and documenting
All the interviewees who were involved with coding noted having used LLMs to try to help with writing code. This took various forms, from asking the LLM to simply produce code based on a natural language command, to checking existing code for potential bugs or other issues. Troubleshooting with respect to bugs was a use case with major utility for these interviewees, who noted that the typical process can be very time consuming, and requires a lot of searching the internet and forums such as Stack Overflow to hunt down potentially relevant information. Failing that, one might need to post on such a forum and rely on a volunteer to help.
Some interviewees had used LLMs to augment their capabilities in terms of what sorts of code they could tackle. For example, Interviewee 6 used chatGPT to help develop code for a compiler as part of the LLVM Compiler Infrastructure Project. He felt that help from chatGPT enabled something that would have taken months to work in weeks.
Some of the coding uses were relatively mundane, such as just using the LLM to more quickly produce a rough code skeleton that the Interviewee would have been able to produce, just with a bit more effort (e.g., some simple code for a graph). Similarly, Interviewee 19 noted most often using LLMs to tackle relatively simple problems that he was confident there was a way to do, but that he might not know specifically how to tackle in the language he was using. In other cases, such as Interview 8, the interviewee reported tackling a project that they would have had no idea how to take on by asking the LLM to produce a project plan for them. The LLM was then used to further break down each step, prompting it to help with aspects of code that were needed at different stages.
Another use case for LLMs in relation to coding was in documenting and cleaning code. For example, Interviewee 17 noted how LLMs seemed quite proficient at providing documentation for code, as well as cleaning up code to make it more legible to other users.
Some of the coders had used Copilot to help with coding, although this was a lot less common than simply interacting with an LLM chat interface about code. Copilot did not seem to be held as especially transformative just yet, with one interviewee referring to it as something similar to simple autocomplete (Interviewee 6). Interviewee 19 likewise noted that copilot was used by some in his organization, and seemed to be helpful in completing simple chunks of code, but did not seem revolutionary.
As well as being a coder, Interviewee 3 highlighted their use and experimentation with various ‘no code coding’ environments, which enable the user to create outputs such as web applications without code. These platforms utilize LLMs under the hood to generate the requisite code, as well as tools such as Semantic Kernel and LangChain which help with integrating AI-based agents into applications.
Idea generation
Some interviewees noted using LLMs to help with getting started on projects or on writing tasks. For example, Interviewee 6 asked the LLM how he should tackle a project that he would not have known how to approach. Other users, such as Interviewee 5, reported asking the LLM to brainstorm potential approaches to technical problems to better scope out and consider ways in which the issue might be solved – he reported this now being one of the first things he does when embarking on a project. Similarly, Interviewee 1 noted colleagues starting to just pitch a question to an LLM, and using the LLMs response as something like a basic prior or starting point for team discussions. Interviewee 19 noted a similar approach for coding problems, highlighting that LLMs had a vast knowledge of algorithms and data structures, meaning that you could interact with an LLM to brainstorm about different approaches to solve an issue (though he noted that when the issue was complex, the solutions often did not entirely solve the issue – perhaps because there was no clean solution).
Categorization, sentiment analysis, and other analytics
Two interviewees reported that LLMs were used as part of their work to help with categorizing and scoring some types of information (Interviews 13 and 15). Both noted, for example, using LLMs to help analyze sentiment of certain pieces of raw data, such as social media posts. Interviewee 9 also reported that LLMs were used to help evaluate their own analyses, such as interpreting summary statistics and commenting on whether it appeared a model was performing well or not or which model might be best. Hence, LLMs were involved at various stages of an analytic pipeline across interviewees – sometimes augmenting data, sometimes helping code analyses, and sometimes helping to interpret them.
Example use case 4. Using retrieval augmented generation (RAG) to enhance customer service capabilities Interviewee 17 was a data scientist at a company that provides other businesses with customer service tools, including chatbots. The company has recently started integrating generative AI into their products. The interviewee is working on pipelines that utilize RAG to create ‘client specific environments’. RAG is a means of updating LLMs with specific information without fine tuning them – for example, by essentially including as contextual information in a prompt all sorts of material that may be useful for answering the kinds of questions the chatbot is likely to receive. This can include company opening times, delivery information, pricing, policies, etc. When the LLM is prompted, it would first be instructed to search the additional materials for relevant information that might help answer the question at hand. These RAG-based customer service agents are reportedly in their early stages, but have already been implemented and are used as part of the company’s product options. |
![]()
Perceived value of use cases
We should not use the small sample of use cases here to determine the extent to which LLMs as a whole do or do not speed up or change workflows, but even among this small sample we see substantial variability in terms of the extent to which LLMs were perceived to be transformative or useful. It can be seen from the use cases above that many users did feel that LLMs sped up their completion of certain tasks, and could often help them get up to speed and begin more difficult tasks more quickly. As some examples, Interviewee 13 reported using an LLM to help generate code for data analyses that would only take minutes to produce usable values, and was much quicker and enabled more models to be tested, than a more manual approach. Interviewee 14 reported large proportional gains, but for what she thought were relatively less transformative or time-consuming tasks, such as speeding up scripting for a coding task she already knew well, going from 15 minutes to 3 minutes. With respect to producing code to help produce figures for a publication, Interviewee 6 reported tasks sometimes going from 2 days work to about 4 hours. He reported getting a shell of a script for figures in as little as 2 minutes, but that he would then need to spend up to 40 minutes tweaking the graph to get all colors, sizes, and other formatting ready for publication. This same interviewee said that in some instances for his use case of searching for obscure computer science papers, the process might usually take many hours but could sometimes only take minutes to get started with the help of an LLM. As part of the pipeline that Interviewee 2 generated to help with insurance claim calls, he reported that calls could possibly have gone from around 10 minutes per call to just a couple of minutes, due to agents not needing to manually search through all the documents. While not about an LLM, interviewee 7 reported using generative AI to make images in minutes that he would have previously outsourced to a freelancer and required many hours of work and waiting time. Interviewee 19 thought that, for programming, current LLMs might take a programmer from 1 to 1.05 in terms of their efficiency and capabilities, but that they were not yet in a place where you would dramatically reduce the number of programmers you might hire (e.g., more like hiring 48 as opposed to 50 on a team).
LLMs also facilitated users in understanding and tackling tasks that may have taken them considerable training and research to navigate without LLMs. In one extreme case, Interviewee 8 reported almost complete reliance upon Gemini in a cybersecurity role that he was shifted into due to a company restructuring. He relied on Gemini both to know what to ask other teams to provide in terms of evidence for a particular component of security compliance, and in assessing the evidence they provided. He referred to the capacity of LLMs to bring people up to competency as a ‘democratization of development’. This same user also reported that colleagues of his who worked as developers said they were completing some tasks in 2-3 days that previously might have taken 4-5 weeks.
However, it is worth noting that some users reported high failure rates for help from LLMs. For example, Interviewee 18 estimated that a satisfying solution could not be reached in around 50% of the cases that she had tasked an LLM-oriented consultant to try to solve, and she did not believe this was due to the capacities of the consultant as opposed to the limitations of LLMs. Similarly, Interviewee 14 estimated that GPT failed to produce much of use about 50% of the times she interacted with it – yet this was still sufficiently quick that it was worth trying as a first port of call for many problems. Interviewee 5 felt that GPT tended to perform poorly when the issues he was facing involved anything slightly difficult mathematically. Interviewee 19 noted that, for complex coding tasks, LLMs would often produce buggy code or solutions that did not work – though he highlighted that he was not sure for some of these tasks that there was a solution, which was why he was interacting with the LLM to try to tackle it in the first place.
Changes to workflows
Again, there was considerable variability in the extent to which LLMs affected workflows, both across people and across tasks. Some common themes were:
- A shift from more traditional search processes to one that uses an LLM as a first port of call
- Using an LLM to brainstorm ideas and consider different solutions to problems
- For coders, less reliance upon forums, searching, and asking questions of others when dealing with bugs
- One interviewee highlighted the advent of various ‘no code’ application development environments that help build LLM-based tools without coding, such that people may be able to develop applications without coding skills (e.g., LangChain)
Workflows of some interviewees may have been affected simply due to their use of AI tools that were generated for their specific roles, while they did not always know how things were done previously. For example, Interviewee 16 reported that an LLM was used to organize and tag for approval different pieces of shipment information, and his role appeared to mostly be providing an ‘ok’ on what this AI system had suggested. Similarly, Interviewee 12 reported that LLMs integrated into a geospatial tagging system they used had substantially reduced the amount of manual labeling that he was involved with.
Checks on validity of outputs for different use cases
When prompted in the interview regarding whether, and how, they checked the validity of outputs produced by LLMs, respondents universally agreed that checking outputs was important, and in general that one could not trust LLMs with complete confidence. Multiple respondents noted the propensity of LLMs to hallucinate, or sometimes to just provide useless responses. The extent to which interviewees engaged with checking of outputs varied considerably.
For some tasks, checking was simply part of the natural process of conducting the work: one would not copy out a piece of gibberish text when using the LLM to help with a writing task. Others noted how the LLM was more of a starting point from which they could take over, such as the corporate lawyer or the PhD candidate using LLMs for obscure search tasks. For coding tasks, most interviewees highlighted the importance of checking the code through test runs just as one would when generating one’s own code. These coders also reported using the LLMs to help explain the code and what it was supposed to achieve.
Interviewees rarely noted highly formal or systematic means of checking outputs of LLMs, sometimes involving more of a sense check. For example, in deriving a ranking for possible events that people might be interested in attending, Interviewee 18 reported checking the final lists to ensure they looked reasonable, but in developing the pipeline had not exhaustively checked the initial list of events to ensure that potentially very relevant events were not being missed. However, Interviewee 17 reported that he and his colleagues had been prototyping a means of more systematically assessing the outputs of LLMs and variability in terms of which models were used, and how they were prompted, using a ‘ground truth’ of correct responses against which model outputs might be compared.
Some discussions also revealed more subtle potential failure modes. For example, Interviewee 5 discussed how sometimes an LLM might suggest a path to tackling a problem that would work for the proximate tasks, but which a more experienced developer might recognize would lead to a roadblock further down the road.
Knowledge, strategies, and best practices for LLMs
Techniques
Although not all interviewees were familiar with the specific term ‘prompt engineering’, we observed that almost all of them were to some extent engaging in this practice. For example, multiple people noted how they had come to realize how best to prompt their LLM of choice, often involving known prompt engineering principles such as breaking down a target problem into small steps, providing very explicit instructions, and requesting outputs in particular formats. They also noted how you may need to ask for particular tones of voice or to generate responses for particular audiences so as to avoid certain generic ‘LLM styles’. The majority of interviewees seemed generally aware that the quality of responses from an LLM could depend heavily on how it was prompted.
Fine tuning, on the other hand, was very little mentioned. This seems reasonable in that most Interviewees were users of the LLMs as end points rather than people who were themselves developing LLMs. Interviewee 1 did report having conducted some pretraining and fine-tuning on insurance policies as part of a pipeline he was developing for an LLM-assisted customer service agent.
One additional technique that we had not asked about, but which came up in discussion with Interviewee 17, was retrieval augmented generation (RAG). RAG, a technique that can be used to provide additional contextual information to an LLM along with an incoming prompt, was incorporated in the product that Interviewee 7 was developing so that people chatting with the agent could more reliably query it about company-specific information.
Interfaces and plugins
The bulk of our interviewees were interacting with LLMs primarily through the typical chat interfaces. There is, however, an increasing array of proprietary chat-based models that some of our interviewees were either developing for other companies, or using as part of their roles within companies. In addition, interviewee 19 reported that their firm used a wrapper around chatGPT and Claude so that no information was shared with these LLMs’ parent companies.
Some of the technically most impressive use cases involved the development of pipelines that would interact with APIs for models, sometimes piecing together various different LLM-related tools. We have already mentioned Interviewee 2 having developed an LLM-based assistant for insurance agents that used both Whisper and GPT to provide the customer service representative information that was pertinent to the case. This interviewee had developed similar tools that would provide insights about calls such as the customer mood and the nature of the complaints that were being received. Another example was Interviewee 3, who generated a proof-of-concept application that would scrape information and images from fashion stores, augment that information by using the images as an input to a text generator to provide more elaborate descriptions, and then use an image generator to iterate on fashion ideas using the description as a prompt. While this was a toy example for the user, he reported discovering that a fashion label was apparently doing something very similar.
While interviewees typically were not using more than the standard LLM interfaces, some noted feeling that LLMs could be more valuable to them if they were more integrated into the different tools they were using (e.g., Interviewee 11).
Automation and future use cases
Current automation
For most of the tasks that interviewees discussed with us, humans were still very much in the loop. Interviewee 1, for example, discussed how working with an LLM was something akin to working with a ‘mediocre employee’, and only a quite narrow set of tasks could be outsourced. Others described working with an LLM as being like having a ‘very effective assistant’ (Interviewee 4). Several interviewees noted how the best outputs from LLMs were generated by breaking the task down into manageable chunks and going through piece by piece, rather than just assigning a task and letting the LLM run with it. Others noted feeling like you sometimes have to ‘spoon feed’ the LLM (Interviewee 4) or have a lot of back and forth with it (Interviewee 5). Hence, most interviewees were heavily using LLMs for a range of tasks, but usually just via the typical chat interfaces, and reported being very directly involved in the process.
However, a handful of interviewees did indicate relatively high levels of automation for some tasks. These use cases tended to come from people who were actively involved in developing LLM-based pipelines for commercial purposes. For example, Interviewee 2 had helped work on more than one customer service-related pipeline in which an LLM was integrated. As we discussed above, one such case was a chaining of Whisper (voice to text) with GPT to generate real time feedback to a customer service agent about what policies and regulations were relevant to their ongoing call. This interviewee had generated a similar pipeline that would log information about calls and issues faced by customers for another company.
Automation of some aspects of customer service was also indicated by other interviewees who reported, for example, that their own organization was using some kind of LLM-based chatbot to reportedly deal with 30-40% of internal service requests (Interviewee 8). Similarly, Interviewee 10, a head of a small customer service team for an energy supplier, indicated that an LLM-based bot was used as the first contact for customers before they were potentially transferred to a human agent. He suggested that 60% of customers don’t proceed beyond the LLM.
Another example of a relatively more automated pipeline was Interviewee 7, who reported having developed a system that would track upcoming events from a host of charitable or church-related events and summarize and display information about them directly onto a website. This system appeared to now be working automatically, although he noted that its initial development, including simply collecting all the potential sources of information and manually looking into how to parse and extract the relevant information from myriad sources and webpages, probably took 5 or 6 months of calendar time to develop, and was very laborious.
Interviewee 12 provided an example of using a system that appears to have successfully automated a lot of geographic/satellite imagery tagging and labeling. Although he was still required to remain in the loop by providing inputs and checking outputs for this more automated process, he reported that what used to take a day’s work and required considerable, continued concentration, could now be done in about 4 hours.
Prospects for automation/future of LLMs in their industry
Many of our interviewees reported expecting LLMs to play increasingly important roles in their industry in the coming years, although the extent of this anticipation and how much it was expected to transform the industry varied considerably. The two interviewees who felt that automation would play relatively little role in their specific jobs were the two academics (Interviewees 6 and 14). They indicated that the core or most important parts of their roles typically did not involve ‘rote’ tasks or standard, replicable procedures. Interviewee 16 expressed that, for those parts of her role that maybe could be more automated – for example, cleaning data in certain ways – she would likely write her own Python script.
Interviewee 4, the corporate lawyer, felt that full automation might be more likely for certain administrative tasks. He expected LLMs would have a major role to play in the industry, but that legal professionals would still be needed. He argued that LLMs could not go to court and actually argue a case, nor could they get around requirements for licensing and certifications that are necessary for providing certain kinds of legal advice professionally.
Interviewees who were heavily involved in coding, such as Interviewee 6 and Interviewee 17, thought that the clearest prospects for full automation were relatively simple tasks such as assisting with code documentation/producing docstrings, or changing naming conventions and how a project is structured. Interviewee 13 similarly highlighted relatively simple tasks related to more automatically cleaning and preparing data, such as removing special characters and typos. Interviewee 7 reported that while many parts of his job role might potentially be automated, he did not see development generally as easily automatable, as it requires understanding across many different layers of complexity. Similarly, Interviewee 17 did not feel that what he had seen from LLMs currently was enough to entrust them with completely automating development. This sentiment was furthered by Interviewee 19, who reported that LLMs often produced buggy code outside of relatively simple tasks, and you would want to keep a human in the loop for the foreseeable future. Yet, he thought it was conceivable that updates to current LLMs could make them much more capable, as they were already within reach of being highly competent. He also thought it was possible that current LLMs would already be able to perform much better in an environment in which they had access to all the information a typical programmer has, such as being able to test out and iteratively debug code.
Interviewee 11, the health-related content writer, felt that a lot of her current role could be automated. Because much of the information that her content was based on is publicly available, she deemed it possible to scrape a vast amount of this information and collect it together, and then perhaps just automatically produce a lot of new content on the basis of it. However, she noted that a key selling point of their organization’s content was its reliability, which would be in jeopardy if it were just being churned out automatically.
One interviewee (Interviewee 8) anticipated very large amounts of automation sweeping across the customer service sector, expecting as much as 50% of customer service roles to be automated away in the next 12 months and as much as 90% in the next 3 to 5 years. He felt that current tools were only scratching the surface of what was possible, and more automation might be unlocked with multimodal systems. However, as others had noted (e.g., Interviewees 4 and 11), the possibility of automated systems making mistakes might be a barrier to companies leaning into automation – especially as it was unclear who would be liable for such mistakes.
In discussing possible reasons why efforts at automating processes with LLMs might have failed in various cases she had tried to help people with, Interviewee 18 noted that a common element was the difficulty of integration. Many desired use cases might involve extracting information from, or combining information across, multiple existing systems to feed into LLMs, and this seemed to be particularly difficult to do.
Interviewee 10 highlighted a host of processes he thought it would be good to automate, and which could be automated, with LLMs – from data entry and writing reports, to summarizing customer emails and providing solutions, to predicting energy consumption on the basis of weather patterns and being integrated into the physical network to anticipate equipment failure. It is notable that not all of these ideas seem especially geared towards the capacities of LLMs specifically (for example, a typical ML or other statistical engine may be better suited and already able to predict energy usage on the basis of weather predictions, and some other type of automated system might be what one would expect to be integrated into the physical infrastructure of an energy grid to detect issues), and so it seems some people may be highly speculative or wishful in terms of exactly what an LLM-based system specifically is going to add in terms of automation.
Perceived threats of job loss/replacement
As with the variation in the extent to which people believed elements of their job could be automated, there was variation in the threat of LLMs to one’s job, or the prospect of job loss. The two academic interviewees did not see LLMs as being a threat to their roles, given that they did not think many of their responsibilities could be automated away. Similarly, the corporate lawyer did not anticipate his role becoming obsolete, as he felt LLMs could not get around things such as licensing requirements, as well as the fact that he felt his knowledge of the law was necessary to guide and interpret the LLM.
Interviewee 2 emphasized that, based on his experience working with various companies, companies seemed to always want to automate as much as possible. Still, the prospect of costly mistakes presented a challenge to extensive automation, and this risk of mistakes was referenced by many of the interviewees (e.g., Interviewee 6 and Interviewee 11). Still, both Interviewee 2 and Interviewee 8 – who had both worked in roles related to customer service – expressed that a large number of customer service oriented roles were at risk, especially at lower levels (in turn, some managerial roles might also be at risk, as Interviewee 8 noted that if there was no longer a team of service agents, then there would be no need for a team manager either).
One idea or theme that arose across multiple interviews was the idea that certain roles in particular were more at risk of automation or job loss than others. These at-risk roles were perceived to be gig workers/freelancers (for example, content writers or image creators), some types of contract work, as well as some entry level or ‘stepping stone’ early career roles. For example, Interviewee 6 noted that gig workers would be at risk, because LLMs enabled various simple processes to be completed simply with the help of an LLM. He also believed that the barriers to entry could be greater, as one would need to be able to show capabilities beyond what could be done with an LLM in order to land a role. Likewise, Interviewee 8 suggested that lower level jobs were particularly at risk.
Interviewee 1, the policy generalist, noted that he already considered some of the LLM writing he’d generated to be better than something he would have paid a contractor to create for him. As such, he suggested certain roles, such as speech writing, could become obsolete. Some other interviewees also suggested particular skills might be obsolete – for example Interviewee 8 felt that people focused just on coding (‘code monkeys’) could be in for a shock in the job market, and referenced the Nvidia CEO having reportedly suggested for people not to study computer science, but to instead become a domain expert in an area in which AI might end up being used.[3] Several Interviewees suggested that their training or particular knowledge of the domain in which they were using LLMs were key for them to utilize the LLMs effectively – for example, knowledge of the law for Interviewee 2, but also knowledge of coding and broad understanding of what one wanted to make (Interviewee 5 and Interviewee 13).
A theme that arose across multiple interviews was that, even while people thought LLMs could substantially impact their industry, they were not necessarily concerned about losing their jobs. One expression of this was people indicating that LLMs would simply enable them to focus on different aspects of their work (for example, strategy or networking as in Interviews 11 and 18). Alternatively, several interviewees indicated that their roles would adapt. They felt confident that, so long as they kept abreast of latest developments in LLMs and could be the ones deciding upon and managing the deployment of LLMs, that they would be safe – it would be people who don’t know how to use LLMs that are most at risk (Interviewees 8, 9, 13, and 15). Numerous interviewees generally expressed that their jobs would just evolve, and it was necessary to be able to adapt and keep up with changes.
Limitations, concerns, and threats
At the close of the interviews, interviewees were asked if they perceived any limitations of LLMs, or had any concerns about their usage as part of work. Some limitations arose across multiple interviews.
Privacy and intellectual property
Multiple interviewees raised concerns over the privacy and security of information that was fed into LLMs. The corporate lawyer (Interviewee 4) noted that he had to be exceedingly careful, as many projects involved privileged company information that must not be exposed via an LLM. Similarly, Interviewee 3 – a consultant – indicated that the finance department in one company he had worked with was particularly concerned about the prospect of information leaking through an LLM, and preferred more traditional solutions. Multiple interviewees involved with coding noted concerns over intellectual property and the licensing of code (Interviewee 5, Interviewee 9), including some companies banning the use of copilot (Interviewee 7) Interviewee 5 – issues with int prop and code licensing. These concerns were both about the prospect of one’s own proprietary code or other personal information leaking to other users somehow, being used to train LLMs, or also the possibility that one might inadvertently infringe on others’ proprietary information. Solutions to this may be available, of course: Interviewee 19 reported that the firm he worked for used a wrapper around GPT and Claude to shield information from being shared outside of the company.
Out of date information/access to contemporary information
Several interviewees reported that many LLMs were limited by not having sufficiently up to date information, or the inability to actively update their information on the basis of web search (Interviewees 1, 4, 6, and 11). For example, the corporate lawyer noted that one might still need to supplement search via an LLM with checks of any more recently added regulations, and the health content writer (Interviewee 11) noted how health information is often changing or being updated with new knowledge. This limits the possibility of relying upon or fully delegating tasks to an LLM.
Hallucinations/inaccuracies
Multiple interviewees noted that the prospect of inaccuracies in LLM responses – and in particular hallucinations – presented an issue, especially with respect to trusting automated processes involving LLMs. Interviewee 6, the computer science PhD candidate, noted that GPT would hallucinate fake papers when he was asking it for information. Similarly, Interviewee 8, who worked in tech, emphasized how we did not really understand how LLMs worked or why they hallucinate, which limits how much we can trust them. Hallucinations were also raised as a possibly permanent issue by Interviewee 17, the data scientist working on developing AI applications for businesses.
Interviewee 14, a recent PhD graduate, reported having become less confident in relying upon chatGPT the more she had interacted with it, and that it did not seem to differentiate between cases when it was more or less likely to be correct in its responses – it always sounded confident. Interviewee 16 similarly noted that when chatGPT was incorrect, it was confidently incorrect, and this could be misleading. In a similar vein, Interviewee 18 indicated that chatGPT’s capacity to write polished prose could sometimes mask mistakes or poor reasoning.
On the other hand, a couple of interviewees thought that one limitation holding back the power of LLMs might be an excessive concern about their accuracy: Interviewee 10 – the head of a small customer service team – noted that some people are so concerned about the accuracy that they manually check all the responses and therefore don’t gain anything from their usage. Interviewee 16, who handled a lot of customer information at a delivery/shipping firm, felt that the automated system he used was very rarely incorrect and only getting better over time.
Concerns over training data
An extension of concerns over inaccuracies and hallucinations was the possibility that an LLM might be trained on data that was already inaccurate or contained conspiracy theories (Interviewee 4). Looking into the future, Interviewee 15 was concerned that as LLM-generated content proliferated, the training data could become increasingly untrustworthy or further removed from the ground truth.
Liability
Following on from considerations about inaccuracies and hallucinations, several interviewees noted concerns over liability with respect to LLMs and automated systems. Interviewee 8 stressed how one mistake caused by an automated system could destroy a business or its reputation, and Interviewee 6 – the 3D technical artist – noted how using automation in a production setting could be very risky if the cost of a mistake in the system were high. Interviewee 2, who had worked on producing AI-powered customer service pipelines, noted there was some uncertainty around who would be responsible for mistakes if a customer complained about some kind of automated decision, and thought that regulation was needed to ensure that companies would be held accountable and be cautious in developing and deploying LLM-powered systems.
Stylistic concerns
Some interviewees reported having stylistic concerns about LLM outputs. Many interviewees spoke of different LLMs as having different ‘personalities’ or styles, which some of the interviewees in turn felt limited their utility. Certain ways of speaking or unusual word choices were felt to be diagnostic of LLM-produced text, and this style was often felt to be inappropriate for a range of use cases (though some interviewees did note that you could prompt LLMs to respond with different styles).
Poor performance in low information domains
Some interviewees noted a possible limitation of LLMs in performing poorly in domains where the interviewee believed that there was probably less training data – and in some cases this was perceived as leading to biases as well as poor performance. This general possibility was noted across a range of use cases, from simple writing/understanding, to labeling of satellite images, to coding. The 3D artist in Interview 5, for example, noted that GPT seemed worse in the programming language Houdini. He believed this was because Houdini has much less content available online. Interviewee 12, who worked with satellite imaging and disaster management, felt that the semi-automated system he was using performed more poorly in rural than urban areas, and also when the local language of the area being mapped (and therefore place names) were not in common languages such as English or French. This was problematic because many of the natural events they were trying to map often affected rural areas more than urban ones. Similarly, Interviewee 9 felt that LLMs were somewhat limited by their capacities across different languages. Interviewee 10 also felt that, in responding to customer complaints and understanding issues, LLMs were limited in failing to comprehend local variations in languages and colloquialisms that might be present across different regions.
Skill degradation and laziness
Some interviewees expressed concerns that people may become lazy or lose skills through their use of, or reliance upon, LLMs. Interviewee 15, a data scientist, stated that some of the copywriters in his organization seemed to be getting lazy and producing clearly chatGPT-written content without modification. Others similarly expressed that relying on chatGPT might prevent the development of some skills that they felt were important, such as writing skills (Interviewee 6). Interviewee 6 noted feeling that their reliance upon LLMs may be causing some of their coding skills – or at least memory for different functions – to atrophy. However, he did not feel this specific case to be a terrible loss, as his core contributions were not just remembering functions or particular scripts – the coding, just like using an LLM, is a means to an end. Interviewee 5 expressed concern that not getting a grounding in how to produce and think about coding projects, and solely relying on LLMs, could result in poor habits being instilled.
Other concerns
Almost none of the interviewees expressed concerns over more existential or global threats caused by LLMs, although this may be due to the interview focusing on work-related tasks. Interviewee 8 expressed some concern about the use of LLMs by bad actors or hostile countries to produce weapons of mass destruction, and also raised concerns about the large amount of energy usage from LLMs and their training. Two respondents also indicated the possible issue of the cost of running LLM systems or using an LLM API.
Resistance or skepticism from colleagues
Interviewees were prompted to consider whether they had encountered any resistance or skepticism over the usage of LLMs from colleagues or managers. From having worked with a range of different organizations, Interviewee 2 said there was substantial variation in attitudes towards AI among leadership. Several respondents noted there may be an age or generational gap in acceptance of AI, with the corporate lawyer noting younger people and recent graduates being more receptive, and interviewee 18 reporting that it was typically older people at presentations who seemed more skeptical. In addition, Interviewee 12 reported that a shift towards using more AI had occurred in their company when a new, younger management team came in. The computer science PhD candidate also reported that his supervisor had a somewhat ‘old school’ attitude towards such tools and was skeptical of their value.
The most common response to this question was that people had encountered little skepticism or concern regarding their use of LLMs for work. Multiple interviewees reported that the key thing was delivering good work, and their managers were not especially concerned how this was produced. Interviewee 14 thought that she was probably the most skeptical among her colleagues and supervisor. Other interviewees reported their colleagues were all using LLMs in similar ways to them, and that they were sometimes encouraged to use them either by colleagues or by managers.
One interviewee said that, when previously working at a major online retail company, the use of LLMs was shut down due to concerns over licensing the code that they produced (Interviewee 5). Interviewee 7 indicated that their company did not allow the use of Copilot for similar reasons. However, most people reported no official policies with respect to the use of LLMs in their company.
One other source of resistance or skepticism that was noted by a few interviewees was that customers may be skeptical of interacting with an LLM-based bot. Interviewee 10 stated a desire to have LLMs that it would be difficult to tell were not a real human (or to not have to disclose this), due to some people’s desire to skip any automated interactions and talk directly with a person. Interviewee 8 similarly noted that people still want a human element in customer service. In a somewhat similar vein, Interviewee 17 reported having had some interactions with other teams at their organization who had to deal with customer questions or complaints, wishing to understand exactly why LLMs were sometimes answering the way that they did.
Conclusion
Our qualitative interviews with LLM power users revealed several core tasks for which LLMs are being used in the work place, ranging from information gathering and writing assistance to the development of sophisticated, automated pipelines. Although users often felt they had benefitted substantially in terms of productivity, they highlighted that real or perceived limitations of LLMs meant that humans currently remained in the loop. Nevertheless, interviewees voiced significant concerns regarding possible job loss or job transformation, with customer service roles and junior or stepping-stone positions being especially at risk. The findings from this series of interviews were used as the basis for more quantitative assessments of LLM use in both the general public and amongst workers with programming experience, which shed further light on LLM adoption and use cases.
Appendix
Semi-structured interview template
Thank you for agreeing to participate in this interview, we really appreciate it and are excited to hear about how you’ve been using LLMs.
Before we begin, I’d like to briefly introduce myself and what we’re doing with this interview. I’m [NAME] and I work for a non-profit organization called Rethink Priorities. In this project we are seeking to understand how people are using LLMs in their day to day work, to better understand the capacities of LLMs and how their uses might influence the workplace and society more broadly. This is not a commercial project, but we are really interested to know how you’re using these tools in a work related or commercial setting.
This is going to be what we call a semi-structured interview, so while I do have some specific questions, we are expecting that the conversation might go in various different directions and I encourage you to elaborate on your answers and just tell us what you’re thinking.
It would be great for you to introduce yourself as well, but before we do that I just wanted to check with a procedural question, which is just to confirm that you are ok for this conversation to be recorded? The transcripts might be shared with our research partners, but we will redact any personally identifying information such as your name or the company you work for. We will not share the actual video. Information that we gather from these interviews might be used in a public report on the topic, but again this information will not be personally identifying.
[If no, we need to cease the interview]
Great, thank you. So for the next XX minutes, I’d like to talk about how you use LLMs in your work. We will start by having you introduce yourself and your responsibilities at work. Then we’ll go on to discuss how valuable you have found LLMs to be and for what specific use cases. We’ll cover any particular strategies or best practices you might have developed, as well as any ways in which you might have automated your workflow with LLMs. Finally, we’ll talk about challenges and limitations you see with LLMs. At the very end of the interview I will also link you to a very brief set of demographic questions for you to answer in another tab, just so we have some information about the different types of people we’ve interviewed.
Basic background information
[The idea being to elicit from the respondent some information about who they are and what they do, as well as which LLMs they use and which might be most useful to focus upon in the discussion]
- So, please go ahead and tell me a bit about yourself, and describe your current role and responsibilities in your organization. Please describe your current role and responsibilities in your organization.
- Which large language models (LLMs) do you currently use in your work?
- How frequently do you use each LLM?
- Do any of these LLMs stand out as having been more useful or transformative for you than others?
Checklist:
- Job role and responsibilities
- Which LLMs used
- Frequency of using each
- Which are most useful/transformative
Use value and use cases of LLMs
Depending on how many different models people report using, it seems likely we’d need to limit discussion to focus on the most used or the most useful of the LLMs they mention
- For what specific tasks do you use [most frequent/most useful] LLMs? This could plausibly be quite an extensive discussion, and for each use case discussed we might seek to cover
- How useful do you find [LLM] for doing this? How would you rate the usefulness of X LLM for this task
- Prompts: Does it help you to complete that a lot more quickly than you would otherwise? [How much more quickly are you able to complete this task]
- Does it make doing this task a lot easier? [How much easier does it make this task]
- Are you able to do a lot more of this because of the use of LLMs?
- This could include speed of completion, ease of completion, breadth of what can be covered, among other things
- How, if at all, has the incorporation of X LLM into your workflow changed the way you approach or complete your work/this task
- Prompts: Was this task part of your work responsibilities previously, or a new task that is enabled by LLMs?
- Do you check that the outputs or results generated by LLMs are accurate and reliable for your purposes, and if so, how?
- Do you know of any colleagues who use LLMs for use cases other than those you’ve described above, or are LLMs involved in any notable aspects of your organization’s work – what are these use cases?
- What is your impression of the value of this use/quality of the work produced?
- How useful do you find [LLM] for doing this? How would you rate the usefulness of X LLM for this task
- If this has not been covered in the previous more specific questions, then: More broadly, has the incorporation of LLMs into your workflow changed the way you approach or complete your work?
- Are there any useful or interesting ways you have used LLMs in your work that you think might be unique to you, or are not often discussed or mentioned by other users? [aiming to ensure that we do not miss novel or interesting use cases that are not necessarily highly how the user most frequently engages with LLMs, or their most often used or most useful LLM in general]
Checklist
- Specific tasks done with LLM
- How useful is LLM in tasks
- Has LLM changed workflow
- Colleagues who’ve used LLMs
- Possible unique uses of LLMs
- Check the outputs are accurate and reliable
Knowledge, strategies, and best practices for LLMs
- Have you explored or used techniques such as prompt engineering or fine-tuning to customize LLMs for your specific needs?
- What strategies or best practices have you developed for effectively prompting or interacting with LLMs?
- Are there any interfaces/plugins/APIs/tools/programs you have used to more usefully or seamlessly integrate LLMs into your workflow? [examples might be copilot and other things to get LLMs in the coding space, setting up speech to text pipelines, code pipelines that otherwise interact with LLMs]
Checklist
- Have they used techniques such as fine-tuning or prompt engineering
- Development of any strategies or best practices
- Interfaces/tools to integrate LLMs into workflow
Automation and future uses
- Are there any tasks for which you have created pipelines that essentially automate tasks by delegating them to an LLM? Can you describe this?
- Are there any tasks or responsibilities you currently perform that you believe could be fully automated or taken over by LLMs in the future?
- What features/developments do you think would be needed for this to take place?
- How do you see the role of LLMs in your industry evolving in the near to medium term future? – so by near term I mean within the next two years, and medium term more like 5 to 10 years from now. How do you see the role of LLMs in your industry evolving in the near (1-2 years) to medium term (5-10 years) future?
- What new capabilities or advancements in LLMs would be especially valuable or impactful for your work?
- Do you perceive LLMs as potentially threatening or replacing certain aspects of your job responsibilities or those of your colleagues?
Checklist
- Any tasks automated with use of LLMs?
- Tasks that could be automated – what would be necessary for this
- Role of LLMs in near to medium term future
- Potential threat/replacement of responsibilities
Limitations, concerns, and threats
- What are your main concerns or reservations, if any, about relying on LLMs for work-related tasks?
- Prompts: Have you encountered any sticking points or bottlenecks when using LLMs in your work?
- Are there any significant limitations of LLMs that have prevented their use in your work, or negatively impacted your work?
- Have you encountered any resistance or skepticism from colleagues or managers about using LLMs?
Checklist
- Concerns about LLMs
- Limitations
- Pushback
Ok, thank you. Are there any other thoughts you have about LLMs that we haven’t covered in what we’ve discussed that you’d like to share?
Great, the final thing I’d like to do is just to have you go to this link and provide some simple demographic information. We’re doing this separately from the interview to help keep any identifying information apart from our discussion. If you could let me know if the link is working and when you’ve completed the questions.
Contributions and acknowledgmentsJamie Elsey, Willem Sleegers, and David Moss developed the research project and designed interview materials. Willem Sleegers recruited interviewees. Jamie Elsey conducted the interviews and wrote the report. David Moss reviewed and edited the report. |
