Editorial noteThis is an opinion piece by RP’s CEO Marcus A. Davis. It was originally published as a post on the Effective Altruism Forum in July 2025. |
Summary
In the following article, I argue that most of the interesting cross-cause prioritization decisions and conclusions rest on philosophical evidence that is not robust enough to justify high degrees of certainty that any given intervention (or class of cause interventions) is “best” above all others. I generally hold this to be true due to the reliance of such cross-cause prioritization judgments on relatively weak philosophical evidence. In particular, the case for high confidence in conclusions on which interventions are, all things considered, “best” seems to rely on particular approaches to handling normative uncertainty. The evidence for these approaches is weak, and different methods can yield radically different recommendations, suggesting that cross-cause prioritization intervention rankings or conclusions are fundamentally fragile and that high confidence in any single approach is unwarranted.
I think the reliance of cross-cause prioritization conclusions on philosophical evidence that is not robust has been previously underestimated in effective altruism (EA) circles, and I would like others involved in the movement (individuals, groups, and foundations) to take this uncertainty seriously, not just in words but in their actions. I’m not in a position to say what this means for any particular actor, but my big takeaway is that we should be humble in our assertions about cross-cause prioritization and not confident that any specific intervention is, all things considered, “best,” since any particular intervention or cause conclusion is premised on a lot of shaky evidence. This means we shouldn’t be confident that preventing global catastrophic risks is the best thing we can do, nor should we be confident that it’s preventing animal suffering or helping people in extreme poverty.
Key arguments I am advancing:
- The interesting decisions about cross-cause prioritization rely on many philosophical judgments (more).
- Generally speaking, I find the type of evidence for these types of conclusions to be weak (more)
- I think this is true when you consider the direct types of evidence you get on these questions.
- I think it’s also true if you step back and consider what philosophical arguments on these topics look like compared to other epistemic domains.
- Aggregation methods for handling normative uncertainty (i.e., decision procedures) profoundly disagree about what to do at an object level, given the same beliefs and empirical facts (more).
- Allocation methods may matter more than normative ethics, particularly if you have low credences across a variety of normative theories.
- Small changes to the set of included projects can radically change the outputs.
- If you were uncertain over normative theories and aggregation methods (which you should be), you likely wouldn’t end up strongly favoring any of GHD, AW, or GCR.
- Small amounts of risk aversion can dramatically change results and tend to favor Global Health and Development (GHD) and animal welfare (public post).
- The evidence for using any given aggregation method to make decisions is weak (more).
- Generally speaking, quantitative studies provide more robust evidence compared to the philosophical and empirical foundations of cause prioritization. Moreover, EAs are very skeptical of the strength of evidence available from single RCTs or causal inference via observational studies (more)
- Not only are we skeptical of the specific evidence from those studies, but we are presumptively skeptical of even high-quality studies. It requires additional corroborating evidence and the survival of rigorous scrutiny before accepting conclusions drawn from such evidence.
- The resulting uncertainty about what to do is a serious problem, but I’m not convinced that some possible responses here eliminate or significantly reduce this problem.
- The nature of what “effective altruism” is or the idea of “doing the most good” doesn’t justify ignoring this uncertainty and leaning into a particular form of consequentialism or utilitarianism (more).
- Nothing about this justifies a largely intuition or “priors” driven approach to determining cause prioritization. If anything, relying heavily on intuitions is worse from an EA perspective than relying on relatively weak philosophical evidence (more).
- Being an anti-realist about ethics may eliminate the concern, but it may only do so if you basically commit to an “anything goes” version of anti-realism. Also, anti-realism isn’t equally plausible across all relevant domains, and the concern about philosophical evidence being weak may undermine staking out such strong anti-realist claims (more).
- You cannot escape this by claiming that an approach to cross-cause prioritization is “non-philosophical” or a common-sense approach (more).
- This isn’t a reductio ad absurdum against the idea of comparing charities or doing cross-cause comparisons at all. It doesn’t justify an “anything goes” approach in philanthropy, and we can still rule that some charities are better than others (more).
- Some might argue that they have high confidence in particular normative theories and/or specific aggregation methods, which would limit this concern. But even if you do (and I don’t think you should), I believe both aggregation methods and normative theories are underspecified enough that even with this stipulation, it’s probably unclear what you should practically do (more).
- I’m not in a position to say what this means for any particular actor, but I think a big takeaway is we should be humble in our assertions about cross-cause prioritization.
- I’d be keen to see more work on these meta-normative questions of how effective altruists should act in the face of uncertainty about these questions.
- I take these concerns as evidence in favor of diversifying approaches, though justifying that position in greater detail is beyond the scope of this post.
- I think you should remain skeptical that we have really resolved enough of the uncertainties raised here to confidently claim any interventions are truly, all things considered, best.
Cause Prioritization Is Uncertain, and Some Key Philosophical Evidence for Particular Conclusions is Structurally Weak
The decision-relevant aspects of cross-cause prioritization rely heavily on philosophical conclusions
Where should we give money and resources if we want to take the best moral actions we can? I take this to be perhaps the central question at the heart of the effective altruist (EA) approach to doing good. That there are better and worse ways of improving the world, and that we can use evidence and reason to determine where to give to select better options, is also a key pillar of this approach.
I think this claim is important and true. Still, it can be overstated in many circumstances if it’s taken to be a claim that we can know with high confidence how many potentially very promising interventions rank relative to each other.
It’s relatively easy to argue, all else equal, that it’s better to save 100 lives rather than 10 or that interventions with robust evidence of effectiveness are more appealing than those without such evidence. It’s harder (but still relatively easy) to argue that, when spending charitable dollars, it’s better to save 100 lives than spend the same amount of money exposing 100 people to art for one hour each. Not everyone is on board with these claims[1], but they are very difficult to argue against. These are largely taken as background considerations within EA. Importantly, these points aren’t particularly contingent on highly contentious philosophical claims and can be endorsed by people with different views about normative ethics (i.e., deontologists and utilitarians), how to value present vs future people, which decision theory to use, how to compare human to animal welfare, and how to deal with moral uncertainty.
However, some of the main topics of EA concern, such as weighing how causes (like global health and animal welfare) or interventions (say, malaria nets vs corporate campaigns to improve hen welfare) compare to each other, do not turn on uncontroversial questions like whether saving 100 lives is more valuable than saving 10. Instead, I think EA cause prioritization decisions often rest on (implicit or explicit) philosophical considerations that are much tougher to justify reaching with high confidence. This is because the action on, say, deciding between malaria and pandemic prevention often necessarily includes, among other things, consideration of normative views (which ethical theories to use), decision theories (what procedures should we use to select actions), and population ethics (how do we handle problems when actions impact current and future populations). Not only can it be very difficult to compare really disparate outcomes well, but the outcomes of these comparisons are also often heavily theory-laden and fragile to changes in the assumptions or approach used, as WIT pointed out in their recent post on the different types of cause prioritization.
I think this is a general problem for cause prioritization, given the reliance on philosophical evidence and particularly a problem for handling decisions under normative uncertainty (how we should combine our ethical views into a single choice if we aren’t sure about a given theory), and selecting among competing methods to aggregate views given uncertainty. This, in turn, is a significant problem for achieving high certainty about which specific interventions are best because aggregation methods disagree on what to do, and, in my opinion, the evidence supporting aggregation methods is weak.
Philosophical evidence about the interesting cause prioritization questions is generally weak
Let’s consider an inside view and an outside view of how we can think about the strength of evidence for cause prioritization. The inside view is from the perspective of the particulars about the concrete arguments about cause prioritization. The external view abstracts away these particulars and instead examines the broader epistemic landscape of similar problems. I think both suggest that we shouldn’t have high confidence in the philosophical arguments needed to justify strong views about the action-relevant points for cross-cause prioritization within EA.
An inside view against having strong views
In philosophy, for the most part, when it comes to major considerations like ethical theories (i.e, consequentialism vs deontology) and decisions under normative uncertainty, people aren’t even claiming to demonstrate that X is true, only giving some considerations in favor of X and against Y.
But almost every philosophical argument has a counterargument. I’m not a nihilist about reaching conclusions on any of these matters whatsoever, but on a lot of fundamental conceptual issues of cause prioritization (i.e., normative views, decision theory, aggregation methods, population ethics), the evidence in question is often a series of attributes nearly everyone agrees on but disagrees about how much of a strength and weaknesses each is, or how to add up these considerations[2].
That is, one issue may be “disqualifying” for Susan’s consideration but not for Tim’s. Tim thinks that expected value (EV) reasoning leads to fanaticism (the view that there’s always an N high enough that EV recommends an action, no matter how small the probability of obtaining the outcome is). Still, Tim thinks this is a cost worth paying to preserve the structure of EV reasoning[3]. By contrast, Susan may think fanaticism is a severe enough issue that we should look for alternatives that avoid that issue[4]. These are the typical terms of the debate when directly considering the specific evidence and arguments presented to adopt one view or another.
A particularly clear example of this dynamic appears in Chapter 4 of Beckstead’s 2013 thesis On the overwhelming importance of shaping the far future where he considers how different ethical views deal with causing additional people to exist. After considering a variety of population ethics views and thought experiments to test those theories against particular cases, he produced the following table on page 95. The details of what these cases are and the views aren’t particularly relevant to the point I’m making, but “ X” means “The view faces problems with this case”, “ X*” means “The view faces problems with a version of this case”, and “?” means “It isn’t clear whether the view has intuitively implausible implications about this case”.
| The Happy Child | The Wretched Child | Obligation to have kids? | Sight or Paid Pregnancy | Repugnant Conclusions | Better to create happy people? | Bad to have kids? | Extinction Cases | The Risk-Averse Mother | The Medical Programmes | Disease Now or Disease Later | Mostly Good or Extinction | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Strict Symmetric | ? | X | X | X | X | X | X | X | ||||
| Strict Asymmetric | ? | X | X | X | X | X | X | X | X | |||
| Moderate Symmetric (low weight) | ? | X* | ? | X | X | X | ||||||
| Moderate Asymmetric (low weight) | ? | ? | X* | X* | X | X | X* | |||||
| Unrestricted | ? | X | ? |
Beckstead ultimately took this as evidence in favor of the “unrestricted” view, given it fared best in this comparison, though he notes (i) this was his judgment of these cases and (ii) one could potentially produce a different set of cases to get a different result (and some of these cases are related)[5]. Those are concerns, but the bigger concern for me about this is that there’s no view here that has no unintuitive implications. Given that, to me, the actionable questions are what series of unintuitive bullets you are willing to bite to endorse any view, how do you weigh those unintuitive demerits against each other, and how do you weigh these demerits against whatever benefits you get from the theory[6]?
This setup isn’t unique to population ethics. I think this dynamic–competing views with varying strengths and weaknesses, backed by competing claims across thought experiments that show different opposing views as unintuitive–also applies to competing decision theories and procedures and normative theories that are central to cross-cause prioritization[7]. People may profess very high confidence that a given view is correct because they perceive their view to be very intuitive (or more intuitive than alternatives), but weighing up which theories survive the intellectual obstacle course of thought experiments doesn’t look like the kind of evidence that can lead one to have high confidence that ultimately we came to the correct conclusion in these domains.
I think this is another way of saying that, in the domains most relevant to reaching concrete decisions in cross-cause prioritization, the evidence is often insufficient for coming to strong conclusions that a particular view is actually true. It’s really hard for me to believe we could be, say, 90% sure a particular view is right when the evidence looks like this (and I’d often argue you should be far less than 90% confident given this type of evidence).
An outside view on having strong views
But consider a view that abstracts away the particular arguments delivered and consider what this looks like from an outside perspective[8]. Suppose there were ten accounts of some historical biological issue, such as the reason for the origin of life. Suppose further that there’s been a decades-long disagreement about which theories are correct. The field has disagreements on which methodologies are most appropriate to address the question, disagreements about what each methodology, once applied, shows, and given the nature of the debate, there is unlikely to ever come strong empirical evidence demonstrating, in fact, that one theory is true. Further, as a social matter, suppose scientists who adhere to different theories all get peer-reviewed by the whole community, but there’s a strong social and financial incentive to advance the theory you started your career advocating for.
What credence (the measure of your beliefs’ strength) should you, someone who’s not a biologist, have in the correct theory? It’s very unlikely to me that the answer would be significantly higher than 0.1, but it definitely wouldn’t be > 0.5[9]. And even if you were a biologist in the field, given the peer disagreement, the weakness of the evidence, and the social incentives, it would seem unjustifiable to place much more credence than 0.2 in theories even if, in practice, a large number of biologists profess credences of ≥ 0.75 in particular theories. Again, perhaps you can make some special case that would justify a higher certainty, but it seems very unlikely you should, on the merits of the epistemic situation, hold 0.75 credence in any theory.
Enough cause prioritization issues take this general shape that you should be really skeptical that you find yourself in the situation of the special biologist who is justified in giving ≥ 0.5 to their favorite theory. For these types of questions, the strongest argument given is often a version of saying about an alternative view, “That’s unintuitive!” which is not very compelling as an argument for why you should have high confidence in a particular view, given people disagree, often strongly, about what’s unintuitive.
While decades of investigation have examined many major philosophical debates, there hasn’t been widespread, sustained debate on several issues related to EA. Interspecies comparisons of moral weight, aggregation methods across normative theories, the correct philanthropic discount rate, and many others are largely niche issues.
I submit that philosophy moves slowly, and even when it converges on an answer, or a series of answers, it takes robust discussion for that to happen. To a rough approximation, one could say that, historically, most philosophy has been wrong or at least misguided[10]. In the face of that type of history, I would argue that the best default attitude on any arbitrary philosophical position should be skepticism unless the position is extraordinarily well-mapped out.
It’s always true that “this time could be different,” but to come to strong conclusions on these issues at the heart of cause prioritization seems to me to be like William Godwin and his friends deciding in the early 1800s that they had resolved all the major issues of utilitarianism and now could give detailed object-level advice about how to set up the structure of government to enact the ideal version of utilitarianism.
Aggregation methods disagree
Last year, some of Rethink Priorities’ work investigated how different worldviews affect project funding decisions under normative uncertainty. It turned out that the recommended course of action depends significantly on how you combine credences across various worldviews. Different aggregation methods can lead to very different conclusions. This is because the way you weigh preferences and apply decision-making criteria can significantly change the outcome of your analysis.
To endorse one cause area as much more important than everything else, you have to adopt a number of philosophical premises[11], many of which are the subject of active philosophical debate and discussion.
I have used RP’s moral parliament tool to examine what would happen if you distributed your credences across various ethical theories[12]. You can view the results of that exercise, which show how different aggregation methods would distribute resources across interventions in this spreadsheet and in Figures 1 and 2 below.

Figure 1: Allocations Across Causes and Aggregation Theories. Shows how resources allocated to cause areas vary across aggregation methods. More information on the aggregation methods used and how they are implemented in the underlying tool can be found here. Further details on the inputs used to produce these results can be found in Footnote 12, and all values are in this spreadsheet.
There are several takeaways from this.
- Aggregation methods may matter more than normative ethics, particularly if you have low credences across various normative theories. Think of “aggregation methods” as different ways of solving a complex puzzle. How you choose to solve the puzzle can significantly change the final picture, even if the puzzle pieces (your fundamental beliefs) stay the same. (See Figure 1).
- Depending on the aggregation method, the same set of normative and empirical commitments can assign ≥ 75%, and in some cases ~100%, of resources to each of Global Health & Development (GHD), Animal Welfare (AW), or Global Catastrophic Risk (GCR), and, in some instances, different particular projects within areas.
- Small changes to the set of included projects, can radically change the outputs under some aggregation approaches. Adding a single additional project to be considered can shift the majority of resources away from what was being favored to a different project, and/or shift resources more to projects already under consideration. That is, adding a new project isn’t purely additive for the new project, with corresponding drops for all other projects — (See Figure 2)[13].
- If you were uncertain over normative theories and aggregation methods (which you should be), you likely wouldn’t end up strongly favoring any of GHD, AW, or GCR[14].
- Small amounts of risk aversion can dramatically change results and tend to favor GHD and animals (you can also see this demonstrated in WIT’s portfolio tool).

Figure 2: Consequentialism 1 vs Consequentialism 2 shows how resources allocated to cause areas vary across aggregation methods when adding an additional project to be considered. Consequentialism 1 contains 3 projects for each cause area. Consequentialism 2 adds a single project on reducing the worst pest management practices to these nine projects previously considered and produces substantially different results for the Borda method and marginally different results elsewhere. More information on the aggregation methods used and how they are implemented in the underlying tool can be found here. Further details on the inputs used to produce these results can be found in Footnote 12, and all values are in this spreadsheet.
Evidence for aggregation methods is weaker than empirical evidence, of which EAs are skeptical
Selecting an aggregation method across normative uncertainty didn’t get a definite treatment until ~2000 with Lockhart’s Moral uncertainty and its consequences, with Will MacAskill’s longer work Normative Uncertainty occurring in 2014. This is barely a blink on the timescale of analytic philosophy. This can be true even though this is already an area of active debate, including MacAskill and Ord 2018 defending maximizing expected choiceworthiness (MEC) as a solution to normative uncertainty, and Greaves and Cotton-Barratt 2023 comparing MEC and Nash bargaining.
Even if one finds MacAskill and Ord’s arguments for MEC compelling, we must critically examine the strength of philosophical evidence. This work, like most philosophical work, relies on a combination of explicit arguments about what follows from what, simplifying modeling assumptions and intuition, and generally involves many implicit or explicit choice points. Generally, philosophical arguments, no matter how persuasive, have a historical tendency to be subsequently challenged or dismantled. In fact, I argue that a compelling philosophical argument for MEC’s theoretical or practical superiority provides substantially weaker evidence than an empirical social science randomized controlled trial (RCT) demonstrating a causal effect[15]. And further, I submit the selection effects of caring about optimizing charitable giving so much that you end up, say, endorsing GiveWell and reading meta-commentary on cause prioritization, means you are likely very skeptical of the results of such a single RCT.
Within effective altruism circles, there’s a presumptive skepticism of empirical results. The default stance toward empirical research is “interesting initial finding, but let’s see if it holds up to future challenges and replications.” This attitude should similarly guide our assessment of the correct answer to EA-related philosophical questions, such as which aggregation method to use to address normative uncertainty.
Indeed, for empirical studies, it is common for an initial study claiming a causal relationship to be subsequently complicated or nuanced by follow-up investigations. Holden Karnofsky’s Does X Cause Y? An in-depth evidence review blogpost effectively illustrated this dynamic. The strength of the evidence produced by philosophy in the cross-cause prioritization action-relevant cases is markedly weaker than the types of quantitative evidence that exists in the domain Karnofsky was critiquing and certainly weaker than the RCTs Karnofsky says he likes to rely on.
Objections and Replies
Aren’t we here to do the most good? / Aren’t we here to do consequentialism? / Doesn’t our competitive edge come from being more consequentialist than others in the nonprofit sector?
Consequentialism, and even utilitarianism, underspecifies how to act generally, but uncertainty across the various forms of consequentialism presents these same problems with different aggregation methods, suggesting radically different actions. See the consequentialism tabs in the aggregation methods moral parliament results here.
Further, I don’t think the goal of EA is to do whatever consequentialism or utilitarianism says, regardless of how confident one should be in the underlying theory or aggregation method.
And while it is true that, to a significant extent, people in EA are on average much more consequentialist than others working in charity, the point of doing our work is supposed to be improving the world. We’re not a for-profit venture looking for a market-based edge; we’re trying to do what is right and do that well. It’s true we could lean into doing a certain version of utilitarianism and provide some value compared to those that aren’t as clear about how they are reaching their tradeoffs. Still, we should only do this if we think we ought to be advancing this as a normative claim, not because we found an opportunity to sell this vision to ourselves or others.
Can’t I just use my intuitions or my “priors” about the right answers to these questions? I agree that philosophical evidence is weak, so we should just do what our intuitions say
There are multiple senses of “priors” that are relevant here. To the extent this is about the application of your reflective sense of the credence in all the relevant theories after a review of the evidence for and against them, using “priors” may be completely unobjectionable. However, another sense seems relevant, which is about taking your unreflective inside-view, or perhaps just subconsciously or consciously inserting your favorite theory as the relevant benchmark, and calling the application of that process using “priors” to reach a determination. In this latter sense, using “priors,” which may or may not be synonymous with your intuition, to decide matters of cause prioritization this way is, if anything, worse than just taking strong philosophical stances in light of limited evidence.
To the extent EA has norms, one of them should be clear reasoning and giving evidence for your position. Once you give up doing that and assert your intuitions for making these calls, you will often have largely abandoned the attempt to reason to the correct answer.
As a point of comparison, it seems pretty clear that an individual’s philosophical intuition on many of these matters is even weaker evidence than relying on a single empirical study. Intuitions about these topics should be treated as “break glass in case of emergency” tools, not a key deciding factor. Some reliance may come up, but you should be careful not to overuse them, and when they are used to make important decisions, there should be extensive reporting about how and why it was done this way.
This is not to say we shouldn’t consider intuitions when it’s all we have, or that we can’t, say, combine intuitions about these matters in some rigorous manner. It is to suggest that your “priors” in the context should be uncertain, and we should be cautious and clear about when we are relying on intuition rather than clear evidence and argument, and be aware of all the biases that could come from doing so.
We can use common sense / or a “non-philosophical” approach and conclude which cause area(s) to support. For example, it’s common sense that humanity going extinct would be really bad; so, we should work on that
This has overlapping issues with the use of “priors” and intuitions discussed above because “common sense” is often a stand-in for the speaker’s views. Additionally, there is very rarely a singular “common sense” answer (and even if you poll people to empirically get such a view, you’d often get different answers across times and locations).
Even putting those issues aside, this approach doesn’t answer the central questions we face. That the lives of the global poor are more important than ruining a suit of someone in a rich country; that it’s bad to torture animals; and that, all else equal, it would be good for humanity to continue to exist are all common-sense views (even if contentious once you get into the details). However, this approach doesn’t resolve how to make concrete trade-offs among these common-sense claims. The hard cause prioritization questions are about which specific tradeoffs are desirable. Common sense might have an answer to: “Should we try to make humanity continue to exist and have a flourishing future or not?” It doesn’t say anything helpful about: “How many resources should we place on creating that future relative to preventing global poverty now?”
Any answer given to that latter question at least implicitly takes stances on the empirical and philosophical justifications for taking one action rather than another. Similarly, to make decisions in cause prioritization, one needs to answer questions like:
- What is the correct amount to weigh different animal species relative to each other and to humans?
- How does one weigh the value of foreigners compared to locals?
- What is the tradeoff between income and health? (How many dollars topeople in extreme poverty is a year of healthy life worth?)
- How do you value the future beyond the next ~200 years if at all?
- What is the net value of existence now and in the future?
- Can many really weak claims of harm (weak in probability and/or intensity) outweigh one really strong claim of harm? If so, under what circumstances?
It’s implausible that there are “common sense” or “non-philosophical” answers to these questions. Moreover, insofar as there are common-sense or “non-philosophical” answers, there is no reason to think that they will lead to a consistent and reasonable overall view.
In any case, part of the appeal of common sense is that it allows us to act with reasonably high confidence. However, it’s hard to think that common sense can provide that here. We will be forced to grapple with difficult, uncomfortable tradeoffs that require explicit reasoning about these tradeoffs.
I’m an anti-realist about philosophical questions, so I think that whatever I value is right, by my lights, so why should I care about any uncertainty across theories? Can’t I just endorse whatever views seem best to me?
Being anti-realist in this way, that you can set aside the strength of the evidence and arguments for claims in some ways, is an untouchable position for this critique of evidence, but likely only if you’ve truly committed to ignoring the evidence entirely. Not all anti-realists are completely “anything goes” in this way.
To some extent, there is overlap here with the above concern about the use of “priors” on these questions, as if you are giving up entirely on using evidence and reason to reach your views. This seems like you would be violating the norm on using evidence and reason to reach your conclusions that EA is built upon. It also seems like the strongest anti-realist stance could justify any arbitrary position including, say, endorsing Derek Parfit’s Future Tuesday Indifference–where a hedonist cares about the quality of their future unless it happens on a future Tuesday–or Parfit’s Within-a-Mile-Altruism, where a person cares only about events that occur within a mile of their home and doesn’t care at all about the suffering of those further than one mile away. This may or may not be too much of a bullet to bite, depending on your commitment to anti-realism.
Separately, it may be more plausible to stake out such an anti-realist position on some normative claims than others. An anti-realist position about ethical theories may be more plausible than an anti-realist position about decision theory.
If the evidence in philosophy is as weak as you say, this suggests there are no right answers at all and/or that potentially anything goes in philanthropy. If you can’t confidently rule things out, wouldn’t this imply that you can’t distinguish a scam charity from a highly effective group like Against Malaria Foundation?
No, it doesn’t suggest that anything goes. First, whether there are right answers is one thing; whether we can know those answers is another. Second, whether we can have high confidence is one thing; whether we can get action-relevant evidence is another. To do the hard work of setting priorities in philanthropy, we need action-relevant evidence, which often means acting on significant uncertainties. Still, if the differences between the best charities and the rest are large enough—and they are!—we can make progress.
More concretely, recall that the moral parliament tool I used only included the types of charitable actions that were promising according to the worldviews included. That is, the charity list (which does not consist of real charities) isn’t a definitive list of all interventions that are actually being pursued by all humans. That means many charities are less efficient at saving lives, improving happiness, or creating a just society, and would still be strongly disfavored even by an approach that relies more modestly on philosophical conclusions in controversial areas. Charities that accomplish the same goals as another group but less efficiently will be dominated, even under this approach.
Additionally, nothing about this approach makes it so that, say, you can’t distinguish between a charity that adds value by exposing wealthy individuals to more art and corporate campaigns for chickens. Again, there is a range restriction here in what is shown, since only at least somewhat plausible philosophical views were included.
While I see the concern about this making it less clear what are the best opportunities, I think a more accurate reading of the concern about philosophical evidence being weak is something like “among plausible charitable targets like those included in that moral parliament tool, it’s difficult to draw strong conclusions of one cause area over another” instead of “you typically can’t draw strong conclusions about charity X over charity Y working on the same problem”.
I have high confidence in MEC (or some other aggregation method) and/or some more narrow set of normative theories, so cause prioritization is more predictable than you are suggesting, despite some uncertainty in what theories I give some credence to
Again, I would note that you likely shouldn’t have strong beliefs given the weak nature of the evidence. But even if you do have strong beliefs, it is still likely not super clear what aggregation methods practically suggest to do because:
- Many aggregation methods are underspecified because
- There are numerous potential variations of these methods, and small changes in how you approach the problem can significantly alter the recommended actions
- The way you understand and frame normative uncertainty can change how these methods are implemented
- In many cases, the recommended action an aggregation method suggests depends on subtle variations in underlying normative theories. Some methods consider not only the ranking of preferences but also their intensity. This means that how strongly a theory argues for something can be as important as what it argues for.
1. Aggregation methods are underspecified
a. There are a large number of potential variants of these methods
Firstly, MEC is a probability-weighted sum of an action’s choiceworthiness relative to all the theories you consider. But to produce that sum, you need some way to handle intertheoretic comparisons. This is an initial choice point, as there are many ways you could handle such comparisons and normalize across theories[16].
Cotton-Barratt, MacAskill, and Ord 2020 suggests mirroring how we handle empirical uncertainty by mapping normative theories onto a single metric while keeping the variance within the theories the same[17]. That does not tell you, among other things, which type of data set with empirical uncertainty we are attempting to model and, hence, how to handle outliers (extreme values that differ significantly from other data points). As a result, variants on the theme of MEC (or other aggregation methods) might produce very different results.
When handling empirical uncertainty sometimes you should drop outliers (i.e. when such outliers are very likely to be erroneous), other times you may want to amplify outlier weights (i.e. when deviation from middle of a data set is particularly important), and sometimes you might want to maintain but reduce the weight of outliers or regress them back towards the mean (i.e. when what’s important is the shape of the distribution and/or when you suspect outliers are erroneous but directionally correct).
Critical questions arise: Are extreme/outlier values when handling normative uncertainty more like one of these cases than the others? Supposing you think you should amplify outliers, should the outliers be squared or cubed? Supposing you should drop some outliers, when do you start to drop values? Supposing you should do some regressing to the mean, at what point do you start the regression?
These considerations could significantly alter the conclusions that different versions of an aggregation method yield. There are numerous transformations that could be applied, creating a wide variety of gradations in how a theory processes normative theory outputs.
This issue of nearby variants applies to MEC and many other aggregation methods. For example, when modeling approval voting as a normative aggregation method, RP assessed the best possible distribution by the light of theories under consideration and then had to assign a threshold to determine how close to that optimal distribution an output would have to be to approve of it. There is no natural answer to this question[18]. Similar issues apply to other aggregation methods that rely on bargaining, like Nash bargaining (but also to voting methods like score voting and quadratic voting).
b. How you conceptualize the challenge of normative uncertainty can change how you implement aggregation methods
Second, how you conceptualize normative uncertainty may alter how some implementations of aggregation behave. How hard should theories negotiate if they are bargaining? Should theories with large values for certain outcomes refuse to accept any outcome that doesn’t result in them getting their way, even if they aren’t in a strong negotiating position about the percentage of seats in a moral parliament? In real-world parliaments, such tactics might prevent a coalition government from forming or cause it to collapse. However, in real life, the range of political actors is broad: it includes parties bargaining hard along with softer negotiators who can almost always reach a compromise; it also includes terrorists who refuse any bargain that is not 100% in their favor on pain of death to others and themselves.
In handling normative uncertainty, are normative theories (or certain specific actions a theory commands) that would demand most or all the resources to themselves because of strong preferences more like terrorists (who are often in real life forbidden from parliament because they refuse to negotiate and/or because they take actions other theories strongly disapprove of) or are they more like coalition partners within a government just driving a hard bargain?
How you analogize this situation could significantly alter which results aggregation methods return (for all but non-weighted voting methods), but how you conceptualize this turns at least in part on what you are trying to accomplish by turning to normative aggregation methods in the first place[19]. I think, in light of the weak evidence inherent to resolving disagreements of this kind about the purpose of aggregating across uncertainty, we should be cautious in thinking it’s clear what results various aggregation methods would return, even if you give relatively high confidence to certain aggregation methods.
2. Normative theories are underspecified
Finally, as noted by Baker (relevant section in penultimate draft), MEC is potentially susceptible to spending resources on modifications to the normative theories under consideration that amplify the typical results of the theory. That is, for some normative theory A that says what you should do and the stakes are X, there can be an A* theory that says you should do the same thing, but the stakes are 1000x. Given the added stakes, if you incorporate A* into your normative considerations, it can dominate your actions even if you think A* is 100 times less likely than A. For example, suppose you give contractualism 10% credence. There could be a version of contractualism that takes the same normative approach but applies a 100,000x multiplier to the stakes of all outcomes. If you assigned a 0.1% credence to this modified contractualism, it would still dominate standard contractualism if it were incorporated in MEC. In general, so long as we aren’t nearly certain such amplifications are false, such modifications might upend what you think MEC returns as a result of these amplifications could (and possibly would) dominate your actions.
Aggregation methods that rank order theories without consideration of the intensity of preferences within theories will not be affected (like ranked choice voting or approval voting). Still, many, if not all, other aggregation method results could be at least somewhat altered by this consideration. For some, like MEC itself, these amplifications may be the only thing that matters, depending on how you handle improbable theories and/or large value outputs from a theory.
In other words, suppose you go all the way in on a normative uncertainty approach like MEC (and you should not). By choosing MEC, you are applying a method that heavily rewards the intensity of preferences in theories. And, if you take first-order philosophical uncertainty seriously, you may end up heavily endorsing theories considered implausible because you can’t easily rule them out.
Conclusion (or “well, what do I recommend?”)
I have presumptive skepticism against reaching strong conclusions based on limited evidence generally, and that applies as much to philosophical evidence that supports cross-cause prioritization decisions as it does to empirical investigations or scientific studies. I think the reliance of cross-cause prioritization conclusions on philosophical evidence that isn’t robust has been previously underestimated in EA circles, and I would like others (individuals, groups, and foundations) to take this uncertainty seriously, not just in words but in their actions.
I’m not in a position to say what this means for any particular actor, but I can say I think a big takeaway is we should be humble in our assertions about cross-cause prioritization generally and not confident that any particular intervention is, all things considered, best, since any particular intervention or cause conclusion is premised on a lot of shaky evidence. This means we shouldn’t be confident that preventing global catastrophic risks is the best thing we can do, nor should we be confident that it’s preventing animal suffering or helping people in low-income countries. This applies at the cause level and, often, at the level of specific interventions when comparing across cause areas. I would be keen to see more work on these meta-normative questions of how effective altruists should act in the face of uncertainty about these questions (i.e., have we captured all the relevant uncertainties? Are there particular actions that are very robust to these considerations? What are they?).
In short, EA remains primarily a set of questions and an approach rather than an answer to me. This is despite EA in practice over the last several years becoming much more siloed by cause area.
Overall, I take these concerns as evidence in favor of diversifying because you can’t be sure any given approach is “truly right”, though extensively justifying that take is beyond the scope of this post. Maybe one day our smarter descendants will resolve some of these problems, but it’s also possible they won’t because there aren’t answers to them. For now, though, I would say you should remain skeptical that we have really resolved enough of the uncertainties raised here to confidently claim any interventions are truly, all things considered, best. I would encourage everyone, regardless of which area or intervention they currently consider best, to reflect explicitly on these uncertainties and how they should update their behavior in light of them.
Acknowledgements
This post was written by Marcus A. Davis. Thank you to Arvo Muñoz Morán, Hayley Clatterbuck, Derek Shiller, Bob Fischer, Laura Duffy, David Moss, and Ula Zarosa for their helpful feedback. Thanks to Willem Sleegers and Arvo Muñoz Morán for providing graphs showing the results of the use of the moral parliament tool.
Rethink Priorities is a global priority think-and-do tank aiming to do good at scale. We research and implement pressing opportunities to make the world better. We act upon these opportunities by developing and implementing strategies, projects, and solutions to key issues. We do this work in close partnership with foundations and impact-focused non-profits or other entities. If you’re interested in Rethink Priorities’ work, please consider subscribing to our newsletter. You can explore our completed public work here. You can also subscribe to my Substack Charity for All here, which will have a short version of this post next week.
