Search

Interrater reliability of the Cumulative Pain Framework: Welfare threats in egg and chicken production

Share with
Or share with link

Executive summary

What I did

  • The Welfare Footprint Framework enables researchers to estimate Cumulative Pain: the time population members spend in varying severity levels of pain due to a certain welfare threat. I assessed the extent to which estimates vary depending on who is conducting the evaluation.
  • Three animal welfare scientists replicated four Cumulative Pain estimates for farmed chickens originally completed by the Welfare Footprint Institute.

What I found

  • Welfare Footprint Institute estimates consistently erred on the conservative side.
    • Other methodological decisions may play a greater role in whether their estimates give conventional conditions the benefit of the doubt.
  • Overlap in average time in pain was fairly low, suggesting that raters’ raw estimates were overprecise (i.e., their estimates of duration and prevalence were too narrow).
    • That said, modest overlap did not always imply strong disagreement in absolute terms.
  • Raters were in closer agreement about severity attributions for acute (and presumably more severe threats) than they were about chronic issues.

Recommendations for increasing reliability of Cumulative Pain estimates

  • Calibration training before the evaluations may mitigate overprecision.
  • Have multiple raters independently produce estimates, and then either pool their estimates or resolve their disagreement through discussion.
  • Allow differentiation between uncertainty and heterogeneity among severity levels.
  • Experiment with having raters visualize distributions instead of reporting uncertainty intervals.
  • Provide resources to help raters systematize their beliefs before reporting estimates (e.g., the templates in Tables 1 and 2 of Schuck-Paim et al., 2025).

Background

What is the Welfare Footprint Framework?

The Welfare Footprint Institute developed the Welfare Footprint Framework method to estimate how much pain (a catch-all term for all negatively valenced experiences, whether physical or emotional) a particular set of circumstances causes. A unique feature of the framework is that it measures the severity of all negatively valenced experiences on a single, ordinal metric. The severity categories are “grounded on (evolutionary) principles that should be common to most pain experiences: the disruptive character of the pain experience and its effectiveness to promote adaptive behaviors. [ . . . ] The greater the threat, the louder this signal should be to ensure it will take precedence over other bids for behavioral execution” (Alonso & Schuck-Paim, 2021, p. 2). While Welfare Footprint Institute conceptualizes pain severity as a continuum, the method currently uses just five ordered categories to make subjective judgments of severity more tractable (see Box 1)

Box 1: Pain severity definitions

No pain: Although the welfare threat is present, it is not causing a negatively valenced experience.

Annoying: Pain experiences are not intense enough to disrupt the routine or daily activities of individuals, their possibility to enjoy pleasant (positive) experiences, or their ability to conduct mentally demanding tasks that require attention. Sufferers do not think about this sensation most of the time, and when they do they can adapt to it.

Harmful: Pain experiences that most would consider disruptive of daily routine. Although not entirely preventing individuals from functioning, their ability to do so is impaired as the direct result of pain, and often accompanied by the desire to take painkillers or seek treatment. Frequent complaints are often present. The possibility to enjoy pleasant experiences is impaired, as is performance on mentally demanding tasks, alertness and attention to ongoing stimuli.

Disabling: Most forms of functioning or enjoyment are prevented as the direct result of pain. Symptoms are continuously distressing. Individuals affected often substantially reduce activity levels and refrain from moving. Pain at this level can disrupt or prevent sleeping. Only strong analgesia can relieve it.

Excruciating: Threshold of pain under which many people would choose to take their life rather than standing the pain. This is the case, for example, of severe burning events, which may make victims jump from buildings, or other conditions associated with suicidal attempts by sufferers (e.g., cluster headaches). Many forms of torture have been designed to inflict pain at this level. Behavioral patterns can include loud screaming, involuntary shaking and extreme restlessness.

Notes. These definitions are directly excerpted from Table 1 of Alonso & Schuck-Paim, 2021, except for the No Pain description, which I wrote.

Researchers estimate “Cumulative Pain” by assessing the duration of time population members spend at each severity level of pain when enduring the welfare threat of interest.[1] By weighting Cumulative Pain estimates by the prevalence of the welfare threat (i.e., the proportion of population members that are actually exposed to it), the analyst can estimate the duration in pain for an average population member.

The minimal steps to estimate Cumulative Pain include:

  1. Identify the population and environmental conditions of interest. For example, Schuck-Paim et al. (2025a) examined air exposure during the slaughter of farmed rainbow trout.
  2. Individuate the welfare threats into discrete stages. For example, Schuck-Paim et al. (2025) divided slaughter of rainbow trout via asphyxiation into four stages: (i) alarm immediately after air exposure, (ii) hypercapnia and acidosis, (iii) metabolic exhaustion, and (iv) depression of neuronal activity.[2]
  3. Based on reviews of all relevant literature, report estimates with appropriate level of uncertainty[3] for each stage of the welfare threat. For each stage, the analyst:
    1. Estimates the duration of time that each stage lasts for the average population member who experiences the welfare threat.
    2. Assigns proportions to each of these five severity categories that must sum to 1. The proportion represents the fraction of the stage’s duration that pain is at that severity level. (Equivalently, it can be thought of as the proportion of population members who experience the threat who endure pain of that level for the whole stage.)
    3. Estimates the prevalence of the welfare threat (or for individual stages, if they are separable) as a proportion between 0 and 1.
  4. Model time in pain for the average population member using a Monte Carlo approach.[4] For each severity level, multiply the duration of the stage by the proportion of time spent at that severity level. Summing the product of each stage provides the time that the average population member who is actually exposed to the welfare threat endures it at that severity level. See Figure 1 in Schuck-Paim et al., 2025 for an example of air exposure in rainbow trout. Discount this estimate by the prevalence of the welfare threat to obtain the time in pain for the “average” population member.

Box 2: Why is the Welfare Footprint Framework useful?

Cause prioritization. Analysts can compute the aggregate amount of pain caused by the welfare threat by multiplying the time in pain for an average population member by the size of the population. All else equal, the most efficient way to help a population is to mitigate the welfare threats causing the most pain first. More broadly, different welfare threats to the same population can be added together to yield the overall burden of pain for that population. All else equal, it is more efficient to prioritize populations experiencing more aggregate pain.

Assess the net effect of a reform. Many welfare improvements have a host of effects, some positive and some negative. For example, while cage-free housing meets more of the behavioral needs of egg-laying hens than caged housing overall, there is a concern that certain welfare threats may be more common, such as keel bone fractures. By quantifying the pain incurred from myriad welfare threats in both housing systems on the same unidimensional metric of pain severity, it is possible to compare the aggregate amount of pain caused by each system.

Cost-effectiveness analysis. Stakeholders are more likely to implement reforms when the magnitude of the net benefit is high relative to the cost. The net effect of a reform can serve as an input into the calculation of how it costs to avert one unit of pain (e.g., see Schuck-Paim et al., 2025b).

Transparency about uncertainty. Due to gaps in the evidence base or time constraints in reviewing it, analysts will, in practice, never be fully confident about attributions of pain severity, duration, or prevalence. The Welfare Footprint Framework enables analysts to explicitly model how much uncertainty they have.

Why would Cumulative Pain estimates differ across raters?

The trustworthiness of Cumulative Pain estimates depends on the “rater” (the term used by psychometricians for a person providing subjective data) calibrating their beliefs to the available evidence. In theory, attributions of duration and prevalence should be fairly clear-cut. In practice, scientists may not have published studies that measure the presence of a given welfare threat with much precision. Exacerbating the issue, raters may have trouble quantifying their uncertainty.

Attributions of pain severity likely depend even more on subjective judgment because mental states are not directly observable. Our understanding of how pain severity maps onto behavioral and physiological indicators is incomplete, especially for evolutionarily distant species and populations that have not been extensively studied.

Raters are likely to cope with uncertainty in idiosyncratic ways. They may have different priors based on personal experience or beliefs about what sources are most dependable. There are also an indefinite number of non-substantive factors (e.g., a rater’s mood on a given day) that could play a role. And, of course, a biased agenda can affect raters, especially when evaluating controversial issues.

In theory, raters could recognize all of these subjective factors and adjust their estimates accordingly. However, exactly how to make these adjustments requires subjective judgment. Raters may also lack awareness of how their evaluation process differs from others. Or, if they do have a bias towards a particular outcome, they will simply not want to correct for it.

After this project was completed, Welfare Footprint Institute confirmed that they now integrate assessments across several independent raters. Pooled estimates should cancel out different raters’ biases (i.e., systematic over- or under-estimation), so long as the variation in biases is random across raters. Pooled estimates can also cancel out over- and under-confidence (i.e., individual raters’ tendencies to report more or less certainty, respectively, than is warranted by the evidence) if both tendencies are well-represented. However, overprecision is more common than underprecision (Moore & Healy, 2008), so pooling estimates may be insufficient to correct for overly narrow distributions.

Goals of present study

For evaluations that do not require much effort or domain expertise to complete, using several raters is not a major issue for researchers. However, Cumulative Pain assessments are typically quite effortful. Measuring rater disagreement would help determine whether multiple raters justify the additional effort.

Secondarily, it would be useful to know how Welfare Footprint Institute’s own Cumulative Pain estimates differ from other raters. In their work on egg-laying hens, they claim to have “preferred to err on the side of caution than potentially overestimated reform benefits in any particular aspect.” If they, in fact, have a general tendency to give conventional conditions the benefit of the doubt, we would expect them to attribute less pain to conventional farming practices than other raters and more pain to reformed practices than other raters. If they are simply conservative in general, then their estimates should be lower than other raters across the board.

We addressed both goals by having poultry scientists replicate randomly selected Cumulative Pain estimates from Welfare Footprint Institute’s books, Quantifying Pain in Laying Hens and Quantifying Pain in Broiler Chickens.

Methodology

Protocol

I recruited three animal welfare scientists whose area of expertise was either egg-laying hens or broiler chickens. None of the raters had previous familiarity with the Welfare Footprint Framework.

I randomly sampled two chapters from Quantifying Pain in Laying Hens, and randomly selected one Cumulative Pain estimate within that chapter. I completed the same procedure for Quantifying Pain in Broiler Chickens. The four estimates selected were:

  • Chronic egg peritonitis in conventional housing for hens (“Pain-Track 5.2. Hypotheses proposed for the temporal evolution of the pain endured as a result of chronic peritonitis”)
  • A fracture during depopulation and transport for hens raised in conventional housing (“Pain-Track 7.1. Hypotheses proposed for the temporal evolution of the pain endured as a result of fractures suffered by laying hens during depopulation”)
  • Intentional fatal electrocution of broiler chickens (“Pain-Track 9. Hypothesis for the pain associated with fatal electrocution (cardiac arrest) in broilers”)
  • Heat stress in slower-growing commercial broilers affected by heat stress (“Pain-Track 2. Hypothesis for the intensity and duration of the pain endured per day by slower-growing commercial broilers affected by heat stress in slower-growing breeds kept to 8 weeks in indoor housing systems”)

I had raters start by reading an instruction guide I created that included resources for learning about the methodology. They completed their estimates using a template I created in Google Sheets. Raters were restricted to the references that Welfare Footprint Institute cited in the chapter where the original estimates appeared. Therefore, disagreement among raters is due to how different scientists use the same evidence, rather than which evidence they examine in the first place.

I took several steps to prevent artificial agreement. First, raters never met or knew of each other’s identity, and so did not influence each other’s ratings. Second, I emphasized to them several times that they should not read Welfare Footprint Institute’s work on chickens other than chapters 1-2 of the book on hens and chapter 1 of the book on broiler chickens. Third, while I did occasionally ask raters to clarify their understanding of what each parameter meant within the context of a given welfare threat, I did not question ratings in cases where I worried that my knowledge of Welfare Footprint Institute’s estimates might influence my assessment of raters’ estimates.

Statistical Approach

The data and script are available on GitHub. To estimate the time in pain for each severity category for the average population member (see Table 1), I sampled 10,000 draws from a truncated normal distribution for duration, and 10,000 draws from a beta distribution for prevalence. I summed the Annoying, Hurtful, Disabling, and Excruciating distributions to estimate the total time in pain for an average population member.

Using the 95% equal-tailed intervals resulting from the Monte Carlo models, I quantified the level of agreement for each pair of raters. In particular, I calculated the Jaccard Coefficient, which “measures the probability that an element of at least one of two sets is an element of both” (Levandowsky & Winter, 1971, p. 2). It is calculated as the ratio of the length of the intersection and the length of the union of two intervals.

One disadvantage of the Jaccard coefficient is that it doesn’t make any distinction between non-overlapping distributions that are far apart versus close together. Thus, this metric is primarily useful for determining whether raters report adequate levels of uncertainty. Judgment is still required to determine whether absolute differences in pain attributions are decision-relevant.

Also, Jaccard Coefficients can be high simply because most or all raters agree that a particular severity category is absent. For example, it is straightforward to rule out the No Pain category for fully conscious chickens that are dropped in boiling water. Agreement in these “easy” cases may say little about whether to expect agreement in more ambiguous cases.

Results

The raw data are available in Tables A1–A4, but it is generally easier to spot trends using Table 1. In terms of severity attribution, Welfare Footprint Institute is more selective in its use of the Excruciating category. In terms of time in pain (i.e., duration, weighted by prevalence), Welfare Footprint Institute estimates more total time in pain just once (Fractures during Depopulation and Transportation), and the absolute difference is fairly small. On the whole, Welfare Footprint Institute appears conservative about severity and time in pain.

However, lower estimates of pain burden are not unequivocal evidence of giving conventional practices the benefit of the doubt. They report substantially less time in pain for heat stress, and used milder severity categories relative to the other raters. Since a slower-growing breed is the main welfare reform they champion in Quantifying Pain in Broiler Chickens, this finding could be indicative of a pro-reform bias, or of being conservative regardless of the system under consideration.

Turning to the general level of agreement across all four sources, Intentional Fatal Electrocution and Fractures During Depopulation and Transportation showed the most agreement in terms of severity category attribution. These are acute and presumably severe welfare threats. The chronic issues, Chronic Peritonitis and Heat Stress, showed less agreement for severity attribution. In Table 1, Table A1, and Table A4, there was significant variation in the use of Disabling, Hurtful, and Annoying categories. Two possibilities could explain these patterns: either (a) there is less data to discriminate among severity categories for chronic issues than there is for acute issues, or (b) it is generally easier to discriminate between Excruciating and non-Excruciating pain than it is to discriminate between Annoying, Hurtful, and Disabling pain.

Table 1.

Prevalence-weighted time in pain

Notes. Fatal electrocution is measured in seconds. All other issues are measured in hours.

Figure 1 displays the Jaccard Coefficient for every rating pair and the mean of these coefficients. Both it and Table 1 show that there is close agreement on total time in pain for Chronic Peritonitis, save for one outlier. While time in pain for Heat Stress shows notable variation, the possibility remains that it might be harder to judge the severity of chronic issues than it is to judge their duration or prevalence.

Figure 1.

Jaccard coefficients for prevalence-weighted time in pain

Except for cases of universal agreement or three raters collectively disagreeing with a fourth, average overlap was lower than .5, meaning that it was more likely than not that a value that was within one rater’s 95% uncertainty interval was not in a second rater’s interval. These findings are consistent with the hypothesis that raters’ uncertainty intervals for duration and prevalence were too narrow.

Minimal overlap was not always problematic when judged in absolute magnitude. For example, overlap for total time in pain was low for Intentional Fatal Electrocution. But Table 1 and Table A3 show that the disagreements are small in absolute terms, as this welfare threat is unique in unfolding over seconds rather than hours. Overall, Heat Stress in Slower-Growing Broilers appears to be the hardest welfare threat of the four to assess.

Recommendations for increasing interrater reliability

Training in quantifying uncertainty probably matters. While forecasters deliberately practice creating distributions that match their subjective beliefs, scientists typically do not receive such training. Scientists typically do not report their beliefs as confidence intervals, a concept that, at any rate, has a frequentist interpretation in most academic settings. Possibly, disagreement was artificially magnified by noise in how raters translated their beliefs into quantitative reports. Investigators could sample forecasters who are trained in quantifying uncertainty, especially if adequate domain expertise can be obtained over the course of the project.

If domain experts are required, calibration training (e.g., O’Hagan et al., 2006) may mitigate overprecision. My understanding is that the Welfare Footprint Institute now incorporates some calibration training for new raters.

User-friendly tools could reduce reliance on skill in quantifying uncertainty. Visualization tools like Guesstimate would enable raters to toy around with distributions until they find the one that best represents their beliefs. In our work on farmed shrimp (McKay & McAuliffe, 2024), we used mean and standard deviations as inputs to functions that generated distributions, which we personally found more intuitive than confidence intervals. A dedicated app might be required to balance model flexibility and user-friendliness.

Increase flexibility in how uncertainty in severity is reported. In Welfare Footprint Institute’s implementation of the Cumulative Pain metric, the only method for representing uncertainty in severity is to spread out percentages more evenly across categories. This approach confounds uncertainty with genuine heterogeneity in pain severity across population members or within a single population member over the course of the threat. The inability to separate uncertainty from heterogeneity could also potentially create artificial disagreement among raters. In our work on farmed shrimp, we modeled uncertainty about severity categories using Dirichlet distributions.

Limitations

Small sample of raters and welfare threats. Due to sampling error, the results are only able to provide a hazy sense of how much unreliability we can expect between any two randomly selected raters evaluating a randomly selected threat facing farmed chickens. The results are also silent on how much agreement we can expect about issues facing other species.

We did not directly address whether assessments are biased in favor or against conventional practices. Random sampling ensures that our estimates of interrater reliability are generalizable, does not directly speak to all relevant uncertainties. For example, it might be harder to remain impartial about aspects of cage-free housing that industry commentators criticize (e.g., Brown, 2015; Jacob, 2006). A more direct test of whether previous comparisons of conventional and reformed conditions are biased would involve having independent raters evaluate the same welfare threat in both systems.

In personal correspondence after the project was completed, Welfare Footprint Institute clarified that they try to be conservative across the board in their actual assessments. They also noted two ways in which they have historically given conventional conditions the benefit of the doubt. First, do not include in their analysis all welfare threats that are more common or severe in conventional systems (see p. 738 of Schuck-Paim et al., 2025). Second, they have upcoming work (previewed in this talk) arguing that barren environments magnify pain severity relative to enriched environments, which they had not considered in their previous work.

Yet, other methodological choices may have erred in favor of reformed conditions. Consider new fractures during depopulation and transportation. In chapter 7 of Quantifying Pain in Laying Hens, Welfare Footprint Institute argues that caged housing tends to cause more fractures than cage-free housing. They add the Cumulative Pain from each fracture together linearly, which they acknowledge is a debatable decision.

Where more than one fracture is likely (furnished and conventional cages), the resulting time in pain adds to the time in pain due to previous fractures (depopulation fractures can occur in different bones, representing a source of pain at different anatomical locations). These estimates assume that the aversive experience of one harm is not affected by the other, but future refinements (including synergistic effects) are possible.” (p. 13, emphasis mine).

In a more recent blog post (Schuck-Paim & Alonso, 2023), Welfare Footprint Institute argued that co-occurring pains, especially severe ones, cancel each other out to some degree:

With concurrent pain of similar intensity, the share of attention devoted to the painful experience is likely summated in a sub-additive way. The extent to which summation is sub-additive may, however, depend on pain intensity itself. Because pain of greater intensities demands a greater share of the organism’s attention, remaining attentional resources for the processing of additional pain would become progressively scarcer.

Depending on the welfare threat, it may be that methodological decisions that relate to unresolved theoretical debates introduce far more uncertainty than the idiosyncrasies of individual raters.

We did not make use of all available resources to standardize raters’ evaluations. For example, we could have used the templates in Tables 1 and 2 in Schuck-Paim et al. (2025), which may have helped standardize how raters make evaluations. To our knowledge, these resources did not exist when the Welfare Footprint Institute originally published its books on farmed chickens, so we chose not to use them. However, we would recommend that future researchers use them when creating new estimates.

My instructions may have been confusing or otherwise sub-optimal. Because raters could ask me as many questions as they wanted, we were collectively able to address several cases where my instructions were unclear. However, there may have been additional misunderstandings that nobody noticed or brought up. In other cases, I may have provided instructions that were clear but sub-optimal. For example, I asked raters to assume uniform distributions for duration and prevalence, even though I did not model these parameters using uniform distributions. As it happens, these instructions seemed to make little difference: Raters reported simply choosing the lowest and highest value that seemed plausible. Only one rater said they might have used wider intervals had they assumed normal distributions instead.

Acknowledgments

This report is a project of Rethink Priorities—a think tank dedicated to informing decisions made by high-impact organizations and funders across various cause areas. The author is William McAuliffe. Thanks to Sophie Williamson, Michael St. Jules, Cynthia Schuck-Paim, and Wladimir Alonso for feedback on earlier drafts. Thanks to Hannah McKay for suggesting improvements to the code and Monte Carlo models. Finally, I am particularly grateful to the raters: Alexandra Ulans, Maëva Manet, and one who preferred to remain anonymous.

If you are interested in RP’s work, please visit our research database and subscribe to our newsletter.

References

Brown, J. (2015). Cage-free hen housing: How far will the pendulum swing? https://www.thepoultrysite.com/focus/zoetis/poultry-health-today-issue-5-expert-advice-cagefree-hen-housing-how-far-will-the-pendulum-swing

Jacob, J. (2006). The welfare of the laying hen: The caged versus cage-free debate. https://mndaily.com/199789/opinion/welfare-laying-hen-caged-versus-cage-free-debate/

Levandowsky, M., & Winter, D. (1971). Distance between sets. Nature, 234(5323), 34-35.

McKay, H., & McAuliffe, W. (2024). Quantifying and prioritizing shrimp welfare threats. Rethink Priorities. https://rethinkpriorities.org/research-area/quantifying-and-prioritizing-shrimp-welfare-threats/

Moore, D. A., & Healy, P. J. (2008). The trouble with overconfidence. Psychological Review, 115(2), 502.

O’Hagan, A., Buck, C. E., Daneshkhah, A., Eiser, J. E., Garthwaite, P. H., Jenkinson, D. J., Oakley, J. E. and Rakow, T. (2006) Uncertain judgements: Eliciting expert probabilities. Chichester: Wiley.

Schuck-Paim, C. (2025). The pain echo chamber effect: How cages and barren environments may amplify pain perception in animals. https://www.youtube.com/watch?v=6fYhVkc0m98&t=3232s

Schuck-Paim, C., & Alonso, W. J. (2021). Quantifying Pain in Laying Hens. A Blueprint for the Comparative Analysis of Welfare in Animals. Center for Welfare Metrics.

Schuck-Paim, C., & Alonso, W. J. (2022). Quantifying Pain in Broiler Chickens: Impact of the Better Chicken Commitment and Adoption of Slower-Growing Breeds on Broiler Welfare. Center for Welfare Metrics.

Schuck-Paim, C., & Alonso, W. J. (2023). Simultaneous affective experiences and potential for positive welfare. https://welfarefootprint.org/2023/04/16/attention-positivewelfare/

Schuck-Paim, C., Alonso, W. J., Pereira, P. A., Saraiva, J. L., Cerqueira, M., Chiang, C., & Sneddon, L. U. (2025). Quantifying the welfare impact of air asphyxia in rainbow trout slaughter for policy and practice. Scientific Reports, 15(1), 19850.

Schuck-Paim, C., Alonso, W. J., Verkuijl, C., Hegwood, M., & Hartcher, K. (2025). The Welfare Footprint Framework can help balance animal welfare with other food system priorities. Nature Food, 1-3.

Appendix

Table A1.

Raw data for chronic egg peritonitis

Table A2.

Raw data for fractures during depopulation and transportation

Table A3.

Raw data for fatal electrocution during electric waterbath stunning

Table A4.

Raw data for heat stress in slow-growing broiler chickens

  1. The Welfare Footprint Framework also encompasses an analogous metric for Cumulative Pleasure, based on an ordinal categorization of positively valenced experiences. The focus of this report is only the Cumulative Pain metric.
  2. The number of stages depends in part on how granular an analysis the evaluator is able to provide, given time constraints and the availability of evidence. At a minimum though, the stages should be labeled in ways that make clear which contiguous events are out of scope. In the example above, “analytical boundaries are limited to analyzing the impact of air exposure for the slaughter of rainbow trout. The assessment spans from the moment of emersion to loss of consciousness. Exposure to ice or ice slurry, handling or pre-slaughter practices to which fish are exposed are beyond the scope of this study” (p. 2).
  3. There is no single prescription for how uncertainty should be reported, but the Welfare Footprint Institute typically reports prevalence and duration as 95% uncertainty intervals (i.e., the 2.5th and 97.5th percentiles of a frequency distribution representing beliefs about the true value). They do not directly model uncertainty about severity, but they do allow for spreading proportions more evenly across categories.
  4. Welfare Footprint Institute typically uses a normal distribution to model duration. I have not found a description of the distribution they use to model prevalence, but a beta distribution is a reasonable choice.