Recently, *Nature* published my correspondence “Dispense with redundant P values”. It highlights my concern that p-values are often calculated because “everybody does it”. This reminded me of the mechanical repetition that parrots are well-known for (footnote 1). Parroting of p-value reporting should stop and I suggest to only present a p-value in a figure if it is necessary for interpretation.

During the editing process of my contribution a specific example of a redundant p-value was removed. The reasoning was that it seemed unfair to single out only one paper. I agreed and I would like to stress that parroting of p-value reporting is not restricted to a specific paper, a specific issue of *Nature* or to some specific journal. It’s just that I found it very ironic that in the same issue of *Nature* that proposes “Five ways to fix statistics” (Leek et al., 2017) there are several clear examples of figures (in different papers) with meaningless p-values.

Since I am convinced that an example will clarify my point, I have extracted the data (see footnote 2) from one of the papers in the aforementioned issue (without disclosing the nature of the paper). I performed a t-test (two-tailed, unequal variances) as described in the paper and reproduced the p-value (footnote 3). The resulting figure is shown below on the left and closely mimics the figure of the original paper.

Clearly, there is a large difference between the ’Control’ and ‘Treated’ condition, reflecting a large effect of the treatment. To reach that conclusion, there is no need for a p-value. Moreover, the p-value does not convey any relevant information for interpretation of the figure. As such, the p-value qualifies as chartjunk (E.R. Tufte, 1983) and should be omitted. This will generate a cleaner figure (shown above on the right) that emphasizes the data.

The problems with p-values are larger than their meaningless use in figures whenever the effects are large. The way that p-values are defined has some confusing backward logic (footnote 4). Consequently, p-values are often misinterpreted (Greenland et al., 2016, Lakens, 2017) or misused as a ‘measure of credibility’ (Goodman, 2016). The misconception that p-values represent the strength of evidence is reinforced by catogerizing p-values, e.g. by using increasing number of asterisks (e.g. * P < 0.05; ** P < 0.01; *** P < 0.001). P-values cannot be used as a rating system (Wasserstein and Lazar, 2016) and categories should be avoided at all times.

To avoid the unnecessary, and at times misleading, use of p-values, the mechanical repetition of current practices should stop. Whether p-values are important for the interpretation of the figure should be a central question. Before that question can be answered, the correct definition of a p-value needs to be thoroughly understood. In addition, the correct interpretation and common misconceptions (Greenland et al., 2016) of p-values should be considered. I hope that careful reflection on the meaning of p-values will decrease their use and improve figures.

**Acknowledgments**: I am indebted to Marten Postma for the many discussions about statistical concepts and applied statistics, that have increased my understanding of the topic.

**Footnotes**

__Footnote 1__: The original title of my correspondence was “Prevent p-value parroting”. This title was changed to “Dispense with redundant P values” by *Nature* after I returned the proofs and without consulting me.

__Footnote 2__: The raw data that I extracted is listed below in csv format:

```
Condition,Value
Control,305
Control,318
Control,355
Control,364
Treated,160
Treated,125
Treated,120
Treated,127
```

__Footnote 3__: This is an example of yet another questionable practice, i.e. calculating a p-value for a dataset with only a couple of datapoints per condition. Ironically, several examples can be found in the aforementioned issue while this matter has also been addressed previously by David Vaux (2012) in *Nature* (and by many others as well).

__Footnote 4__: The p-value is the probability of the observed data (or more extreme values), assuming that the null-hypothesis (there is zero difference between the two conditions) is true.