What works in particle physics does not necessarily work for everything else.
It doesn't really work in particle physics. It's just that the inherent problems with significant testing (plus some additional ones) were more readily apparent in HEP and as an unfortunate consequence the experimentalists and consulting statisticians changed little of what they had already borrowed from the social and behavioral sciences (i.e., p-values and the atheoretical, horribly mangled mixture of Fisherian statistics and contradictory as well as incompatible notions from Neyman-Pearson hypothesis testing). In the main, they did the minimal required to combat the issues now facing standard statistical methods applied to so-called Big Data (e.g., the issues associated with large
n), the look-elsewhere effect and the related issue of multiple testing. They simply lowered the significance level. Thankfully, because almost the entirety of experimentalist methods and findings in HEP are rooted in QFT and the standard model (the latter or some variant often serving as a null), experimentalists mostly have to figure out not
what to discover or even
how, but ensure that
when they inevitably find what is predicted they can do so without egg on a whole lot of faces. What is lost, unfortunately, is a great deal of time and money as well as the possibility of new discoveries as a result of particle physics adopting bad methods from other sciences that were criticized even there by their very founders.
And for some distributions being considered if you used a .0005 cutoff you'd start racking up type two errors instead of type one errors
This is more of the unfortunate confused approach that has been part of standard indoctrination for decades now, despite combining the Fisherian significance cutoff approach with the conceptually, formally/mathematically, and philosophically incompatible Neyman-Pearson hypothesis testing approach. The type I and II errors are only statistically meaningful in the pre-data N-P approach in which the parameter space in question is partitioned in to mutually exclusive regions so that one could apply methods in order to optimize the probability of making a decision error (which is why Wald later reformulated their approach into decision theory more formally). The trade-off makes sense from the pre-data perspective, because one can minimize the probability of long-term procedural errors in the frequentist interpretation inherent to the N-P approach. It doesn't make sense in terms of a Fisherian p-value interpreted as the probability of obtaining the data one has (post-data perspective), under the assumption of a null and no alternative.
There are methods for trying to find an optimal alpha level if you must use one, but really you don't need a significance level at all. You can simply interpret the p-value for what it is, a conditional probability.
This is completely false and absolutely, horribly mistaken. It is a common but no less destructive misconception. The p-value is ABSOLUTELY
NOT a conditional probability. This is why it is important to avoid defining it (in words) using something like "the probability of obtaining data X,
given that the null is true. The "given" suggests conditional probability, but a p-value is
absolutely NOT a random variable, and thus it is meaningless to speak of it in terms of a conditional probability p(data|null hypothesis is true) despite unfortunately frequent abuses of notation. For Bayesians, p-values are random variables, but by adopting this approach one has to involve priors and compute posteriors and the entire approach is conceptually, philosophically, computationally, and mathematically quite different.