Another aspect here is the very definition of the W in your formula. it represents the number of microstates 'equivalent' to the given macrostate. But how is a macrostate defined? We identify certain variables (temperature, volume, total energy, molar composition) as relevant for the macrostate. But, for example, if we cannot distinguish two different substances, the corresponding molar compositions are different, making the entropy different. And, again, there is a sense that this measures the loss of information in going from the microscopic states (where we get W=1 for each microstate) to the macrostate where we only use macroscopic variables to describe the system.
So, why should the number of equivalent microstates for a system increase for one direction of time consistently across space?
OK, gardening finished!
I can see you have been thinking a lot harder about all this than I have
. I suspect that, as an academic (as I see you are from your profile), you may have better access to real Stat. TD experts than I have, relying, as I have to, on forty year old recollections and textbooks. But I shall manfully try to understand what you are driving at.
Just to check what assumptions we hold in common, I presume that the mathematics of probability is assumed to hold across the universe. So for example if we have a system that can
either be in a state that can be realised in 2 ways,
or one that can be realised in 4 ways, the latter is the more probable, other things being equal.
If that is a given then, very simplistically, it seems to me there will be an inherent tendency over time towards those states that can be realised in more ways, e.g. two gases diffusing into one another. I cannot see how this could ever not be so, if mathematical logic applies everywhere. In a hand-wavy way, that is my mental picture of why entropy can be said to be the arrowhead of time.
In the point about the effect of the composition of a substance affecting the entropy, let me retrace why (excuse me if I go slowly here - it's been quite a while). It would be, I think, because the atomic masses, bonding interactions etc will affect the
partition function, Q, which represents the degree to which the atoms or molecules are able to escape the ground state and populate excited states. And we have the result that S=k lnQ + U/T. Does that sound right?
Where my grasp of what you are saying gets more shaky, I'm afraid, is in the point you want to make about a
loss of information in "going from" the microstates to a macrostate. I understand we don't generally keep tabs on the continual flux among the microstates, all exchanging energy with one another in a bulk samples of a substance, and that knowledge of the bulk properties doesn't tell us about all that. Are you saying that if we
did have all that information, the information entropy would be a lot lower, zero even?