So, most of us know that the sequences in DNA are base pairs with one of four possibilities in each spot. Naively, this would mean that there are 2 bits per base pair. Multiply that by the number of base pairs and you get the total amount of information in the DNA of an individual.
But, things are not quite that simple. We can regard DNA as the 'data stream' and analyze to find the amount of Shannon information in that data. Because of repetitions and similarities between stretches of DNA, the actual amount of Shannon information encoded is much smaller than the naive count above. In essence, when asking about the Shannon information, we are asking for the smallest number of bits in a lossless compression of the data.
We know that data compression factors in the hundreds are possible and I have seen reports of up to 1200. But my guess is that even these are far from optimal.
Is there any actual information on the total Shannon entropy of a human genome? How does this information compare to that for other species? Again, a simple comparison of length of DNA strands is not enough to answer this. Some analysis of compression has to be done.
It is known that the DNA length in some yeast is larger than that of humans. But how does the Shannon entropy of the two compare? Is anything known?
But, things are not quite that simple. We can regard DNA as the 'data stream' and analyze to find the amount of Shannon information in that data. Because of repetitions and similarities between stretches of DNA, the actual amount of Shannon information encoded is much smaller than the naive count above. In essence, when asking about the Shannon information, we are asking for the smallest number of bits in a lossless compression of the data.
We know that data compression factors in the hundreds are possible and I have seen reports of up to 1200. But my guess is that even these are far from optimal.
Is there any actual information on the total Shannon entropy of a human genome? How does this information compare to that for other species? Again, a simple comparison of length of DNA strands is not enough to answer this. Some analysis of compression has to be done.
It is known that the DNA length in some yeast is larger than that of humans. But how does the Shannon entropy of the two compare? Is anything known?