• Welcome to Religious Forums, a friendly forum to discuss all religions in a friendly surrounding.

    Your voice is missing! You will need to register to get access to the following site features:
    • Reply to discussions and create your own threads.
    • Our modern chat room. No add-ons or extensions required, just login and start chatting!
    • Access to private conversations with other members.

    We hope to see you as a part of our community soon!

How much Information does DNA have?

Polymath257

Think & Care
Staff member
Premium Member
So, most of us know that the sequences in DNA are base pairs with one of four possibilities in each spot. Naively, this would mean that there are 2 bits per base pair. Multiply that by the number of base pairs and you get the total amount of information in the DNA of an individual.

But, things are not quite that simple. We can regard DNA as the 'data stream' and analyze to find the amount of Shannon information in that data. Because of repetitions and similarities between stretches of DNA, the actual amount of Shannon information encoded is much smaller than the naive count above. In essence, when asking about the Shannon information, we are asking for the smallest number of bits in a lossless compression of the data.

We know that data compression factors in the hundreds are possible and I have seen reports of up to 1200. But my guess is that even these are far from optimal.

Is there any actual information on the total Shannon entropy of a human genome? How does this information compare to that for other species? Again, a simple comparison of length of DNA strands is not enough to answer this. Some analysis of compression has to be done.

It is known that the DNA length in some yeast is larger than that of humans. But how does the Shannon entropy of the two compare? Is anything known?
 

Sunstone

De Diablo Del Fora
Premium Member
I'm giving the OP a "winner" rating for its admirable attempt to elevate the general tone of conversation on this Forum. In an age when a little googling allows nearly any but the laziest of us to overcome technical terms, we could use with a few more threads around here that are not dumbed down.
 

BSM1

What? Me worry?
So, most of us know that the sequences in DNA are base pairs with one of four possibilities in each spot. Naively, this would mean that there are 2 bits per base pair. Multiply that by the number of base pairs and you get the total amount of information in the DNA of an individual.

But, things are not quite that simple. We can regard DNA as the 'data stream' and analyze to find the amount of Shannon information in that data. Because of repetitions and similarities between stretches of DNA, the actual amount of Shannon information encoded is much smaller than the naive count above. In essence, when asking about the Shannon information, we are asking for the smallest number of bits in a lossless compression of the data.

We know that data compression factors in the hundreds are possible and I have seen reports of up to 1200. But my guess is that even these are far from optimal.

Is there any actual information on the total Shannon entropy of a human genome? How does this information compare to that for other species? Again, a simple comparison of length of DNA strands is not enough to answer this. Some analysis of compression has to be done.

It is known that the DNA length in some yeast is larger than that of humans. But how does the Shannon entropy of the two compare? Is anything known?

Not only has my DNA have information, but it also claims to have photos and videos. I paid what was asked because I couldn't afford the exposure...just sayin'.
 

Cacotopia

Let's go full Trottle
Theoretically it is possible to store all of humanity's information made in a year in 4 grams of DNA
 

exchemist

Veteran Member
So, most of us know that the sequences in DNA are base pairs with one of four possibilities in each spot. Naively, this would mean that there are 2 bits per base pair. Multiply that by the number of base pairs and you get the total amount of information in the DNA of an individual.

But, things are not quite that simple. We can regard DNA as the 'data stream' and analyze to find the amount of Shannon information in that data. Because of repetitions and similarities between stretches of DNA, the actual amount of Shannon information encoded is much smaller than the naive count above. In essence, when asking about the Shannon information, we are asking for the smallest number of bits in a lossless compression of the data.

We know that data compression factors in the hundreds are possible and I have seen reports of up to 1200. But my guess is that even these are far from optimal.

Is there any actual information on the total Shannon entropy of a human genome? How does this information compare to that for other species? Again, a simple comparison of length of DNA strands is not enough to answer this. Some analysis of compression has to be done.

It is known that the DNA length in some yeast is larger than that of humans. But how does the Shannon entropy of the two compare? Is anything known?
Thanks for raising this. I'll need to leave this one to others, as information theory is not my speciality, but I'll be very interested to read any informed contributions.

One potential caveat I offer is that, according to my understanding, one cannot equate Shannon entropy with thermodynamic entropy.
 

Jumi

Well-Known Member
I suspect that DNA like QM is often used to debate points of view that they don't really apply to. Though I think comparing DNA to computer code is erroneous, one thing is certain, human or nature made is not error free. Neither is more complex or requiring more space, better...
 

Polymath257

Think & Care
Staff member
Premium Member
Thanks for raising this. I'll need to leave this one to others, as information theory is not my speciality, but I'll be very interested to read any informed contributions.

One potential caveat I offer is that, according to my understanding, one cannot equate Shannon entropy with thermodynamic entropy.

They are related, with thermodynamic entropy a special case, but I'm definitely NOT asking about thermodynamic entropy here. I am curious about the Shannon entropy of the sequence of base pairs.
 

Polymath257

Think & Care
Staff member
Premium Member
I'm giving the OP a "winner" rating for its admirable attempt to elevate the general tone of conversation on this Forum. In an age when a little googling allows nearly any but the laziest of us to overcome technical terms, we could use with a few more threads around here that are not dumbed down.

While it is fairly easy to find information about the data size of the human genome (7-800 MB), it seems to be much harder to find out the actual information content. One issue is that some places look at *comparative* information, where you are given a template genome and compare others to it. Since the difference between human genomes is fairly small, this leads to a very small Shannon information. Also, there is stuff on the internet that deals with how much information *we* can store on DNA. That is another matter entirely from how much *is* stored.

I've yet to see a convincing answer to my question. But perhaps I'm lazy. :D
 

Daemon Sophic

Avatar in flux
So, most of us know that the sequences in DNA are base pairs with one of four possibilities in each spot. Naively, this would mean that there are 2 bits per base pair. Multiply that by the number of base pairs and you get the total amount of information in the DNA of an individual.

But, things are not quite that simple. We can regard DNA as the 'data stream' and analyze to find the amount of Shannon information in that data. Because of repetitions and similarities between stretches of DNA, the actual amount of Shannon information encoded is much smaller than the naive count above. In essence, when asking about the Shannon information, we are asking for the smallest number of bits in a lossless compression of the data.

We know that data compression factors in the hundreds are possible and I have seen reports of up to 1200. But my guess is that even these are far from optimal.

Is there any actual information on the total Shannon entropy of a human genome? How does this information compare to that for other species? Again, a simple comparison of length of DNA strands is not enough to answer this. Some analysis of compression has to be done.

It is known that the DNA length in some yeast is larger than that of humans. But how does the Shannon entropy of the two compare? Is anything known?
Interesting topic, but perhaps the wrong venue in which to raise it?
That goes beyond my comfort zone for casual discussion, but I do have 2 brothers who are into (PhD’s) bio-mathematics.
o_OActually......Joe? Matt? is that you?
 

Revoltingest

Pragmatic Libertarian
Premium Member
So, most of us know that the sequences in DNA are base pairs with one of four possibilities in each spot. Naively, this would mean that there are 2 bits per base pair. Multiply that by the number of base pairs and you get the total amount of information in the DNA of an individual.
Multiplication seems the wrong (underestimating) operation here.
I say factorial would apply to some extent because order matters.
 

Thermos aquaticus

Well-Known Member
Others have defined genetic information differently than you have.


How do genetic systems gain information by evolutionary processes? Answering this question precisely requires a robust, quantitative measure of information. Fortunately, 50 years ago Claude Shannon defined information as a decrease in the uncertainty of a receiver. For molecular systems, uncertainty is closely related to entropy and hence has clear connections to the Second Law of Thermodynamics. These aspects of information theory have allowed the development of a straightforward and practical method of measuring information in genetic control systems. Here this method is used to observe information gain in the binding sites for an artificial ‘protein’ in a computer simulation of evolution. The simulation begins with zero information and, as in naturally occurring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome. The transition is rapid, demonstrating that information gain can occur by punctuated equilibrium.
Schneider TD. Evolution of biological information. Nucleic Acids Research. 2000;28(14):2794-2799.

In this definition the sender and receiver are the protein and the DNA sequence it binds to. An increase in information would then be increased binding between the two. I don't know how well that meshes with the rest of information theory, but it is certainly one way that the question has been approached.
 
Top