Help!  My Segments Are So Sticky!


Back in the day, it used to be popular to refer to certain segments as “sticky” when they appeared to be passed down from generation to generation untouched.  That sort of name-calling has reduced greatly now that we have a clearer understanding of the statistical rules that our chromosomes follow as a result of random recombination.  It turns out that the smaller a segment, the more likely it is to escape the chopping block of recombination in each generation and instead either be passed to the child in full or not at all.  Let’s take a look at some numbers and see how this plays out.

As our starting point, we’re going to go back to our definition of centiMorgan, as explained in my blog from a few weeks ago about the statistical impossibility of two full siblings not sharing any DNA segments.  If you missed that one, that’s OK, here’s the way I like to think about a cM:  A cM is a unit that denotes a span of a chromosome that has exactly 1% likelihood of being split at least once by recombination, within a single generation.  Conversely, a 1 cM span of chromosome has a 99% chance of avoiding recombination per generation.

So from this definition, let’s look at a 7 cM segment on a chromosome in terms of “stickiness.”  There are exactly three possibilities in our inheritance model:  A)  This 7 cM segment is passed intact from parent to child; B) the segment is not passed at all and instead the parent passes genetic material from his or her opposite parent to the child; or C) at least one recombination occurs across this span and the parent passes pieces of both copies of his or her chromosome across this span (say 4 cM from one grandparent and 3 cM from the other).

First, let’s calculate the odds of no recombination occurring across a 7 cM span of chromosome.  We use an “AND” operator and multiply the probabilities of no recombination on each 1 cM span that comprises the 7 cM in question.  So the odds of no recombination across the 7 cM span is restated as the odds of no recombination on the first cM AND no recombination on the second cM AND no recombination on the third cM, etc.  In statistics, independent probabilities linked by an “AND” operator can be simply multiplied.  Therefore, the odds of no recombination across a 7 cM span of chromosome in a single generation = 0.99 * 0.99 * 0.99 * 0.99 * 0.99 * 0.99 * 0.99, which we can express in exponent notation as (0.99)^7, which according to my calculator is about 93%.  The two possibilities (inherited in the entirety and not inherited at all) must occupy this 93% equally (47% each, rounding to the nearest whole percent), whereas the odds of the segment getting the chop-chop is only 7%.  Therefore, we can say that a 7 cM segment is necessarily “sticky,” with the odds of recombination in a generation (as opposed to acting all sticky) is low.  This is not a property special to your 7 cM segment.  Rather, this applies to all 7 cM segments, whether or not you have matches on them and whether or not endogamy is at play, and with total disregard for the age of this segment (failing to recombine in a hundred years doesn’t make it any more likely to recombine in the next generation).

Now, let’s extend this concept to segments of different lengths in cM using Excel.  Here’s the story from 7 cM all the way up to 40 cM:

cM
Odds of no recombination in a generation
Odds of at least one recombination across segment in a generation
Odds of inheriting entire segment
Odds of not inheriting segment at all
7
93%
7%
47%
47%
8
92%
8%
46%
46%
9
91%
9%
46%
46%
10
90%
10%
45%
45%
11
90%
10%
45%
45%
12
89%
11%
44%
44%
13
88%
12%
44%
44%
14
87%
13%
43%
43%
15
86%
14%
43%
43%
16
85%
15%
43%
43%
17
84%
16%
42%
42%
18
83%
17%
42%
42%
19
83%
17%
41%
41%
20
82%
18%
41%
41%
21
81%
19%
40%
40%
22
80%
20%
40%
40%
23
79%
21%
40%
40%
24
79%
21%
39%
39%
25
78%
22%
39%
39%
26
77%
23%
39%
39%
27
76%
24%
38%
38%
28
75%
25%
38%
38%
29
75%
25%
37%
37%
30
74%
26%
37%
37%
31
73%
27%
37%
37%
32
72%
28%
36%
36%
33
72%
28%
36%
36%
34
71%
29%
36%
36%
35
70%
30%
35%
35%
36
70%
30%
35%
35%
37
69%
31%
34%
34%
38
68%
32%
34%
34%
39
68%
32%
34%
34%
40
67%
33%
33%
33%

Can you guess why I stopped at 40 cM?  It’s because that’s where a segment will exhibit equal probabilities of each of the 3 described scenarios, which I would consider to be not very sticky.

But there’s more to this story.  Surely, “stickiness” relates somehow to the expected age of a segment.  That is, let’s ask the question “How old is my sticky little 7 cM segment likely to be?”  In this calculation, we start with the premise that it’s inherited in its entirety from one parent and that it’s a valid segment of genetic material from just one copy of that parent’s chromosome.  We’ve already calculated that probability at 93%, but let’s switch our rounding to tenths of a percent for a bit more accuracy.  Our 7 cM segment has a 93.2% chance of being inherited without recombination in a generation.  How about 2 generations?  Well, it has to be passed down in one generation AND another, so our probability of a segment being at least two generations old is 0.932^2 = 86.9%.  If we continue with this drill, we will find the odds are still at 70.3% that the segment is at least 5 generations.  Let’s switch from generations to years, since as genealogists we really care about whether our matches are related to us in a historical timeframe when there are records with which we can build our trees.  Let’s assume that the average generation span is 25 years, and accordingly, 400 years ago (around the beginning of the genealogical era) takes us back 16 generations.  So what are the odds that our 7 cM segment is older than 400 years?  The answer is going to shock some people who insist that autosomal matches only go back 400 years.  In fact, the odds of a 7 cM segment on my genome of exceeding 400 years in age is 30.2 %.  That is to say that for any given 7 cM segment, there’s about a 70% chance that all of your matches on that segment have a common ancestor who was born less than 400 years ago, but over 30% of the age of a segment this size is going to be further back.

Well, how far back?  I’ve taken it upon myself to carry out some advanced statistical calculations with which I won’t bore you, but I’ll give you some interesting figures, and I’ll show you a few colorful charts that might change the way you think about your DNA matches.  First, let’s talk quartiles.  For a 7 cM segment, we can divide the age ranges of our segment into statistical quartiles representing an even 25% probability that the age of our segment is within 4 ranges.  These quartiles are as follows:  0-100 years, 100-250 years, 250 to 500 years, and >500 years.  Based on this, our expectation value is 250 years, but it’s just as likely that the segment is over 500 years old as it is that the segment is say 100 to 250 years old.  Let’s keep it going and talk 100 year ranges instead of quartiles.  I made a pie chart to show you the probability that a 7 cM segment dates back to some different time-frames:


Yes, you’re reading this correctly.  There’s a 6% chance that our 7 cM segment has been passed down untouched for over 1000 years!  Now that’s what I call a sticky segment.  Eew!  So what, it’s just 6%, right?  Well, we have over 7000 cM of real estate on our chromosomes and that’s not even counting the X (we’ll get to that in a bit).  So, that’s an expected 60 little super-sticky segments where you’re never ever going to find your common ancestor because he/she walked the earth over 1000 years ago.  Now, when you see someone say from a traditionally endogamous population, and they have like a zillion matches on a lot of their segments, I want you to understand this.  Their common ancestor from which they all inherited their hyper-sticky segments will often have lived over 1000 years ago and may be an ancestor of a large swath of the population (and many times over due to intermarriage among even distantly related descendants over time).  That doesn’t make a segment like this any less real though, just less useful for genealogy in terms of finding the most recent common ancestor you share with other matches thereon.

Next, let’s move on to 20 cM.  Why 20 cM?  Because that’s where Ancestry sets the threshold for shared matches.  Here, the picture is very different, with quartiles (rounding to the nearest generation) being 0-25 years, 25-75 years, and 75-175 years, and >175 years.  It’s no surprise that Ancestry considers this to be the fourth cousin boundary, since the average fourth cousin shares an ancestor born about 125 years prior, and since a 20 cM single-segment match is likely to be related at fourth cousin level or closer about 75.5% of the time!  Here’s that same pie chart for a 20 cM segment:


Awesome.  There’s now a 96% chance that our common ancestor for a shared segment of 20 cM lived within the past 400 years.  Note, however, there is still a 1.8% chance that a 20 cM segment is over 500 years old!

Finally, the million dollar question that everybody’s asking.  What about that crazy X chromosome?  I heard those segments are ancient if they’re not at least 15 cM.  Well, maybe so, but first let's talk about whether they’re even real segments (IBD).  One problem with the X chromosome is that some parts are poorly sampled, with very low SNP counts per cM.  I personally recommend that a segment have at least 75 tested SNPs per cM (and a minimum of 7 cM) before you can rely on it being IBD.  This just ain’t happening on some parts of the X due to low sampling rate (SNP per cM).  But let’s assume we’ve got a nice segment with some good SNP density, but it’s not too long.  Let’s take everyone’s favorite 15 cM threshold and see what kind of stats we get in the context of segment dating.

First, we need another tool in our arsenal, and I’m going to call it “effective generation span.”  While a generation span in real life might be 25 years or so, the “effective generation span” of an X chromosome is 37.5 years.  Here’s why.  Let’s talk about the last time an X chromosome segment had any chance of recombining.  That would be when a female ancestor had it.  Male’s X chromosomes aren’t recombined when passed to their daughters because they only have one X and therefore nothing with which to combine.  I’m ignoring PAR (pseudo-autosomal regions) because they’re puny and practically useless for genealogy.  So, the last chance an X segment had to recombine was in a donor’s mother, or in a donor’s father’s mother.  Whether a segment was inherited from either of those two ancestors we’ll assume is equally likely for purposes of our discussion, and I’ll assert that this is a reasonable assumption.  So, from the last opportunity for recombination, there’s a 50% chance of one generation (25 years) and a 50% chance of two generations (50 years).  To calculate our effective generation span for the X, we simply take the weighted average:  (0.5 * 25) + (0.5 * 50) = 37.5 years.  Then, we can use the same methodology as we did on the autosomal chromosomes to calculate the age ranges of our favorite 15 cM segment on the X.  Turns out there’s about an 80% chance that such a match shares a MRCA within 400 years, matching the common wisdom in our community that 15 cM is a nice place to start examining our X matches (given that our comparison to the other donor includes at least 15 * 75 = 1125 SNPs).  Here's the same pie chart for date ranges for a 15 cM X segment:


That’s all for this week.  Despite the liberties I’ve taken using the word “sticky,” as you now know, there’s nothing inherently sticky about one segment vs. the other, but rather segments only appear to “stick” because of their length in cM.  Any apparent stickiness is simply a direct result of the statistical nature of DNA inheritance, and the phenomenon applies across the board to all small segments.

If you’ve enjoyed this post, I encourage you to check out my website www.borlandgenetics.com where I’m accepting uploads to an autosomal database that focuses on making simple and powerful (and for the most part free) DNA reconstruction tools accessible to the average genetic genealogist.

Comments

  1. Very interesting! Thanks for this. And while I tell people (other eastern Polynesians like myself) who are predicted 2nd - 3rd cousin matches to me or my relatives to look at the largest segment of at least 30cM in order to determine a true 2nd cousin relationship, this chart makes sense except for the 20cM.

    Ancestry's shared matches are based on TOTAL shared. While we can have a good 20cM total shared, the number of segments can be as much as 5 segments (just looking at my own). So if say there are 5 segments, make it 3 segments (I had a lot of 3 and 2 segments), that's about 6.6cM.

    Would love to see you work with my data! ;) Thanks again for this though, definitely enlightening!

    ReplyDelete
  2. Excellent, Kevin. Thanks for explaining this. I still don't like the term "sticky" since there's nothing to prevent recombination from selecting the other chromosomal segment in the same location (as indicated in the column about odds of not passing on the segment at all). The probabilities you've given for various shared amounts of DNA and segment longevity is very helpful. It seems I routinely get confronted by outliers. Recently on behalf of someone looking for his great grandfather's father, we approached the grandfather of a match who shares 118 cM. For the number of generations, this indicated to us that this particular line was the correct one. But the the grandfather shared only 121 cM.

    ReplyDelete
  3. This is incredibly in-depth, but you've done a great job of explaining it. Thank you for this!

    ReplyDelete
  4. Great post, Kevin! Thank you for making this readable and understandable and for adding the graphics! This was a very helpful post.

    ReplyDelete
  5. Although the calculations in this piece are mathematically corect, I think they are conceptually wrong for genetic genealogy. The probabilities calculated are in a forward direction, answering the question "If two people share an ancestor n generations ago what is the probability that they share a segment of x cM from that ancestor?" Generally that is not the question we are interested in, we know for certain that two people share a segment of x cM and we want to know how long ago the ancestor was. This is the fundamental difference between this approach and the Speed and Balding approach summarised here https://isogg.org/wiki/Identical_by_descent. An analogous question, with known relationship and question about inheritance, would be If A and B are siblings, what is the probability they have the same colour eyes? (Answer: reasonably high). The converse is, a question with known genetics and unknown relationship: If A and B both have blue eyes how probable is it that they are siblings? (Answer: quite low.)

    ReplyDelete
    Replies
    1. I believe I saw the context within which this post was inspired. The question was whether a shared segment could have come from a shared 9th great grandparent -- Is this even plausible? It appears that this post is an answer to that question. And the answer, as I interpreted it, is that we can expect to have an abundance of IBD DNA connections from distant shared ancestral couples, particularly with matches in the 7-20 cM range. Unfortunately, many people appear to be under the impression that the absolute limit is approximately 5 generations.

      The question of whether a specific segment can be assigned to a specific ancestral connection is a much more difficult one but this post doesn't appear to be an attempt to tackle that issue.

      Delete
  6. I'd calculate the “effective generation span” of an X chromosome to be 25/3 (male X) + 25/3 (female X from mother) + 50/3 (female X from father) ⁼ 33.3 years.

    ReplyDelete
  7. Would you grant permission to quote you and a chart to a Family Genealogy Group? Great blog post!

    ReplyDelete
  8. This rather connects with my own experience. My wife is Spanish and she has two Arab matches
    via her autosomal results, plus a Jewish one. And both Jews and Arabs were finally evicted from Spain over four hundred years ago. My mother from Northern Ireland has a match with an Icelandic lady and the latter only has Icelandic connections in her family tree.

    ReplyDelete

Post a Comment

Popular posts from this blog