Help! My Segments Are So Sticky!
Back in the day, it used to be popular to refer to certain
segments as “sticky” when they appeared to be passed down from generation to
generation untouched. That sort of
name-calling has reduced greatly now that we have a clearer understanding of
the statistical rules that our chromosomes follow as a result of random
recombination. It turns out that the
smaller a segment, the more likely it is to escape the chopping block of
recombination in each generation and instead either be passed to the child in
full or not at all. Let’s take a look at
some numbers and see how this plays out.
As our starting point, we’re going to go back to our
definition of centiMorgan, as explained in my blog from a few weeks ago about
the statistical impossibility of two full siblings not sharing any DNA
segments. If you missed that one, that’s
OK, here’s the way I like to think about a cM:
A cM is a unit that denotes a span of a chromosome that has exactly 1% likelihood
of being split at least once by recombination, within a single generation. Conversely, a 1 cM span of chromosome has a
99% chance of avoiding recombination per generation.
So from this definition, let’s look at a 7 cM segment on a chromosome
in terms of “stickiness.” There are
exactly three possibilities in our inheritance model: A)
This 7 cM segment is passed intact from parent to child; B) the segment
is not passed at all and instead the parent passes genetic material from his or
her opposite parent to the child; or C) at least one recombination
occurs across this span and the parent passes pieces of both copies of his or
her chromosome across this span (say 4 cM from one grandparent and 3 cM from the
other).
First, let’s calculate the odds of no recombination occurring
across a 7 cM span of chromosome. We use
an “AND” operator and multiply the probabilities of no recombination on each 1
cM span that comprises the 7 cM in question.
So the odds of no recombination across the 7 cM span is restated as the
odds of no recombination on the first cM AND no recombination on the second cM
AND no recombination on the third cM, etc.
In statistics, independent probabilities linked by an “AND” operator can
be simply multiplied. Therefore, the
odds of no recombination across a 7 cM span of chromosome in a single
generation = 0.99 * 0.99 * 0.99 * 0.99 * 0.99 * 0.99 * 0.99, which we can express in
exponent notation as (0.99)^7, which according to my calculator is about
93%. The two possibilities (inherited in
the entirety and not inherited at all) must occupy this 93% equally (47% each,
rounding to the nearest whole percent), whereas the odds of the segment getting
the chop-chop is only 7%. Therefore, we
can say that a 7 cM segment is necessarily “sticky,” with the odds of recombination
in a generation (as opposed to acting all sticky) is low. This is not a property special to your 7 cM
segment. Rather, this applies to all 7
cM segments, whether or not you have matches on them and whether or not
endogamy is at play, and with total disregard for the age of this segment
(failing to recombine in a hundred years doesn’t make it any more likely to recombine
in the next generation).
Now, let’s extend this concept to segments of different lengths
in cM using Excel. Here’s the story from
7 cM all the way up to 40 cM:
cM
|
Odds of no recombination in a generation
|
Odds of at least one recombination across segment in a
generation
|
Odds of inheriting entire segment
|
Odds of not inheriting segment at all
|
7
|
93%
|
7%
|
47%
|
47%
|
8
|
92%
|
8%
|
46%
|
46%
|
9
|
91%
|
9%
|
46%
|
46%
|
10
|
90%
|
10%
|
45%
|
45%
|
11
|
90%
|
10%
|
45%
|
45%
|
12
|
89%
|
11%
|
44%
|
44%
|
13
|
88%
|
12%
|
44%
|
44%
|
14
|
87%
|
13%
|
43%
|
43%
|
15
|
86%
|
14%
|
43%
|
43%
|
16
|
85%
|
15%
|
43%
|
43%
|
17
|
84%
|
16%
|
42%
|
42%
|
18
|
83%
|
17%
|
42%
|
42%
|
19
|
83%
|
17%
|
41%
|
41%
|
20
|
82%
|
18%
|
41%
|
41%
|
21
|
81%
|
19%
|
40%
|
40%
|
22
|
80%
|
20%
|
40%
|
40%
|
23
|
79%
|
21%
|
40%
|
40%
|
24
|
79%
|
21%
|
39%
|
39%
|
25
|
78%
|
22%
|
39%
|
39%
|
26
|
77%
|
23%
|
39%
|
39%
|
27
|
76%
|
24%
|
38%
|
38%
|
28
|
75%
|
25%
|
38%
|
38%
|
29
|
75%
|
25%
|
37%
|
37%
|
30
|
74%
|
26%
|
37%
|
37%
|
31
|
73%
|
27%
|
37%
|
37%
|
32
|
72%
|
28%
|
36%
|
36%
|
33
|
72%
|
28%
|
36%
|
36%
|
34
|
71%
|
29%
|
36%
|
36%
|
35
|
70%
|
30%
|
35%
|
35%
|
36
|
70%
|
30%
|
35%
|
35%
|
37
|
69%
|
31%
|
34%
|
34%
|
38
|
68%
|
32%
|
34%
|
34%
|
39
|
68%
|
32%
|
34%
|
34%
|
40
|
67%
|
33%
|
33%
|
33%
|
Can you guess why I stopped at 40 cM? It’s because that’s where a segment will
exhibit equal probabilities of each of the 3 described scenarios, which I would
consider to be not very sticky.
But there’s more to this story. Surely, “stickiness” relates somehow to the
expected age of a segment. That is, let’s
ask the question “How old is my sticky little 7 cM segment likely to be?” In this calculation, we start with the premise
that it’s inherited in its entirety from one parent and that it’s a valid
segment of genetic material from just one copy of that parent’s chromosome. We’ve already calculated that probability at 93%,
but let’s switch our rounding to tenths of a percent for a bit more accuracy. Our 7 cM segment has a 93.2% chance of being
inherited without recombination in a generation. How about 2 generations? Well, it has to be passed down in one
generation AND another, so our probability of a segment being at least two
generations old is 0.932^2 = 86.9%. If
we continue with this drill, we will find the odds are still at 70.3% that the
segment is at least 5 generations. Let’s
switch from generations to years, since as genealogists we really care about
whether our matches are related to us in a historical timeframe when there are
records with which we can build our trees.
Let’s assume that the average generation span is 25 years, and
accordingly, 400 years ago (around the beginning of the genealogical era) takes
us back 16 generations. So what are the
odds that our 7 cM segment is older than 400 years? The answer is going to shock some people who
insist that autosomal matches only go back 400 years. In fact, the odds of a 7 cM segment on my genome
of exceeding 400 years in age is 30.2 %.
That is to say that for any given 7 cM segment, there’s about a 70%
chance that all of your matches on that segment have a common ancestor who was
born less than 400 years ago, but over 30% of the age of a segment this size
is going to be further back.
Well, how far back? I’ve
taken it upon myself to carry out some advanced statistical calculations with
which I won’t bore you, but I’ll give you some interesting figures, and I’ll show
you a few colorful charts that might change the way you think about your DNA
matches. First, let’s talk
quartiles. For a 7 cM segment, we can
divide the age ranges of our segment into statistical quartiles representing an
even 25% probability that the age of our segment is within 4 ranges. These quartiles are as follows: 0-100 years, 100-250 years, 250 to 500 years,
and >500 years. Based on this, our
expectation value is 250 years, but it’s just as likely that the segment is
over 500 years old as it is that the segment is say 100 to 250 years old. Let’s keep it going and talk 100 year ranges
instead of quartiles. I made a pie chart
to show you the probability that a 7 cM segment dates back to some different time-frames:
Yes, you’re reading this correctly. There’s a 6% chance that our 7 cM segment has
been passed down untouched for over 1000 years!
Now that’s what I call a sticky segment.
Eew! So what, it’s just 6%, right? Well, we have over 7000 cM of real estate on
our chromosomes and that’s not even counting the X (we’ll get to that in a
bit). So, that’s an expected 60 little
super-sticky segments where you’re never ever going to find your common ancestor
because he/she walked the earth over 1000 years ago. Now, when you see someone say from a
traditionally endogamous population, and they have like a zillion matches on a
lot of their segments, I want you to understand this. Their common ancestor from which they all
inherited their hyper-sticky segments will often have lived over 1000 years ago
and may be an ancestor of a large swath of the population (and many times over
due to intermarriage among even distantly related descendants over time). That doesn’t make a segment like this any
less real though, just less useful for genealogy in terms of finding the most
recent common ancestor you share with other matches thereon.
Next, let’s move on to 20 cM. Why 20 cM?
Because that’s where Ancestry sets the threshold for shared
matches. Here, the picture is very
different, with quartiles (rounding to the nearest generation) being 0-25 years,
25-75 years, and 75-175 years, and >175 years. It’s no surprise that Ancestry considers this
to be the fourth cousin boundary, since the average fourth cousin shares an
ancestor born about 125 years prior, and since a 20 cM single-segment match is
likely to be related at fourth cousin level or closer about 75.5% of the time! Here’s that same pie chart for a 20 cM
segment:
Awesome. There’s now
a 96% chance that our common ancestor for a shared segment of 20 cM lived
within the past 400 years. Note,
however, there is still a 1.8% chance that a 20 cM segment is over 500 years old!
Finally, the million dollar question that everybody’s
asking. What about that crazy X
chromosome? I heard those segments are
ancient if they’re not at least 15 cM.
Well, maybe so, but first let's talk about whether they’re even real
segments (IBD). One problem with the X
chromosome is that some parts are poorly sampled, with very low SNP counts per
cM. I personally recommend that a
segment have at least 75 tested SNPs per cM (and a minimum of 7 cM) before you
can rely on it being IBD. This just ain’t
happening on some parts of the X due to low sampling rate (SNP per cM). But let’s assume we’ve got a nice segment
with some good SNP density, but it’s not too long. Let’s take everyone’s favorite 15 cM
threshold and see what kind of stats we get in the context of segment dating.
First, we need another tool in our arsenal, and I’m going to
call it “effective generation span.”
While a generation span in real life might be 25 years or so, the “effective
generation span” of an X chromosome is 37.5 years. Here’s why.
Let’s talk about the last time an X chromosome segment had any chance of
recombining. That would be when a female
ancestor had it. Male’s X chromosomes
aren’t recombined when passed to their daughters because they only have one X
and therefore nothing with which to combine.
I’m ignoring PAR (pseudo-autosomal regions) because they’re puny and
practically useless for genealogy. So,
the last chance an X segment had to recombine was in a donor’s mother, or in a
donor’s father’s mother. Whether a
segment was inherited from either of those two ancestors we’ll assume is
equally likely for purposes of our discussion, and I’ll assert that this is a
reasonable assumption. So, from the last
opportunity for recombination, there’s a 50% chance of one generation (25
years) and a 50% chance of two generations (50 years). To calculate our effective generation span for
the X, we simply take the weighted average:
(0.5 * 25) + (0.5 * 50) = 37.5 years.
Then, we can use the same methodology as we did on the autosomal
chromosomes to calculate the age ranges of our favorite 15 cM segment on the X. Turns out there’s about an 80% chance that
such a match shares a MRCA within 400 years, matching the common wisdom in our
community that 15 cM is a nice place to start examining our X matches (given
that our comparison to the other donor includes at least 15 * 75 = 1125 SNPs). Here's the same pie chart for date ranges for a 15 cM X segment:
That’s all for this week.
Despite the liberties I’ve taken using the word “sticky,” as you now
know, there’s nothing inherently sticky about one segment vs. the other, but
rather segments only appear to “stick” because of their length in cM. Any apparent stickiness is simply a direct
result of the statistical nature of DNA inheritance, and the phenomenon applies
across the board to all small segments.
If you’ve enjoyed this post, I encourage you to check out my
website www.borlandgenetics.com
where I’m accepting uploads to an autosomal database that focuses on making
simple and powerful (and for the most part free) DNA reconstruction tools accessible
to the average genetic genealogist.
Nice read Borland!
ReplyDeleteVery interesting! Thanks for this. And while I tell people (other eastern Polynesians like myself) who are predicted 2nd - 3rd cousin matches to me or my relatives to look at the largest segment of at least 30cM in order to determine a true 2nd cousin relationship, this chart makes sense except for the 20cM.
ReplyDeleteAncestry's shared matches are based on TOTAL shared. While we can have a good 20cM total shared, the number of segments can be as much as 5 segments (just looking at my own). So if say there are 5 segments, make it 3 segments (I had a lot of 3 and 2 segments), that's about 6.6cM.
Would love to see you work with my data! ;) Thanks again for this though, definitely enlightening!
Excellent, Kevin. Thanks for explaining this. I still don't like the term "sticky" since there's nothing to prevent recombination from selecting the other chromosomal segment in the same location (as indicated in the column about odds of not passing on the segment at all). The probabilities you've given for various shared amounts of DNA and segment longevity is very helpful. It seems I routinely get confronted by outliers. Recently on behalf of someone looking for his great grandfather's father, we approached the grandfather of a match who shares 118 cM. For the number of generations, this indicated to us that this particular line was the correct one. But the the grandfather shared only 121 cM.
ReplyDeleteThis is incredibly in-depth, but you've done a great job of explaining it. Thank you for this!
ReplyDeleteGreat post, Kevin! Thank you for making this readable and understandable and for adding the graphics! This was a very helpful post.
ReplyDeleteAlthough the calculations in this piece are mathematically corect, I think they are conceptually wrong for genetic genealogy. The probabilities calculated are in a forward direction, answering the question "If two people share an ancestor n generations ago what is the probability that they share a segment of x cM from that ancestor?" Generally that is not the question we are interested in, we know for certain that two people share a segment of x cM and we want to know how long ago the ancestor was. This is the fundamental difference between this approach and the Speed and Balding approach summarised here https://isogg.org/wiki/Identical_by_descent. An analogous question, with known relationship and question about inheritance, would be If A and B are siblings, what is the probability they have the same colour eyes? (Answer: reasonably high). The converse is, a question with known genetics and unknown relationship: If A and B both have blue eyes how probable is it that they are siblings? (Answer: quite low.)
ReplyDeleteThis comment has been removed by the author.
DeleteI'd calculate the “effective generation span” of an X chromosome to be 25/3 (male X) + 25/3 (female X from mother) + 50/3 (female X from father) ⁼ 33.3 years.
ReplyDeleteSounds correct.
DeleteWould you grant permission to quote you and a chart to a Family Genealogy Group? Great blog post!
ReplyDeleteSure, no problem.
DeleteMuch appreciated.
DeleteThis rather connects with my own experience. My wife is Spanish and she has two Arab matches
ReplyDeletevia her autosomal results, plus a Jewish one. And both Jews and Arabs were finally evicted from Spain over four hundred years ago. My mother from Northern Ireland has a match with an Icelandic lady and the latter only has Icelandic connections in her family tree.
It would be interesting to look at how increased generation span affected the figures. (25yrs is insufficient in west Cornwall women where married at 26+ and your ancestor is on average her middle child five+ years later). Would also be good to reflect on factors that affect recombination, e.g. maternal age and chromosome length. X chromosome in particular often does not recombine. I guess I was disappointed not to see these caveats in the calculations. Though the general point made is useful.
ReplyDeleteIt would be interesting to look at how increased generation span affected the figures. (25yrs is insufficient in west Cornwall women where married at 26+ and your ancestor is on average her middle child five+ years later). Would also be good to reflect on factors that affect recombination, e.g. maternal age and chromosome length. X chromosome in particular often does not recombine. I guess I was disappointed not to see these caveats in the calculations. Though the general point made is useful.
ReplyDeleteIam glad you posted this / I only started doing ny Ancestry in Feb 2021 when I received my dna results since then I've had them labels stapled on me had them spread so must bull that they get me banned off wikitree can you believe that a bunch of old people who claim to be professional genealogist branding 5his on someone behind the back I only found out by stumbling on to their conversation left on a post by the time I saw it it was already to late everything snowballed the amount of incest connebts and you dad's not your dad etc that's just the being of it .. I wish I had 9f seen this then / Ian still waiting for their evidence proving their claims / I've lots spark for it / theirs no point I understood the way you presented it way better than some other ones I've seen so thanks
ReplyDeleteI have a DNA match with someone on Ancestry, where we only have single segment being shared, 106cm, we can dismiss 4 generations of common ancestors, as one family migrated thousands of miles away in 1912, additionally, the daughter of the match has performed a DNA test on Ancestry, and matches 99cm with me, albeit, now in two segments, 92cm and 7cm. So surely this is sticky? My question is, if only 7cm are lost in a generation by such a large sized segment, what figures can be extrapolated from your modelling? Are the segments that are unwilling to be recombined?
ReplyDeleteCould you explain why "a 1 cM span of chromosome has a 99% chance of avoiding recombination per generation"? I don't think I understand the math.
ReplyDeleteThat's just the definition of a centi-Morgan, and why the unit of measurement has the prefix "centi-" in it. A cM span of a chromosome is a statistical unit specifically defined by having a 1/100 chance of a recombination event across it in a generation.
DeleteHave you calculated the odds of segments longer than 40 cM being passed down a given number of generations? That would be of interest to me and the unknown commentator a couple of lines above.
ReplyDeleteI should probably turn the calculation to a tool on the Borland Genetics site if other people are interested in this kind of thing. My next "programming marathon" for the site will begin in July and I'll put it on my list of ideas for new site content. Thanks!
DeleteThis comment has been removed by the author.
DeleteThanks Kevin. That would be great--there's no rush to respond. I tried running some numbers for a 47 cM shared segment (the size my dad shares with someone who I think could be a 4th cousin twice removed). This matches' ancestor did have 15 or 16 kids that may have had offspring. If I am reading the formula right, the chance of inheriting the 47 cM segment intact is 62.5% for each generation distant. So the odds of sharing a large segment are very low, but it's also hard to be confident in which generation the lines connect. I'm reading about a 37% chance that it is at the 4C2R level versus something more distant (adding up the rows below 0.57% until they get close to 0). However, we have other shared segments in the 30-40 cM range with folks who are 5th cousins of this match, so that would seem to push the odds of the closer relationship much higher again. I'm think I'm imagining a single large segment versus of the WATO calculator....
DeleteP(Shared 47cm segment) Steps Relationship
62.50% 1 sibling
39.06% 2 sibling 1R
24.41% 3 1C
15.26% 4 1C1R
9.54% 5 2C
5.96% 6 2C1R
3.73% 7 3C
2.33% 8 3C1R
1.46% 9 4C
0.91% 10 4C1R
0.57% 11 5C
0.36% 12 5C1R
0.22% 13 6C
0.14% 14 6C1R
0.09% 15 7C
0.05% 16 7C1R
0.03% 17 8C
0.02% 18 8C1R
0.01% 19 9C
0.01% 20 9C1R
0.01% 21 10C
0.00% 22 10C1R