Marcia, Marcia, Marcia!
"My big sister Marcia matches me at 3600 cM, and all this time I never knew she was my mother!!! Now I know why she always treated me like a child!" |
When I see a post like this in one of the Facebook Groups (and I actually see it quite frequently), I always take a minute to stop and read the comments, because jumping to this conclusion is usually the result of one of two common genetic genealogy pitfalls, both of which often lead beginner genetic genealogists down a rabbit hole. Furthermore, when the sibling that comes in at around 3600 cM is an older brother, I frequently see comments responding to the post by well-meaning beginners, insisting that the only conclusion is that the brother must have committed some act of incest resulted in the birth of the individual who made the post (the original poster in FB terminology). Needless to say, these types of conclusions can lead to serious family conflicts and permanent damage to family relations.
Yes, sometimes a family hides a teenage pregnancy and the
conclusion is correct, and sadly, sometimes the conclusion regarding incest is
correct. However, nine out of ten times
(probably closer 99 out of 100 times), conclusions like the ones I speak of, are incorrectly reached due to one of two common beginner genetic genealogy pitfalls. I will explain each of them herein.
First (and Most Common) Situation
The first question I ask of the original poster, when I see
a post like this, is always a request for the poster to share the source of the
shared cM information upon which they rely.
Almost always, the answer is 23 & Me. Here’s why.
Like inches or centimetres, the cM (centi-Morgan) is simply
a unit of measurement. It’s a complex
non-linear unit of measurement based on statistical analysis, but nonetheless
at the end of the day, it’s just a unit of measurement. Now, I want you to think about a
situation. It’s Friday, and you just got
your paycheck, and you’ve gone to BestBuy and bought a new 55 inch television
set for the family room. You take it home
and set it up, and then your child, playing with the tape measure says, “Look
daddy, it’s only 48 inches” after measuring the width of the screen. Furious, you pack it back up and haul it back
to BestBuy where a friendly sales associate calms you down after explaining to
you that the size of televisions and monitors are measured along the diagonal,
not by width of the screen (its longest traditional dimension).
Now, I’m going to be that Best Buy associate. It’s not enough that you have a measurement of
3600 cM, just like it wasn’t enough information to simply know the store’s measurement
of 55 inches. Equally important for your
analysis is what dimension or property is being measured. So let’s talk about what is being measured at
23 & Me when they report shared cM, vs. what all the other sites are
measuring when they report shared cM, because it’s not the same. 23 & Me provides measurements of total
shared DNA on both copies of the chromosome, whereas the other testing
companies use what’s called the HIR method of comparison. To understand the difference, let’s look at
the following diagram of a single pair of chromosomes found in our subject’s
genome.
Let’s call it chromosome 8. Let’s call our subject (original poster) Cindy. The sister, whose relationship is being
called into question, we’ll call Marcia.
On the first (paternal) copy of Cindy’s chromosome 8, green indicates
all of the regions or “blocks” where Cindy’s paternal chromosome matches Marcia’s
paternal chromosome. As full siblings,
we expect that somewhere around half of the time, Cindy inherited DNA on this
chromosome from the same copy of their father Mike’s chromosome as Marcia,
whereas approximately half of the time, Mike passed Cindy DNA from his father’s
side while passing Marcia DNA from his mother’s copy or vice versa, resulting
in non-matching regions when Cindy and Marcia inherited DNA from opposite
paternal grandparents. The exact same
thing is going on with Cindy’s maternal copy of chromosome 8. That is, Cindy is only matching Marcia about
half the time where their mother Carol.
By random chance, the matching portions on each copy of chromosome 8 are
along different (but overlapping) regions of the chromosomes. I have labeled the blocks by their individual
length in cM.
Now, let’s look at the way 23 & Me would tabulate the
match into a total measurement of shared cM on this chromosome. 23 & Me takes the simple approach of
adding the lengths of the three green blocks together, resulting in 189 cM (36 cM
+ 63 cM + 90 cM). They do this because
this methods provides an accurate reflection of how much genetic information in
terms of blocks of genetic code, that the two siblings share.
This third and final illustration, however, shows the predominant
method for tabulating shared cM by nearly every other genetic genealogy website
(including the default method on GEDmatch although they allow users to select
among methodologies). Here, we simply
treat both copies of chromosome 8 as a single span of the genome, and we care
only whether the siblings (or any other relative for that matter) match
one-another due to inheritance on either copy of the chromosome. If the siblings happen to match on both
copies, the region is only counted once.
This is called the HIR method of comparison, where HIR stands for “half-identical
regions.” Technically, HIR regions are
being added to the tally, but so are FIR regions (fully-identical regions),
although the FIR regions are being treated as HIR for tabulation purposes and
not being counted twice as is the case when we use the 23 & Me method. It should be noted that a third method (the
FIR method) can be employed using advanced settings in GEDmatch which tabulates
only FIR regions (and counts them once).
This third method is useful for determining between full and three-quarter
siblings, but that’s beyond the scope of this post. The Borland Genetics chromosome browser provides
a list of all regions and whether each is HIR, FIR or NIR (not matching at
all), as this regime is most useful for DNA reconstruction, which the site sets
out to accomplish.
Now, let’s extrapolate the results across the full set of 22
autosomal chromosomes. An additional piece
of information you need to understand here is that the total length of one copy
of all 22 autosomal chromosomes is approximately 3600 cM. If we were to use the 23 & Me method, we
would find the following: On average, one
quarter of the span of each chromosome represents regions where the parent
passed paternal DNA to both children (and therefore the match is tabulated);
one quarter of the span of each chromosome represents regions where the parent
passed maternal DNA to both children (and therefore the match is tabulated). The other half of the chromosomal regions
represent regions where opposite data was passed to each child by the parent (paternal
to one but maternal to the other). The
result is that on the paternal copies of the chromosomes, full siblings will match
approximately half of the time, resulting in 1800 cM to be added to the total. Likewise, an additional 1800 cM will be added
to the total where the maternal copies match due to same-side inheritance. Therefore, the expectation value for full siblings
using the 23 & Me method is 3600 cM.
Note, that if Marcia were the mother of Cindy, they would likewise have the
same expectation value, but Cindy would match Marcia along 100% of the maternal
copies of the chromosomes (yielding 3600 cM), but would not match Marcia at all
along the opposing paternal copies (adding nothing to the total). So, if we use the 23 & Me method of
comparison, since both relationships (parent/child and full sibling) yield an
expectation value of about 3600 cM, we need to employ additional comparison
tools to determine which is the case.
Let’s see what the statistics look like for the HIR method
of comparison. We already know that full
siblings are likely to match one another about 1800 cM on each copy of the
chromosomes. This time, the first 1800
cM gets counted, but not all of the additional matching blocks on the other
copy get counted, because statistically speaking, half of the span of the matching
blocks on the other copy were already tabulated as matching on the other
copy. Therefore, only 900 cM, on
average, of new matching spans of chromosome are added to the tally from the
second copy of the chromosome, resulting in an expectation value of 1800 cM +
900 cM = 2700 cM when using the HIR method of comparison. However, just like with the 23 & Me
method, a parent/child relationship will still come in at 3600 cM on average.
So if we use the 23 & Me method of comparison, our
measurement of cM will be about 3600 regardless of whether the relation is
parent/child or full sibling. However,
when we use the HIR method, as most other sites do, a parent/child relationship
is likely to be about 3600 cM, whereas a full sibling relation will, on
average, register at about 2700 cM. Most
genetic genealogists are familiar with the HIR numbers. Many do not know that 23 & Me is measuring
something different, and therefore will jump to the conclusion of a
parent/child relationship when they see a 3600 cM measurement.
Moral of the story: An
inch is a unit of measurement. It can be
used to measure length, width, or height.
Likewise, a cM is just a unit of measurement. Make sure you’re measuring what you think you’re
measuring! Otherwise, you will reach the
wrong conclusion as to what the measurement represents.
Second Pitfall: GEDmatch Navigation Issues
This pitfall is common among beginners who are not familiar
with how to read the tables and charts on GEDmatch. It occurs when someone doesn’t realize whose
matches they are looking at because they clicked through a few screens of
results and lost track. A high
percentage of those who upload to GEDmatch are heavy into genetic genealogy and
have tested not only themselves but also one or more parents or children. Therefore, the closet matches on any given match-list
often are about 3600 cM. To make a long
story short, suppose Cindy and Marcia have separate GEDmatch accounts, and that
Marcia has also uploaded data for her son Mickey.
What happens next is more like Threes Company than the Brady
Bunch. The beginner clicks on Marcia in
her match list, and then spends some time there, and then clicks on Mickey, who
is Marcia’s top match, perhaps just given an alias “M.L.” for privacy reasons
(Mickey being a minor). Then, Cindy forgets
or doesn’t realize that she is not looking at her own matches, but rather snooping
on Mickey’s matches, and she sees the top match to Marcia at 3600 cM. She then says, “I knew it!” and hops on to
Facebook and makes the post about sharing 3600 cM with Marcia, when in fact it
was Mickey all along that shared this amount of cM and not in fact Cindy. However, since the name “Mickey” appeared
nowhere on the page, Cindy ignored that it said “M.L.” near the top,
thinking it was some sort of genetic genealogy jargon like HIR or FIR (not
realizing that it stood for her nephew Mickey Logan’s initials.
As with all of my blog posts in this series, I hope this helps
steer some beginners along the right track and that it helps provide more
experienced genetic genealogists with an analogy to share with those they are
assisting with their research. Genetic
genealogy is powerful, and you can use it to solve a wide variety of difficult
problems. However, with this power comes
the responsibility to understand the techniques and tools available to us, so
we don’t wind up led astray.
I found this article through your Sticky Segment blog posting and you are blowing my mind with the 3600 cm not being a parent child relationship! I have never heard of the 23 and me calculations being different and when I put that number in to the shared cm project, it indicates 100% parent child relationship with no caveat regarding 23 and me tests!! Shouldn't this be advertised somewhere??
ReplyDeleteI agree that the different counting system of 23 & Me should be more "advertised." I think it's on one version of the "green chart," but few notice it.
DeleteWhen 23andme debuted their "tree", I was horrified. Relationships that I know to be true were just cannibalized! And exactly what you've described here happened - my great-grandmother was one of 12 children born over 21 years (to the same set of parents). After my tree was created at my request, several of my ggm's siblings show up as her children (I know this because of known cousins going back to her as an ancestors rather than her parents). I attempted to correct it, and there were so many other errors, I just gave up. Too bad because it was an interesting concept.
Delete