Marcia, Marcia, Marcia!

"My big sister Marcia matches me at 3600 cM, and all this time I never knew she was my mother!!!  Now I know why she always treated me like a child!"

When I see a post like this in one of the Facebook Groups (and I actually see it quite frequently), I always take a minute to stop and read the comments, because jumping to this conclusion is usually the result of one of two common genetic genealogy pitfalls, both of which often lead beginner genetic genealogists down a rabbit hole.  Furthermore, when the sibling that comes in at around 3600 cM is an older brother, I frequently see comments responding to the post by well-meaning beginners, insisting that the only conclusion is that the brother must have committed some act of incest resulted in the birth of the individual who made the post (the original poster in FB terminology).  Needless to say, these types of conclusions can lead to serious family conflicts and permanent damage to family relations.

Yes, sometimes a family hides a teenage pregnancy and the conclusion is correct, and sadly, sometimes the conclusion regarding incest is correct.  However, nine out of ten times (probably closer 99 out of 100 times), conclusions like the ones I speak of, are incorrectly reached due to one of two common beginner genetic genealogy pitfalls.  I will explain each of them herein.

First (and Most Common) Situation

The first question I ask of the original poster, when I see a post like this, is always a request for the poster to share the source of the shared cM information upon which they rely.  Almost always, the answer is 23 & Me.  Here’s why.

Like inches or centimetres, the cM (centi-Morgan) is simply a unit of measurement.  It’s a complex non-linear unit of measurement based on statistical analysis, but nonetheless at the end of the day, it’s just a unit of measurement.  Now, I want you to think about a situation.  It’s Friday, and you just got your paycheck, and you’ve gone to BestBuy and bought a new 55 inch television set for the family room.  You take it home and set it up, and then your child, playing with the tape measure says, “Look daddy, it’s only 48 inches” after measuring the width of the screen.  Furious, you pack it back up and haul it back to BestBuy where a friendly sales associate calms you down after explaining to you that the size of televisions and monitors are measured along the diagonal, not by width of the screen (its longest traditional dimension).

Now, I’m going to be that Best Buy associate.  It’s not enough that you have a measurement of 3600 cM, just like it wasn’t enough information to simply know the store’s measurement of 55 inches.  Equally important for your analysis is what dimension or property is being measured.  So let’s talk about what is being measured at 23 & Me when they report shared cM, vs. what all the other sites are measuring when they report shared cM, because it’s not the same.  23 & Me provides measurements of total shared DNA on both copies of the chromosome, whereas the other testing companies use what’s called the HIR method of comparison.  To understand the difference, let’s look at the following diagram of a single pair of chromosomes found in our subject’s genome.


Let’s call it chromosome 8.  Let’s call our subject (original poster) Cindy.  The sister, whose relationship is being called into question, we’ll call Marcia.  On the first (paternal) copy of Cindy’s chromosome 8, green indicates all of the regions or “blocks” where Cindy’s paternal chromosome matches Marcia’s paternal chromosome.  As full siblings, we expect that somewhere around half of the time, Cindy inherited DNA on this chromosome from the same copy of their father Mike’s chromosome as Marcia, whereas approximately half of the time, Mike passed Cindy DNA from his father’s side while passing Marcia DNA from his mother’s copy or vice versa, resulting in non-matching regions when Cindy and Marcia inherited DNA from opposite paternal grandparents.  The exact same thing is going on with Cindy’s maternal copy of chromosome 8.  That is, Cindy is only matching Marcia about half the time where their mother Carol.  By random chance, the matching portions on each copy of chromosome 8 are along different (but overlapping) regions of the chromosomes.  I have labeled the blocks by their individual length in cM.


Now, let’s look at the way 23 & Me would tabulate the match into a total measurement of shared cM on this chromosome.  23 & Me takes the simple approach of adding the lengths of the three green blocks together, resulting in 189 cM (36 cM + 63 cM + 90 cM).  They do this because this methods provides an accurate reflection of how much genetic information in terms of blocks of genetic code, that the two siblings share.


This third and final illustration, however, shows the predominant method for tabulating shared cM by nearly every other genetic genealogy website (including the default method on GEDmatch although they allow users to select among methodologies).  Here, we simply treat both copies of chromosome 8 as a single span of the genome, and we care only whether the siblings (or any other relative for that matter) match one-another due to inheritance on either copy of the chromosome.  If the siblings happen to match on both copies, the region is only counted once.  This is called the HIR method of comparison, where HIR stands for “half-identical regions.”  Technically, HIR regions are being added to the tally, but so are FIR regions (fully-identical regions), although the FIR regions are being treated as HIR for tabulation purposes and not being counted twice as is the case when we use the 23 & Me method.  It should be noted that a third method (the FIR method) can be employed using advanced settings in GEDmatch which tabulates only FIR regions (and counts them once).  This third method is useful for determining between full and three-quarter siblings, but that’s beyond the scope of this post.  The Borland Genetics chromosome browser provides a list of all regions and whether each is HIR, FIR or NIR (not matching at all), as this regime is most useful for DNA reconstruction, which the site sets out to accomplish.

Now, let’s extrapolate the results across the full set of 22 autosomal chromosomes.  An additional piece of information you need to understand here is that the total length of one copy of all 22 autosomal chromosomes is approximately 3600 cM.  If we were to use the 23 & Me method, we would find the following:  On average, one quarter of the span of each chromosome represents regions where the parent passed paternal DNA to both children (and therefore the match is tabulated); one quarter of the span of each chromosome represents regions where the parent passed maternal DNA to both children (and therefore the match is tabulated).  The other half of the chromosomal regions represent regions where opposite data was passed to each child by the parent (paternal to one but maternal to the other).  The result is that on the paternal copies of the chromosomes, full siblings will match approximately half of the time, resulting in 1800 cM to be added to the total.  Likewise, an additional 1800 cM will be added to the total where the maternal copies match due to same-side inheritance.  Therefore, the expectation value for full siblings using the 23 & Me method is 3600 cM.  Note, that if Marcia were the mother of Cindy, they would likewise have the same expectation value, but Cindy would match Marcia along 100% of the maternal copies of the chromosomes (yielding 3600 cM), but would not match Marcia at all along the opposing paternal copies (adding nothing to the total).  So, if we use the 23 & Me method of comparison, since both relationships (parent/child and full sibling) yield an expectation value of about 3600 cM, we need to employ additional comparison tools to determine which is the case.

Let’s see what the statistics look like for the HIR method of comparison.  We already know that full siblings are likely to match one another about 1800 cM on each copy of the chromosomes.  This time, the first 1800 cM gets counted, but not all of the additional matching blocks on the other copy get counted, because statistically speaking, half of the span of the matching blocks on the other copy were already tabulated as matching on the other copy.  Therefore, only 900 cM, on average, of new matching spans of chromosome are added to the tally from the second copy of the chromosome, resulting in an expectation value of 1800 cM + 900 cM = 2700 cM when using the HIR method of comparison.  However, just like with the 23 & Me method, a parent/child relationship will still come in at 3600 cM on average.

So if we use the 23 & Me method of comparison, our measurement of cM will be about 3600 regardless of whether the relation is parent/child or full sibling.  However, when we use the HIR method, as most other sites do, a parent/child relationship is likely to be about 3600 cM, whereas a full sibling relation will, on average, register at about 2700 cM.  Most genetic genealogists are familiar with the HIR numbers.  Many do not know that 23 & Me is measuring something different, and therefore will jump to the conclusion of a parent/child relationship when they see a 3600 cM measurement.

Moral of the story:  An inch is a unit of measurement.  It can be used to measure length, width, or height.  Likewise, a cM is just a unit of measurement.  Make sure you’re measuring what you think you’re measuring!  Otherwise, you will reach the wrong conclusion as to what the measurement represents.

Second Pitfall:  GEDmatch Navigation Issues

This pitfall is common among beginners who are not familiar with how to read the tables and charts on GEDmatch.  It occurs when someone doesn’t realize whose matches they are looking at because they clicked through a few screens of results and lost track.  A high percentage of those who upload to GEDmatch are heavy into genetic genealogy and have tested not only themselves but also one or more parents or children.  Therefore, the closet matches on any given match-list often are about 3600 cM.  To make a long story short, suppose Cindy and Marcia have separate GEDmatch accounts, and that Marcia has also uploaded data for her son Mickey.

What happens next is more like Threes Company than the Brady Bunch.  The beginner clicks on Marcia in her match list, and then spends some time there, and then clicks on Mickey, who is Marcia’s top match, perhaps just given an alias “M.L.” for privacy reasons (Mickey being a minor).  Then, Cindy forgets or doesn’t realize that she is not looking at her own matches, but rather snooping on Mickey’s matches, and she sees the top match to Marcia at 3600 cM.  She then says, “I knew it!” and hops on to Facebook and makes the post about sharing 3600 cM with Marcia, when in fact it was Mickey all along that shared this amount of cM and not in fact Cindy.  However, since the name “Mickey” appeared nowhere on the page, Cindy ignored that it said “M.L.” near the top, thinking it was some sort of genetic genealogy jargon like HIR or FIR (not realizing that it stood for her nephew Mickey Logan’s initials.

As with all of my blog posts in this series, I hope this helps steer some beginners along the right track and that it helps provide more experienced genetic genealogists with an analogy to share with those they are assisting with their research.  Genetic genealogy is powerful, and you can use it to solve a wide variety of difficult problems.  However, with this power comes the responsibility to understand the techniques and tools available to us, so we don’t wind up led astray.

Comments

  1. I found this article through your Sticky Segment blog posting and you are blowing my mind with the 3600 cm not being a parent child relationship! I have never heard of the 23 and me calculations being different and when I put that number in to the shared cm project, it indicates 100% parent child relationship with no caveat regarding 23 and me tests!! Shouldn't this be advertised somewhere??

    ReplyDelete
    Replies
    1. I agree that the different counting system of 23 & Me should be more "advertised." I think it's on one version of the "green chart," but few notice it.

      Delete
    2. When 23andme debuted their "tree", I was horrified. Relationships that I know to be true were just cannibalized! And exactly what you've described here happened - my great-grandmother was one of 12 children born over 21 years (to the same set of parents). After my tree was created at my request, several of my ggm's siblings show up as her children (I know this because of known cousins going back to her as an ancestors rather than her parents). I attempted to correct it, and there were so many other errors, I just gave up. Too bad because it was an interesting concept.

      Delete

Post a Comment

Popular posts from this blog

Introducing the Borland Genetics Segment Lab