Thursday, November 15, 2018

DIY: Refuting the Khazar myth

Showing the Khazar fallacy with open genomics tools


This is the third part of my Khazar special series. Read part I, and part II if you haven't already.

In my previous posts, I've reviewed both the academic papers refuting the Khazar "hypothesis", as well as busting the most "serious" attempt to actually prove this narrative (Elhaik et al., Das et al., etc.).

However, since most people would find it difficult to actually understand everything that is written in those studies, I think the best way to find out something is to do it yourself.

Fortunately, nowadays, simple DNA ancestry tests done at home - such as 23andme, AncestryDNA, MyHeritage etc. are easily accessible and can prove the same conclusion detailed in those peer reviewed publications without much effort. And thanks to open source/data communities, we are now at the age that any person with a simple GEDmatch (an open data personal genomics database and genealogy website) account and little bit of technical capability can verify this by themselves in no time.

There are number of ADMIXTURE calculators on GEDmatch which one can use to run yours or any other kit to see both the closest populations or the closest ADMIXTURE.

For example, let's run my own kit - a full East European Ashkenazi Jew:

Eurogenes K13 calculator results:

Single Population Sharing:

#Population (source) Dist.
1Ashkenazi4.14
2East_Sicilian9.03
3Italian_Jewish9.18
4Central_Greek9.64
5South_Italian9.73
6Algerian_Jewish10.09
7Sephardic_Jewish10.66
8West_Sicilian10.91
9Italian_Abruzzo12.03
10Greek_Thessaly12.33
11Tunisian_Jewish12.9
12Libyan_Jewish13.8
13Cyprian15.76
14Tuscan16.14
15Lebanese_Muslim19.41
16Bulgarian19.63
17Lebanese_Druze20.7
18Syrian20.9
19Samaritan21.17
20Palestinian21.84

As can be seen - the closest single population to my kit are other Ashkenazi Jews, followed by East Mediterranean non-Jewish populations and other (Western) Jewish populations.

My ADMIXTURE results also show I am pretty much your average Ashkenazi Jew:

Mixed Mode Population Sharing:

# Primary Population (source) Secondary Population (source) Distance
1 97.2% Ashkenazi + 2.8% Lebanese_Druze @ 4.1
2 100% Ashkenazi + 0% Abhkasian @ 4.14
3 100% Ashkenazi + 0% Adygei @ 4.14
4 100% Ashkenazi + 0% Afghan_Pashtun @ 4.14
5 100% Ashkenazi + 0% Afghan_Tadjik @ 4.14
6 100% Ashkenazi + 0% Afghan_Turkmen @ 4.14
7 100% Ashkenazi + 0% Aghan_Hazara @ 4.14
8 100% Ashkenazi + 0% Algerian @ 4.14
9 100% Ashkenazi + 0% Algerian_Jewish @ 4.14
10 100% Ashkenazi + 0% Altaian @ 4.14
11 100% Ashkenazi + 0% Armenian @ 4.14
12 100% Ashkenazi + 0% Assyrian @ 4.14
13 100% Ashkenazi + 0% Austrian @ 4.14
14 100% Ashkenazi + 0% Austroasiatic_Ho @ 4.14
15 100% Ashkenazi + 0% Azeri @ 4.14
16 100% Ashkenazi + 0% Balkar @ 4.14
17 100% Ashkenazi + 0% Balochi @ 4.14
18 100% Ashkenazi + 0% Bangladeshi @ 4.14
19 100% Ashkenazi + 0% Bantu_N.E. @ 4.14
20 100% Ashkenazi + 0% Bantu_S.E. @ 4.14
  
And if I'll try some other calculators, I get pretty much the same picture:

puntDNAL K13 Global:

Single Population Sharing:

#Population (source)Distance
1Ashkenazy_Jew3.82
2Italian_Sicilian4.67
3Greek_Central5.86
4Italian_Abruzzo6.52
5Sephardic_Jew7.16
6Greek_Thessaly10.54
7Albanian11.07
8Kosovar12.41
9Italian_Tuscan13.03
10Turkish13.23
11Cypriot14.68
12Turkish_Aydin15.13
13Turkish_Kayseri15.74
14Bulgarian16.74
15Macedonian17.24
16Syrian18.37
17Romanian18.58
18Lebanese_Christian18.95
19Italian_Bergamo18.99
20Lebanese_Druze19.19


MDLP K23b:

Single Population Sharing:

#Population (source)Distance
1Ashkenazi_Jew ( )3.04
2Sicilian_West ( )4.78
3Sicilian_Siracusa ( )5.21
4Sicilian_Agrigento ( )5.45
5French_Jew ( )5.72
6Maltese ( )5.87
7Turk_Jew ( )5.9
8Sephardic_Jew ( )5.98
9Greek_Peloponnesos ( )6.27
10Sicilian_Trapani ( )6.28
11Sicilian_East ( )6.3
12Italian_Jew ( )6.39
13Ashkenazi ( )6.65
14Greek_Northwest ( )7.14
15Greek_Thessaloniki ( )7.53
16Greek_Thessaly ( )7.54
17Bulgarian ( )7.56
18Macedonian ( )7.7
19Moroccan_Jew ( )7.82
20Romanian_Jew ( )8.56

And so forth. As can be seen, this is pretty consistent with what we've seen in the PCA I've posted in previous entry:




Ashkenazi Jews, like their genetically closest population, Sephardic Jews, are grouped together as part of the Western Jews cluster, which in a larger scope can be viewed as part of the East Mediterranean Continuum, which again means Ashkenazi Jews cluster genetically with other non-Jewish East Mediterranean populations - Aegean Greek populations (like from Crete, Rhodes etc.), Sicilians, Maltese and Cypriots.

The PCA above is made from academic samples gathered as part of Davidski's Global25 en-devour, and it also includes coordinates for 471 Ashkenazi individuals (colored in grey) from Bray et al. 2010, which as can be seen they all cluster tightly together, as expected from such an endogenic population.

Examining these ~500 Ashkenazi samples to find their lowest genetic distance, the shortest is to the Ashkenazi Global25 reference panel, followed by various East Mediterranean populations:

 Ashkenazi_G25_reference 1.031630  
 Maltese 1.995762  
 Italian_South 2.020773  
 Sicilian_East 2.166531  
 Sicilian_West 2.313971  
 Italian_Abruzzo 2.593161  
 Italian_Jew 2.757282  
 Greek_Crete 2.930938  


The same trend can be observed in other PCAs, based on Eurogenes K36 (one of the ADMIXTURE calculators found on GEDmatch) values, using a mix of both academic and non-academic samples:




As can be seen, with open genetic data now available, and people getting their own ancestry profile and raw DNA in less than a $100 these days, it's quite easy to refute the Khazar myth with about 5 minutes work, by simply showing that Ashkenazi Jews cluster with all other Western Jews and tightly to other Mediterranean people.

But don't just take my word for it.
If you're of Ashkenazi origin, or know someone who has done one of those home ancestry tests, ask their raw data, and upload them to GEDmatch.
Run any of the ADMIXTURE calculators.

And if you want to dive deeper, you can research nMonte, PAST3, and the other open source tools that I used to make the PCAs included in this post. There’s a vast world of information that’s easily accessible to anyone curious about genetics and wants to test theories out for themselves.

So far I've used autosomal analyses to refute the Khazar narrative. This means that I’ve dealt with overall ancestry and not with any sex-specific chromosomes.

In my next post, I will tackle uniparental lineages: the Y chromosome and mitochondrial DNA that are passed down through generations from father to son and mother to daughter. These can offer more detailed information in some areas that can complement what we’ve already seen. Stay tuned!

Monday, November 5, 2018

The Return of the Khazars

"Proving" the Khazar ancestry of Ashkenazi Jews with bad science


This is the second part of a series of blog entries dedicated to show the invalidity of the theory that Ashkenazi Jews are Khazars. If you hadn't read the first part yet, I suggest you do

In my previous post, I referenced a long list of peer reviewed studies from the past decade or so that completely dismantle this now-defunct theory.

However, as I mentioned at the end of the post, the overwhelming amount of conclusive data was still not enough to kill off the Khazar theory.

On December, 2012, Dr. Eran Elhaik published his peer reviewed study "The Missing Link of Jewish European Ancestry: Contrasting the Rhineland and the Khazarian Hypotheses" on the Oxford journal Genome Biology and Evolution, which garnered a lot of publicity at the time by claiming to prove, using population genetics, that Ashkenazi Jews are indeed descendants of the Khazars.

At the time, the study’s controversial results, the scientific community’s rejection of it, and Elhaik's own cries that he's being persecuted due to politics rather than admitting that perhaps there was something wrong with his study, all helped to elevate its publicity. Most non-scientific journals that covered this study accepted it without even questioning its scientific validity.

However, the scientific community soon responded with force: in 2013, Human Biology published a study that refuted both Elhaik's claims and the Khazar narrative in general. The leading author of this paper was Doron M. Behar, a known geneticist and researcher on Jewish genetics. Another thirty scholars, many of whom are well known scientists, cosigned the paper. Here's the abstract of that study:
The origin and history of the Ashkenazi Jewish population have long been of great interest, and advances in high-throughput genetic analysis have recently provided a new approach for investigating these topics. We and others have argued on the basis of genome-wide data that the Ashkenazi Jewish population derives its ancestry from a combination of sources tracing to both Europe and the Middle East. It has been claimed, however, through a reanalysis of some of our data, that a large part of the ancestry of the Ashkenazi population originates with the Khazars, a Turkic-speaking group that lived to the north of the Caucasus region ~1,000 years ago. Because the Khazar population has left no obvious modern descendants that could enable a clear test for a contribution to Ashkenazi Jewish ancestry, the Khazar hypothesis has been difficult to examine using genetics. Furthermore, because only limited genetic data have been available from the Caucasus region, and because these data have been concentrated in populations that are genetically close to populations from the Middle East, the attribution of any signal of Ashkenazi-Caucasus genetic similarity to Khazar ancestry rather than shared ancestral Middle Eastern ancestry has been problematic. Here, through integration of genotypes on newly collected samples with data from several of our past studies, we have assembled the largest data set available to date for assessment of Ashkenazi Jewish genetic origins. This data set contains genome-wide single-nucleotide polymorphisms in 1,774 samples from 106 Jewish and non- Jewish populations that span the possible regions of potential Ashkenazi ancestry: Europe, the Middle East, and the region historically associated with the Khazar Khaganate. The data set includes 261 samples from 15 populations from the Caucasus region and the region directly to its north, samples that have not previously been included alongside Ashkenazi Jewish samples in genomic studies. Employing a variety of standard techniques for the analysis of populationgenetic structure, we find that Ashkenazi Jews share the greatest genetic ancestry with other Jewish populations, and among non-Jewish populations, with groups from Europe and the Middle East. No particular similarity of Ashkenazi Jews with populations from the Caucasus is evident, particularly with the populations that most closely represent the Khazar region. Thus, analysis of Ashkenazi Jews together with a large sample from the region of the Khazar Khaganate corroborates the earlier results that Ashkenazi Jews derive their ancestry primarily from populations of the Middle East and Europe, that they possess considerable shared ancestry with other Jewish populations, and that there is no indication of a significant genetic contribution either from within or from north of the Caucasus region.


Citations:

Behar, Doron M.; Metspalu, Mait; Baran, Yael; Kopelman, Naama M.; Yunusbayev, Bayazit; Gladstein, Ariella; Tzur, Shay; Sahakyan, Havhannes; Bahmanimehr, Ardeshir; Yepiskoposyan, Levon; Tambets, Kristiina; Khusnutdinova, Elza K.; Kusniarevich, Aljona; Balanovsky, Oleg; Balanovsky, Elena; Kovacevic, Lejla; Marjanovic, Damir; Mihailov, Evelin; Kouvatsi, Anastasia; Traintaphyllidis, Costas; King, Roy J.; Semino, Ornella; Torroni, Anotonio; Hammer, Michael F.; Metspalu, Ene; Skorecki, Karl; Rosset, Saharon; Halperin, Eran; Villems, Richard; and Rosenberg, Noah A.

These are basically the foremost experts on Jewish genetic studies.

And you can read the entire rebuttal here:

No Evidence from Genome-Wide Data of a Khazar Origin for the Ashkenazi Jews

In response to this rebuttal, Elhaik seemingly became obsessed with proving the Khazar hypothesis, going so far as to create an entire website dedicated to it. And when his previous attempts fell flat in the face of science, history, and logic, he continued publishing similar follow up "studies", as part of Das et al. in 2016 and 2017, essentially akin to trolling the scientific community.

To understand just how problematic Elhaik's papers and theories are, one has to understand that populations genetics is essentially a comparative field. You have to carefully construct reference groups as a basis for relationships between populations. For example, if I assume that my Near East/Middle East reference group will be composed of Iraqis, Iranians and Kurds, I might end up with Lebanese, Druze and Palestinians getting bogus results that they are only 50% Middle Eastern / Near Eastern, and about 50% South European.

In addition to this, assumptions about modern populations representing ancient ones need to be carefully verified with ancient DNA samples before being used as such. A good example is Haber et al. paper from 2017, which successfully established that modern day Lebanese are pretty good proxies to Bronze Age Canaanites by testing ancient samples found in Sidon, Lebanon and dated to ~1750 BC. This paper found a 93% correlation between modern day Lebanese and those ancient samples. So, it's safe to say that we can use Lebanese as a good modern reference population for Levantine ancestry.

And while Elhaik's 2012 paper has numerous flaws, these two factors—reference groups and using modern populations as proxies of ancient populations—are where his entire narrative totally collapses.

First, in his paper, he seems to have intentionally omit all Western Jewish population except for Ashkenazi Jews. Considering what data was widely available at the time of this study, 2012, this seem to have been a deliberate and calculated move, as one cannot escape from the thought that he knew that the autosomal similarities between Ashkenazi, Sephardi, Italian, and even North African Jews would completely undermine the entire premise of his study. Oddly enough, he actually admits to having Sephardic Jewish samples, yet without any clear reasons, states:
In congruence with the literature that considers “Ashkenazi Jews” distinct from “Sephardic Jews,” we excluded the later.
Just like that. No reason is given why they were excluded.

Second, in what Elhaik describes as the choice of "surrogate" populations—essentially what I've described as using modern populations as proxies for ancient ones—he states the following:
Choice of Surrogate Populations
As the ancient Judeans and Khazars have been vanquished and their remains have yet to be sequenced, in accordance with previous studies (Levy-Coffman 2005; Kopelman et al. 2009; Atzmon et al. 2010; Behar et al. 2010), contemporary Middle Eastern and Caucasus populations were used as surrogates. Palestinians were considered proto-Judeans because they are assumed to share a similar linguistic, ethnic, and geographic background with the Judeans and were shown to share common ancestry with European Jews (Bonné-Tamir and Adam 1992; Nebel et al. 2000; Atzmon et al. 2010; Behar et al. 2010). Similarly, Caucasus Georgians and Armenians were considered proto-Khazars because they are believed to have emerged from the same genetic cohort as the Khazars (Polak 1951; Dvornik 1962; Brook 2006).

Essentially, he chose to represent ancient Levantine Jewish population with modern day Palestinians, and Khazars with Georgians and Armenians. The funny thing is that he actually claims one of the reasons for choosing Palestinians as his reference group was that they were shown to share common ancestry with European Jews! This here alone indicates that he recognizes the Levantine ancestry of Ashkenazi Jews.

However, using Palestinians as a reference group for ancient Judean Jews, lacking any concrete historical or genetic evidence at the time for such a connection, can rightly be considered more  politically-driven than good science, and using Armenians (or Georgians) as Khazar proxies is just odd.

Palestinians are Levantine people, just like ancient Judean Jews most likely were. However, the majority of Palestinians today are Muslim, and Muslim Levantines are not the best proxy for ancient Levantines because it has been established by previous studies that they drift towards North African and peninsular Arab (Saudi, Yemenite) populations. While it is true that Elhaik's original paper was out years before Haber et al. provided proof that Lebanese are a much better proxy for the ancient Levant, the aforementioned drift that Levantine Muslims show on the different PCAs and even some degree of Sub Saharan African ancestry found among them that is lacking Christian and Jewish populations had been established at least as early as 2003:

Extensive Female-Mediated Gene Flow from Sub-Saharan Africainto Near Eastern Arab Populations

This study found that Haplogroups L1-L3A, which are common among people of sub-Saharan African descent and usually indicate such admixture, can be found among Muslim Middle Eastern populations:
“Haplogroups L1–L3A in the Near East reach their highest frequency in the Yemen Hadramawt (∼35%). Other Arab populations—Palestinians, Jordanians, Syrians, Iraqis, and Bedouin—have ∼10%–15% of lineages of sub-Saharan African origin. These types are rarely shared between different Arab populations. By contrast, non-Arab Near Eastern populations—Turks, Kurds, Armenians, Azeris, and Georgians—have few or no such lineages, suggesting that gene flow from Africa has been specifically into Arab populations. “

And, also, regarding non-Muslims Middle Easterners, specifically Middle Eastern Jews:
“Near Eastern Jewish groups almost entirely lack haplogroups L1–L3A. “
Later studies reaffirmed these findings, which can be seen in the PCA I posted in my previous entry here.

Another important fact from the 2003 paper is that the members of the "Khazar" reference group Elhaik constructed, Armenians and Georgians, are treated here as Near Eastern populations rather than Caucasus populations related to the northern Caucasus that the Khazars inhabited and ruled over. And for a good reason, but we'll get to that soon.

First, let’s return to Elhaik's choice of Palestinians as a "surrogate" population representing ancient Judean Jews. He justifies this choice with an argument that undermines the paper's central premise: that Palestinians were shown to share common ancestry with Ashkenazi Jews, thus recognizing their Levantine ancestry and partial origin in contradiction to the Khazar narrative that he would later conclude. From this, the conclusion I (and many others who have read his paper) is that this decision is  driven by political views.

In fact, one cannot but suspect that Elhaik believes that Palestinians descend from the ancient Jews and are thus the real Jews, while Ashkenazi Jews, who make up the majority of the world’s modern Jewish population, are essentially fake Jews. This seems to me (and others) to be the main reason that he chose Palestinians rather than Samaritans, for instance, who would have made a more logical surrogate population due to historical, genetic, cultural, and religious factors.

In his later studies and in the website that he created to promote his ideas, Elhaik regularly alludes to Shlomo Sand's theories. Sand, the author of the politically-biased and historically-controversial books such as The Invention of the Jewish People (2009) and How I Ceased to Be a Jew (2013), has repeatedly claimed that all of the different Jewish ethnic groups that lived around the world are made up of local converts to Judaism and that there is no common Jewish ethnicity or ancestral origin. He similarly claims that modern Palestinians are the true descendants of ancient Jews rather than modern Jews. These claims are not supported by population genetic studies. And predictably, when Elhaik published his first paper in 2012, Sand was quick to seize it and dismiss all other genetic studies as erroneous despite the fact that he has no credentials of knowledge of population genetics. Sand's ideological and controversial books on Jewish history and Elhaik's seemingly poorly-reasoned papers complement each other, and are equally detached from earlier and more recent scientific evidence.


Elhaik’s use of Amrenians and Georgians as a proxy for Khazars further illustrate the ahistorical nature of his narrative. He admits in his paper that “Khazars have been vanquished and their remains have yet to be sequenced,” so nobody really knows who what modern populations, if any, are predominately descended from the Khazars. In virtually all genetic studies, Armenians are considered to be a Near Eastern population that overlaps with Mesopotamian populations like modern Assyrians and, to a lesser degree, Kurds. This is clearly evident in both the Eurogenes PCA:


















and if I zoom in on the Global25 PCA that I posted in my previous entry:














Armenians very clearly overlap with Assyrians, Kurds, and Iranians. Ironically, their closest Jewish populations are Mizrahi Jews—Georgian, Iranian, and Iraqi Jews—not Ashkenazi Jews. These populations all cluster tightly with Armenians and other Mesopotamian-like non-Jewish populations. Are we now to believe that Iraqi and Iranian Jews descend from Khazars?

Elhaik very seem to have chose not to use the Turkic-speaking Chuvash people, who are widely assumed by scholars to be the closest modern population to the Khazars. In fact, he didn’t choose a single Turkic-speaking people as a reference for Khazars despite the fact that the Khazars were almost certainly Turkic-speaking themselves. He also didn't choose any North Caucasian populations such as Kumyks or Ossetians, both of whom reside in the actual areas parts that the Khazars’ kingdom was centered around. On top of all this, the Ossetians, a North Caucasus Iranian people, even have historical traditions linking their ethnogenesis to the Khazars via the medieval Alans.
Lastly, the only non-Ashkenazi Jewish population that Elhaik chose to feature in his study are Azeri Jews, otherwise known as Mountain/Caucasian Jews. Elhaik cites Kevin Alan Brook's The Jews of Khazaria (2006) as one of his bases for choosing Armenians and Georgians as his "surrogate populations" for Khazars. I actually have a copy of this book at home, which I highly recommend it as an excellent scholarly work about the Khazars, and in his book as well as on his website, Brook actually argues against the notion that Azeri Jews are descended from Khazars:

"I have not yet been convinced of a connection between Mountain Jews and Khazarian Jews. It is possibly a coincidence that Khazarian Jews and Mountain Jews lived in roughly the same geographic area. And most of the Khazars who remained in the Caucasus after the 10th century are known to have been forced into Islam, leaving us with the more likely scenario that the Turkic groups of the North Caucasus who are Muslims, especially the Karachays and Balkars, but not the Kumukhs, are partly descended from the Khazars. "
It's funny because in this same study, Brook claims that North Caucasus populations can be more reasonably-assumed to be descendants of the Khazars, and yet Elhaik still chose Armenians and Georgians, who inhabit regions that are on the very edge of Khazaria’s historical southern boundaries and that the Khazars did not consistently control. In fact, Arabs had a much stronger holds over these regions, especially Armenia, than the Khazars did during this time period. Azeri Jews are similarly assumed by Elhaik to be the descendants of Khazars without any explanation and again contradicting the same sources that he cites.

Elhaik basically assumes with no rational basis that Armenians are descended from Khazars, and that Palestinians are the primary and most authentic modern descendants of ancient Jews. Therefore, he concludes that Ashkenazi Jews, who show some affinity to other Northern Near Eastern populations like Armenians (which in reality is likely due to shared Near Eastern ancestry), are descendants of Khazars.

If anything, Elhaik unknowingly corroborated a paper that was published the following year by Haber et al. (2013), which have found that:

"Levantine populations [can be split to] two branches: one leading to Europeans and Central Asians that includes Lebanese, Armenians, Cypriots, Druze and Jews, as well as Turks, Iranians and Caucasian populations; and a second branch composed of Palestinians, Jordanians, Syrians, as well as North Africans, Ethiopians, Saudis, and Bedouins." 

I hope that this entry was clear and elaborate enough to show just how poorly-reasoned and unscientific the most serious attempt in recent years to "prove" the Khazar ancestry of Ashkenazi Jews (or any Jews to that matter) was. In fact, it had to be this bad because this theory has no merit.

Unfortunately, as I've mentioned at the beginning of this blog entry, Elhaik didn't stop with his first bad study, and has subsequently tried to support this theory in later studies, by Das et al. (2015, 2016 and 2017), though he has changed his narrative repeatedly to accommodate the evidence that overwhelmingly contradicts his arguments. Bizarrely, though, he has pushed his narrative to an even greater extreme, arguing that Ashkenai Jews only derive 3% of their ancestry from the Levant by assuming that Bedouins are in fact pure Levantines and, as a result, modern day Levantine populations, including Palestinians and Lebanese, are heavily descended from Iranian and Anatolian populations.

In my next post, I'll show how simple it is to disprove the Khazar theory with today's available open genomics data.