The origin of Minnan & Hakka, the so-called "Taiwanese", inferred by HLA study

M. Linˇ]Mackay Memorial Hospitalˇ^


ˇ@ˇ@Key words: anthropology; HLA class I and class II genes; HLA haplotypes; Minnan and Hakka; ˇ§Taiwaneseˇ¨

ˇ@ˇ@Acknowledgements: We would like to thank Drs. L.C. Yu and Y.C Tsai of the Research Department, Mackay Memorial Hospital for helping us to collect blood samples from Hakka. This work was supported by a grant from the National Health Research Institue of Taiwan (DOH 87-HR-601).

ˇ@ˇ@Abstract: The Minnan and Hakka people groups, the so-called "Taiwanese", are the descendants of early settlers from the southeast coast of China during the last few centuries. Genetically they showed affinities to southern Asian populations as determined by phylogenetic trees and correspondence analysis calculated from HLA allele frequencies. This corresponds historically with the fact that they are the descendants of the southeast coastal indigenous population (Yueh) of China and should therefore not be considered as descendants of "pure" northern Han Chinese. A33-B58-DRB1*03 (A33-Cw10-B58-DRB1*03-DQB1*02), the most common HLA haplotype among "Taiwanese", with a haplotype frequency of 6.3%, has also been found to be the most common haplotype among Thai-Chinese and Singapore Chinese, two other populations also originating from the southeast coast of China. These observations suggest that this haplotype is the most well conserved ancient haplotype of the Yueh.

ˇ@ˇ@ˇ§Taiwaneseˇ¨, the major population group in Taiwan, are comprised of the Minnan and Hakka peoples in which constitute 73.5% and 17.5%, respectively, of the total population. They are descendants of early setters from the southeast coast (Fuchien and Kwangton) of China during the past 400 years or more in the recent history. Many ˇ§Taiwaneseˇ¨ intermarried with the preexisting indigenous tribes after arrival, mainly with the plains tribes but also with the mountain tribes. In our previous study, we found that 13%of ˇ§Taiwaneseˇ¨ HLA-A,-B and ˇVC three-locus haplotypes most likely originated from these mountain tribes and also from the Pazeh, who are a disappearing plains tribe. This suggests that only a small proportion of indigenous genes are present in the ˇ§Taiwaneseˇ¨ gene pool, although HLA data from the already extinct plains tribes (9 tribes) are not available, and so the degree of contribution of these tribes to the ˇ§Taiwaneseˇ¨ gene pool is at present unknown (1). This is in contrast to the traditional account of ˇ§Taiwaneseˇ¨ origins handed down through successive generation, either orally or through written family pedigrees. In this account, ˇ§Taiwaneseˇ¨ have been told that their ancestors originated from the Central Plains of North China but migrated to southeast coastal area sometime after the Han Dynasty, during the invasion of the north by northern pastoral nomads. Hence they are assumed to be descendants of ˇ§pureˇ¨ northern Han Chinese from the Central Plains and thus belong to the great tradition of Han (Hwa-Shia).

ˇ@ˇ@In this study, we analyzed the HLA data of both Minnan and Hakka by constructing phylogenetic trees and plotting correspondence analysis, and also by tracing the most common HLA haplotype seen in Minnan and Hakka, and comparing the results with data from other populations. In this way we hope to clarify the truth about the origin of ˇ§Taiwaneseˇ¨.

ˇ@ˇ@Material and methods

ˇ@ˇ@During 1997 and 1998 a total number of 123 families were enrolled for paternity testing at Mackay Memorial Hospital. Among these families, 167 unrelated persons consisting of either the parents of inclusion trios or the mother of exclusion trios, having parents who were either Minnan or Hakka (136 individuals had Minnan parents, 24 had Hakka parents and 7 had one Minnan and one Hakka parent) were studied in this report. Some of these individuals had been included in our previous HLA class I study (1).

ˇ@ˇ@Blood samples were tested for HLA class I antigens by a serological method (standard microlymphocytotoxicity test) using Terasaki Chinese HLA-A, -B and ˇVC 72-well Trays (lot 2, 3, 3A, 3B) and the latest Terasaki Special Monoclonal Tray-Asian HLA class I (lot 3). All samples were also tested with a local typing tray for HLA class I antigens. HLA class II DNA typing was performed by using One Lamda Micro SSP Generic HLA Class II DNA Typing Trays (lot 2, 3) of medium-level resolution.

ˇ@ˇ@Blood samples were also obtained from 75 Hakka individuals. These individuals were unrelated and had parents who were both Hakka. All samples were submitted as panel cells at the 1998 Japanese Red Cross Central Block Histocompatibility Workshop where 420 antisera collected and specified by the participating labs from Japan, Korea, Thailand, US, South Africa and Taiwan, were used to test for HLA-A, -B and ˇVC antigens. HLA class II DNA typing was also performed. All blood samples were collected in ACD tubes and all testing was performed at the Transfusion Medicine Research Laboratory, Mackay Memorial Hospital.

ˇ@ˇ@The HLA-A, B, C, DRB1 and DQB1 gene frequencies of 136 Minnan and 99 Hakka (75 + 24) individuals were estimated by the maximum likelihood method developed for the 11th International Histocompatibility Workshop (11th IHW) (2). The A-Cw-B, A-B-DRB1 and A-Cw-B-DRB1-DQB1 haplotype frequencies of these 167 individuals were calculated by direct counting of the haplotypes of parents (inclusion trios) and mothers (exclusion trios) of these families.

ˇ@ˇ@The genetic distances D (Nei's standard genetic distance, 1972) (3), and DA(Nei, 1983) (4) between populations were calculated by using ˇ§ODENˇ¨ (public domain software) according to the HLA-A, B, C gene frequencies of this report and data reported at the 11th IHW (5). The studied populations are shown in the map of East Asia (Fig. 1). Using genetic distances, phylogenetic trees were constructed by the neighbor-joining (NJ) method (6), the tree using genetic distances D is shown in Fig. 2. The correspondence analysis performed on many populations based on HLA-A, -B and ˇVC gene frequencies (5) by Vista (free software, is shown in Fig. 3. These populations included the Minnan and Hakka of this report and many other populations from data reported at the 11th IHW (5).




ˇ@ˇ@The HLA-A, B, C, DRB1 and DQB1 gene frequencies of 136 Minnan and 99 Hakka are shown in Table 1. There are significant differences in A2 and B27 gene frequencies (P < 0.05) between Minnan and Hakka, most likely due to small samples used here since the frequencies were similar in a large samples study (7). The frequency of Cw8 gene is higher and blank for Cw gene is lower among Hakka, which could be due to 420 antisera were used to type for HLA class I genes for Hakka in HLA workshop in this study. A2, A1101, B60, B46, B58, B13, Cw10, Cw7, Cwl, DRB1*04, DRB1*09, DQB1*06, DQB1*0301 and DQB1*05 showed the high gene frequencies in these two ethnic groups. Analysis of HLA haplotypes for the A-Cw-B, A-B-DR and A-Cw-B-DR-DQ loci of ˇ§Taiwaneseˇ¨ (Minnan and Hakka) demonstrated 126 haplotypes for the three loci A-Cw-B, 212 for the three loci A-B-DR and 247 for five loci A-Cw-B-DR-DQ. Only haplotypes with a haplotype count of six or more for three locus haplotypes (Table 2) and haplotype counts of four or more for five locus haplotypes are shown (Table 3). From Table 2 it can be seen that the five most common haplotypes for the A-Cw-B loci are A33-Cw10-B58 (10.8%) , A2-Cw1-B46 (7.8%), A1101-Cw7-B60 (5.7%), A1101-Cw10-B13 (3.9%) and A1101-Cw1-B46 (3.2%). The five most common haplotypes for the A-B-DR loci are A33-B58_DRB1*03 (6.3%), A2-B46-DRB1*09 (3.0%), A33-B58-DRB1*13 (3.0%), A1101-B13-DRB1*15 (1.8%) and A1101-B75-DRB1*12 (1.8%). The three most common haplotypes for the five loci A-Cw-B-DR-DQ, as shown in Table 3, are A33-Cw10-B58-DRB1*03-DQB1*02 (6.3%), A33-Cw10-B58-DRB1*13-DQB1*06 (3%) and A2-Cw1-B46-DRB1*09-DQB1*0303 (2.7%).

ˇ@ˇ@The phylogenetic tree constructed from D, as shown in Fig. 2, reveals that Minnan and Hakka merge together and cluster with Thai-Chinese and Singapore Chinese. This group also forms a cluster with the neighboring groups of Thais, Vietnamese and Buyi, and another group of Southern Han and Miao, thus forming a southern Asian cluster. Li is a separate entity, while Northern Han, Hui, Man, Mongolians, Buriat, Uygur, Kazakhs, Korean, Japanese and Orochon form a northern Asian cluster. The phylogenetic tree constructed by DA revealed that Minnan and Hakka clustered together with Southern Han, Buyi and Miao, then with Thai-Chinese and Singapore Chinese, followed by Thais and Vietnamese, again forming a southern Asian cluster. Northern Han clustered with Li, and with the northern Asian populations, Hui, Buriat, Uygur, Kazakhs, Man and Mongolians.


ˇ@ˇ@Correspondence analysis of Minnan and Hakka, and many other populations as shown in Fig. 3 also showed that Minnan and Hakka clustered together with other southern Asian populations including southern Han, Singapore Chinese and Thai-Chinese. Northern Han formed a cluster with Koreans as well as Man and Hui populations. Taiwanˇ¦s indigenous groups are clustered together and showed and affinity to Highlanders.


ˇ@ˇ@Because of the high degree of the polymorphism of HLA system, it is a useful genetic marker for the characterization of human populations and analysis of their relationships for anthropological purposes. Differences in the distribution of HLA alleles among various human populations are more marked when compared to other genetic markers. This becomes even more obvious if HLA haplotypes (particular combinations of alleles) are used as an index. It is interesting that many haplotypes were found to have a unique organization of HLA genes that have been well-conserved through thousands of years and also each characteristic haplotype shows a limited regional distribution. Therefore the HLA haplotype is a powerful marker and is useful for surveys among closely related ethnic groups (8). Family studies have been found to be the best method for studying multilocus HLA haplotype distribution in Taiwan, this studies is the first report based on family study in ˇ§Taiwaneseˇ¨ (Minnan and Hakka).

ˇ@ˇ@A33-B58-DRB1*03 was found to be the most common and also the best conserved A-B-DR three-locus haplotype among ˇ§Taiwaneseˇ¨ (6.3%), was exclusively related to the most common five locus haplotype, A33-Cw10-B58-DRB1*03-DQB1*02 (6.3%). Similar to our results, the haplotype A33-B58-DR17 (DRB1*03) was also proven to be the most common haplotype in Taiwan among Minnan (n = 7137, 5.59%) and Hakka (n = 714, 5.10%) by the Tzu Chi Taiwan Marrow Donor Registry as estimated by the maximum likelihood method (7). The three-locus haplotype A33-B58-DRB1*03 has also been found among southeast Asians (Thai-Chinese 7.1%; Singapore Chinese 4.4%) (5) and northern Asians (Han in Urumchi, China 3.5%; Khalha in Mongolia 3.4%; Uygur in Urumchi, China 1.8%; Kazakhs in Urumchi, China 1.8%; Korean 1.2%) (9, 10). The five-locus haplotype has also been found in Thais (Present-day Thais 2.2%; Black Thai 2.8%) (11). It is interesting to note that the HLA-A, B two-locus haplotype A33-B58 among southern Han (n = 138, samples collected in Human and Fuchien) was reported to be only 2.4%, with no significant A-B-DR three-locus haplotype frequency at the 11 IHW (5), while in an earlier report the A33-B17 haplotype frequency among southern Han (n = 844) was 1.53%, and for southern minorities (n = 621) was 1.29% (12). In another report, the haplotype A33-Cw10-B58 was found among northern Han (n = 196, 3.1%), however, the most frequently associated DR gene with this haplotype in northeast Asians is more likely DR13 than DR3 (13), and the distribution of haplotype B58-DR3 was found clearly focused in southeast Asian population (8). Therefore, although three- to five-locus haplotype of A33-Cw10-B58-DRB1*03-DQB1*02 has been found to be distributed among all east and southeast Asians, however, the highest frequencies have been found in ˇ§Taiwaneseˇ¨, Singapore Chinese and Thai-Chinese, and these three populations are presumed to be the descendants on early settlers from the southeast coast of China. Therefore, this haplotype is probably the most well-conserved haplotype among these three populations.


ˇ@ˇ@The second most common three locus haplotype among ˇ§Taiwaneseˇ¨ in this study A2-B46-DRB1*09 (3.0%) has also been found in Singapore Chinese (7.7%); Vietnamese (5.2%); southern Han (5.1%); Thais (4.7%) and Thai-Chinese (2.4%) (5). The corresponding five-locus haplotype A2-Cw1-B46-DRB1*09-DQB1*0303 (2.7%) has also been found in Black Thai (8.3%), Dai Lui (Thais, 5.1%), and present-day Thais (2.1%) (11). This haplotype is thus characteristic for southern Asians (8).

ˇ@ˇ@It is well known from the results of HLA studies that there are genetic differences between southern and northern Han Chinese (7, 8, 14, 15). This corresponds well with the prehistory and history of China. The Central Plains (Chung Yuan) culture developed in the loess region of the Hunag Ho (Yellow River) basin in the north of China. And was dated to sixth or even seventh millennium BC from excavated millets (16). This culture went through the Shia civilization, followed by the Shang and Chou Dynasties and was limited to the north (north of Yangtze River). Only in the Chin Dynasty (221 ˇV 206 B.C.), there was political control over the central region of the south gained (17). Recent new archeological discoveries showed that in the south of China there was an independent Yueh coastal culture (from the Yangtze delta to the Hong river delta in North Vietnam) in existence almost at the same period of time (18). However, the history of the south only began around 500 B.C. just before the battles between two countries of Yueh and Wu (17). There is a well known story about the King of the Yueh who slept on firewood for mattress and had gall hung over his bed to remind him of the bitterness of his defeat by Wu which finally enabled him to win the battle, and resulted in national recovery. The Yueh people lived along southeast coast of China (Chechiang, Fuchien, Kwangton and Kwangsi), and were named the Hundred (Pai) Yueh before the Han Dynasty (206 B.C ˇV 220 A.D.) due to the great diversity of local cultures. Not very much Yueh history has been available from Chinese history (of the Central Plains) apart from battles between Yueh and Wu during the Spring and Autumn Warring States Period, and the dispersion of part of the Yueh peoples to the north during the Han Dynasty, since in Chinese history populations other than those of the Central Plains have considered as ˇ§barbariansˇ¨ (17, 19). The Minnan (Min) were one of the ethnic groups among the Yueh who lived in Fuchien, and according to Lin (17) and Meacham (19) the present-day Minnan are descendants of indigenous Minnan peoples although probably limited gene flow from the northern Han occurred during the Chin and Han Dynasties and also during the following several centuries, due to invasions of nomads from Mongolia in the north which caused internal migration from north to south (17). As the barbarian status of the Yueh gradually disappeared and they were finally given Han status in history, thus probably resulting in misinterpretation and erroneous self-assertion of present-day Minnan as ˇ§pureˇ¨ descendants of the northern Han. In Chinese history, many ethnic minorities adopted Han culture, and many peoples from these ethnic groups often announced that they were Han, most likely because the Han culture was more dominant at that time and so being a member of a Han ethnic group was both beneficial and a source of pride in the past (17, 19, 20). This also imply to Hakka, as during South Song Dynasty (1127 ˇV 1279 A.D.) or even earlier there were limited immigrants from Central Plains to southeast region of China. These peoples with their dominant Han culture influenced culturally and linguistically the indigenous Yueh peoples especially those who live in Kwangton. Subsequently few initial immigrants and the vast majority of indigenous Yueh peoples forming the Hakka ethnic group.


ˇ@ˇ@Two phylogenetic trees constructed by calculation of the genetic distances D and DA showed a slight difference in the order of clustering among southern Asians. The D tree appears to better fit into the history of ˇ§Taiwaneseˇ¨, Thai-Chinese and Singapore Chinese, since they originated from the same area of the southeast coast of China but migrated to different places during the last few centuries. However, the trees constructed both by D and DA separated the southern Asian populations, including ˇ§Taiwaneseˇ¨, from northern Han. Correspondence analysis of ˇ§Taiwaneseˇ¨ and many other populations also showed that there was a clear separation between the ˇ§Taiwaneseˇ¨ and northern Han. A phylogenetic tree constructed by HLA gene frequencies on marrow donors also showed thatˇ¨ Taiwaneseˇ¨ (Minnan and Hakka) clustering together with the southern Han are separated from the northern Han (8). Many other studies including HLA as well as other genetic markers such as immunoglobulins (22), blood groups (23), glucose-6-phosphate dehydrogenase (24) and microsatellites (25) also clearly separate the southern Han from the northern Han. A study on Chinese surnames and genetic differences between north and south China also separated southern Han from the northern Han (20). In that study, the southern Han cluster can be further subdivide into three subclusters: the lower Yangtze river centering around Shanghai; most of the Yangtze river basin; and the southeast coastal areas and islands off the coast (including Taiwan). Chinese surnames have a history of probably more than 4,000 years and are transmitted via the male line and can thus be considered as a Y-chromosome gene. The finding of southeast coastal subdivisions of the southern Han corresponds well with the history and populations of the Yueh. Also in that study, the map of the first two principal coordinates calculated by the gene frequencies for ABO, MN, and Rh(D) of 28 provinces of China plus Taiwan, it was revealed that there was a clear split between the populations of Taiwan and the southeast coastal provinces from all other parts of China, suggesting that ˇ§Taiwaneseˇ¨ and its originating southeast coastal populations (Yueh peoples) might be genetically distinct from the others.

ˇ@ˇ@These genetic data indicate that the southern Han are basically of southern origin and remain genetically distinct from the northern Han. ˇ§Taiwaneseˇ¨ who are descendants of the ancient Yueh peoples preserve the ancestral HLA haplotype A33-Cw10-B58-DRB1*03-DQB1*02 of the Yueh. The genetic distance between ˇ§Taiwaneseˇ¨ and southern Han warrant further study because half of the population samples contributing to the HLA data of southern Han used in this study were from Fuchien (5).


1. Lin M, Chu LL, Lee HL et al. Heterogeneity of Taiwan's indigenous population: possible relation to prehistoric Mongoloid dispersals. Tissue Antigens 2000: 55: 1 - 9.

2. Imanishi T, Akaza T, Kimura A et al. Allele and haplotype frequencies for HLA and complement loci in various ethnic groups. In: Tsuji K, Aizawa M, Sasazuki T, eds. HLA 1991. Proceedings of the 11th International Histocompatibility Workshop and Conference. Vol 1. Oxford: Oxford University Press, 1992: 1067 - 74, 1141 - 9.

3. Nei M. Genetic variation within species. In: Molecular evolutionary genetics. New York: Columbia University Press, 1987: 220 - 1.

4. Nei M. Tajima F, Tateno Y. Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data. J Mol Evol 1983: 19: 153 - 70.

5. Imanishi T, Akaza T, Kimura A et al. Estimation of allele and haplotype frequencies for HLA and complement loci. In: Tsuji K, Aizawa M, Sasazuki T, eds. HLA 1991. Proceedings of the 11th International Histocompatibility Workshop and Conference. Vol 1. Oxford: Oxford University Press, 1992: 76 - 9.

6. Saitou N, Nei M. The neighbor joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 1987: 4: 406 - 25.

7. Shaw CK, Chen LL, Lee A, Lee TD. Distribution of HLA gene and haplotype frequencies in Taiwan: a comparative study among Min-nan, Hakka, aborigines and mainland Chinese.Tissue Antigens 1999: 53: 51 - 64.

8. Tokunaga K, Imanishi T, Takahashi K, Juji T. On the origin and dispersal of East Asian populations as viewd from HLA haplotypes. In: Akazawa T, Szathmary EJ, eds. Prehistoric Mongoloid Dispersals. Oxford: Oxford University Press, 1996: 187 - 97.

9. Tanaka H, Tokunaga K, Inoko H et al. Distribution of HLA-A, B, and DRB1 alleles and haplotypes in Northeast Asia. In: Charron D, ed. Proceedings of the 12th International Histocompatibility Workshop and Conference. Vol 1. Paris: EDK, 1997: 285 - 91.

10. Park MH, Hwang YS, Park KS et al. HLA haplotypes in Koreans based on 107 families. Tissue Antigens 1998: 51: 347 - 55.

11. Chandanayingyong D, Bejrachandra S, Kunachiwa W et al. HLA in the Thai population. Proceedings of 10th Regional Congress of the International Society of Blood Transfusion Western Pacific Region. Taipei, 1999: 102 - 9.

12. Lee TD, Zhao TM, Mickey K et al. The polymorphism of HLA antigens in the Chinese. Tissue Antigens 1988: 12: 188 - 208.

13. Inoue T, Ogawa A, Tokunaga K et al. Diversity of HLA-B17 alleles and haplotypes in East Asians and a novel Cw6 allele (Cw*0604) associated with B*5701.Tissue Antigens 1999: 53: 534 - 44.

14. Chen RB, Zhao TM, Ye YG et al. Joint reprot on Mainland Chinese HLA polymorphism. In: Aizawa M, ed, HLA in Asia-Oceania: Proceedings of the 3rd Asia-Oceania Histocompatibility Workshop and Conference. 1986: 224 - 30.

15. Chen RB, Ye GY, Geng ZC et al. HLA polymorphism of the principal minority nationalities in mainland China. In: Tsuji K, Aizawa M, Sasazuki T, eds. HLA 1991. Proceedings of the 11th International Histocompatibility Workshop Conference. Vol 1. Oxford: Oxford University Press, 1992: 676 - 9.

16. An Chih-min. Pei-li-kang, Tzu-shan, ho Yang-shao-shih lun Chung-Yuan hsin shih-Chi wen-hua ti Yuan-Yuan Chi fa-chan. Kao-ku 1979: 4: 335 - 46.

17. Lin Hui-Shiang. Chuon Kuo ming Chu shi. Taipei: Taiwan Commercial Press, 1936.

18. Chang KC. The archaeology of ancient China. 3rd edition New Haven: Yale University Press, 1977.

19. Meacham W. Origins and development of the Yueh coastal Neolithic: A microcosm of culture change of the mainland of East Asia. In: Keightly DK, ed. The origins of Chinese civilization. Berkeley, CA: University of California Press, 1981.

20. Du R, Yuan Y, Hwang J et al. Chinese surnames and the genetic differences between north and south China. Monograph series 5. J Chinese Linguistics 1992.

21. Wu Song-Ti. Hakka Nan Song yuen liou suoh. Soh Hwei Ko Shueh Pang. Fu Tan Shueh Pau 1995, No. 5.

22. Matsumoto H. Characteristics of Mongoloid and neighboring populations based on the genetic markers of human immunoglobulins. Hum Genet 1988: 80: 207 - 18.

23. Lin M, Broadberry RE. Immunohematology in Taiwan Transfu Med Rev 1998: 12: 56 - 72.

24. Chu JY. Glucose-6-phosphate dehydrogenase kang Taiwan chu chung te shue yen. Taiwan Med J 1999: 42: 252 - 6.

25. Chu JY. Huang W, Kuang SQ et al. Genetic relationship of populations in China. Proc Natl Acad Sci U S A 1998: 95: 11763 - 8.