What is TibetanQA?
Tibetan Machine Reading Comprehension Dataset (TibetanQA) is a Tibetan reading comprehension datset, which contains 20,000 question answer pairs and 1,513 articles.
Multi-domain resources: The articles are from the Tibetan Encyclopedia website and cover 12 topics, including nature, culture, education and so on.
High quality: A complete and strict workflow is adopted to passage collection, question construction and answer verification to ensure the high quality of the data.
We invited the native Tibetan students to construct data sets using a program dedicated to collect question and answer pairs. We trained them first. Only the students whose accuracy rate reaches more than 90% participated in the construction of data sets. We encouraged them to construct various forms of questions. In addition, we invited another group to verify the answers.We discarded incomplete and grammatically incorrect question and answer pairs, and finally verified 20000 question and answer pairs. We divided the problem into four categories: word matching, synonym replacement, multi sentence reasoning and fuzzy problem. These four kinds of problems require the reasoning ability of the machine to be improved in turn.
Rank Model EM F1 Human Performance 86.831 89.452
Minzu University of China