Skip to content
  • KOSPI 2712.14 -32.91 -1.20%
  • KOSDAQ 870.15 -2.27 -0.26%
  • KOSPI200 368.83 -5.26 -1.41%
  • USD/KRW 1370 +4 +0.29%
  • JPY100/KRW 879.7 +1.95 +0.22%
  • EUR/KRW 1471.38 +3.61 +0.25%
  • CNH/KRW 189.54 +0.57 +0.3%
View Market Snapshot
Artificial intelligence

NCSOFT unveils AI dataset to rival hyperscale language models

FoCus Dataset is the first of its kind, utilizing both user personas and outside knowledge 

By Apr 14, 2022 (Gmt+09:00)

2 Min read

NCSoft building in Pangyo, near South Korea's capital Seoul (Courtesy of NCSOFT Corp.)
NCSoft building in Pangyo, near South Korea's capital Seoul (Courtesy of NCSOFT Corp.)

NCSOFT Corp. unveiled an artificial intelligence (AI) conversation dataset it developed with Korea University’s research center on Thursday.

The South Korean game developer and publisher headquartered in Pangyo city is positioning the latest development as the much-awaited rival to the hyperscale language models dominating the natural language processing (NLP) field. 

Lim Hui-seok, a professor of computer science and engineering at the university, led the research. Lim also heads the academic institute’s NLP and AI research center. 

The collection of data is named FoCus Dataset, a short form of For Customized Conversation Dataset. 

The research team says it is the first such dataset that encompasses both user persona and outside knowledge. As it stands, it is comprised of more than 15,000 conversations on some 8,000 subjects. 

An AI that is equipped with the FoCus Dataset will be able to comprehend the experience and preferences of the person with whom it is having a conversation. Not only that, it will be able to source and learn the latest information available on Wikipedia in real-time. 

The collection and utilization of language data for AI adaptation falls in the NLP category. The goal of the machine learning technology is to program computers to process and analyze large amounts of the language spoken by humans for seamless communication between machines and people. 

In this process, a persona refers to a profile that represents large segments of data since it is easier to test a given strategy against an average of different individuals, i.e. a persona, as opposed to thousands of individuals. 

What sets FoCus Dataset apart from other data collections is that it can enable sophisticated conversations without the help of hyperscale language models.

Even though typical large-scale language models take a long time to learn and deduct meaning from, they still hit a bottleneck when it comes to inferring real-time data and reflecting personal experiences.

In late February, NCSOFT and Korea University jointly published a paper on the dataset at the AAAI 2022 conference. Founded in 1979, the Association for the Advancement of Artificial Intelligence is one of the highest-regarded scientific societies in the AI community. 

Come this October, the two entities will host the first workshop on the customized chat technology at COLING 2022, an international conference on computational linguistics. 

“Recently in the NLP academic circle, the need for alternative conversation technologies that will rival hyperscale language models has risen – for financial and environmental reasons,” Lee Yeon-soo, director of NCSOFT’s Language AI Lab said. 

The lead scientist at NCSOFT elaborated that he hopes the dataset will spark vibrant conversation and technological development within the NLP sector. 

NCSOFT is best known for the distribution of massively multiplayer online role-playing games (MMORPGs) such as Lineage and Guild Wars. In recent years, it has been expanding its foothold in other tech sectors. 

Write to Jee Abbey Lee at jal@hankyung.com
More to Read
Comment 0
0/300