Artificial intelligence

NCSOFT unveils AI dataset to rival hyperscale language models

FoCus Dataset is the first of its kind, utilizing both user personas and outside knowledge

By Apr 14, 2022 (Gmt+09:00)

2 Min read

jal@hankyung.com

Most Read

LG Chem to sell water filter business to Glenwood PE for $692 million

Kyobo Life poised to buy Japan’s SBI Group-owned savings bank

KT&G eyes overseas M&A after rejecting activist fund's offer

StockX in merger talks with Naver’s online reseller Kream

Mirae Asset to be named Korea Post’s core real estate fund operator

NCSoft　building　in　Pangyo,　near　South　Korea's　capital　Seoul　(Courtesy　of　NCSOFT　Corp.)

NCSOFT Corp. unveiled an artificial intelligence (AI) conversation dataset it developed with Korea University’s research center on Thursday.

The South Korean game developer and publisher headquartered in Pangyo city is positioning the latest development as the much-awaited rival to the hyperscale language models dominating the natural language processing (NLP) field.

Lim Hui-seok, a professor of computer science and engineering at the university, led the research. Lim also heads the academic institute’s NLP and AI research center.

The collection of data is named FoCus Dataset, a short form of For Customized Conversation Dataset.

The research team says it is the first such dataset that encompasses both user persona and outside knowledge. As it stands, it is comprised of more than 15,000 conversations on some 8,000 subjects.

An AI that is equipped with the FoCus Dataset will be able to comprehend the experience and preferences of the person with whom it is having a conversation. Not only that, it will be able to source and learn the latest information available on Wikipedia in real-time.

The collection and utilization of language data for AI adaptation falls in the NLP category. The goal of the machine learning technology is to program computers to process and analyze large amounts of the language spoken by humans for seamless communication between machines and people.

In this process, a persona refers to a profile that represents large segments of data since it is easier to test a given strategy against an average of different individuals, i.e. a persona, as opposed to thousands of individuals.

What sets FoCus Dataset apart from other data collections is that it can enable sophisticated conversations without the help of hyperscale language models.

Even though typical large-scale language models take a long time to learn and deduct meaning from, they still hit a bottleneck when it comes to inferring real-time data and reflecting personal experiences.

In late February, NCSOFT and Korea University jointly published a paper on the dataset at the AAAI 2022 conference. Founded in 1979, the Association for the Advancement of Artificial Intelligence is one of the highest-regarded scientific societies in the AI community.

Come this October, the two entities will host the first workshop on the customized chat technology at COLING 2022, an international conference on computational linguistics.

“Recently in the NLP academic circle, the need for alternative conversation technologies that will rival hyperscale language models has risen – for financial and environmental reasons,” Lee Yeon-soo, director of NCSOFT’s Language AI Lab said.

The lead scientist at NCSOFT elaborated that he hopes the dataset will spark vibrant conversation and technological development within the NLP sector.

NCSOFT is best known for the distribution of massively multiplayer online role-playing games (MMORPGs) such as Lineage and Guild Wars. In recent years, it has been expanding its foothold in other tech sectors.

Write to Jee Abbey Lee at jal@hankyung.com

NCSOFT unveils AI dataset to rival hyperscale language models

FoCus Dataset is the first of its kind, utilizing both user personas and outside knowledge

Cookies on KED Global

Currency Converter

NCSOFT unveils AI dataset to rival hyperscale language models

FoCus Dataset is the first of its kind, utilizing both user personas and outside knowledge

Cookies on KED Global

Fill in the information to subscribe to our newsletter and you can also getunlimited access to the latest intelligence on Korean asset owners.

Fill in the information to download the full story ofHidden Champions and Next Unicorns.

Currency Converter

Fill in the information to subscribe to our newsletter and you can also get
unlimited access to the latest intelligence on Korean asset owners.

Fill in the information to download the full story of
Hidden Champions and Next Unicorns.