Singaporean writers object to IMDA using works to train a large language model

Apr 09, 2024 05:58 pm

The writing community is objecting to the Infocomm Media Development Authority’s (IMDA) plans to build a South-east Asia-focused large language model (LLM).

The National Multimodal LLM Programme (NMLP), “a base model with regional context that can understand Singapore’s and the region’s unique linguistic characteristics and multilingual environment”, was announced in December 2023.

But Singapore writers whose works would have to be used to train the LLM recently voiced their displeasure about the project.

An LLM is an artificial intelligence (AI) that can understand and generate text responses after being trained on a data set of written materials. The issue of writers’ works being illegally copied for data used to train LLMs such as Llama from Meta and OpenAI’s ChatGPT has triggered lawsuits in the United States.

The IMDA sent out a survey on March 28 through Sing Lit Station (SLS) to gather writers’ responses on using their work to train NMPL. The April 7 deadline was extended to April 15, and the form will remain online indefinitely “to gather a full range of views”, with subsequent responses to be shared with IMDA on a rolling basis.

But writers are frustrated at the short timeframe and lack of clarity about usage and payment terms.

Daniel Radcliffe has found himself at odds with J.K. Rowling on the thorny issue of gender identity.

Movies

Radcliffe breaks silence on feud with Rowling

May 02, 2024

Ethos publisher Ng Kah Gay told The Straits Times: “If implemented without due consideration and safeguards, AI software will start assimilating material that would otherwise be copyrighted, and adversely impact the livelihood of existing authors and publishers.”

New York City-based transnational literary organisation Singapore Unbound raised concerns in an Instagram post on April 9, saying: “The survey does not state anywhere that IMDA recognises that the writings it is seeking are the intellectual property of the authors, or of the publishers to which the authors have sold the rights to their work.”

Boston-based Singaporean author Ally Chua told ST that while she appreciates the preliminary survey to engage authors, “the gist of the survey was entirely centred on sharing work to train LLMs – as if it was a foregone conclusion that usage of such written material is a go-ahead, and the survey and further discussion are just a matter of negotiation”.

“Our written works are not just blank ‘data’ that one can feed into a computer. This topic (using copyrighted works to train AI) is a contentious one across the world, and more conversation about what this means for intellectual property, commodification of unique Asean works, et cetera, should be done before jumping straight to ‘what can convince you to share your written materials with us?’”

A spokesperson for SLS told ST that it was in the preliminary stages of discussion with IMDA. “Our discussion focused on exploring the merits of including Singapore literary works in regional language research, and considering ways for IMDA to work with interested partners while protecting their rights. For now, we are still collating responses from the community.”

Mr Ng said he has asked IMDA to consult different stakeholders such as writers, translators, editors and publishers. “Without such consultation, we will not be able to forecast the impact of such LLM training on the practice and income of these stakeholders.”

Get The New Paper on your phone with the free TNP app. Download from the Apple App Store or Google Play Store now

AUTHORS AI/ARTIFICIAL INTELLIGENCE IMDA/INFOCOMM MEDIA DEVELOPMENT AUTHORITY