The Prague Post - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.282476
AFN 76.952508
ALL 96.861296
AMD 446.302407
ANG 2.087028
AOA 1069.175293
ARS 1718.033975
AUD 1.770184
AWG 2.101624
AZN 2.007095
BAM 1.957218
BBD 2.350002
BDT 142.693347
BGN 1.954411
BHD 0.439541
BIF 3440.491812
BMD 1.165949
BND 1.511007
BOB 8.061908
BRL 6.24529
BSD 1.166745
BTN 102.926546
BWP 16.587013
BYN 3.976414
BYR 22852.609237
BZD 2.3466
CAD 1.625975
CDF 2971.426289
CHF 0.924831
CLF 0.028013
CLP 1098.953808
CNY 8.277366
CNH 8.272266
COP 4534.319117
CRC 584.923637
CUC 1.165949
CUP 30.89766
CVE 110.619446
CZK 24.313547
DJF 207.213018
DKK 7.468384
DOP 74.912524
DZD 151.470792
EGP 55.234987
ERN 17.489242
ETB 176.411386
FJD 2.635571
FKP 0.874785
GBP 0.878281
GEL 3.171244
GGP 0.874785
GHS 12.679726
GIP 0.874785
GMD 85.114576
GNF 10118.10924
GTQ 8.937473
GYD 244.106797
HKD 9.058436
HNL 30.746552
HRK 7.532502
HTG 152.730617
HUF 388.082807
IDR 19335.114636
ILS 3.79784
IMP 0.874785
INR 102.911654
IQD 1527.393781
IRR 49057.322616
ISK 143.400081
JEP 0.874785
JMD 187.047075
JOD 0.826668
JPY 177.431274
KES 150.638741
KGS 101.961839
KHR 4692.946387
KMF 492.031096
KPW 1049.332065
KRW 1669.727033
KWD 0.357608
KYD 0.972304
KZT 622.050526
LAK 25301.103197
LBP 104410.773001
LKR 355.047147
LRD 213.893566
LSL 20.100609
LTL 3.442745
LVL 0.705271
LYD 6.348588
MAD 10.757089
MDL 19.823357
MGA 5270.091041
MKD 61.61678
MMK 2447.760677
MNT 4184.077953
MOP 9.337863
MRU 46.713742
MUR 52.96922
MVR 17.850427
MWK 2024.667365
MXN 21.475308
MYR 4.89407
MZN 74.516073
NAD 20.100781
NGN 1698.951371
NIO 42.848413
NOK 11.630445
NPR 164.682273
NZD 2.015769
OMR 0.44832
PAB 1.166745
PEN 3.949117
PGK 4.890284
PHP 68.865061
PKR 327.631833
PLN 4.231055
PYG 8277.06574
QAR 4.245513
RON 5.083653
RSD 117.215362
RUB 92.39952
RWF 1690.626704
SAR 4.372423
SBD 9.596454
SCR 16.617165
SDG 701.313664
SEK 10.912412
SGD 1.508558
SHP 0.874764
SLE 27.048095
SLL 24449.376461
SOS 666.337745
SRD 46.111555
STD 24132.79959
STN 24.951318
SVC 10.209482
SYP 12891.699361
SZL 20.101118
THB 37.718329
TJS 10.775004
TMT 4.092483
TND 3.395235
TOP 2.730774
TRY 48.92382
TTD 7.919736
TWD 35.633045
TZS 2868.235554
UAH 49.086773
UGX 4056.938279
USD 1.165949
UYU 46.584139
UZS 14064.267442
VES 251.965116
VND 30697.117143
VUV 142.192976
WST 3.262327
XAF 656.432428
XAG 0.024787
XAU 0.000295
XCD 3.151036
XCG 2.102823
XDR 0.817303
XOF 656.429686
XPF 119.331742
YER 278.253833
ZAR 19.989273
ZMK 10494.947424
ZMW 25.610549
ZWL 375.435247
  • CMSC

    -0.0030

    24.312

    -0.01%

  • RIO

    1.2160

    72.146

    +1.69%

  • SCS

    -0.0250

    16.605

    -0.15%

  • BTI

    0.2850

    52.375

    +0.54%

  • CMSD

    -0.0400

    24.61

    -0.16%

  • BCC

    -0.7600

    72.26

    -1.05%

  • RBGPF

    0.0000

    79.09

    0%

  • RYCEF

    0.5100

    15.46

    +3.3%

  • AZN

    -1.7300

    82.33

    -2.1%

  • JRI

    -0.0400

    14.04

    -0.28%

  • NGG

    -0.3300

    76.84

    -0.43%

  • GSK

    -0.6500

    43.15

    -1.51%

  • BCE

    0.1850

    23.665

    +0.78%

  • BP

    -0.2550

    34.515

    -0.74%

  • RELX

    -0.2100

    46.43

    -0.45%

  • VOD

    0.3650

    12.265

    +2.98%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

W.Cejka--TPP