The Prague Post - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.292558
AFN 79.635726
ALL 97.056979
AMD 447.372947
ANG 2.091968
AOA 1071.825075
ARS 1664.391269
AUD 1.77011
AWG 2.10391
AZN 1.985577
BAM 1.955941
BBD 2.353269
BDT 142.190224
BGN 1.956625
BHD 0.440684
BIF 3486.750716
BMD 1.168839
BND 1.501608
BOB 8.073581
BRL 6.332187
BSD 1.168384
BTN 103.308103
BWP 15.656858
BYN 3.955184
BYR 22909.245299
BZD 2.349869
CAD 1.62257
CDF 3361.580874
CHF 0.934487
CLF 0.028708
CLP 1126.047447
CNY 8.323595
CNH 8.327207
COP 4586.185453
CRC 588.93479
CUC 1.168839
CUP 30.974235
CVE 110.272929
CZK 24.388412
DJF 208.064961
DKK 7.46413
DOP 74.465354
DZD 151.88365
EGP 56.377642
ERN 17.532586
ETB 167.761863
FJD 2.626089
FKP 0.862839
GBP 0.864801
GEL 3.143933
GGP 0.862839
GHS 14.254025
GIP 0.862839
GMD 84.15638
GNF 10133.72867
GTQ 8.950644
GYD 244.447577
HKD 9.105758
HNL 30.606201
HRK 7.536321
HTG 153.001002
HUF 392.784884
IDR 19267.493484
ILS 3.904466
IMP 0.862839
INR 103.361634
IQD 1530.610059
IRR 49196.435056
ISK 142.808983
JEP 0.862839
JMD 187.073452
JOD 0.828768
JPY 172.895252
KES 151.188705
KGS 102.21484
KHR 4683.336757
KMF 491.499784
KPW 1051.943986
KRW 1628.017507
KWD 0.357151
KYD 0.973653
KZT 629.905294
LAK 25334.821711
LBP 104629.923458
LKR 352.625356
LRD 214.405417
LSL 20.505974
LTL 3.451278
LVL 0.707019
LYD 6.322455
MAD 10.550059
MDL 19.413064
MGA 5200.373935
MKD 61.544425
MMK 2454.077343
MNT 4203.904032
MOP 9.374333
MRU 46.431339
MUR 53.252296
MVR 18.01184
MWK 2026.045684
MXN 21.779972
MYR 4.934789
MZN 74.700734
NAD 20.505974
NGN 1759.816007
NIO 42.993091
NOK 11.611697
NPR 165.294886
NZD 1.971539
OMR 0.449408
PAB 1.168384
PEN 4.065692
PGK 4.952356
PHP 66.823701
PKR 331.655248
PLN 4.265643
PYG 8369.60182
QAR 4.258797
RON 5.071359
RSD 117.197449
RUB 99.118795
RWF 1693.021737
SAR 4.385065
SBD 9.612326
SCR 16.612824
SDG 701.903664
SEK 10.949706
SGD 1.501725
SHP 0.918524
SLE 27.321646
SLL 24509.968
SOS 667.748015
SRD 46.021914
STD 24192.608373
STN 24.501762
SVC 10.223735
SYP 15197.074173
SZL 20.496474
THB 37.196548
TJS 11.082197
TMT 4.102625
TND 3.409945
TOP 2.737539
TRY 48.266706
TTD 7.935469
TWD 35.467836
TZS 2881.188287
UAH 48.292272
UGX 4101.294905
USD 1.168839
UYU 46.763363
UZS 14442.038461
VES 182.547301
VND 30860.272908
VUV 139.200961
WST 3.174457
XAF 656.00417
XAG 0.028475
XAU 0.000323
XCD 3.158846
XCG 2.105751
XDR 0.815454
XOF 656.00417
XPF 119.331742
YER 280.05087
ZAR 20.504007
ZMK 10520.949275
ZMW 27.837002
ZWL 376.365696
  • CMSC

    0.1050

    24.405

    +0.43%

  • BCC

    2.8400

    88.71

    +3.2%

  • NGG

    -0.1300

    70.55

    -0.18%

  • SCS

    0.2100

    16.93

    +1.24%

  • RIO

    0.2100

    62.31

    +0.34%

  • JRI

    0.1010

    14.121

    +0.72%

  • GSK

    0.9150

    41.415

    +2.21%

  • AZN

    0.4150

    81.225

    +0.51%

  • BCE

    0.1310

    24.271

    +0.54%

  • CMSD

    0.0800

    24.42

    +0.33%

  • RYCEF

    0.2900

    15.16

    +1.91%

  • RBGPF

    0.0000

    77.27

    0%

  • VOD

    0.1650

    11.815

    +1.4%

  • BTI

    0.7300

    56.99

    +1.28%

  • RELX

    0.8300

    45.96

    +1.81%

  • BP

    -0.2000

    34.56

    -0.58%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

W.Cejka--TPP