The Prague Post - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.309944
AFN 74.510722
ALL 95.57072
AMD 435.060622
ANG 2.100187
AOA 1077.148486
ARS 1633.548543
AUD 1.629831
AWG 2.112056
AZN 1.993211
BAM 1.959164
BBD 2.36379
BDT 144.001343
BGN 1.957292
BHD 0.443092
BIF 3490.759371
BMD 1.173364
BND 1.497091
BOB 8.10951
BRL 5.833496
BSD 1.17363
BTN 111.330948
BWP 15.949538
BYN 3.311854
BYR 22997.94409
BZD 2.360374
CAD 1.592936
CDF 2722.205195
CHF 0.916978
CLF 0.026861
CLP 1057.166507
CNY 8.011909
CNH 8.014379
COP 4290.559811
CRC 533.570631
CUC 1.173364
CUP 31.094159
CVE 110.883606
CZK 24.384973
DJF 208.530081
DKK 7.47283
DOP 69.707775
DZD 155.375718
EGP 62.901366
ERN 17.600467
ETB 184.218309
FJD 2.572133
FKP 0.86981
GBP 0.862364
GEL 3.150465
GGP 0.86981
GHS 13.135802
GIP 0.86981
GMD 86.261344
GNF 10299.208702
GTQ 8.966273
GYD 245.529324
HKD 9.191257
HNL 31.235267
HRK 7.535234
HTG 153.739671
HUF 364.309138
IDR 20299.205753
ILS 3.464324
IMP 0.86981
INR 111.346365
IQD 1537.107488
IRR 1542974.31
ISK 143.79583
JEP 0.86981
JMD 183.895722
JOD 0.8319
JPY 184.517474
KES 151.574808
KGS 102.576112
KHR 4708.118921
KMF 492.81274
KPW 1055.852847
KRW 1729.938621
KWD 0.360563
KYD 0.97805
KZT 543.605835
LAK 25787.610236
LBP 105074.790218
LKR 375.090738
LRD 215.722741
LSL 19.548368
LTL 3.46464
LVL 0.709756
LYD 7.456676
MAD 10.834841
MDL 20.221182
MGA 4875.329696
MKD 61.641296
MMK 2463.692897
MNT 4198.415212
MOP 9.470045
MRU 46.922496
MUR 55.195536
MVR 18.134354
MWK 2043.41337
MXN 20.49469
MYR 4.657838
MZN 74.983831
NAD 19.547992
NGN 1613.012025
NIO 43.08605
NOK 10.87415
NPR 178.120952
NZD 1.986923
OMR 0.451158
PAB 1.1736
PEN 4.11581
PGK 5.092197
PHP 71.885593
PKR 327.075207
PLN 4.2554
PYG 7218.099854
QAR 4.275447
RON 5.201286
RSD 117.426757
RUB 87.924811
RWF 1715.458891
SAR 4.400387
SBD 9.443922
SCR 17.160502
SDG 704.60387
SEK 10.83886
SGD 1.493696
SHP 0.876035
SLE 28.894132
SLL 24604.862266
SOS 670.573522
SRD 43.951834
STD 24286.27602
STN 24.875327
SVC 10.269638
SYP 129.825834
SZL 19.547986
THB 38.158058
TJS 11.008297
TMT 4.112643
TND 3.381583
TOP 2.82518
TRY 52.975292
TTD 7.966424
TWD 37.016146
TZS 3056.614692
UAH 51.56859
UGX 4413.009001
USD 1.173364
UYU 46.804945
UZS 14007.043283
VES 569.771431
VND 30925.194614
VUV 139.051108
WST 3.182386
XAF 657.132804
XAG 0.015821
XAU 0.000254
XCD 3.171076
XCG 2.115166
XDR 0.818678
XOF 657.66984
XPF 119.331742
YER 280.024184
ZAR 19.564801
ZMK 10561.688152
ZMW 21.917216
ZWL 377.822888
  • RBGPF

    -1.1500

    62.6

    -1.84%

  • CMSC

    0.0000

    22.82

    0%

  • CMSD

    0.0700

    23.13

    +0.3%

  • RIO

    3.9900

    100.48

    +3.97%

  • GSK

    0.9100

    52.31

    +1.74%

  • BCE

    0.5200

    23.78

    +2.19%

  • AZN

    2.1700

    187.37

    +1.16%

  • NGG

    3.5600

    89.54

    +3.98%

  • RELX

    0.7900

    36.59

    +2.16%

  • RYCEF

    0.9000

    15.8

    +5.7%

  • BTI

    1.3500

    58.8

    +2.3%

  • BCC

    0.2700

    79.27

    +0.34%

  • BP

    0.5800

    47.38

    +1.22%

  • VOD

    0.4600

    15.8

    +2.91%

  • JRI

    0.2500

    12.99

    +1.92%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

W.Cejka--TPP