The Prague Post - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.262562
AFN 73.710324
ALL 95.813323
AMD 438.049481
ANG 2.077291
AOA 1064.335865
ARS 1624.353348
AUD 1.630432
AWG 2.089209
AZN 1.977798
BAM 1.951994
BBD 2.339599
BDT 142.286248
BGN 1.912376
BHD 0.438157
BIF 3264.389777
BMD 1.160672
BND 1.477258
BOB 8.026661
BRL 5.99406
BSD 1.161665
BTN 106.655637
BWP 15.523268
BYN 3.411736
BYR 22749.169649
BZD 2.336255
CAD 1.576651
CDF 2524.461792
CHF 0.903989
CLF 0.026138
CLP 1032.058263
CNY 7.981995
CNH 7.982404
COP 4307.044577
CRC 548.544625
CUC 1.160672
CUP 30.757806
CVE 110.437941
CZK 24.396985
DJF 206.275086
DKK 7.471385
DOP 70.394758
DZD 152.665271
EGP 60.343639
ERN 17.410079
ETB 181.703183
FJD 2.554929
FKP 0.866462
GBP 0.865106
GEL 3.157283
GGP 0.866462
GHS 12.593421
GIP 0.866462
GMD 84.729203
GNF 10187.804558
GTQ 8.906864
GYD 243.035552
HKD 9.08083
HNL 30.838734
HRK 7.531828
HTG 152.317604
HUF 387.53795
IDR 19567.767914
ILS 3.572072
IMP 0.866462
INR 106.96677
IQD 1520.480216
IRR 1534060.078108
ISK 145.698959
JEP 0.866462
JMD 182.26462
JOD 0.822923
JPY 183.571294
KES 150.016162
KGS 101.500731
KHR 4660.097832
KMF 490.964169
KPW 1044.638932
KRW 1710.712543
KWD 0.356478
KYD 0.968046
KZT 566.048756
LAK 24867.395511
LBP 103938.170162
LKR 361.079079
LRD 212.693156
LSL 19.00035
LTL 3.427162
LVL 0.702078
LYD 7.385932
MAD 10.834852
MDL 19.991709
MGA 4840.001658
MKD 61.624926
MMK 2437.339802
MNT 4162.494025
MOP 9.360248
MRU 46.577391
MUR 53.333105
MVR 17.94369
MWK 2015.506454
MXN 20.430785
MYR 4.554485
MZN 74.169853
NAD 19.000234
NGN 1621.45863
NIO 42.620475
NOK 11.187241
NPR 170.638349
NZD 1.959516
OMR 0.446245
PAB 1.16169
PEN 3.985164
PGK 4.99611
PHP 68.566694
PKR 324.2829
PLN 4.266497
PYG 7562.960512
QAR 4.225967
RON 5.088157
RSD 117.361357
RUB 91.754332
RWF 1692.839997
SAR 4.356256
SBD 9.345336
SCR 15.529346
SDG 697.564004
SEK 10.649676
SGD 1.478098
SHP 0.870805
SLE 28.520332
SLL 24338.70909
SOS 663.319362
SRD 43.570458
STD 24023.565374
STN 24.452954
SVC 10.164182
SYP 128.320243
SZL 19.000064
THB 36.707467
TJS 11.116708
TMT 4.073958
TND 3.367687
TOP 2.79462
TRY 51.180295
TTD 7.881937
TWD 36.899041
TZS 3013.104344
UAH 50.968161
UGX 4303.719842
USD 1.160672
UYU 46.849057
UZS 14125.377551
VES 505.700804
VND 30450.227843
VUV 139.041208
WST 3.173863
XAF 654.697392
XAG 0.013172
XAU 0.000224
XCD 3.136774
XCG 2.093472
XDR 0.814833
XOF 653.457782
XPF 119.331742
YER 276.904908
ZAR 18.898455
ZMK 10447.44135
ZMW 22.535933
ZWL 373.735885
  • RBGPF

    0.1000

    82.5

    +0.12%

  • CMSC

    0.0300

    23.25

    +0.13%

  • RELX

    -0.4900

    35.19

    -1.39%

  • AZN

    0.0400

    194.99

    +0.02%

  • NGG

    -0.5600

    89.85

    -0.62%

  • RYCEF

    0.8000

    17.5

    +4.57%

  • BP

    -0.7100

    39.94

    -1.78%

  • GSK

    -0.1900

    55.32

    -0.34%

  • VOD

    -0.0200

    14.46

    -0.14%

  • RIO

    1.3300

    91.68

    +1.45%

  • BTI

    1.0800

    59.41

    +1.82%

  • JRI

    0.0600

    12.64

    +0.47%

  • BCE

    0.5100

    26.39

    +1.93%

  • BCC

    -1.9500

    72.54

    -2.69%

  • CMSD

    -0.0800

    23.08

    -0.35%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

W.Cejka--TPP