The Prague Post - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.123274
AFN 78.845557
ALL 97.596257
AMD 436.974451
ANG 2.02326
AOA 1027.1838
ARS 1248.909706
AUD 1.7531
AWG 2.023496
AZN 1.904973
BAM 1.943087
BBD 2.264376
BDT 136.254018
BGN 1.955818
BHD 0.423127
BIF 3336.236092
BMD 1.122605
BND 1.454157
BOB 7.749403
BRL 6.380212
BSD 1.121507
BTN 95.956053
BWP 15.196322
BYN 3.669572
BYR 22003.064713
BZD 2.252672
CAD 1.563194
CDF 3227.490444
CHF 0.932728
CLF 0.027547
CLP 1057.101308
CNY 8.111217
CNH 8.133119
COP 4773.508759
CRC 569.993659
CUC 1.122605
CUP 29.749042
CVE 109.552642
CZK 24.931981
DJF 199.509615
DKK 7.460807
DOP 65.931287
DZD 148.903709
EGP 56.83201
ERN 16.83908
ETB 151.16208
FJD 2.549101
FKP 0.841184
GBP 0.846703
GEL 3.092773
GGP 0.841184
GHS 14.803347
GIP 0.841184
GMD 80.266662
GNF 9711.849423
GTQ 8.627974
GYD 234.647216
HKD 8.726595
HNL 29.134555
HRK 7.536947
HTG 146.74255
HUF 405.666347
IDR 18575.41382
ILS 4.007319
IMP 0.841184
INR 96.579927
IQD 1469.146981
IRR 47275.71699
ISK 146.668232
JEP 0.841184
JMD 177.985528
JOD 0.796263
JPY 163.578751
KES 144.950604
KGS 98.172275
KHR 4490.699555
KMF 486.662581
KPW 1010.318113
KRW 1576.244573
KWD 0.344595
KYD 0.934639
KZT 579.189026
LAK 24236.619088
LBP 100481.044293
LKR 335.54618
LRD 224.295531
LSL 20.405993
LTL 3.314762
LVL 0.679053
LYD 6.125172
MAD 10.360035
MDL 19.316535
MGA 5027.354783
MKD 61.482574
MMK 2356.987398
MNT 4012.266015
MOP 8.978061
MRU 44.453361
MUR 50.865004
MVR 17.298932
MWK 1944.784115
MXN 21.969454
MYR 4.805865
MZN 71.731512
NAD 20.405813
NGN 1805.78948
NIO 41.272
NOK 11.714016
NPR 153.525608
NZD 1.899984
OMR 0.432196
PAB 1.121517
PEN 4.09789
PGK 4.653739
PHP 62.522938
PKR 315.626615
PLN 4.253419
PYG 8958.74642
QAR 4.088018
RON 5.117621
RSD 116.46271
RUB 92.614528
RWF 1604.892896
SAR 4.210489
SBD 9.374707
SCR 16.675726
SDG 674.124197
SEK 10.912066
SGD 1.459791
SHP 0.882192
SLE 25.516701
SLL 23540.4545
SOS 640.938057
SRD 40.702289
STD 23235.664058
SVC 9.81319
SYP 14595.977591
SZL 20.394568
THB 37.136347
TJS 11.580334
TMT 3.940345
TND 3.368099
TOP 2.62925
TRY 43.368383
TTD 7.617535
TWD 33.986921
TZS 3036.647304
UAH 46.604061
UGX 4107.328987
USD 1.122605
UYU 46.843733
UZS 14477.346287
VES 102.267343
VND 29146.763809
VUV 135.432547
WST 2.974563
XAF 651.704825
XAG 0.034553
XAU 0.000339
XCD 3.033897
XDR 0.80676
XOF 651.719244
XPF 119.331742
YER 274.474652
ZAR 20.436189
ZMK 10104.793732
ZMW 29.745785
ZWL 361.478462
  • RBGPF

    2.8600

    65.86

    +4.34%

  • BCC

    2.5700

    89.67

    +2.87%

  • SCS

    0.5750

    10.485

    +5.48%

  • CMSD

    -0.0600

    22.35

    -0.27%

  • CMSC

    -0.0300

    22.13

    -0.14%

  • JRI

    -0.0260

    13

    -0.2%

  • NGG

    -2.4000

    70.17

    -3.42%

  • RYCEF

    0.4300

    10.6

    +4.06%

  • BCE

    0.9850

    22.235

    +4.43%

  • VOD

    -0.1500

    9.25

    -1.62%

  • GSK

    -0.3100

    36.86

    -0.84%

  • RELX

    -0.7800

    54.09

    -1.44%

  • RIO

    -0.8400

    59.18

    -1.42%

  • BTI

    -1.1550

    43.295

    -2.67%

  • BP

    0.4650

    28.595

    +1.63%

  • AZN

    -2.7700

    67.3

    -4.12%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

H.Dolezal--TPP