The Prague Post - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.237188
AFN 72.108292
ALL 95.938311
AMD 436.591732
ANG 2.064923
AOA 1057.999566
ARS 1610.053627
AUD 1.617397
AWG 2.079656
AZN 1.963217
BAM 1.953526
BBD 2.320399
BDT 141.854856
BGN 1.900991
BHD 0.435465
BIF 3440.62434
BMD 1.153762
BND 1.474696
BOB 7.99669
BRL 5.949253
BSD 1.158152
BTN 106.591909
BWP 15.526924
BYN 3.41892
BYR 22613.731709
BZD 2.321997
CAD 1.568072
CDF 2512.892702
CHF 0.902345
CLF 0.026221
CLP 1035.339974
CNY 7.922017
CNH 7.940235
COP 4274.076056
CRC 545.678924
CUC 1.153762
CUP 30.574688
CVE 110.136782
CZK 24.402291
DJF 206.229913
DKK 7.471865
DOP 70.270021
DZD 152.133872
EGP 59.846895
ERN 17.306427
ETB 179.342201
FJD 2.559969
FKP 0.85732
GBP 0.862841
GEL 3.132423
GGP 0.85732
GHS 12.548392
GIP 0.85732
GMD 84.797981
GNF 10153.355744
GTQ 8.879663
GYD 242.647516
HKD 9.027898
HNL 30.656974
HRK 7.534407
HTG 151.96572
HUF 389.533029
IDR 19504.343599
ILS 3.587334
IMP 0.85732
INR 106.447162
IQD 1516.943373
IRR 1525013.532007
ISK 144.808988
JEP 0.85732
JMD 181.409594
JOD 0.817987
JPY 183.491394
KES 149.689063
KGS 100.896296
KHR 4648.668729
KMF 491.502389
KPW 1038.425208
KRW 1708.04039
KWD 0.354092
KYD 0.964955
KZT 568.776365
LAK 24807.002721
LBP 103768.195891
LKR 360.015634
LRD 211.933273
LSL 18.962341
LTL 3.406759
LVL 0.697899
LYD 7.366424
MAD 10.842477
MDL 19.971749
MGA 4801.410329
MKD 61.58999
MMK 2422.249424
MNT 4131.516627
MOP 9.335459
MRU 46.245365
MUR 52.969315
MVR 17.825768
MWK 2008.162152
MXN 20.510482
MYR 4.533707
MZN 73.73718
NAD 18.962341
NGN 1614.770859
NIO 42.62112
NOK 11.153705
NPR 170.551883
NZD 1.95667
OMR 0.443626
PAB 1.158152
PEN 3.969179
PGK 4.990255
PHP 68.690942
PKR 323.609563
PLN 4.257537
PYG 7506.261415
QAR 4.222884
RON 5.09121
RSD 117.389677
RUB 91.405648
RWF 1692.329836
SAR 4.32933
SBD 9.282224
SCR 17.369823
SDG 693.410524
SEK 10.696653
SGD 1.472217
SHP 0.86562
SLE 28.384548
SLL 24193.807775
SOS 660.733655
SRD 43.235493
STD 23880.540277
STN 24.471829
SVC 10.131931
SYP 128.357478
SZL 18.960926
THB 36.814809
TJS 11.100677
TMT 4.038166
TND 3.394049
TOP 2.777982
TRY 50.895778
TTD 7.857865
TWD 36.734044
TZS 2999.780987
UAH 51.055962
UGX 4279.018483
USD 1.153762
UYU 46.585766
UZS 14068.853309
VES 504.952214
VND 30312.784346
VUV 137.783385
WST 3.150631
XAF 655.194241
XAG 0.01358
XAU 0.000224
XCD 3.118099
XCG 2.087008
XDR 0.814851
XOF 655.194241
XPF 119.331742
YER 275.286247
ZAR 19.167387
ZMK 10385.240379
ZMW 22.525776
ZWL 371.510836
  • RBGPF

    0.1000

    82.5

    +0.12%

  • RYCEF

    -0.3300

    17.35

    -1.9%

  • CMSD

    0.0700

    23.15

    +0.3%

  • CMSC

    -0.0100

    23.24

    -0.04%

  • GSK

    -0.1700

    55.15

    -0.31%

  • NGG

    -0.1600

    89.69

    -0.18%

  • RELX

    -0.4300

    34.76

    -1.24%

  • RIO

    0.4000

    92.08

    +0.43%

  • BCC

    -0.6400

    71.9

    -0.89%

  • BCE

    -0.5000

    25.89

    -1.93%

  • AZN

    -1.6800

    193.31

    -0.87%

  • JRI

    0.2100

    12.85

    +1.63%

  • VOD

    -0.0600

    14.4

    -0.42%

  • BTI

    -0.2500

    59.16

    -0.42%

  • BP

    1.6200

    41.56

    +3.9%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

H.Dolezal--TPP