The Prague Post - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.259931
AFN 77.160181
ALL 96.850227
AMD 442.401038
ANG 2.076294
AOA 1063.677072
ARS 1669.055616
AUD 1.767413
AWG 2.087915
AZN 1.976525
BAM 1.955805
BBD 2.329705
BDT 141.350332
BGN 1.968011
BHD 0.435001
BIF 3394.307963
BMD 1.159953
BND 1.504604
BOB 7.993019
BRL 6.236027
BSD 1.156703
BTN 102.544241
BWP 15.533036
BYN 3.942709
BYR 22735.073339
BZD 2.326405
CAD 1.629908
CDF 2598.294516
CHF 0.933958
CLF 0.027862
CLP 1091.35256
CNY 8.255852
CNH 8.261671
COP 4467.910482
CRC 580.101361
CUC 1.159953
CUP 30.738747
CVE 110.265259
CZK 24.471643
DJF 205.980483
DKK 7.508031
DOP 74.320174
DZD 149.986352
EGP 54.518128
ERN 17.399291
ETB 178.208318
FJD 2.659946
FKP 0.882902
GBP 0.881758
GEL 3.149318
GGP 0.882902
GHS 12.60803
GIP 0.882902
GMD 84.101039
GNF 10040.023555
GTQ 8.867021
GYD 242.000568
HKD 9.017299
HNL 30.424071
HRK 7.575772
HTG 151.300355
HUF 390.266543
IDR 19298.7714
ILS 3.773738
IMP 0.882902
INR 102.97504
IQD 1515.303555
IRR 48805.011161
ISK 145.586114
JEP 0.882902
JMD 185.650436
JOD 0.822452
JPY 178.631605
KES 149.450351
KGS 101.438311
KHR 4638.010881
KMF 494.140266
KPW 1044.01324
KRW 1657.306094
KWD 0.356013
KYD 0.963902
KZT 612.471437
LAK 25008.058672
LBP 103640.543153
LKR 352.160826
LRD 211.970497
LSL 20.060547
LTL 3.425039
LVL 0.701644
LYD 6.310015
MAD 10.713725
MDL 19.693046
MGA 5195.012188
MKD 61.620145
MMK 2434.716309
MNT 4162.087864
MOP 9.259322
MRU 46.335109
MUR 53.068276
MVR 17.751613
MWK 2005.704706
MXN 21.531279
MYR 4.857927
MZN 74.125305
NAD 20.060547
NGN 1678.637617
NIO 42.5701
NOK 11.741742
NPR 164.070385
NZD 2.029664
OMR 0.443731
PAB 1.156903
PEN 3.913209
PGK 4.877011
PHP 68.08115
PKR 327.549368
PLN 4.276946
PYG 8183.019198
QAR 4.21621
RON 5.119224
RSD 117.220275
RUB 93.250219
RWF 1680.103942
SAR 4.350385
SBD 9.554962
SCR 17.028538
SDG 697.715826
SEK 11.017487
SGD 1.507015
SHP 0.870265
SLE 26.876535
SLL 24323.628045
SOS 661.101551
SRD 44.669204
STD 24008.679397
STN 24.500057
SVC 10.121024
SYP 12825.363833
SZL 20.056047
THB 37.571296
TJS 10.653225
TMT 4.059835
TND 3.416008
TOP 2.71673
TRY 48.73004
TTD 7.834018
TWD 35.722836
TZS 2845.506676
UAH 48.480314
UGX 4029.009453
USD 1.159953
UYU 46.140108
UZS 13886.032578
VES 256.893396
VND 30524.155863
VUV 141.366347
WST 3.247376
XAF 655.958539
XAG 0.023832
XAU 0.00029
XCD 3.134831
XCG 2.084705
XDR 0.815802
XOF 655.958539
XPF 119.331742
YER 276.652887
ZAR 20.150247
ZMK 10440.970593
ZMW 25.59206
ZWL 373.504303
  • CMSD

    -0.3700

    23.99

    -1.54%

  • BCC

    1.3100

    70.49

    +1.86%

  • SCS

    0.0000

    15.96

    0%

  • JRI

    0.0300

    13.9

    +0.22%

  • BTI

    -0.0900

    51.19

    -0.18%

  • NGG

    -0.8000

    75.25

    -1.06%

  • RIO

    -0.4600

    71.74

    -0.64%

  • BCE

    -0.2500

    22.86

    -1.09%

  • GSK

    -0.0800

    46.86

    -0.17%

  • RBGPF

    -3.0000

    76

    -3.95%

  • BP

    0.3600

    35.13

    +1.02%

  • RYCEF

    0.0000

    15.45

    0%

  • CMSC

    -0.3100

    23.75

    -1.31%

  • VOD

    0.0800

    12.05

    +0.66%

  • RELX

    -0.1300

    44.24

    -0.29%

  • AZN

    0.0600

    82.4

    +0.07%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

H.Dolezal--TPP