The Internal State of an LLM Knows When It's Lying

Amos Azaria, Tom M. Mitchell. The Internal State of an LLM Knows When It's Lying. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 967-976, Association for Computational Linguistics, 2023. [doi]

@inproceedings{AzariaM23,
  title = {The Internal State of an LLM Knows When It's Lying},
  author = {Amos Azaria and Tom M. Mitchell},
  year = {2023},
  url = {https://aclanthology.org/2023.findings-emnlp.68},
  researchr = {https://researchr.org/publication/AzariaM23},
  cites = {0},
  citedby = {0},
  pages = {967-976},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023},
  editor = {Houda Bouamor and Juan Pino 0001 and Kalika Bali},
  publisher = {Association for Computational Linguistics},
  isbn = {979-8-89176-061-5},
}