Table 1 shows the results of causal mediation analysis for 11 different LMs. In these experiments, we examined the causal behaviour using reasoning chains generated by GPT-4 (controlled setting). We find that in-context learning and instruction-tuning improve the indirect effect over models trained only with language modelling objectives (e.g., LLaMA and Mistral), indicating that these methods help the model align better with the reasoning chains. We observe that models trained with RLHF objective (ChatGPT, Llama-2-7B-Chat) have a more direct effect than an indirect effect, suggesting that training on human feedback might have disincentive faithful reasoning. Interestingly, we observe that none of the models has high indirect or direct effects on the causal understanding task. One intuitive reason is that the causal understanding task is challenging, and the model's (<10B) performance is nearly random; hence, the effects are not strong. Overall, we observe that LLMs are inconsistent in faithfully performing reasoning over the CoT.
Table 2 shows the zero-shot performance of the ChatGPT and GPT-4 models. We observe that for StrategyQA and Causal Understanding tasks, GPT-4 has a higher natural indirect effect than ChatGPT, suggesting that it is able to better reason over the reasoning steps for these tasks. However, for mathematical reasoning (GSM8K), ChatGPT has a better indirect effect. Qualitatively, we find that for mathematical reasoning, when we provide intervened reasoning steps, GPT-4 considers them incorrect and continues to generate correct reasoning steps. This results in a lower indirect effect score. Moreover, GPT-4 exhibits a more pronounced direct effect than Chat-GPT, suggesting that its outputs are more causally sensitive to reasoning problems.
@misc{debjit2024frodo,
title={Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning},
author={Debjit Paul, Robert West, Antoine Bosselut and Boi Faltings}
year={2024},
eprint={2402.13950},
archivePrefix={arXiv},
primaryClass={cs.CL}
}