FIPO: Fallacy-Informed Preference Optimization - Steering LLMs Toward Logically Sound Arguments

🏆 Outstanding Paper Award Winner at NAACL 2025!

I’m thrilled to share that our paper “A Logical Fallacy-Informed Framework for Argument Generation” has received the Outstanding Paper Award at the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025)!

🚨 The Problem: LLMs Generate Fallacious Arguments

Despite their remarkable capabilities, Large Language Models (LLMs) still struggle with generating logically sound arguments, often producing content that contains logical fallacies - errors in reasoning that undermine the validity of an argument. Our preliminary study revealed a startling finding: 21% of arguments generated by ChatGPT contain logical fallacies.

Consider these examples of fallacious vs. logically sound arguments:

Topic: AI is bad for the job market (Support)
Ad Hominem: “AI proponents are tech enthusiasts disconnected from real-world job market issues.”
Circular Reasoning: “AI is harmful to the employment sector because AI is bad for the job market.”
Appeal to Emotion: “AI will leave millions of families struggling and unemployed.”
Logically Sound: “AI automates tasks, reducing employment opportunities by replacing humans in manufacturing and administrative jobs.”

💡 Our Solution: Fallacy-Informed Preference Optimization (FIPO)

We introduce FIPO, a novel framework that helps steer LLMs toward generating logically sound arguments by making them explicitly aware of logical fallacies during training.

Key Innovation: Weighted Classification Loss

FIPO combines traditional preference optimization with a weighted cross-entropy classification loss that penalizes the model based on fallacy frequency, applying stronger penalties for misclassifying more common fallacies.

The FIPO loss function is:

LFIPOθ) = LCPOθ) + λLCLFθ)

Where the classification loss LCLF is defined as:

**LCLFθ) = -E(t,s,yw,yl,k)∼D’[w0 log P0hθ(yw t,s) + wk log Pkhθ(yl t,s)]**

Here:

  • wk: Weights based on fallacy frequency in training data
  • Pkhθ: Probability of fallacy type k from classification head
  • λ = 0.3: Balancing parameter (optimized through experiments)

🧠 The 13 Types of Logical Fallacies

We define 13 categories of logical fallacies based on centuries of logical reasoning research dating back to Aristotle:

Most Common Fallacies (in our dataset):

  1. Faulty Generalization (18.0%) - Drawing conclusions about all instances from limited examples

    • Example: “I know someone who smoked cannabis and became successful. Therefore, everyone who smokes cannabis will be successful.”
  2. Ad Hominem (12.3%) - Attacking the person making the argument rather than the argument itself

    • Example: “Those climate scientists are just trying to get more funding for their research.”
  3. Ad Populum (9.5%) - Argument based on what the majority believes

    • Example: “Most people think this policy is good, so it must be correct.”
  4. False Causality (8.8%) - Implying causal relationships without supporting evidence

    • Example: “Ever since we installed those wind turbines, there have been more bird deaths in the area.”
  5. Circular Reasoning (7.0%) - The conclusion comes back to the premise without proving itself

    • Example: “We should trust the news because the news says we should trust it.”

Other Fallacy Types:

  • Appeal to Emotion (6.8%) - Manipulating emotions rather than using logic
  • Fallacy of Relevance (6.6%) - Introducing irrelevant information
  • Fallacy of Logic (6.2%) - Errors in logical structure
  • Intentional (5.8%) - Deliberately wrong arguments
  • False Dilemma (5.8%) - Presenting only two options when many exist
  • Fallacy of Extension (5.8%) - Attacking exaggerated versions of arguments
  • Fallacy of Credibility (5.4%) - Attacking speaker’s character
  • Equivocation (2.0%) - Using ambiguous language

Fallacy Distribution

🔬 Methodology: 4-Step Framework

Our approach involves four key steps:

Step 1: Supervised Fine-Tuning (SFT)

Train the base model on the ExplaGraphs dataset containing topics, stances, and arguments.

Step 2: Preference Data Collection

Generate fallacious arguments using ChatGPT for each original argument, creating preference pairs where:

  • Preferred (yw): Original logically sound argument
  • Dispreferred (yl): Generated fallacious argument with label k

We generated 7,872 fallacious arguments spanning all 13 fallacy types following the natural distribution found in real-world discourse.

Step 3: Preference Optimization

Apply standard preference optimization methods (DPO, PPO, CPO, KTO) using the preference dataset.

Step 4: FIPO Enhancement

Add our weighted classification loss to the best-performing preference method (CPO) to create FIPO.

📊 Outstanding Results

Dramatic Fallacy Reduction

Model Baseline (SFT) Best Previous Method FIPO Improvement
Llama-2 (7B) 34.5% 26.0% (PPO) 17.0% 17.5% reduction
Mistral (7B) 32.5% 27.75% (KTO) 19.5% 13.0% reduction

Quality Improvements (Win-Rate)

FIPO not only reduces fallacies but also maintains high argument quality:

  • Human Evaluation Win-Rate: 46% (vs. 40.3% loss rate for CPO)
  • GPT-4 Evaluation Win-Rate: 63.5% (highest among all methods)
  • Significant reduction in “loss rate”: From 40.3% (CPO) to 23% (FIPO)

Fallacy-Specific Performance

FIPO excels particularly at reducing the most common fallacy types:

Fallacy Type Llama-2 SFT Llama-2 FIPO Reduction
Faulty Generalization 27.5% 7.0% -20.5%
False Causality 2.5% 3.5% Controlled
Appeal to Emotion 1.0% 2.5% Controlled

The weighted classification loss ensures the model focuses on the most problematic and frequent fallacies.

🔍 Human Evaluation Validation

GPT-4 Reliability Verification

We validated GPT-4’s ability to identify fallacies through human annotation:

  • Randolph’s κ agreement: 0.640 (substantial agreement)
  • Majority agreement ratio: 95.5% between annotators and GPT-4

Comparative Analysis

200 arguments were independently classified by our team and compared with GPT-4:

  • Strong alignment on most fallacy types
  • Main discrepancy: Fallacy of Relevance detection
  • Some confusion between Faulty Generalization and False Causality (expected due to subtle differences)

🧪 Ablation Studies Confirm Design Choices

We validated our design through rigorous ablation studies:

Approach Fallacy Rate Analysis
Uniform Dataset Distribution 37.5% Natural distribution is crucial
Unweighted Cross-Entropy 29.0% Weighting by frequency is essential
FIPO (Full Method) 17.0% Best performance

These results confirm that both the natural fallacy distribution and weighted classification loss are essential components of FIPO.

🌍 Generalization: Out-of-Domain Performance

FIPO’s effectiveness extends beyond training data:

Debatepedia Dataset Results:

  • Fallacy Rate: 45% (second-best, close to KTO’s 44%)
  • Win Rate: 62% (highest among all methods)
  • Excellent at reducing: False Causality and Fallacy of Relevance

This demonstrates FIPO’s ability to generalize to new domains and topics.

💻 Implementation Details

Base Models

  • Llama-2 (7B) and Mistral (7B)
  • LoRA fine-tuning: Reduces parameters from 7B to 8.3M (0.12%)

Training Configuration

  • Classification loss weight (λ): 0.3 (optimal balance)
  • Fallacy weights (wk): Based on natural frequency distribution
  • Base method: CPO (best trade-off between win-rate and fallacy-rate)

Key Hyperparameters

  • Learning rate: 2×10⁻⁴
  • LoRA rank: 16, α = 32
  • Training epochs: 3
  • β (CPO): 0.1

🔮 Impact and Future Directions

Immediate Impact

  • First work to systematically address logical fallacies in argument generation
  • Novel framework combining preference optimization with classification objectives
  • Significant performance gains across multiple models and evaluation metrics

Broader Implications

  • Trust and Safety: Reduces spread of logically flawed arguments
  • Educational Applications: Could help teach logical reasoning
  • Democratic Discourse: Promotes more sound public debate

Future Research Directions

  1. Scaling to larger models: Apply FIPO to models >10B parameters
  2. Multi-domain extension: Expand to scientific, legal, and technical arguments
  3. Real-time fallacy detection: Interactive systems for argument evaluation
  4. Cross-lingual fallacy analysis: Extend to non-English languages
  5. Integration with fact-checking: Combine logical and factual verification

🎯 Key Takeaways

  1. LLMs have a significant logical fallacy problem: Even advanced models like ChatGPT generate fallacious arguments 21% of the time
  2. Explicit fallacy awareness helps: Making models aware of specific fallacy types significantly improves argument quality
  3. Weighted classification loss is crucial: Focusing on frequent fallacies yields better overall performance
  4. Preference optimization isn’t enough alone: Standard methods like DPO and PPO help but miss fine-grained fallacy distinctions
  5. FIPO provides substantial improvements: Up to 17.5% reduction in fallacy rates while maintaining argument quality

📚 Resources and Code


👥 Acknowledgments

This work was a collaborative effort with amazing researchers:

  • Luca Mouchel (Lead Author, Master Internship)
  • Shaobo Cui, Robert West, Antoine Bosselut, Boi Faltings
  • EPFL, Switzerland

Special thanks to the ICT-48 Network of AI Research Excellence Center “TAILOR”, the Swiss National Science Foundation, and our other funding partners.


📖 Citation

@inproceedings{mouchel-etal-2025-logical,
    title = "A Logical Fallacy-Informed Framework for Argument Generation",
    author = "Mouchel, Luca and Paul, Debjit and Cui, Shaobo and West, Robert and Bosselut, Antoine and Faltings, Boi",
    booktitle = "Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    year = "2025",
    pages = "7296--7314",
    publisher = "Association for Computational Linguistics"
}

🤔 Discussion Questions

For Researchers:

  • How might FIPO be adapted for other types of reasoning tasks beyond argument generation?
  • What other fine-grained classification objectives could improve preference optimization?

For Practitioners:

  • How can we integrate fallacy detection into real-world applications like social media platforms or educational tools?
  • What are the computational trade-offs when adding classification heads to large language models?

For Society:

  • How do we balance improving logical reasoning with maintaining diverse perspectives in AI-generated content?
  • What role should AI play in moderating online discourse and identifying fallacious arguments?

This research represents a significant step toward more trustworthy and logically sound AI systems. As LLMs become increasingly prevalent in decision-making and public discourse, ensuring they generate well-reasoned arguments is not just a technical challenge—it’s a societal imperative.

🏆 Proud to have this work recognized with the Outstanding Paper Award at NAACL 2025!




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • What Children Know That AI Doesn't: Learning Through Experience, Not Text
  • Understanding FLOPs, MFU, and Computational Efficiency in LLM Training: From Dense Transformers to MoE Architectures
  • Making Reasoning Matter: Measuring Faithfulness in Chain-of-Thought Reasoning
  • blog