Case Study Preparation with GenAI

Picture this: It’s late at night, and a group of management students are frantically preparing for tomorrow’s strategy class. They’re tasked with analyzing a complex case study on corporate strategy, but the sheer volume of information is daunting. As they struggle to identify key points and formulate discussion questions, they wish for a tool that could streamline this process. Enter the world of Large Language Models (LLMs) and intelligent prompting – a game-changer in case study preparation.

In this blog post, we’ll dive deep into how AI can be leveraged to automatically generate insightful questions and answers from any document or case study. We’ll walk through a step-by-step process, explaining each component of a Python script that brings this concept to life. By the end, you’ll understand how to harness the power of AI to transform case study analysis and preparation.

Step 1: Setting Up the Environment

Before we dive into the code, let’s understand the tools we’ll be using. Our script relies on several key libraries:

import os
import json
import tempfile
from PyPDF2 import PdfReader
from io import BytesIO
import streamlit as st
import pandas as pd
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document as LangchainDocument
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

These libraries serve different purposes:

os and json handle system operations and data formatting.
PyPDF2 and BytesIO manage PDF file operations.
streamlit creates an interactive web interface.
pandas helps with data manipulation and analysis.
The langchain libraries provide tools for working with LLMs and document processing.

Step 2: Crafting the Instruction Template

The instruction template is crucial – it’s essentially the “brain” of our Q&A generation system. Let’s break it down:

instruction_template = """You are a professor teaching a case study in a class of management program. This case is based on strategy management.

Your task is to generate a set of questions and it's answers from the context that can be discussed in the class. The questions should be based on the following topics:

- Factual: Facts or information that contains numbers, dates, events etc. that are mostly quantitative or qualitative data
- SWOT: Key Strength, weakness, opportunities or threats that are mentioned in the case study
- Decisions and Outcomes: Key decisions taken and it's successful or failed outcomes and reasons
- Concepts: Key management ideas that are proven or innovative that were applied 
- Ethical and Governance: Key considerations from ethical and governance perspective

The question can be little elaborate (not more than 100 words) with some details of context why the question is being asked. The question MUST NOT mention something like "according to the passage" or "context".

The output should only an array json elements with the following fields and nothing else. Do not not generate more than 4 questions.

Question:
Question_Type:
Answer:
Tags: # a comma separated list of numbers or statistics or terminology mentioned in the answer. It should not be a list. And maximum of three items.
"""

# Define the context string template
# This template is used to wrap the content of each chunk when sending it to the AI model
context_str = """<context>
{context_content}
</context>"""

This template does several important things:

It sets the context for the AI, positioning it as a professor teaching strategy management.
It outlines specific types of questions to generate, ensuring comprehensive coverage of the case study.
It provides guidelines for question format and length.
It specifies the desired output structure, making it easy to process the AI’s responses.

Step 3: Initializing the AI Model

We use OpenAI’s GPT-4 model for our Q&A generation. Here’s how we set it up:

def getModel(key):
    os.environ['OPENAI_API_KEY'] = key
    return ChatOpenAI(temperature=0,
                 model="gpt-4-turbo",
                 max_tokens=1000)

This function does two key things:

It securely sets the OpenAI API key as an environment variable.
It initializes the ChatOpenAI model with specific parameters:
- temperature=0 for more deterministic outputs
- model="gpt-4-turbo" to use the latest GPT-4 model
- max_tokens=1000 to limit the length of responses

Step 4: Processing the Document

Large documents need to be broken down into manageable chunks. We use the RecursiveCharacterTextSplitter for this:

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=200,
    add_start_index=True,
    separators=["\n\n", "\n", ".", " ", ""],
)

This splitter breaks the text into 2000-character chunks with a 200-character overlap between chunks. The overlap ensures that context is maintained across chunk boundaries.

Step 5: Generating Q&A

The heart of our system is the generate_qa function. Let’s examine its key components:

def generate_qa(pages, instructions, model):
    # Initialize a text splitter to break down the document into manageable chunks
    # This helps in processing long documents by splitting them into smaller parts
    text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=2000,  # Each chunk will have a maximum of 2000 characters
                chunk_overlap=200,  # Overlap between chunks to maintain context
                add_start_index=True,  # Add start index to each chunk for reference
                separators=["\n\n", "\n", ".", " ", ""],  # Hierarchy of separators to use for splitting
    )

    # Split the documents into chunks
    docs_processed = text_splitter.split_documents(pages)
    prompt = ChatPromptTemplate.from_messages([("user", instructions + context_str)])
    chain = prompt | model

    output_df = pd.DataFrame(columns=['Chunk_ID', 'Question', 'Question_Type', 'Answer', 'Tags'])

    # Process each chunk to generate Q&A
    for context in list_of_chunks:        
        # Generate Q&A for the current chunk by invoking the chain
        output_QA_couple = chain.invoke({"context_content": context.page_content})
        
        # Parse the JSON output from the model
        parsed_json = json.loads(output_QA_couple.content)
        
        # Add the generated Q&A to the DataFrame
        try:
            for item in parsed_json:
                # Add chunk ID and the original chunk text to each Q&A item
                item.update({"Chunk_ID": chunk_id, "Chunk_Text": context.page_content})
                new_row = pd.DataFrame([item])
                # Append the new row to the output DataFrame
                output_df = pd.concat([output_df, new_row], ignore_index=True)
            chunk_id += 1
        except:
            # If there's an error processing a chunk, skip it and continue with the next one
            continue

    return output_df

This function:

Splits the document into chunks.
Creates a chat prompt template by combining the instructions and the context.
Sets up a chain that links the prompt template to the AI model.
Processes each chunk, generating Q&A pairs.
Stores the results in a pandas DataFrame for easy manipulation and analysis.

Step 6: Handling PDF Input

To work with PDF files, we use the PyPDF2 library:

def read_pdf(file):
    pdf_reader = PdfReader(file)
    text = ""
    for page in pdf_reader.pages:
        text += page.extract_text() + "\n"
    return text

This function extracts text from each page of the PDF, concatenating it into a single string.

The Power of AI-Assisted Case Study Preparation

By leveraging this AI-powered tool, students can transform their approach to case study analysis:

Comprehensive Coverage: The AI generates questions across various aspects of the case, ensuring no crucial points are missed.
Time Efficiency: Students can quickly obtain a set of relevant questions and answers, saving hours of initial analysis.
Diverse Perspectives: The AI’s ability to consider multiple angles (factual, SWOT, decisions, concepts, ethics) encourages a more holistic understanding of the case.
Structured Learning: The generated Q&A provides a framework for organizing thoughts and preparing for class discussions.
Enhanced Critical Thinking: With basic questions answered, students can focus on deeper analysis and developing their own insights.

Video:

Looking Ahead: The Future of AI in Education

This Q&A generation tool is just the beginning. As AI continues to evolve, we can anticipate even more sophisticated applications in education:

Personalized learning paths based on individual student responses to AI-generated questions
Real-time case study generation that incorporates current events
Interactive AI tutors that can engage in Socratic dialogues about case studies

The integration of AI in case study preparation doesn’t replace critical thinking – it enhances it. By automating the initial analysis, AI frees students to engage in higher-level discussions, challenge assumptions, and develop innovative solutions to complex business problems.

As we embrace these technological advancements, it’s crucial to remember that AI is a tool to augment human intelligence, not replace it. The real value lies in how students use these AI-generated insights to fuel their own creativity and problem-solving skills.

In conclusion, AI-powered Q&A generation is revolutionizing the way students approach case studies. By providing a solid foundation of questions and answers, it empowers students to dive deeper into analysis, fostering more engaging and productive classroom discussions. As we continue to explore the possibilities of AI in education, we’re not just preparing students for exams – we’re equipping them with the skills to tackle real-world business challenges in an increasingly complex global landscape.