Case Study Preparation with GenAI
Picture this: It’s late at night, and a group of management students are frantically preparing for tomorrow’s strategy class. They’re tasked with analyzing a complex case study on corporate strategy, but the sheer volume of information is daunting. As they struggle to identify key points and formulate discussion questions, they wish for a tool that could streamline this process. Enter the world of Large Language Models (LLMs) and intelligent prompting – a game-changer in case study preparation.
In this blog post, we’ll dive deep into how AI can be leveraged to automatically generate insightful questions and answers from any document or case study. We’ll walk through a step-by-step process, explaining each component of a Python script that brings this concept to life. By the end, you’ll understand how to harness the power of AI to transform case study analysis and preparation.
Step 1: Setting Up the Environment
Before we dive into the code, let’s understand the tools we’ll be using. Our script relies on several key libraries:
import os
import json
import tempfile
from PyPDF2 import PdfReader
from io import BytesIO
import streamlit as st
import pandas as pd
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document as LangchainDocument
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
These libraries serve different purposes:
os
andjson
handle system operations and data formatting.PyPDF2
andBytesIO
manage PDF file operations.streamlit
creates an interactive web interface.pandas
helps with data manipulation and analysis.- The
langchain
libraries provide tools for working with LLMs and document processing.
Step 2: Crafting the Instruction Template
The instruction template is crucial – it’s essentially the “brain” of our Q&A generation system. Let’s break it down:
instruction_template = """You are a professor teaching a case study in a class of management program. This case is based on strategy management.
Your task is to generate a set of questions and it's answers from the context that can be discussed in the class. The questions should be based on the following topics:
- Factual: Facts or information that contains numbers, dates, events etc. that are mostly quantitative or qualitative data
- SWOT: Key Strength, weakness, opportunities or threats that are mentioned in the case study
- Decisions and Outcomes: Key decisions taken and it's successful or failed outcomes and reasons
- Concepts: Key management ideas that are proven or innovative that were applied
- Ethical and Governance: Key considerations from ethical and governance perspective
The question can be little elaborate (not more than 100 words) with some details of context why the question is being asked. The question MUST NOT mention something like "according to the passage" or "context".
The output should only an array json elements with the following fields and nothing else. Do not not generate more than 4 questions.
Question:
Question_Type:
Answer:
Tags: # a comma separated list of numbers or statistics or terminology mentioned in the answer. It should not be a list. And maximum of three items.
"""
# Define the context string template
# This template is used to wrap the content of each chunk when sending it to the AI model
context_str = """<context>
{context_content}
</context>"""
This template does several important things:
- It sets the context for the AI, positioning it as a professor teaching strategy management.
- It outlines specific types of questions to generate, ensuring comprehensive coverage of the case study.
- It provides guidelines for question format and length.
- It specifies the desired output structure, making it easy to process the AI’s responses.
Step 3: Initializing the AI Model
We use OpenAI’s GPT-4 model for our Q&A generation. Here’s how we set it up:
def getModel(key):
os.environ['OPENAI_API_KEY'] = key
return ChatOpenAI(temperature=0,
model="gpt-4-turbo",
max_tokens=1000)
This function does two key things:
- It securely sets the OpenAI API key as an environment variable.
- It initializes the ChatOpenAI model with specific parameters:
temperature=0
for more deterministic outputsmodel="gpt-4-turbo"
to use the latest GPT-4 modelmax_tokens=1000
to limit the length of responses
Step 4: Processing the Document
Large documents need to be broken down into manageable chunks. We use the RecursiveCharacterTextSplitter
for this:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=2000,
chunk_overlap=200,
add_start_index=True,
separators=["\n\n", "\n", ".", " ", ""],
)
This splitter breaks the text into 2000-character chunks with a 200-character overlap between chunks. The overlap ensures that context is maintained across chunk boundaries.
Step 5: Generating Q&A
The heart of our system is the generate_qa
function. Let’s examine its key components:
def generate_qa(pages, instructions, model):
# Initialize a text splitter to break down the document into manageable chunks
# This helps in processing long documents by splitting them into smaller parts
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=2000, # Each chunk will have a maximum of 2000 characters
chunk_overlap=200, # Overlap between chunks to maintain context
add_start_index=True, # Add start index to each chunk for reference
separators=["\n\n", "\n", ".", " ", ""], # Hierarchy of separators to use for splitting
)
# Split the documents into chunks
docs_processed = text_splitter.split_documents(pages)
prompt = ChatPromptTemplate.from_messages([("user", instructions + context_str)])
chain = prompt | model
output_df = pd.DataFrame(columns=['Chunk_ID', 'Question', 'Question_Type', 'Answer', 'Tags'])
# Process each chunk to generate Q&A
for context in list_of_chunks:
# Generate Q&A for the current chunk by invoking the chain
output_QA_couple = chain.invoke({"context_content": context.page_content})
# Parse the JSON output from the model
parsed_json = json.loads(output_QA_couple.content)
# Add the generated Q&A to the DataFrame
try:
for item in parsed_json:
# Add chunk ID and the original chunk text to each Q&A item
item.update({"Chunk_ID": chunk_id, "Chunk_Text": context.page_content})
new_row = pd.DataFrame([item])
# Append the new row to the output DataFrame
output_df = pd.concat([output_df, new_row], ignore_index=True)
chunk_id += 1
except:
# If there's an error processing a chunk, skip it and continue with the next one
continue
return output_df
This function:
- Splits the document into chunks.
- Creates a chat prompt template by combining the instructions and the context.
- Sets up a chain that links the prompt template to the AI model.
- Processes each chunk, generating Q&A pairs.
- Stores the results in a pandas DataFrame for easy manipulation and analysis.
Step 6: Handling PDF Input
To work with PDF files, we use the PyPDF2 library:
def read_pdf(file):
pdf_reader = PdfReader(file)
text = ""
for page in pdf_reader.pages:
text += page.extract_text() + "\n"
return text
This function extracts text from each page of the PDF, concatenating it into a single string.
The Power of AI-Assisted Case Study Preparation
By leveraging this AI-powered tool, students can transform their approach to case study analysis:
- Comprehensive Coverage: The AI generates questions across various aspects of the case, ensuring no crucial points are missed.
- Time Efficiency: Students can quickly obtain a set of relevant questions and answers, saving hours of initial analysis.
- Diverse Perspectives: The AI’s ability to consider multiple angles (factual, SWOT, decisions, concepts, ethics) encourages a more holistic understanding of the case.
- Structured Learning: The generated Q&A provides a framework for organizing thoughts and preparing for class discussions.
- Enhanced Critical Thinking: With basic questions answered, students can focus on deeper analysis and developing their own insights.
Video:
Looking Ahead: The Future of AI in Education
This Q&A generation tool is just the beginning. As AI continues to evolve, we can anticipate even more sophisticated applications in education:
- Personalized learning paths based on individual student responses to AI-generated questions
- Real-time case study generation that incorporates current events
- Interactive AI tutors that can engage in Socratic dialogues about case studies
The integration of AI in case study preparation doesn’t replace critical thinking – it enhances it. By automating the initial analysis, AI frees students to engage in higher-level discussions, challenge assumptions, and develop innovative solutions to complex business problems.
As we embrace these technological advancements, it’s crucial to remember that AI is a tool to augment human intelligence, not replace it. The real value lies in how students use these AI-generated insights to fuel their own creativity and problem-solving skills.
In conclusion, AI-powered Q&A generation is revolutionizing the way students approach case studies. By providing a solid foundation of questions and answers, it empowers students to dive deeper into analysis, fostering more engaging and productive classroom discussions. As we continue to explore the possibilities of AI in education, we’re not just preparing students for exams – we’re equipping them with the skills to tackle real-world business challenges in an increasingly complex global landscape.