Tool-integrated Reasoning (TIR) 소개

LLM을 통해서 수학 문제를 풀어보기 위해서 Qwen2.5-Math-7B-Instruct 모델을 사용하려 했습니다.

https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct

Qwen/Qwen2.5-Math-7B-Instruct · Hugging Face

Qwen2.5-Math-7B-Instruct 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. Introduction In August 2024, we released the first series of mathemat

huggingface.co

모델 페이지에서 아래와 같은 문구를 보고 TIR에 관심이 생겨 찾아보았습니다

🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR.
We do not recommend using this series of models for other tasks.

Tool-integrated Reasoning은 Microsoft에서 2023년에 발표한

"ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving" 논문에서 소개된 기법이며 본문은 이 논문의 일부를 정리 한 글 입니다.

LLM을 사용하여 수학 문제를 푸는 방법은 (a) 와 같이 자연어 추론을 통해서 문제를 푸는 방법과, (b) 수학 문제를 푸는 프로그램을 생성하도록 하는 방법이 있으며, 이 두 방법은 상호 보완적인 관계입니다.

(a) 단계별 자연어 추론은 의미 분석, 계획 수립, 추상적 추론에 적합합니다. (b) 수학적 추론은 대수학, 알고리즘 처리와 같은 계산을 위임 하여 연산을 안정적으로 잘 처리 할 수 있습니다.

LLaMA-2 와 같은 오픈 소스 모델이 이 두 방법을 모두 사용할 수 있도록 하였습니다. 구체적으로, 우리는 추론의 상호 교차 형식을 설계하고, GSM8k, MATH와 같은 데이터셋에서 수학 문제에 대한 상호작용 도구 사용 궤적을 수집한 후, 고품질 주석을 바탕으로 모방 학습(imitation learning)을 적용하여 기존의 오픈 소스 모델보다 더 나은 성능을 달성했습니다.

데이터가 문제의 모든 유형을 포함하지 않기 때문에, 모방 학습에만 의존할 경우 모델의 출력 공간이 제한 될 수 있습니다.

따라서 출력 공간 형성(output space shaping)을 적용하였으며, 문제를 가르쳐 주는 더 큰 사이즈의 모델을 별도로 두어 잘 푼 풀이와 잘못된 풀이를 수정한 것을 모두 학습하도록 하였습니다.

출력 공간 형성 : 생성 모델이나 최적화 문제에서 모델이 생성할 수 있는 출력 공간을 조정하거나 제한하는 과정
1) 제약 조건 설정 : 출력의 범위를 특정 제약 (python 에서 XX를 위해 개발하고 있어 → python 언어 결과만 제공)
2) 출력 분포 조정 : 모델이 만들어내는 결과의 분포를 조정해, 특정한 패턴을 더 많이 생성하게 유도 (추천 시스템의 경우 더 관련성이 높은 아이템만 추천)
3) 후처리 : 모델의 출력을 받아, 결과를 재조정하거나 필터링하여 원하는 조건에 맞는 출력을 생성

Fig (c): 주어진 수학 문제 q에 대해 TORA는 자연어로 추론을 시작하여 r1 을 생성합니다.

이후에 방정식 풀이와 같은 작업에 프로그램 기반 도구 사용이 더 적합하다고 판단되면, TORA는 자연어 안내 r1 에 따라 도구 사용을 위한 프로그램 a1 을 생성합니다.

이 프로그램을 실행한 결과 o1 은 이후의 처리 단계에 입력되어 도구 사용의 조정, 하위 과제 해결, 혹은 답안 완성 등에 활용됩니다.

이러한 과정을 반복하여 모델이 최종 답안을 “\boxed{}” 기호 안에 넣을 때까지 진행합니다.

최종적으로 얻어진 궤적은 𝜏 = r_1 a_1 o_1...r_n−1 a_n−1 o_n−1 rn 로 표시되며, rn 에는 최종 답이 포함됩니다.

결론적으로 TIR (Tool Integrated Reasoning) 은 복잡한 문제를 더 잘 풀수 있도록 추론과 도구를 결합하여 문제를 단계적으로 풀어나가는 방법이라고 할 수 있습니다.

마지막으로 Huggingface에 있는 Qwen/Qwen2.5-Math-7B-Instruct 모델에서 간단하게 CoT와 TIR을 비교를 해 보겠습니다.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-Math-7B-Instruct"
device = "cuda" # the device to load the model onto

# 모델과 토크나이저를 가져 온 후에
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 풀고싶은 문제를 정의 해 봅시다
prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."

CoT를 통해 문제를 풀도록 하면 다음과 같습니다.

%%time

# CoT
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]


text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

To solve the equation \(4x + 5 = 6x + 7\), we need to isolate the variable \(x\). Here are the steps to do that:

1. Start with the original equation:
   \[
   4x + 5 = 6x + 7
   \]

2. Subtract \(4x\) from both sides of the equation to move all the \(x\)-terms to one side:
   \[
   4x + 5 - 4x = 6x + 7 - 4x
   \]
   Simplifying both sides, we get:
   \[
   5 = 2x + 7
   \]

3. Next, subtract 7 from both sides to move the constant term to the other side:
   \[
   5 - 7 = 2x + 7 - 7
   \]
   Simplifying both sides, we get:
   \[
   -2 = 2x
   \]

4. Finally, divide both sides by 2 to solve for \(x\):
   \[
   \frac{-2}{2} = \frac{2x}{2}
   \]
   Simplifying both sides, we get:
   \[
   -1 = x
   \]

So, the value of \(x\) that satisfies the equation is \(\boxed{-1}\).
CPU times: total: 1min 46s
Wall time: 6min 46s

TIR을 통해 문제를 풀도록 하면 아래와 같습니다.

%%time

# TIR
messages = [
    {"role": "system", "content": "Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

To solve the equation \(4x + 5 = 6x + 7\), we need to isolate \(x\). Let's follow these steps:

1. Subtract \(4x\) from both sides of the equation to get the \(x\) terms on one side:
   \[
   5 = 2x + 7
   \]

2. Subtract 7 from both sides to isolate the term with \(x\):
   \[
   5 - 7 = 2x
   \]
   \[
   -2 = 2x
   \]

3. Divide both sides by 2 to solve for \(x\):
   \[
   x = \frac{-2}{2}
   \]
   \[
   x = -1
   \]

Let's verify this solution by substituting \(x = -1\) back into the original equation to ensure it holds true. We'll use Python to confirm our result.
```python
# Define the value of x
x = -1

# Left side of the equation
left_side = 4 * x + 5

# Right side of the equation
right_side = 6 * x + 7

# Check if both sides are equal
print((left_side, right_side, left_side == right_side))
```
```output
(1, 1, True)
```
The left side of the equation evaluates to \(1\) and the right side also evaluates to \(1\), confirming that the solution \(x = -1\) satisfies the equation \(4x + 5 = 6x + 7\).

Thus, the value of \(x\) that satisfies the equation is \(\boxed{-1}\).
CPU times: total: 2min 13s
Wall time: 7min 59s

TIR의 경우에는 CoT와 다르게 도구(수식)을 같이 사용하는것을 알 수 있습니다.

더 극적인 비교를 위해서는 다른 문제를 풀어 볼 수 있습니다.

prompt = """
"Find the derivative f'(x) of the function f(x) = x^3 * ln(x^2 + 1)."
"""

LaTex 문법이 눈에 잘 들어오지 않아 ChatGPT를 사용하여 문장을 이쁘게 수정하였습니다.

CoT의 경우

To find the derivative \( f'(x) \) of the function \( f(x) = x^3 \ln(x^2 + 1) \), we will use the product rule. The product rule states that if \( f(x) = g(x)h(x) \), then \( f'(x) = g'(x)h(x) + g(x)h'(x) \).

Here, we can identify \( g(x) = x^3 \) and \( h(x) = \ln(x^2 + 1) \).

First, we find the derivative of \( g(x) \):
\[ g'(x) = \frac{d}{dx}(x^3) = 3x^2. \]

Next, we find the derivative of \( h(x) \). To do this, we use the chain rule. The chain rule states that if \( h(x) = \ln(u) \) and \( u = x^2 + 1 \), then \( h'(x) = \frac{1}{u} \cdot \frac{du}{dx} \). Here, \( u = x^2 + 1 \), so \( \frac{du}{dx} = 2x \). Therefore,
\[ h'(x) = \frac{1}{x^2 + 1} \cdot 2x = \frac{2x}{x^2 + 1}. \]

Now, we apply the product rule:
\[ f'(x) = g'(x)h(x) + g(x)h'(x) = 3x^2 \ln(x^2 + 1) + x^3 \cdot \frac{2x}{x^2 + 1}. \]

We can simplify the second term:
\[ x^3 \cdot \frac{2x}{x^2 + 1} = \frac{2x^4}{x^2 + 1}. \]

So, the derivative is:
\[ f'(x) = 3x^2 \ln(x^2 + 1) + \frac{2x^4}{x^2 + 1}. \]

Therefore, the final answer is:
\[ \boxed{3x^2 \ln(x^2 + 1) + \frac{2x^4}{x^2 + 1}}. \]
CPU times: total: 2min 54s
Wall time: 11min 15s

TIR의 경우

To find the derivative of the function \( f(x) = x^3 \ln(x^2 + 1) \), we can use the product rule. The product rule states that if \( f(x) = g(x)h(x) \), then \( f'(x) = g'(x)h(x) + g(x)h'(x) \).

In this case, let \( g(x) = x^3 \) and \( h(x) = \ln(x^2 + 1) \).

1. First, we find the derivative of \( g(x) = x^3 \):
\[ g'(x) = 3x^2. \]

2. Next, we find the derivative of \( h(x) = \ln(x^2 + 1) \). Using the chain rule, we get:
\[ h'(x) = \frac{1}{x^2 + 1} \cdot \frac{d}{dx}(x^2 + 1) = \frac{2x}{x^2 + 1}. \]

Now we can apply the product rule:
\[ f'(x) = g'(x)h(x) + g(x)h'(x) = 3x^2 \ln(x^2 + 1) + x^3 \cdot \frac{2x}{x^2 + 1}. \]

Simplifying the expression, we get:
\[ f'(x) = 3x^2 \ln(x^2 + 1) + \frac{2x^4}{x^2 + 1}. \]

Let's confirm this by using Python to compute the derivative.
```python
import sympy as sp

# Define the variable and function
x = sp.symbols('x')
f = x**3 * sp.ln(x**2 + 1)

# Compute the derivative
f_prime = sp.diff(f, x)
print(f_prime)
```
```output
2*x**4/(x**2 + 1) + 3*x**2*log(x**2 + 1)
```
The derivative of the function \( f(x) = x^3 \ln(x^2 + 1) \) is indeed \( f'(x) = 3x^2 \ln(x^2 + 1) + \frac{2x^4}{x^2 + 1} \).

So the final answer is:
\[ \boxed
CPU times: total: 2min 49s
Wall time: 11min 38s

미분이 필요한 문제를 요청하였으며, 파이썬 패키지를 적극적으로 활용하여 문제를 푸는것을 확인하였습니다.

은퇴하고 고양이 키우기 프로젝트

Tool-integrated Reasoning (TIR) 소개

티스토리툴바