๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๊ต์œก/์ŠคํŒŒ๋ฅดํƒ€

[TIL] ๋‚ด์ผ๋ฐฐ์›€์บ ํ”„ AI 9๊ธฐ - 23ํšŒ

by gomdeng 2025. 2. 6.

๐Ÿถ ํ•™์Šต ๋ฒ”์œ„

  • ์Šคํƒ ๋‹ค๋“œ๋ฐ˜ 
  • ๊ฐœ์ธ ๊ณต๋ถ€

 

๐Ÿถ ํ•™์Šต ๋‚ด์šฉ

โœจ ๊ฐœ์ธ ๊ณต๋ถ€

โœ”๏ธ ์œ„์น˜ ์ธ์ฝ”๋”ฉ
  1. ์œ„์น˜ ์ธ์ฝ”๋”ฉ
   1) ๋‚ด์šฉ
     - ํŠธ๋žœ์Šคํฌ๋จธ์—์„œ๋Š” ๋ชจ๋“  ์ž…๋ ฅ์„ ๋™์‹œ์— ์ฒ˜๋ฆฌ
     - ๊ทธ ๊ณผ์ •์—์„œ ์ˆœ์„œ ์ •๋ณด๊ฐ€ ์‚ฌ๋ผ์ง
     - ์ด๋•Œ ์ˆœ์„œ๋ฅผ ์ถ”๊ฐ€ํ•ด์ฃผ๋Š” ์—ญํ• ์„ ๋‹ด๋‹น

  2. ์ ˆ๋Œ€์  ์œ„์น˜ ์ธ์ฝ”๋”ฉ(absolute position encoding)
   1) ๋‚ด์šฉ
     - ์ž…๋ ฅ ํ† ํฐ์˜ ์œ„์น˜์— ๋”ฐ๋ผ ๊ณ ์ •๋œ ์ž„๋ฒ ๋”ฉ์„ ๋”ํ•จ
     - ํ† ํฐ๊ณผ ํ† ํฐ ์‚ฌ์ด์˜ ์ƒ๋Œ€์ ์ธ ์œ„์น˜์ •๋ณด ํ™œ์šฉํ•˜์ง€ ๋ชปํ•จ
     - ๊ธด ํ…์ŠคํŠธ๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง


โœ”๏ธ ํ† ํฐํ™” ์ƒ˜ํ”Œ ์ฝ”๋“œ

# ํ…์ŠคํŠธ๋ฅผ ์ ์ ˆํ•œ ๋‹จ์œ„๋กœ ๋‚˜๋ˆ” (๋„์–ด์“ฐ๊ธฐ ๊ธฐ์ค€)
input_text = "๋‚˜๋Š” ์ตœ๊ทผ ํŒŒ๋ฆฌ ์—ฌํ–‰์„ ๋‹ค๋…€์™”๋‹ค"
input_text_list = input_text.split()

print(input_text_list) # ['๋‚˜๋Š”', '์ตœ๊ทผ', 'ํŒŒ๋ฆฌ', '์—ฌํ–‰์„', '๋‹ค๋…€์™”๋‹ค']

# ํ† ํฐ (์ˆซ์žID ๋ถ€์—ฌ)
# ์•„์ด๋”” ๋”•์…”๋„ˆ๋ฆฌ์™€ ์•„์ด๋””
str2idx = {word:idx for idx, word in enumerate(input_text_list)} 
# ํ† ํฐ ๋”•์…”๋„ˆ๋ฆฌ ๋งŒ๋“ค๊ธฐ
idx2str = {idx:word for idx, word in enumerate(input_text_list)} 

print(str2idx) # {'๋‚˜๋Š”': 0, '์ตœ๊ทผ': 1, 'ํŒŒ๋ฆฌ': 2, '์—ฌํ–‰์„': 3, '๋‹ค๋…€์™”๋‹ค': 4}
print(idx2str) # {0: '๋‚˜๋Š”', 1: '์ตœ๊ทผ', 2: 'ํŒŒ๋ฆฌ', 3: '์—ฌํ–‰์„', 4: '๋‹ค๋…€์™”๋‹ค'}

# ํ† ํฐ์„ ํ† ํฐ ์•„์ด๋””๋กœ ๋ณ€ํ™˜
input_ids = [str2idx[word] for word in input_text_list]
print(input_ids) # [0, 1, 2, 3, 4]

 

โœ”๏ธ ์ ˆ๋Œ€์  ์œ„์น˜ ์ธ์ฝ”๋”ฉ

embedding_dim = 16
max_position = 12

# 1. ํ† ํฐ ์ž„๋ฒ ๋”ฉ ์ธต ์ƒ์„ฑ (5, 16)
embed_layer = nn.Embedding(len(str2idx), embedding_dim) 

# 2. ์œ„์น˜ ์ธ์ฝ”๋”ฉ ์ธต ์ƒ์„ฑ (12, 16)
position_embed_layer = nn.Embedding(max_position, embedding_dim) 

# 3. ์œ„์น˜ ID ์ƒ์„ฑ ๋ฐ ์ธ์ฝ”๋”ฉ
# - torch.arange(len(input_ids)) :: ์ž…๋ ฅ ํ† ํฐ์˜ ๊ฐœ์ˆ˜๋งŒํผ ์œ„์น˜ ID๋ฅผ ์ƒ์„ฑ
# - unsqueeze(0) :: ์ด ๋ฐฐ์—ด์— ๋ฐฐ์น˜ ์ฐจ์›์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ชจ์–‘์„ (1, ์‹œํ€€์Šค ๊ธธ์ด)๋กœ ๋งŒ๋“ฌ
position_ids = torch.arange(len(input_ids), dtype=torch.long).unsqueeze(0)
position_encodings = position_embed_layer(position_ids)

# 4. ํ† ํฐ ์ž„๋ฒ ๋”ฉ ์กฐํšŒ ๋ฐ ์ฐจ์› ์กฐ์ •
token_embeddings = embed_layer(torch.tensor(input_ids)) # (5, 16)
token_embeddings = token_embeddings.unsqueeze(0) # (1, 5, 16)

# 5. ํ† ํฐ ์ž„๋ฒ ๋”ฉ๊ณผ ์œ„์น˜ ์ธ์ฝ”๋”ฉ์„ ๋”ํ•ด ์ตœ์ข… ์ž…๋ ฅ ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑ
input_embeddings = token_embeddings + position_encodings
input_embeddings.shape # torch.Size([1, 5, 16]))

 

๐Ÿถ ๋А๋‚€์ 

1. ๊ณต๋ถ€ํ•  ๊ฑด ๋งŽ๊ณ  ์‹œ๊ฐ„์€ ๋ถ€์กฑํ•˜๋‹ค.
2. ๋‹ค ํ• ์ˆ˜ ์—†๋‹ค. ๊ณต๋ถ€ ์ „๋žต์„ ์„ธ์šฐ์ž