새우위키 — 새우위키

모델 레이어 수정

2024.04.25

허깅페이스 transformers 라이브러리로 불러온 모델에 대해 약간의 수정이 필요한 경우가 있습니다. 저 같은 경우에는 diffusers의 controlnet의 조건부 이미지 입력값을 더 키워서 다양한 조건을 한번에 입력받게끔 테스트를 진행하고 있는데요. 간단하게 in_channels를 수정하면 될 줄 알았는데 모델 레이어 전체를 바꿔야만 에러 없이 돌아가는 걸 확인하여 까먹지 않고자 정리해봅니다.정리 중에 생각해보니까 in_channels만 수정한다고 weight가 생기지는 않을테니 당연한 내용이었네요. 간단하게 파라미터만 수정하는 경우아래와 같이 컨트롤넷을 정의하고 조건부 이미지 입력값만 변경해보겠습니다.controlnet = ControlNetModel()controlnet.controlnet_..

이론/Diffusion

[논문리뷰] Video LDM, Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

2024.04.20

이번에는 논문 전체 리뷰가 아닌 특정 부분에 대한 간단한 리뷰를 진행하겠습니다. 제 생각을 적은 것이 많아서 잘못된 생각이 있을 수 있습니다. 이런 부분에 대해 댓글 남겨주시면 감사하겟습니다. 23년 4월에 나온 논문입니다. Stable Video Diffusion을 보려고 했는데 이 논문의 아키텍쳐를 사용했다고 해서 빠르게 짚어보려고 합니다. 소개 올해 2월, OpenAI에서 Sora라는 비디오 생성 모델을 공개했습니다. 입력으로 텍스트만 주어졌을 뿐인데 1분 분량의 사실적인 고화질 영상을 생성할 수 있는 아주 놀라운 기술입니다. 그 전에는 Pika Labs라는 스타트업에서 동영상 생성과 관련하여 기술을 선보인 적이 있었습니다. 관련 스타트업에서 공개한 기술도 있고, OpenAI에서도 발표한 모델이 있..

이론/GAN

[논문리뷰] CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

2024.04.18

소개 https://arxiv.org/abs/2005.09544 CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks The unprecedented increase in the usage of computer vision technology in society goes hand in hand with an increased concern in data privacy. In many real-world scenarios like people tracking or action recognition, it is important to be able to process the arxiv.org 저는 스테이블 디퓨전을 사용한 비식별..

이론/Diffusion

[논문리뷰] ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models

2024.01.29

스테이블 디퓨전(Stable Diffusion) 등장 이후 이 모델을 기반으로 하는 다양한 방법들이 제시되었습니다. 이번에는 그 중 스테이블 디퓨전에 다양한 조건들을 어떻게 적용할지에 대해 연구한 ControlNet(Adding Conditional Control to Text-to-Image Diffusion Models)에 대해 리뷰해보겠습니다. 먼저 리뷰에 앞서 정확하게 어떤 모델인지 체험하기 위해 구현한 컨트롤넷 데모 영상을 보여드리겠습니다. 입력으로 이미지를 넣게 설정되어 있지만 자체적으로 오픈포즈만을 추출해서 사용합니다. 즉 아래 오픈포즈랑 텍스트만으로 그림을 생성한다고 보시면 됩니다. Introduction 스테이블 디퓨전의 등장 이후 이미지 생성 AI는 많은 발전을 이루었으며, 연구자뿐만 ..

기타/Git

[Git] Git Author 변경하기 및 레포지토리마다 다르게 변경하기

2023.12.09

Github 작업을 진행하다보면 커밋 로그를 작성하는 Author를 변경해야 할 필요성을 느낄 때가 있습니다. 일반적으로는 커밋을 올리는 로컬 컴퓨터의 사용자 정보가 Author로 잡히게 되어서 나의 깃헙 계정 등으로 변경하고 싶을 때가 생기는데요. 어떻게 변경할 수 있는지에 대해 간단하게 정리하고자 합니다. Git Author 변경하기 Author는 간단하게 변경이 가능합니다 git config에서 user.name과 user.email을 변경하면 되는데요. 전역적으로 변경하는 방법에 대해 먼저 정리해보도록 하겠습니다. —-global 옵션을 추가하여 다음과 같은 명령어를 커맨드창에서 실행하면 됩니다. git config --global user.name 변경하고싶은이름 git config --glob..

이론/Diffusion

[논문리뷰] Stable Diffusion(High-Resolution Image Synthesis with Latent Diffusion Models)

2023.11.24

2022년 CVPR에서 공개된 High-Resolution Image Synthesis with Latent Diffusion Models을 리뷰해보도록 하겠습니다. 최근 생성형 AI, 그 중에서 텍스트를 이용하여 이미지를 생성하는 대표적인 모델인 스테이블 디퓨전(Stable Diffusion)을 공개한 논문입니다. 최근 들어 생성형 AI가 상당히 많은 주목을 받고 있습니다. 자연어 쪽을 우선 살펴보면 LLM, 대규모 언어 모델 기반의 생성형 AI들이 생겨나고 그 중 ChatGPT, Bard 등의 모델들이 떠오르면서 이제는 ChatGPT가 없으면 불편해질 정도가 되어버렸습니다. 이미지 생성 모델 역시 계속적인 발전을 이루고 있습니다. GAN이 떠오르면서 StyleGAN + CLIP 등 신기한 아이디어들이..

이론/LLM

[논문리뷰] LoRA: Low-Rank Adaptation of Large Language Models

2023.11.11

LoRA: Low-Rank Adaptation of Large Language Models An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes le arxiv.org 최근 자연어 처리 분야에서 가장 활발한 주제는 LLM을 어떻게 자신이 원하는 특정 태스크, 또는 특정 도메인에 파인튜닝을 할 수 있을지인데요. ..

이론/LLM

[논문리뷰] Stanford Alpaca

2023.11.05

최근 자연어처리와 관련한 기사들을 살펴보면 알파카나 라마라는 단어를 자주 보실 수 있을 텐데요. 이번 페이지에서는 그 중 알파카라는 자연어처리 모델에 대해 알아보도록 하겠습니다. 간단하게 소개하면 알파카는 라마(LLaMA)라는 모델을 튜닝하여 만든 모델입니다. 원본 모델인 라마를 튜닝했기에 비슷한 동물인 알파카라는 명칭을 지은 것 같습니다. 참고로 LLaMA는 Large Language model Meta AI의 약자입니다. 오픈소스 LLM은 대부분 이 LLaMA의 가중치, 학습방법 등을 모두 공개함으로써 시작했다고 볼 수 있을만큼 중요한 모델입니다. Stanford CRFM We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52..

이론/LLM

[논문리뷰] LLaMA: Open and Efficient Foundation Language Models

2023.11.04

LLaMA: Open and Efficient Foundation Language Models We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, witho arxiv.org 올해 Meta AI에서 공개한 LLaMA: Open and Efficient Foundation Language Models 를 리뷰해보도록..

이론/LLM

[논문리뷰] BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

2023.11.04

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based arxiv.org BART: Denoising Se..

전체 글

모델 레이어 수정

[논문리뷰] Video LDM, Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

[논문리뷰] CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

[논문리뷰] ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models

[Git] Git Author 변경하기 및 레포지토리마다 다르게 변경하기

[논문리뷰] Stable Diffusion(High-Resolution Image Synthesis with Latent Diffusion Models)

[논문리뷰] LoRA: Low-Rank Adaptation of Large Language Models

[논문리뷰] Stanford Alpaca

[논문리뷰] LLaMA: Open and Efficient Foundation Language Models

[논문리뷰] BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

티스토리툴바