Project 5A: The Power of Diffusion Models!

Background

In Part A, we play with diffusion models, implement diffusion sampling loops, and use them for other tasks such as inpainting and creating optical illusions. We use the DeepFloydIF diffusion model to help run the code.

0. Sampling from the Model

Approach

In this part, I used num_inference_steps=20 for Stage 1 and num_inference_steps=5 for Stage 2. The images are shown below.

part0model2changedto5.jpg
Stage 1 with num_inference_steps=20 (Top) and Stage 2 with num_inference_steps=5 (Bottom)

1.1 Implementing the Forward Process

Approach

In order to implement forward process, I created a forward function that applies the equations to the test image on the project spec. The results are shown below.

part1.1.jpg
Test Images at Noise Levels 250, 500 and 750

1.2 Classical Denoising

Approach

In this part, I applied torchvision.transforms.functional.gaussian_blur to each of the noisy images from the previous part. The results are shown below, with the top image being the noisy image and the bottom being the denoised.

part1.2t250.jpg part1.2t500.jpg part1.2t750.jpg

1.3 One-Step Denoising

Approach

In order to one-step denoise, I first applied the forward process to the test image then passed it through stage_1.unet. Then, I used the equation (im_noisy - noise_est * torch.sqrt(1 - alpha_cumprod)) / torch.sqrt(alpha_cumprod) to remove the noise from the image. The results are shown below:

part1.3t250.jpg part1.3t500.jpg part1.3t730.jpg

1.4 Iterative Denoising

Approach

To iteratively denoise, I applied equations 6 and 7 of the DDPM paper in the project spec, with strided timesteps from 990 to 0. The results are displayed below:

part1.4iter690-390.jpg part1.4iter240-clean.jpg
Noisy Campanile at t=690 to 0
part1.4onestepandgauss.jpg
Single Denoising Step (Top) and Gaussian Blurring (Bottom)

1.5 Diffusion Model Sampling

Approach

To generate images from scratch, I generated random noise with torch.randn and applied my iterative_denoise function on it. The 5 sampled images are shown below:

part1.5p1.jpg part1.5p2.jpg

1.6 Classifier-Free Guidance (CFG)

Approach

To implement iterative_denoise_cfg, I edited my iterative_denoise function such that noise_est = uncond_noise_est + scale * (noise_est - uncond_noise_est). Then, using the same procedure as 1.5, I generated random noise with torch.randn and applied my iterative_denoise_cfg function on it. The 5 sampled images are shown below:

part1.6p1.jpg part1.6p2.jpg

1.7 Image-to-image Translation

Approach

In this part, I applied the forward process to the test image, Kingaroo and Uniqloroo and applied my iterative_denoise_cfg function on it at the noise levels [1, 3, 5, 7, 10, 20]. The 5 sampled images are shown below:

part1.7test1.jpg part1.7test2.jpg
Test Image
part1.7kingaroo1.jpg part1.7kingaroo2.jpg
Kingaroo
part1.7uniqloroo1.jpg part1.7uniqloroo2.jpg
Uniqloroo

1.7.1 Editing Hand-Drawn and Web Images

Approach

In this part, I found an image online and drew two. Then, I passed the images through the starter code. The results are shown below:

bruh.jpg
Online Image
part1.7.1handraw1edit.jpg
Hand Drawn 1
part1.7.1handraw2edit.jpg
Hand Drawn 2

1.7.2 Inpainting

Approach

In this part, I implemented the inpaint function by editing my iterative_denoise_cfg code to apply equation 5 on the project spec to account for the mask. The results are shown below (left is image and mask, right is inpainted):

part1.7.2test.jpg part1.7.2testedit.jpg
Test Image
part1.7.2kingaroo.jpg part1.7.2kingarooedit.jpg
Kingaroo
part1.7.2uniqloroo.jpg part1.7.2uniqlorooedit.jpg
Uniqloroo

1.7.3 Text-Conditional Image-to-image Translation

Approach

In this part, I did the same as 1.7 but with the prompt "a rocket ship". The results are below.

part1.7.3test1.jpg part1.7.3test2.jpg
Test Image
part1.7.3kingaroo1.jpg part1.7.3kingaroo2.jpg
Kingaroo
part1.7.3uniqloroo1.jpg part1.7.3uniqloroo2.jpg
Uniqloroo

1.8 Visual Anagrams

Approach

In this part, I implemented make_flip_illusion editing my iterative_denoise_cfg function to apply the equations on the project spec (to account for UNets and flipping). I then ran it on three different pairs of prompts, shown below:

part1.8mancampfire.jpg part1.8manhatpencil.jpg part1.8manrocket.jpg

1.9 Hybrid Images

Approach

In this part, I implemented make_hybrids editing my iterative_denoise_cfg function to apply the equations on the project spec (to account for UNets and filtering). I then ran it on three different pairs of prompts, shown below:

1.9skullwaterfall.jpg
"a lithograph of a skull" and "a lithograph of waterfalls"
1.9pencilrocket.jpg
"a rocket ship" and "a pencil"
1.9hippiebaristavillage.jpg
"an oil painting of a snowy mountain village" and "a photo of a hipster barista"

Project 5B: Diffusion Models From Scratch!

Background

In this sub-project, we train diffusion models on MNIST datasets: Single-Step Denoising UNet, Time-Conditioning UNet and Class-Conditioning UNet. Credits to my CS189 Homework 6 from last semester for inspiring my code for training the models.

1. Training a Single-Step Denoising UNet

Approach

Using the skeleton code and the diagram in project spec provided, I implemented the Conv and UnconditionalUNet classes. I then created the functions add_noise and add_noise_to_dataset to assist with adding noise to an image and an entire dataset. I ran these functions with sigma=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0] on the dataset. Then I trained the model on the MNIST dataset and sampled results on the test set after the first and the fifth epoch and with out-of-distribution noise levels after the model is trained. The results are shown below:

fig3.jpg
Noising Process for Difference Sigmas
fig4.jpg
Training Loss Curve
fig5.jpg
Sample Results on the Test Set (Epoch 1)
fig6.jpg
Sample Results on the Test Set (Epoch 5)
fig7.jpg
Sample Results on the Test Set With Out-of-Distribution Noise Levels
fig7.jpg
Sample Results on the Test Set With Out-of-Distribution Noise Levels

2. Training a Diffusion Model

Time-Conditioning UNet

Approach

Using the skeleton code and the diagram in project spec provided, I implemented the FCBlock, TimeConditional and DDPM classes, as well as the ddpm_schedule, ddpm_forward and ddpm_sample. Then I trained the model on the MNIST dataset and sampled results on the test set after the fifth and the twentieth epoch. The results are shown below:

fig10.jpg
Training Loss Curve
tc5.jpg
Sample Results on the Test Set (Epoch 5)
tc20.jpg
Sample Results on the Test Set (Epoch 20)

Class-Conditioning UNet

Approach

My process for this part was essentially identical to the previous, except I added two extra FCBlocks, as mentioned in the spec. Then I once again trained the model on the MNIST dataset and sampled results on the test set after the fifth and the twentieth epoch. The results are shown below:

ccloss.jpg
Training Loss Curve
cc5.jpg
Sample Results on the Test Set (Epoch 5)
cc20.jpg
Sample Results on the Test Set (Epoch 20)