Project 5

In Part A, we play with diffusion models, implement diffusion sampling loops, and use them for other tasks such as inpainting and creating optical illusions. We use the DeepFloydIF diffusion model to help run the code.

0. Sampling from the Model

Approach

In this part, I used num_inference_steps=20 for Stage 1 and num_inference_steps=5 for Stage 2. The images are shown below.

Stage 1 with num_inference_steps=20 (Top) and Stage 2 with num_inference_steps=5 (Bottom)

1.1 Implementing the Forward Process

Approach

In order to implement forward process, I created a forward function that applies the equations to the test image on the project spec. The results are shown below.

Test Images at Noise Levels 250, 500 and 750

1.2 Classical Denoising

Approach

In this part, I applied torchvision.transforms.functional.gaussian_blur to each of the noisy images from the previous part. The results are shown below, with the top image being the noisy image and the bottom being the denoised.

1.3 One-Step Denoising

Approach

In order to one-step denoise, I first applied the forward process to the test image then passed it through stage_1.unet. Then, I used the equation (im_noisy - noise_est * torch.sqrt(1 - alpha_cumprod)) / torch.sqrt(alpha_cumprod) to remove the noise from the image. The results are shown below:

1.4 Iterative Denoising

Approach

To iteratively denoise, I applied equations 6 and 7 of the DDPM paper in the project spec, with strided timesteps from 990 to 0. The results are displayed below:

1.5 Diffusion Model Sampling

Approach

To generate images from scratch, I generated random noise with torch.randn and applied my iterative_denoise function on it. The 5 sampled images are shown below:

1.6 Classifier-Free Guidance (CFG)

Approach

To implement iterative_denoise_cfg, I edited my iterative_denoise function such that noise_est = uncond_noise_est + scale * (noise_est - uncond_noise_est). Then, using the same procedure as 1.5, I generated random noise with torch.randn and applied my iterative_denoise_cfg function on it. The 5 sampled images are shown below:

1.7 Image-to-image Translation

Approach

In this part, I applied the forward process to the test image, Kingaroo and Uniqloroo and applied my iterative_denoise_cfg function on it at the noise levels [1, 3, 5, 7, 10, 20]. The 5 sampled images are shown below:

1.7.1 Editing Hand-Drawn and Web Images

Approach

In this part, I found an image online and drew two. Then, I passed the images through the starter code. The results are shown below:

1.7.2 Inpainting

Approach

In this part, I implemented the inpaint function by editing my iterative_denoise_cfg code to apply equation 5 on the project spec to account for the mask. The results are shown below (left is image and mask, right is inpainted):

1.7.3 Text-Conditional Image-to-image Translation

Approach

In this part, I did the same as 1.7 but with the prompt "a rocket ship". The results are below.

1.8 Visual Anagrams

Approach

In this part, I implemented make_flip_illusion editing my iterative_denoise_cfg function to apply the equations on the project spec (to account for UNets and flipping). I then ran it on three different pairs of prompts, shown below:

1.9 Hybrid Images

Approach

In this part, I implemented make_hybrids editing my iterative_denoise_cfg function to apply the equations on the project spec (to account for UNets and filtering). I then ran it on three different pairs of prompts, shown below:

Project 5B: Diffusion Models From Scratch!

Background

In this sub-project, we train diffusion models on MNIST datasets: Single-Step Denoising UNet, Time-Conditioning UNet and Class-Conditioning UNet. Credits to my CS189 Homework 6 from last semester for inspiring my code for training the models.

1. Training a Single-Step Denoising UNet

Approach

Using the skeleton code and the diagram in project spec provided, I implemented the Conv and UnconditionalUNet classes. I then created the functions add_noise and add_noise_to_dataset to assist with adding noise to an image and an entire dataset. I ran these functions with sigma=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0] on the dataset. Then I trained the model on the MNIST dataset and sampled results on the test set after the first and the fifth epoch and with out-of-distribution noise levels after the model is trained. The results are shown below:

2. Training a Diffusion Model

Time-Conditioning UNet

Approach

Using the skeleton code and the diagram in project spec provided, I implemented the FCBlock, TimeConditional and DDPM classes, as well as the ddpm_schedule, ddpm_forward and ddpm_sample. Then I trained the model on the MNIST dataset and sampled results on the test set after the fifth and the twentieth epoch. The results are shown below:

Class-Conditioning UNet

Approach

My process for this part was essentially identical to the previous, except I added two extra FCBlocks, as mentioned in the spec. Then I once again trained the model on the MNIST dataset and sampled results on the test set after the fifth and the twentieth epoch. The results are shown below:

Noising Process for Difference Sigmas	Training Loss Curve
Sample Results on the Test Set (Epoch 1)	Sample Results on the Test Set (Epoch 5)	Sample Results on the Test Set With Out-of-Distribution Noise Levels
Sample Results on the Test Set With Out-of-Distribution Noise Levels

Project 5: Fun With Diffusion Models!

Project 5A: The Power of Diffusion Models!

Background

0. Sampling from the Model

Approach

1.1 Implementing the Forward Process

Approach

1.2 Classical Denoising

Approach

1.3 One-Step Denoising

Approach

1.4 Iterative Denoising

Approach

1.5 Diffusion Model Sampling

Approach

1.6 Classifier-Free Guidance (CFG)

Approach

1.7 Image-to-image Translation

Approach

1.7.1 Editing Hand-Drawn and Web Images

Approach

1.7.2 Inpainting

Approach

1.7.3 Text-Conditional Image-to-image Translation

Approach

1.8 Visual Anagrams

Approach

1.9 Hybrid Images

Approach

Project 5B: Diffusion Models From Scratch!

Background

1. Training a Single-Step Denoising UNet

Approach

2. Training a Diffusion Model

Time-Conditioning UNet

Approach

Class-Conditioning UNet

Approach