DeepFaceLab Freezes with SAEHD and AMP Models on RTX 4060 Ti

This topic has 1 reply, 1 voice, and was last updated 5 months ago by whois306.

Viewing 2 posts - 1 through 2 (of 2 total)

Author

Posts
January 2, 2025 at 2:31 am #10312
whois306
Participant
I’m experiencing a persistent issue with DeepFaceLab where it freezes when attempting to train using the SAEHD or AMP models. The program gets stuck after loading data, without consuming any CPU, memory, or GPU resources. This issue occurs even when pretraining is disabled. Interestingly, the Quick96 and XSeg models work perfectly fine.

Troubleshooting Steps Taken:

Data Check: Verified the training data quality and paths, ensuring they are not the cause of the issue.

Environment Verification: Confirmed DeepFaceLab is using its built-in CUDA and Python environment.

GPU Driver Check: Installed the latest GPU drivers for my RTX 4060 Ti.

Resource Monitoring: Monitored GPU, CPU, and RAM usage, observing no activity when the program freezes.

DeepFaceLab Logs: Checked DeepFaceLab logs, but no obvious errors were found.

train.py Configuration: Modified batch_size and resolution in train.py to reduce memory usage, but the problem persists.

Pretraining Disabled: Disabled pretraining for SAEHD model, issue remains.

Software Conflict Check: Closed unnecessary software and services, still unable to resolve issue.

Dependency Analysis: Used pip list to check and tried updating/downgrading some DeepFaceLab dependency libraries, but issue remains.

Possible Causes (Speculated):

Hardware Incompatibility: A potential compatibility issue between my RTX 4060 Ti and DeepFaceLab, especially with SAEHD and AMP models.

DeepFaceLab Code Bug: A potential bug within the SAEHD or AMP model code in DeepFaceLab, leading to the freeze.

Software Conflict: A potential software conflict, specific to my system environment, impacting only the SAEHD and AMP models.

DeepFaceLab Version Issue: A potential bug within my specific DeepFaceLab version.

Seeking Help On:

RTX 4060 Ti Compatibility: Are there any known compatibility issues with RTX 4060 Ti and DeepFaceLab’s SAEHD/AMP models?

DeepFaceLab Code Analysis: If someone is familiar with DeepFaceLab code, could you help analyze the SAEHD and AMP model code, focusing on:

Model initialization.

Data loading and preprocessing.

Loss function calculation.

Gradient update.

Code related to hardware resource allocation and instruction sets.

Software Conflicts: Are there any known software conflicts specific to DeepFaceLab, especially impacting only the SAEHD and AMP models?

Dependency Issues: Any possible dependency issues that might cause this freezing behavior? I can provide a list of dependency versions using pip list.

System Information:

GPU: RTX 4060 Ti 16GB

CPU: i7 Processor

RAM: 32GB

Operating System: (windows 11)

DeepFaceLab Version: (DeepFaceLab_NVIDIA_RTX3000_series)

CUDA: DeepFaceLab’s built-in CUDA

Python: DeepFaceLab’s built-in Python

Additional Information:
I have tried everything listed above but the issue persists. Any insights or suggestions would be greatly appreciated.
January 2, 2025 at 5:50 am #10313
whois306
Participant
“Using DeepFaceLab_DirectX12, I can train models normally, which, with some limitations, means the problem is solved.”
Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.