In the previous post, I demonstrated how to obtain the complete file list of a dataset without downloading the actual files. In this post, I will illustrate how to retrieve the full file list of a model. The solution is nearly identical, but the code to write the URLs into a file differs slightly.
This time, I will use the “snapshot_download” function to download the model. Below is an example source code:
from huggingface_hub import snapshot_download
def loadModels():
proxies={
"http": "http://127.0.0.1:7890",
"https": "http://127.0.0.1:7890"
}
token="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"
snapshot_download(repo_id="Undi95/dbrx-base", token=token, proxies=proxies)
Within the “snapshot_download” function, model information is retrieved and saved in the “filtered_repo_files” list at line 261. Subsequently, I will iterate through the “filtered_repo_files” list to write the URLs into a file before downloading files.
fileName = repo_id.split("/")[1].lower()
with open(f"{fileName}.txt", mode="w", encoding="utf-8") as f:
for file in filtered_repo_files:
ds = f"https://huggingface.co/{repo_id}/resolve/main/{file}?download=true"
f.write(ds)
f.write("\n")
# search HF_HUB_ENABLE_HF_TRANSFER and paste above code before following code
if HF_HUB_ENABLE_HF_TRANSFER:
......