In the previous post, I demonstrated how to obtain the complete file list of a dataset without downloading the actual files. In this post, I will illustrate how to retrieve the full file list of a model. The solution is nearly identical, but the code to write the URLs into a file differs slightly.

This time, I will use the “snapshot_download” function to download the model. Below is an example source code:

from huggingface_hub import snapshot_download

def loadModels():
    proxies={
        "http": "http://127.0.0.1:7890",
        "https": "http://127.0.0.1:7890"
    }

    token="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"
    snapshot_download(repo_id="Undi95/dbrx-base", token=token, proxies=proxies)

Within the “snapshot_download” function, model information is retrieved and saved in the “filtered_repo_files” list at line 261. Subsequently, I will iterate through the “filtered_repo_files” list to write the URLs into a file before downloading files.

    fileName = repo_id.split("/")[1].lower()
    with open(f"{fileName}.txt", mode="w", encoding="utf-8") as f:
        for file in filtered_repo_files:
            ds = f"https://huggingface.co/{repo_id}/resolve/main/{file}?download=true"
            f.write(ds)
            f.write("\n")

# search HF_HUB_ENABLE_HF_TRANSFER and paste above code before following code
if HF_HUB_ENABLE_HF_TRANSFER:
......
Previous PostNext Post

Leave a Reply

Your email address will not be published. Required fields are marked *