PartnerinAI

Anonymous data upload for ACL submission: a practical guide

Understand anonymous data upload for ACL submission, EMNLP artifact rules, hosting options, and metadata risks before you share models.

📅May 23, 20268 min read📝1,664 words
#anonymous data upload for ACL submission#EMNLP anonymous dataset hosting#Hugging Face anonymous submission policy#how to share replication data anonymously#anonymous model upload for conference submission#ACL EMNLP artifact anonymity rules

⚡ Quick Answer

Anonymous data upload for ACL submission usually means authors must avoid deliberate deanonymization while still providing reviewers access to replication artifacts. The safest approach is to use a host that supports private or neutral sharing, scrub metadata carefully, and follow the venue’s stated anonymity rules rather than guessing.

Anonymous data upload for ACL submission sounds easy right up until you attempt it. Then the snags appear. You need reproducibility. You need reviewer access. And you also need to avoid leaking your lab, company, username, or account trail through a pile of tiny signals the call for papers rarely spells out. That's the real snag. That gap between policy language and day-to-day artifact handling is where plenty of careful submissions start to wobble.

What does anonymous data upload for ACL submission actually require?

What does anonymous data upload for ACL submission actually require?

Anonymous data upload for ACL submission usually means keeping identifying details out of view while still making artifacts available for review when the venue asks. Simple enough. ACL-style venues, including EMNLP, often frame anonymity around author behavior, not around some imaginary world where zero metadata exists anywhere online. That distinction isn't trivial. A hosting platform may keep internal logs, but that doesn't automatically mean an author revealed identity through a public profile, model card, repository history, or branded storage link. The ACL Anthology and conference handbooks have long stressed double-blind review norms, and artifact instructions usually center on stopping reviewer-facing deanonymization. We'd argue that's the right lens. The practical question isn't whether a platform could technically track downloads. It's whether your setup gives reviewers obvious clues that point straight back to you. Think of a public S3 bucket named after a lab. Worth noting.

Is Hugging Face anonymous submission policy compatible with ACL and EMNLP rules?

Is Hugging Face anonymous submission policy compatible with ACL and EMNLP rules?

Hugging Face can fit anonymous submission rules, but only if you strip out visible identity cues and check the venue's latest artifact guidance. That's a bigger shift than it sounds. Hugging Face appeals to researchers because reviewers can access models and datasets quickly, and the tooling already feels familiar across NLP work. But a standard account page, organization name, model card wording, linked papers, and commit metadata can expose authorship in seconds. Not quite invisible. So the platform itself isn't the problem. Your configuration is. If you work with Hugging Face, create a neutral account, avoid self-identifying descriptions, inspect card metadata line by line, and confirm whether access statistics show up for anyone beyond the account owner. My read is blunt. Many researchers fixate on hypothetical backend tracking and ignore the public breadcrumbs they scatter everywhere on the artifact. A model card that says "built from our Berkeley pipeline" gives the game away fast. Worth noting.

How to share replication data anonymously across hosting options

How to share replication data anonymously across hosting options

The safest way to share replication data anonymously is to pick a platform based on reviewer access, metadata exposure, and what you'll need when the paper gets accepted. Here's the thing. OSF can work well because it supports view-only sharing and has a long academic track record, though you still need to inspect project settings carefully. Anonymous GitHub mirrors can work for code, but commit history, issue templates, and default account links can leak identity if you rush the setup. And institutional storage may offer neutral links, yet it sometimes exposes university branding in the URL or access page. Zenodo is strong for final archival release, but it often fits camera-ready stages better than blind review unless you configure it with real care. No platform becomes anonymous by accident. Researchers need a decision tree that weighs convenience against discoverability, because a sloppy "good enough" upload can undo months of careful blind writing. A rushed GitHub mirror with an old avatar still attached is a classic own goal. We'd say that's not trivial.

EMNLP anonymous dataset hosting checklist: where metadata leaks happen

EMNLP anonymous dataset hosting checklist: where metadata leaks happen

EMNLP anonymous dataset hosting usually fails through metadata leaks, not through the file payload alone. That's the part people miss. Filenames can include lab acronyms. README text can mention an earlier project page. Model cards can cite "our previous work" and link to a personal site, while file headers can carry usernames left behind by training scripts. And even PDF documentation can expose author names in document properties, while Hugging Face and GitHub pages may reveal profile images, follower links, or organization affiliations. The Association for Computational Linguistics has repeatedly stressed anonymization discipline in submission materials, and artifacts deserve that same scrutiny. We'd go a step further. Treat your artifact like a forensic surface. If a reviewer can infer your identity with one click, the setup wasn't anonymous enough. A stray username in a tarball header can do more damage than the dataset itself. Worth noting.

Step-by-Step Guide

  1. 1

    Read the venue artifact policy first

    Start with the specific ACL, EMNLP, or workshop instructions for the current cycle. Policies change, and supplementary material rules often differ from archival artifact rules. Don’t rely on Reddit threads from two years ago when the call for papers is the controlling document.

  2. 2

    Choose a neutral hosting platform

    Pick the host that matches your artifact type and anonymity needs. Hugging Face works well for models, OSF works well for project bundles, and institutional storage can work if the links are truly neutral. Choose based on reviewer usability and metadata exposure, not habit.

  3. 3

    Scrub visible identifiers

    Remove author names, lab names, organization handles, profile photos, branded URLs, and self-referential model card language. Check file properties, Git history, notebook metadata, and dataset documentation too. Small leaks are still leaks.

  4. 4

    Test the artifact as an outsider

    Open the link in a private browser session and click through every visible page. Ask a colleague who is not on the project to inspect the artifact for identity signals. If they can guess the authors, reviewers probably can too.

  5. 5

    Document replication without self-reference

    Write clear instructions for running the artifact, but avoid phrases like “our lab server” or links to your personal site. Neutral wording still supports reproducibility. In fact, clearer and more generic instructions often work better for reviewers.

  6. 6

    Plan the camera-ready handoff

    Decide in advance how the anonymous artifact will become the final public release if the paper is accepted. That may mean moving from a neutral account to a lab account, publishing on Zenodo, or attaching a DOI later. Planning early prevents rushed edits that break reproducibility.

Key Statistics

ACL 2024 received thousands of submissions across its main conference and workshops, increasing pressure on reproducibility workflows.High submission volume means reviewers need artifact access that is simple and consistent, which makes platform choice and anonymity hygiene more consequential.
Hugging Face reported more than 1 million models hosted on its platform in 2024.That scale explains why researchers gravitate toward Hugging Face for anonymous model sharing, even though account configuration can expose identity.
OSF has served millions of files for research sharing across disciplines, making it a common choice for view-only academic artifact access.Its academic roots make OSF a credible option when authors want neutral project hosting without leaning on a personal code profile.
A 2023 Nature survey on research reproducibility found many scientists still struggle with access and documentation gaps in shared materials.Anonymous upload decisions should preserve reviewer usability, because a perfectly anonymous artifact is pointless if no one can replicate the paper.

Frequently Asked Questions

Key Takeaways

  • Anonymous hosting is about avoiding identity leaks, not pretending platforms keep zero logs
  • Hugging Face may be acceptable if authors remove obvious identifying metadata
  • ACL and EMNLP rules usually care most about deliberate deanonymization by authors
  • A simple risk checklist beats vague advice when uploading datasets or models
  • Plan the anonymous upload and camera-ready transition at the same time