AI copyright challenge: India’s artificial intelligence push is running ahead of its legal architecture. NITI Aayog’s roadmap on AI for Inclusive Societal Development frames AI as a productivity engine for nearly 490 million informal workers, promising gains in skilling, market access, and social protection through digital platforms. Yet the same technologies driving these ambitions—large language models and generative AI systems—are forcing policymakers to confront unresolved questions of copyright, authorship, and data use.
These questions are no longer abstract. Generative AI systems are trained on vast volumes of text, images, and audio scraped from the open internet, much of it copyrighted. As these models move rapidly into search, education, media, and enterprise software, the central policy challenge has sharpened: how can India protect creators’ rights without imposing licensing barriers that slow innovation and raise entry costs for domestic firms?
READ | AI regulation: China gets it right with hard rules
Generative AI and the economics of training data
Generative AI systems do not store or reproduce creative works in the conventional sense. They learn statistical patterns from massive datasets, converting language and images into mathematical representations. Their performance depends on two inputs: compute power and training data. It is the second that lies at the heart of the copyright debate.
Text-to-image models such as Stable Diffusion and large language models such as ChatGPT have been trained on billions of documents, many of them protected works. AI developers argue that training constitutes a non-expressive use and should fall outside traditional copyright restrictions. Content owners counter that large-scale scraping amounts to unauthorised reproduction, eroding the commercial value of journalism, publishing, and creative work.
The dispute has intensified because model capabilities are improving at exceptional speed, driven by richer datasets and iterative learning. As generative AI systems increasingly substitute or complement human labour in content-heavy sectors, the economic stakes for creators and publishers have risen sharply.
Copyright law was built for humans, not machines
Most copyright regimes, including India’s, rest on the principle of human authorship. The Copyright Act, 1957 protects “original works” created by natural persons. Purely autonomous AI output, without meaningful human creative control, falls outside this framework.
This creates two legal fault lines. The first concerns training: does using copyrighted material to train a model infringe the author’s exclusive right of reproduction? The second concerns outputs: who, if anyone, owns AI-generated content?
Indian courts have not yet offered definitive guidance. In the absence of jurisprudence, policymakers are attempting to resolve these questions through administrative and regulatory mechanisms rather than case law. That choice increases the importance of getting policy design right at the outset.
READ | How AI could hurt — and help — local journalism
India’s policy response: the DPIIT working paper
It is in this context that the Department for Promotion of Industry and Internal Trade (DPIIT) released a working paper proposing a mandatory blanket licensing and royalty framework for generative AI training. Under this approach, AI systems would be permitted to crawl publicly accessible content, while royalties would be collected through a copyright society and distributed to rights holders.
The proposal reflects debates unfolding internationally. The United States Copyright Office has launched a multi-part review addressing digital replicas, the copyrightability of AI outputs, and the legality of training practices. Courts in the United States and Europe are hearing lawsuits brought by news publishers and authors against AI developers, with outcomes that could shape global norms.
India’s choice to explore ex-ante licensing, rather than waiting for litigation to clarify boundaries, signals a preference for regulatory certainty. But certainty achieved too early can also harden inefficient outcomes.
The risk of over-correction
Protecting creators is essential. Yet a consent-heavy or inflexible licensing regime carries clear risks. Requiring permission for every training use would hand veto power to copyright holders, raising compliance costs and slowing research. It would also favour large firms with the financial capacity to negotiate licences, while smaller startups and open-source developers struggle to participate.
Even a blanket licensing model raises difficult questions. How should royalties be calculated—by volume of data scraped, by model revenue, or by downstream commercial use? How should payouts distinguish between small publishers producing original reporting and large media houses publishing at scale? Poor answers could turn licensing into a blunt instrument that taxes innovation without delivering fair compensation.
READ | In the era of AI-generated news, readers value trust over customised content
Market power, jobs, and the limits of national regulation
Copyright policy in the AI era cannot be separated from labour outcomes and market structure. Generative AI is already reshaping income streams in journalism, publishing, translation, and other content-driven services that rely heavily on freelance and informal labour. As routine cognitive tasks are automated, bargaining power shifts away from individual creators. In this setting, copyright becomes not only a question of authorship, but a mechanism for distributing AI-generated rents.
There is also a competition dimension that deserves explicit attention. Mandatory licensing regimes may inadvertently reinforce the dominance of global hyperscalers that can absorb compliance costs and negotiate at scale. Indian startups and open-source developers, by contrast, face higher relative barriers. Without parallel scrutiny from bodies such as the Competition Commission of India, copyright levies risk entrenching data moats rather than broadening access.
Enforcement further complicates matters. Many large language models are trained and deployed across borders, beyond the effective reach of domestic regulators. Unilateral licensing frameworks may struggle to capture value from offshore training pipelines, raising questions of trade compatibility and jurisdiction. In practice, India’s leverage will depend less on formal consent requirements and more on its ability to influence global norms and embed copyright principles into platform governance and market-access rules.
AI copyright: Toward a workable middle path
A durable framework must separate three issues that are often conflated. Training use should be treated differently from reproduction or substitution. Copyright protection should attach only where there is demonstrable human creative control in AI-assisted works. Compensation mechanisms must be proportionate, transparent, and predictable.
One plausible approach is a hybrid model. Publicly accessible content could remain available for training, while commercial AI deployments contribute to a revenue-linked levy. Funds could be pooled through a neutral copyright society and distributed using weighted metrics that reward originality and public-interest content, not merely scale. Such an approach preserves low entry barriers for innovation while ensuring that creators share in the economic upside of AI diffusion.
India’s AI ambitions rest on scale—of talent, data, and adoption. Over-regulation of training data would blunt this advantage. Under-protection of creators would weaken the knowledge ecosystem on which AI ultimately depends.
The objective, therefore, is not to police every act of data use, but to align incentives. Copyright policy must recognise AI as a general-purpose technology. It should be governed with a light but credible hand—protecting human creativity, rewarding original work, and keeping India’s innovation pipeline open.
The balance India strikes now will determine whether AI becomes a driver of inclusive growth, or another domain where policy lags technology.

