Artificial intelligence and data have a symbiotic relationship. Both data and AI need each other to survive and thrive. AI leverages data to function, and data relies on AI for effective analyses.
While this relationship is defined by a simple transaction, there are complex implications to consider. In the era of data privacy, it is crucial to understand and adhere to the principles of intellectual property. At the beginning of every ML project, it’s necessary to identify what assets need to be safeguarded. While some developers believe that copyright laws present barriers to the progression of AI, data protection and legal ownership are integral to the development of AI into the mainstream. As the metaverse becomes more and more data-centric, patented algorithms will be the way of the future. In this article, we will look at how to identify IP properties within any machine learning project.
Why do we need IP protection for AI?
Before undergoing the legal pursuit of protecting an algorithm, it’s worth knowing why we should patent AI projects. The answer is simple. Without intellectual property rights, developers would feel discouraged from developing new technologies. Training datasets, machine learning algorithms, software, and output all require different degrees of protection to uphold industry integrity. Patented algorithms are also key to the monetisation of artificial intelligence projects. Without the ability to assign legal ownership of an ML algorithm, investors may feel less inclined to finance a project.
There is no industry-wide consensus about the role of IP in AI, however. Many argue that the prevalence of patent protection for AI projects could harm innovation and competition. Some argue that AI advances could lower the cost of innovation, resulting in a large number of patents. These patents may then be held only by a few industry heavyweights who have access to the best technology and data.
Whatever side of the fence you find yourself on, some level of protection is needed to comply with industry standards.
How does it work?
In Ireland, intellectual property law protects ownership of AI. However, The Department of Enterprise, Trade, and Employment noted the current regulatory gaps in the National AI strategy. We expect a shift in the level of protection available to developers in coming years on a global scale as the World Intellectual Property Organisation continues to develop sanctions around AI development.
Artificial Intelligence can be protected by patents, copyrights, or as trade secrets:
- Patent: Patented algorithms exist in a legal grey area. To put it simply, patents protect people. An inventor of an algorithm can own a patent and therefore protect their project. However, work developed by an AI is technically outside of patent protection.
- Copyright: Copyrights are simpler and protect a wide variety of AI applications. Section 21(f) of the Copyrights Act ensures protection over computer-generated works.
- Trade Secrets: Trade secrets provide unique protection far exceeding the limits of a simple patent. Trade secrets are applied to more valuable projects that require more resources for protection.
Best Practice for identifying intellectual property in ML projects
It’s a common misconception that AI projects cannot be patented. In reality, IP exists at every step of an AI project lifecycle. The IP regulations that exist within any given ML project are innumerable. Since AI is notoriously indefinable and ‘unprotectable’ patent thicketing (the process by which a multi-layered patent system is introduced) is becoming increasingly common.
To meet IP compliance standards, developers need to know how to effectively classify each process within an ML project into a sect of IP. Generally speaking, the more significant and unique a process is, the more protection it will require. We’re going to take you through the four phases within every ML project that require IP identification:
Phase 1: Setup
The setup phase comprises collecting, classifying, and labelling data. It is made up four IP components; data, annotation protocol, labels, and label taxonomy.
- Data: Data is the core property of any ML project. Each data point represents the atomic unit used to build a machine learning-powered application. In terms of identifying intellectual property at this stage, aggregated data can be protected by copyright laws.
- Annotation Protocol: Annotation protocol encompasses a set of instructions used by human annotators to perform a data annotation task. The main purpose of such protocol is to “train” human annotators to correctly assign labels to data. This unique methodology can be copyrighted.
- Labels: Labels are a list of metadata tags required to perform data annotation. To comply with IP regulations, labels should be copyrighted.
- Label Taxonomy: An organised structure of labels typically represented by a graph where each label is a node linking to one or multiple nodes. Copyright laws protect this structure as it is unique to the project.
Phase 2: Models Training
During the models training phase, there are three main areas that necessitate IP protection:
- Labels metadata: Labels metadata constitutes the entire ensemble of metadata labels and their corresponding data samples. As such, labels metadata can be proprietary, and qualifiable for copyright protection.
- Machine Learning Models: Trained machine learning models are “serialised” self-contained files. Such files are trained to recognise specific patterns while used in conjunction with the inference software interface. The models are typically trained using a training software interface. This interface loads data and corresponding labelled data, for development. Third-party machine learning software libraries are what generate the machine learning models. Unique machine learning models can be protected by trade secrets.
- Active learning workflow: The active learning workflow is an algorithm that can proactively dispatch curation jobs according to certain criteria. Effectively following IP guidelines means patenting the active learning workflow algorithm.
Phase 3: Models Exploitation
The third phase within an ML project sees the models enter into a software pipeline. The pipeline is flooded with data to create the desired output. We have identified four elements within the models exploitation phase that require IP protection:
- ML Output: ML output is made up of the desired results from a trained algorithm. Results are often difficult to define and similarly can be difficult to protect.
- Inference software interface: The interference software interface can be copyrighted as it is defined as a proprietary computer-based work. This interface describes the computer program that loads machine learning models, third-party machine learning software libraries, and newly generated data to produce an ML output.
- ML Pipelines: Machine Learning Pipelines are the mechanism by which data is processed. ML pipelines can be patented for protection and ownership.
- Training software interface: Copyrights can be applied to source code and thus are the option we deploy for the training software interface.
Phase 4: Deployment
The deployment stage sees the integration of ML models into a production environment. There are five steps within the deployment stage before the ML life cycle can be fully realised.
- Web applications: Web application software runs on a web server, unlike computer-based software programs that are run locally on an operating system. However, this does not mean web applications are defenseless. We apply patents to our web applications for IP compliance and protection.
- ML APIs: ML API’s constitute a type of software interface, offering a service to other pieces of software. Typically, they encapsulate an ML Pipeline (patented) and they are made available on the web through a deployment infrastructure. Copyrights can protect ML APIs.
- Third-party Machine learning Software Libraries: Data stores consisting of common algorithms and utilities to train, test, optimise, load, run and update machine Learning Models. Patents protect and uphold the intellectual property of third-party libraries.
- Edge Applications: Edge applications are a distributed computing paradigm that brings computation and data storage closer to data sources (users). These applications can be patented.
- Deployment infrastructure: Ths infrastructure typically comprises ML Pipelines deployed in the cloud using specific tools made available by cloud service providers. The unique infrastructure used in an ML project meets the credentials for copyright certification.
Outside of these four phases, developers must consider the relevancy of background IP. Background IP encompassses all the work completed prior to or separately from the specific contract that may be used in the project. Background IP is important for developers to consider when ascertaining ownership of a project. Essentially, background IP makes it easy to assign proprietorship within an ML project, recognising both past and present contributions.
The proliferation of artificial intelligence is undeniable. Soon the world as we know it will evolve into an AI-enabled ecosystem. As it evolves, so will the value of legal protections and intellectual property. Identifying key intellectual properties in an ML project is paramount in the upkeep of transparency and integrity in AI.
Author: Johanna Walsh