When it comes to privacy on the internet, people often push back against the biggest innovators and advancements because of the bad wrap they get on privacy issues. Machine learning is no exception. It worries people that their data will be processed with enough complexity to predict their future behavior.
Current State of Privacy in Machine Learning
Machine learning is a tool for processing large amounts of data, and creating models to make decisions with minimal human intervention. Privacy and machine learning are at odds because the data used for training the systems requires access to (sometimes) private information. One of the goals of machine learning is to monitor a population and based on its observations, make predictions about the population’s, or individuals within this population’s, future actions.
Once artificial intelligence systems are applied in ways that greatly improve how internet connected tools work for us, people’s concerns will likely subside. It’s still easy to imagine how these systems could get out of control. The general public has very little understanding of how these systems work, and the companies that are driving machine learning forward aren’t giving people much transparency into their processes. Machine learning and artificial intelligence systems don’t rely on any information that Google or Facebook isn’t already collecting, it’s just improving the systems that are analyzing that data.
Anonymizing Data
As data comes into a system in its raw form, it may contain unique identifiers like people’s names or email addresses. The best systems will remove any identifiable information, which is more than just people’s names. According to a study by Philippe Golle of the Palo Alto Research Center, 63 percent of the US population is uniquely identifiable from their gender, ZIP code and birthday.
Anyone who is deciding how to anonymize their data sets needs to consider that stripping out the basic identification data points isn’t always effective. Apple’s use of differential privacy is one of the more well-known applications of data anonymization. It collects information from iPhone users about which features they use most often, but it doesn’t store information about who the users are or any information which could be linked back to their device. Because they aren’t interested in specific demographic information, but in data about how people in general use their devices.
Read More: Learning with Privacy at Scale – Apple
Anonymization Doesn’t Slow Innovation
Some companies that collect and process data may argue that if they are required to anonymize any data that they use, innovation and progress will be hindered. This simply isn’t true. In most cases, someone’s name and contact information isn’t useful to the process the system is analyzing.
One case where anonymization may hinder progress is for companies that rely on behaviors from specific users. For example, a company like Amazon may need to know what you’ve purchased in the past to make product recommendations that are relevant to you. If certain privacy regulations are put in place that prevent this type of tracking about specific users, these types of companies will suffer.
For big data, machine learning, and artificial intelligence as a whole there are ways to deliver effective results, without blatantly violating people’s privacy.
Why Does Machine Learning Amplify The Issue of Data Privacy?
Machine learning is an automated use for massive amounts of data. It relies on data sets which may include private or sensitive information. Privacy advocates argue that machine learning is the ultimate privacy breach. It concerns this people because it doesn’t just give companies access to your data, it allows them to use your private information to drive their decision making.
Privacy Challenges with Machine Learning
Artificial Intelligence, which is driven by machine learning, isn’t always “set up” to tolerate data that has been stripped of unique identifiers. However, this is an issue created by the people who have made these systems. If these A.I. systems were built with privacy by design, this wouldn’t be an issue. For systems to stand the test of time, including new privacy laws, they should integrate data anonymization and privacy protection right away.
The Future of Privacy and Machine Learning
As privacy laws improve and more limiting of which information companies can collect and how they process it, machine learning and artificial intelligence will have to adapt to fit current privacy norms. As the internet shifts as a result of stricter privacy laws, the technologies that drive the internet will shift as well. We can see a future where systems using machine learning won’t have to put people’s privacy at risk to deliver the same results. The smart people who have made these amazing systems are totally capable of creating a privacy-friendly version of artificial intelligence.
Read More: Big Data vs. Privacy