Categories
The Prompt
October 31, 2023
As AI tools grow, it is crucial for researchers to assess data practices of vendors. By gaining insight, researchers can make better choices.
With all the buzz about the latest and greatest new AI tool, it would behoove us researchers to take a step back and ask prospective (or current!) vendors these two questions:
The first question has received a lot of scrutiny especially with the Zoom “Terms of Service” debacle last August, but many researchers don’t necessarily understand exactly how their data is being used within the context of Generative AI/LLM model training. And most researchers (except those who focus on IT and the technology industry) might also not be well versed in cybersecurity risks and mitigation techniques.
So, let’s address both of these questions and empower you, the research community, with the relevant information so you can make an informed decision as to which AI solution is best at assisting your research studies without violating privacy of your participants/respondents.
This question is fundamental to maintaining data privacy and confidentiality as well as ensuring adherence to the corresponding rules and regulations.
At a minimum any AI solution should be GDPR and CCPA compliant (yes there are many other data privacy rules and regulations out there, but these two tend to be the most stringent). The following is an easy rule of thumb to figure out which of the myriad of data privacy laws applies in your specific situation:
Whichever law applies in the situation, or is the most stringent if multiple ones apply, is the one the AI vendor should follow.
The other area is understanding whether the data you upload will be used, in turn, to train the AI model. Training a bespoke model (one that is either internally developed or is ‘walled off’ from other users) with your data is very useful in making the model smarter for your business. But, one must ensure that the data isn’t being used to train a “general-purpose” AI model or LLM (like OpenAI or Bard or Claude). Explicit assurance from the vendor is key – OpenAI does a good job of this within their terms of service:
Note they not only explain which data is used for model training, but they also offer the ability to opt-out.
Once you are assured of how your data is being used (or not used!) the other question now needs to be addressed.
I spend much of my time interviewing security professionals and this very question keeps most of them up at night.
If you think of a Venn diagram with “Cybersecurity Risks” in one circle and “Data Privacy Risks” in the other circle, the overlapping area is the unauthorized use of PII (personally identifiable information). This is an area we want to make sure our vendors are taking extraordinary measures to secure and protect.
Cybersecurity threats center around the idea of vulnerability management – preventing ‘bad actors’ from hacking, ransomware and DDOS (distributed denial of service) attacks. Security teams have options to mitigate these risks: utilizing firewalls and restricting access through the use of advanced verification methods. In addition to securing the data, they should be encrypting data both ‘in transit’ (between computers/servers) and ‘at rest’ – where the data is stored.
Privacy threats can arise from data processing where disparate sets of data can be linked to identify an individual. This data can then be shared without consent. In addition, when companies deviate from security and data best practices, standards, and regulations, It puts privacy at risk. To mitigate these threats, ideally any AI solution should deidentify, anonymize or pseudonymize anything that is PII. In addition, we should be only uploading the bare minimum of data necessary to complete our analysis.
Ideally the AI solution should be built with privacy enhanced technologies, and, in the perfect world, the solution would have been designed from the beginning with privacy in mind – rather than trying to implement privacy on top of a solution that already exists.
When our data is well protected, we can be assured of confidentiality, integrity, and availability as well as privacy – making us and our stakeholders/clients able to rest assured.
So, as you look to evaluating your next AI solution (or any Software as a Service offering) ask your prospective vendor the two simple questions outlined in this article. And, if you are already using an AI tool, be sure to ask your solution provider how they are currently using and protecting your data.
Comments
Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.
Disclaimer
The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.
More from Lisa Horwich
Emerging technologies can provide new opportunities to provide services and expertise that augments data.
Sign Up for
Updates
Get content that matters, written by top insights industry experts, delivered right to your inbox.
67k+ subscribers