Ethical Data Collection in the Digital Age

25 March 2024
6 min read

In 2022, a lawsuit was filed against Amazon for illegal voice data collection via the Alexa virtual assistant. This case illustrates the growing importance of ethical data collection. In an era where data misuse and data breaches happen here and there bringing severe reputational damage to the company, an ethical approach to data collection becomes a necessity, not an option anymore. It’s more than a matter of legal compliance or measures against misuse of sensitive information. Here we’re speaking about corporal responsibility and customer trust, as evident in Amazon’s case, when the company hit privacy charges, worker walkouts, and strong criticism of publicity.

Being experts in data collection and analysis, we decided to publish this article to walk you through the jungle of ethics in data collection and explain its key principles, regulatory framework, and best practices.

Key principles of ethical data collection

As more businesses adopt a data-centric approach, they need to consider the ethical aspects of their data practices, including data collection. Below we speak about 5 key principles of ethical data collection, although there are more of them:

Principles of Ethical Data Collection

1. Consent

Informed consent is probably the most important when it comes to the ethics of data collection. From 2008 to 2010, Google Street View cars gathered email and password data from unencrypted Wi-Fi networks. And even though Google did this unintentionally while capturing images, the situation invited a slew of criticism because of absent consent.

As a part of the agreement established between the data owner and the business, consent is taken before the actual data collection process starts. It’s also critical to articulate the request in a clear and understandable language, so the data subject knows what they’re granting consent for.

2. Data minimization

The principle of data minimization entails that businesses have to collect the data they need for a specific purpose. It’s a bad idea to gather excessive information, which increases the risk of data breaches and privacy infringement.

3. Transparency

Being open and transparent about data collection practices is also important. This way, businesses communicate to data providers what specific information is being collected, how it’s going to be used, and with whom it might be shared. Let’s learn from the best: following Apple’s latest iOS update, many consumers gained insight into the extent of their data usage. Here is how Craig Federighi, Apple’s senior vice president of software engineering, explains their latest focus on data privacy, which also intends transparency in data collection:

“Privacy means peace of mind, it means security, and it means you are in the driver’s seat when it comes to your own data.”

The data owner should also feel enough control over their data. This means if for any reason they decide to stop the usage and sharing of data, they should have the option to easily opt out.

4. Fairness

You as a business executive should make sure that your data collection contributes neither to inequality nor injustice. This means being mindful of potential biases and prejudices in data gathering, which can then affect algorithmic decision-making.

The story of the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) algorithm teaches that data collection and AI can easily perpetuate discrimination if not managed properly. Because of biased data collection, the model used in the US court system predicted twice as many false positives for recidivism in black offenders (45%) as compared to white ones (23%).

5. Accountability

Last but not least, businesses should stay accountable for their data collection methods. If a data breach or any other problem occurs as a consequence of your action, you’d better take responsibility and try to remedy the situation. If a company follows this principle, it will likely have a risk mitigation strategy, which makes it easier to solve any challenges.

Integrating these principles into your data collection practices simplifies it for businesses to navigate through the intricacies of data ethics. This also benefits the business itself because data providers are confident their data is being collected responsibly and are more inclined to share it. Still, we can discuss the benefits of ethical data collection in detail.

Business benefits of ethical data collection

4 Reasons to Choose Ethical Data Collection

If a company pays attention to ethics in data collection, it’ll likely benefit from:

  • Greater trust and reputation: Customers, partners, and the public, in general, will trust the company more eagerly if they know this business prioritizes ethics of data collection. As a result, the company could count on a positive brand reputation and loyal stakeholders, which makes a strong foundation for long-term sustainability and growth.
  • Compliance: If the company complies with data protection and privacy regulations when it comes to data usage and data collection, it avoids any possible legal consequences, from fines to reputational damage.
  • Data quality: More focus on ethical considerations in data collection can enhance data quality and, thus, your model results. As said, if your company’s data collection methods are trustworthy, others are more likely to share their data.
  • Cost efficiency: Businesses can expect significant cost savings in case they prevent data breaches and legal and financial consequences that usually follow.
  • Competitive advantage: With more emphasis on ethical data collection practices, it’s also possible to set yourself apart from the competitors. Stakeholders are more ready to work with the companies that put ethics first.

Challenges and risks in ethical data collection

As evident, data ethics, including data collection, extends beyond a moral obligation; it also brings many growth opportunities for businesses. At the same time, complying with ethical standards isn’t easy — below are some ethical dilemmas and risks that you could meet.

  1. Accuracy vs. privacy: The biggest issue in ethical data collection lies in the conflict between privacy and effective analytics. AI and ML technologies need lots of data to be more accurate in results, but this also poses more data privacy risks. The healthcare sector with sensitive patient data is the prime example here. In 2018, the personal information of 2.65 million patients was leaked, with names, addresses, social security numbers, etc. falling into the wrong hands due to a lack of proper data management.
  2. Legality vs. stakeholder expectations: Another interesting matter in digital data collection is what’s legally permissible and what your stakeholders believe is fair. Sometimes, companies can gather more information than they actually need, even though they act within the existing regulations. This could raise uncomfortable questions from their stakeholders, especially if the company didn’t warn the partners about their plans.
  3. Third-party risks: Don’t ignore ethical issues in data collection posed by third parties. Your company could be fully transparent in data collection methods, and then you decide to buy data from a third-party vendor whose practices are less secure. Since businesses have limited control over their partners’ policies, this could create a new ethical challenge for you.

Each of these challenges demands thoughtful consideration, but we’ll cover some ideas on how to mitigate these risks later. Meanwhile, we can talk about the specific laws and regulations that impact your data collection.

Regulatory framework and compliance

Here we mention the most well-known compliance standards and regulations for data protection and privacy:

  • General Data Protection Regulation (GDPR) enacted in Europe is acknowledged as one of the strictest data privacy laws. Its idea is to make businesses maintain the strictest control over user data as well as empower users to delete or remove their data from the systems where it was uploaded. On the matter of data collection, the law says that certain data can be collected and stored only if it remains anonymous. It also shouldn’t be kept longer than you need it. (See the infographic below to learn more about GDPR.)


  • California Consumer Privacy Act of 2018 (CCPA) empowers users with greater control over personal data collected by businesses. Covering California residents and companies that do business within the state, the law extends to 1) the right to be informed about personal information that’s being collected, how it’s used and shared; 2) the right to correct inaccurate data collected about the user; 3) the right to delete the collected information (with some exceptions); 4) the right to opt-out of the sale and sharing of personal data; 5) the right to non-discriminate when exercising the CCPA rights; and others. To a large extent, CCPA was modeled off of the GDPR.
  • The Data Protection Act of 2018 represents the UK’s version of the GDPR. The act regulates the use of personal data by businesses, organizations, and the government. Among other things, it proclaims the following data protection principles: data collection and use in a fair, lawful, and transparent manner; data collection for specific, explicitly stated purposes; data retention for no longer than necessary; data collection in a way that is adequate, relevant, and limited to what is necessary; etc.
  • Health Insurance Portability and Accountability Act (HIPAA) is the US federal law that aims to protect health data or information related to patient care. The law mainly targets hospitals, insurance companies, and other businesses that work with patient data prohibiting disclosure of health data to third parties unless a patient gives clear, well-documented consent.
  • The Children’s Online Privacy Protection Act (COPPA) is enacted in the US to protect children’s data. This framework outlines guidelines, including dos and don’ts for data collection and processing of children under 13. For example, it covers the topics when consent from a guardian is required or when data collection is prohibited.

Businesses that don’t adhere to these standards and regulations can be subject to legal liability like fines and other penalties. Even harsher risks of non-compliance include damage to the reputation, challenges in securing investments, lower profits, and others.

Machine Learning for Insurance Business
White Paper
Machine Learning for Insurance Business
Download now

Best practices for ethical data collection

As we made it clear why data ethics is important, the next logical step is to answer how to achieve this. Consider the best practices for ethical data collection:

1. Develop a policy for ethical data collection

Start by setting up company-specific rules for data collection. This could be a part of a larger framework for data usage that employs a shared vision and mission regarding the company’s use of data and, thus, speeds up decision-making.

Here are some pieces of advice to make your data policy more applicable:

  • Make sure the C-level executives participate in the process of policymaking.
  • Tailor your data collection rules to your specific industry, as well as the products or services offered. For example, if you work with patient data, you might need some extra rules, such as about the use of HIPAA-friendly solutions.
  • Ensure the policy is accessible to all stakeholders of yours, including employees and partners.
  • Revisit and update the rules regularly to adapt them to changes in business and technology.

Questions to ask to set up a framework for ethical data collection

2. Comply with regulations

Compliance is a critical aspect of ethical data collection, which has to be strongly rooted in your data policy. Data protection laws, such as the GDPR, CCPA, etc., should be the foundation for your company’s rules. Also, make sure to stay updated about the latest changes within these regulations, as they are reviewed regularly. For example, monitor any updates regarding extra consent needed from individuals or new security practices.

Think also about regular audits and compliance checks within your organization. From time to time, organize an independent examination of data collection to check how this process aligns with the applicable laws as well as your internal policy.

3. Cultivate a responsible data culture

It’s easier to use transparent and fair data collection practices in a company with a responsible data culture. First, build a privacy-first mindset in your company, prioritizing privacy rights and data protection in your operations. Secondly, conduct regular data ethics training, where you explain the importance of ethical data collection, the potential risks and ethical implications in case of non-compliance. Training programs could also help you educate employees on ethical data collection methods and ethical data handling in general.

Handling Data Responsibly

4. Mind your data collection methods

On a lower level, make sure your data scientists employ ethical data collection methods like:

  • Using encryption for sensitive data to enhance security. Overall, when dealing with any sensitive data like identification details and health records, it’s recommended to use precautions to ensure the data won’t be leaked.
  • Safeguarding the data from unauthorized access by using strong passwords, keys, and authentication methods.
  • Implementing data anonymization whenever possible. This typically means removing any personally identifiable data from the dataset or, if it’s impossible, separating the data from sensitive information. Besides, we recommend choosing an anonymization technique based on your specific project requirements. For example, k-anonymity protects your data by grouping similar individuals and generalizing data fields with identifying information. This technique suits perfectly for patient, marketing, census, or credit card data.
  • Categorizing your data according to its sensitivity before making it available for ML models. For example, businesses can classify data as normal, sensitive, highly sensitive, and strictly confidential. Tokenization should be used for the two last categories, especially if the data is stored in the cloud.
  • Being conscious about your data sources. It’s always more secure to use publicly available sources like market research studies, GitHub repositories, social media, etc. In other cases, you might need private information received from IoT, CRM software, or ERP systems — make extra security measures in this scenario.
  • Using APIs for web scraping. Avoid this data collection method if possible since it involves duplicating the company’s data and can potentially lead to copyright infringement.
  • Backing up your data regularly and using reputable and trustworthy cloud or on-premise storage providers. Also, monitor your data storage in case you see any anomalies, breaches, or errors to react immediately.

5. Prevent discrimination and biases

To follow the principle of fairness, your company should avoid discrimination and biases in data collection. For example, training algorithms on partial data usually risks getting biased and misleading results. Ensure the data collected is diverse and representative, as well as it doesn’t marginalize or unfairly target any specific groups.

The data should accurately represent the demographic that the ML model is intended to affect. So if it’s the AI solution developed for women specifically, the data couldn’t be based on the male audience as it makes it irrelevant.

The bottom line

Ethics in data collection and usage overall is gaining more prominence in today’s data-driven environment. And this impact goes far beyond legal responsibility; it’s about cultivating trust and credibility in the eyes of customers, partners, and other stakeholders. Ethics of data collection also refers to protecting privacy and mitigating reputational and financial risks.

What’s more, the role of ethics in data collection and analysis is expected to grow further. As the amount of data increases over time, so does the public awareness of data privacy issues. Consequently, businesses might want to prioritize ethical data collection practices, keeping up with the evolving data privacy regulations and frameworks.

Planning to kick off a data science project? Having vast expertise in data science and machine learning, Intelliarts knows everything about data ethics and follows ethical data collection principles thoroughly. We’ll gladly help you transform your data into valuable insights.

Discuss your data project
Contact us


Alexander Barinov
Managing Partner
Rate this article
2 ratings
Research Paper
Research Paper
Boost Your Insurance Sales: Mastering Predictive Lead Scoring Best Practices
Related Posts