Data Ethics Considerations in Data Science
The Office of the Information Commissioner (OAIC) recently revealed that 35% of reported data breaches from Australian companies over a 12-month period between 2018 and 2019 were caused by human error. Such an alarming statistic demonstrates how crucial it is for aspiring data scientists to comprehend the role of ethics in data collection, management and sharing. Below are some of the most prevalent ethical issues in data management you’re likely to encounter as a data scientist.
When it comes to data ethics principles, almost nothing takes precedence over privacy. Protecting the privacy of modern citizens is becoming increasingly complex, with various legislative bodies assigned to regulate certain types of sensitive information in Australia. For instance, privacy issues related to telecommunications are dealt with by the Telecommunications Act 1997, while healthcare information is protected by the My Health Records Act 2012.
Alongside laws concerning specific categories of personal data, all individuals and businesses are also broadly required to comply with the Privacy Act 1988. Designed to promote and protect the privacy of individuals, the Act regulates how Australian Government agencies and private organisations handle personal information. Understanding the significance of privacy legislation is essential for data scientists, as they play such a vital role in identifying when the legislation has been breached.
A privacy breach occurs when personal information is accessed or disclosed without authorisation. In one of Australia’s most famous examples of a privacy breach, the details of over 100,000 Westpac customers were exposed when hackers targeted the payments platform PayID. With incidents like these becoming increasingly common, data scientists are on the frontline of privacy protection. In addition to being tasked with deriving insights from personal information within the confines of privacy laws, they’re also tasked with enhancing privacy protection systems.
Consent is another high-priority issue in the context of data science ethics. If an individual or entity wants to collect personal data, they need to obtain informed and explicitly expressed consent from whoever owns the data. Failing to do so can result in serious consequences, such as fines and reputational damage. For example, consulting firm PwC was fined €150,000 for GDPR breaches in connection with its processing of employee data without proper consent (GDPR regulates data under EU law).
To allow for the legal collection and process of data, consent must be freely given, detailed and informed. Obtaining consent is key to unlocking consumer insights, making it a crucial concept for data scientists to comprehend. It can also be an effective tool in limiting the impact of growing consumer mistrust, allowing brands to position themselves as transparent and trustworthy.
Bias refers to the uniquely human inclination to show prejudice towards a certain demographic, resulting in systematic disadvantages among groups of people. Even though data management relies on machine learning, it’s still susceptible to bias. Data insights can be deliberately or unintentionally misinterpreted, leading to an analysis being wrongly regarded as fact.
A (now defunct) recruiting tool developed by Amazon provides the perfect case study of bias influencing data-driven technology. The company’s computers were programmed to vet applicants by looking at patterns in resumes submitted over a period of 10 years. Since the tech industry has traditionally been dominated by men, most of the resumes came from male applicants. This inadvertently taught the system to treat female applicants as less preferable. Not only did this cause a serious gender bias it also distorted the model’s original objective - to match the right candidate to the right job. As a result, Amazon shut down the project.
By acknowledging that data management processes can be biased, data scientists can help prevent prejudice and systematic disadvantages in data management.
Even though it holds the key to a wealth of business opportunities, big data can also be dangerous. Worth around $203 billion, the global big data industry is filled with complex ethics. It raises countless issues related to privacy and identity, forcing data scientists to pay careful attention to ethical considerations in their work.
Harnessing the power of big data is an essential step in making the most of its potential benefits in an ethical manner. By mastering software tools and techniques used for big data engineering, such as programming languages, database management, modelling and data visualisation, data scientists will be prepared to confront big data ethics.
Monash Online can help you acquire the skills and knowledge to confidently navigate ethical issues in data management. Our 100% online Graduate Diploma of Data Science develops advanced data-management skills to put you at the forefront of the data science industry. You’ll explore relevant ethical issues relating to data, including:
- Statistical modelling foundations
- Information accessibility
- Governance and accountability for the data repositories
- Negotiation of data rights
- Handling and processing of big data.
In addition to data ethics, the course also covers the foundational skills needed to build a fulfilling career as a data scientist. Contact a Monash Online Advisor to learn more about the course today.