Singapore University of Social Sciences

Data Wrangling

Data Wrangling (ANL503)

Applications Open: 01 April 2021

Applications Close: 31 May 2021

Next Available Intake: July 2021

Course Types: Modular Graduate Course

Language: English

Duration: 6 months

Fees: $2200 View More Details on Fees

Area of Interest: Business Administration

Schemes: Lifelong Learning Credit (L2C), Resilience

Funding: To be confirmed

School/Department: School of Business


Synopsis

ANL503 Data Wrangling aims to equip students with advanced data acquisition and manipulation skills and techniques. Using MySQL, Python, and R, students learn the ins and outs of acquiring data from relational database systems (using SQL), web scraping, and web APIs (using Python) in a scalable and reproducible manner. Students also learn how to transform raw data into formats suitable for deeper analytics (using MySQL, Python, and R). The course rounds up with an introduction to visualisation (using R).

Level: 5
Credit Units: 5
Presentation Pattern: Every July

Topics

  • Introduction to the MySQL RDBMS and SQL as a glue language for analytics
  • Essential concepts in probability and statistics
  • Introduction to SQL and data manipulation with the SELECT statement
  • Combining data from multiple sources with union and joins
  • Understanding regular expressions
  • Introduction to R
  • Data manipulation with R
  • Essential principles of data visualisation
  • Introduction to Python programming
  • Practical Python for data acquisition, mangling, and reporting
  • Handling web APIs and web scraping
  • Using Python for scalable spreadsheet data acquisition

Learning Outcome

  • Assess appropriateness of database designs based on characteristics of data and analytics needs
  • Critique data visualisations constructively
  • Construct suitable SQL queries to acquire and reshape data
  • Assemble effective data flows in MySQL, Python, and R as part of a reproducible workflow process
  • Create Python scripts to automate the acquisition and processing of web and spreadsheet data
  • Design and implement effective data visualisations in R
Back to top
Back to top