Course Code: MAV503
Synopsis
MAV503 Data Wrangling with Python is a comprehensive course designed to provide students with the skills and techniques necessary to acquire, clean, transform, and prepare data for analysis. Using Python as the primary tool, students will learn how to extract data from diverse sources, including relational databases using SQL, web scraping, and web APIs. The course emphasizes the practical application of data manipulation techniques, ensuring students can handle real-world datasets in a scalable and reproducible manner. By the end of the course, students will be proficient in using Python to transform raw data into structured formats, ready for advanced analytics and visualisation, laying a strong foundation for data-driven decision-making. MAV503 数据整理是一门综合课程,旨在为学生提供获取、清洗、转换和准备数据进行分析的技能和技术。课程以Python为主要工具,学生将学习如何从各种数据源中提取数据,包括使用SQL的关系数据库、网页抓取和网络API。课程注重数据处理技术的实际应用,确保学生能够以可扩展和可重复的方式处理实际数据集。课程结束后,学生将熟练使用Python将原始数据转换为结构化格式,为高级分析和可视化做好准备,从而为数据驱动的决策奠定坚实基础。
Level: 5
Credit Units: 5
Presentation Pattern: EVERY REGULAR SEMESTER
Topics
- Introduction to Python for Data Wrangling 数据整理的Python简介
- Understanding Data Acquisition Techniques 数据获取技术的理解
- Data Extraction from Relational Databases using SQL in Python 使用Python中的SQL从关系数据库中提取数据
- Web Scraping Techniques with Python 使用Python进行网页抓取技术
- Accessing Data from Web APIs using Python 使用Python访问Web API
- Data Cleaning and Transformation Principles 数据清洗与转换原则
- Advanced Data Manipulation with Pandas 使用Pandas进行高级数据操作
- Handling Missing and Inconsistent Data 处理缺失和不一致的数据
- Working with Different Data Formats (CSV, JSON, Excel) in Python 使用Python处理不同数据格式 (CSV, JSON, Excel)
- Creating Reproducible Data Wrangling Workflows 创建可重复的数据整理工作流程
- Introduction to Data Visualisation for Exploratory Data Analysis 探索性数据分析的可视化介绍
- Integrating Data Wrangling Techniques for Real-World Case Studies 综合数据整理技术的实际案例研究
Learning Outcome
- Critique the suitability of various data sources for different analytical requirements, including relational databases, web scraping, and web APIs 批判性地评价各种数据源在不同分析需求中的适用性,包括关系数据库、网页抓取和Web API
- Assess advanced data wrangling techniques in Python to transform raw data into structured formats for analysis 评估Python中高级数据整理技术,将原始数据转换为结构化格式以进行分析
- Integrate data cleaning and transformation processes to design efficient workflows for complex data preparation tasks 整合数据清洗和转换流程,设计用于复杂数据准备任务的高效工作流程
- Construct Python programs to extract, clean, and transform data from diverse sources, ensuring reproducibility 构建Python程序以从各种来源提取、清洗和转换数据,确保可重复性
- Design and implement comprehensive data wrangling pipelines using Python, incorporating best practices for data preparation 使用Python设计和实施全面的数据整理流程,结合数据准备的最佳实践
- Create reproducible data workflows using Python, applying advanced manipulation techniques for structured data analysis 使用Python创建可重复的数据工作流程,应用高级数据操作技术进行结构化数据分析