Python for Predictive Data Analytics
This is a course for data analysts, quants, statisticians, software developers, and other technical staff interested in learning to use Python for analysing and visualising data and performing powerful predictive analytics.
Some familiarity with programming concepts (in any language) is assumed.
By the end of the course, you will have all the knowledge you need to start using Python competently for processing, analysing, modelling, and visualising various kinds of data, with a focus on time series. You will have had experience with using Python for various scripting, data-manipulation and plotting tasks with data in a variety of formats, including CSV, Excel spreadsheets, SQL databases, JSON, and API endpoints. You will have applied powerful tools for optimisation, regression, classification, and clustering, in useful practical settings on a variety of data sets. You will understand the elegance and power of the Python language and its powerful ecosystem of packages for data analytics, and you will be well-placed to continue learning more as you use it day-to-day.
Day 1: Python Basics
Day 1 covers how to use Python for basic scripting and automation tasks, including tips and tricks for making this easy. The syllabus is as follows:
- Why use Python for predictive analytics? What’s possible? Python versus other languages …
- Setting up your Python development environment (IDE, Jupyter)
- Modules and packages
- Python concepts: an introduction through examples
- Essential data types: strings, tuples, lists, dicts, sets
- Worked example: fetching and ranking real-time temperature data for global cities
- Raising and handling exceptions
- Handling CSV data: introduction to Pandas
Day 2: Handling, Analysing, and Presenting Data in Python
The Pandas package is an amazingly productive tool for working with and analysing data in Python. Day 2 gives a thorough introduction to analysing data with Pandas and visualising it easily:
- Reading and writing essential data formats: CSV, Excel, SQL databases, JSON, time-series
- Indexing and selecting data in Pandas
- Data fusion: joining & merging datasets
- Summarisation with “group by” operations; pivot tables
- Publication-quality 2D plotting with Seaborn and Matplotlib
- Interactive visualisation with Plotly
- Worked example: creating automated reports with Jupyter, Pandas, and nbconvert
Day 3: Time-series, simulation, inference and modelling
Day 3 demonstrates more advanced features of Pandas for working with data, including time-series data. It then describes Monte Carlo simulation methods and walks you through using powerful Bayesian methods of inference and modelling for different kinds of data in Python:
- Time-series analysis: parsing dates, resampling, handling time-zones
- Secret weapons for Pandas: searchsorted, hierarchical indices, unstack, categorical, qgrid
- Introduction to NumPy for linear algebra and Monte Carlo simulation methods
- Classical statistics with scipy.stats and statsmodels
- Density estimation with scikit-learn
- Bayesian inference with PyMC3: parameter and model selection; incorporating prior information
- Bayesian regression; assessing reliabilities
Day 4: Machine learning
Day 4 introduces a more automated approach to modelling real-world data with several powerful machine learning algorithms using scikit-learn. The datasets are selected from a range of industries: financial, geospatial, medical, and social sciences. The syllabus is:
- Classification with scikit-learn: Naive Bayes, logistic regression, SVMs, random forests, with application to diagnosis, AI systems, and time-series prediction
- Nonlinear regression, with application to forecasting
- Clustering data with DBScan, with application to outlier detection
- Dimensionality reduction with PCA
- Validation and model selection
- Deploying machine learning models in production
We will supply you with printed course notes and a USB stick containing a complete Python environment based on VirtualBox. This saves time in the course and allows us to focus on using Python rather than installing it. The USB stick also contains kitchen-sink Python installers for multiple platforms, solutions to the programming exercises, several written tutorials, and reference documentation on Python and the third-party packages covered in the course.
- Personal help:
- Your trainer(s) will be available after the course each day for you to ask any one-on-one questions you like — whether about the course material and exercises or about specific problems you face in your work and how to use Python to solve them.
- Food and drink:
- We will provide lunch, morning and afternoon tea, and drinks.
- The course will run from 9:00 to roughly 17:00 each day, with a breaks of 50 minutes for lunch and 20 minutes each for morning and afternoon tea.
Python for Predictive Data Analytics:
60 Margaret Street (Level 11), Sydney CBD
15 Apr – 18 Apr 2019
Python for Predictive Data Analytics: