Advanced Python for Finance Technologies
In this course you will learn to
- Automatically extract financial data from common data providers
- Clean, aggregate and manipulate financial data effectively
- Conduct elementary time series analysis
- Understand stochastic processes and common noise models
- Construct models for inference and forecasting, such as ARIMA and linear and logistic regression
- Generate powerful visualizations, such as candlestick charts
- Extract financial data by scraping websites
- Understand the fundamentals of supervised and unsupervised machine learning models as applied to finance
- Apply Recurrent Neural Nets (RNNs) and Long Short-Term Memory Units (LSTMs) to financial time series and understand their limitations
- Understand the principles behind blockchain technology
Training materials
All Python training students will receive comprehensive courseware.
Suggested attendees
Students who are familiar with fundamental Python syntax and concepts.
Course Outline
- Crunch numbers: numerical python with NumPy
- Introduce the n-d-array
- NumPy operations
- Broadcasting
- Missing data in NumPy (masked array)
- NumPy structured arrays
- Improve performance through vectorization
- Random number generation
- Introduce Monte-Carlo methods
- General approaches to implementing mathematical algorithms
- Acquire and manipulate financial data with pandas and pandas data reader
- Series vs. DataFrames
- Data types in pandas overview
- Pandas I/O tools: CSV/Excel/SQL
- Pandas I/O tools: Pandas-data reader
- Subset DataFrames
- Create and delete variables
- Discretization continuous data
- Scale and standardize data
- Identify duplicates
- Dummy coding
- Exploratory data analysis and advanced pandas methods
- Uni- and multi-variate statistical summaries and detecting outliers
- Group-wise calculations using pandas
- Pivot tables
- Long to wide and back: pivoting, stacking and melting
- Python visualization: Matplotlib and seaborn
- Pandas visualization: histograms, bar, box plots, scatter plots and pie charts
- Group-by plotting
- Pandas plot formatting
- mpl-finance and candlestick charts
- Merge DataFrames
- Pandas string methods
- Implement regular expressions in pandas
- Handle missing data in pandas
- Elementary time series analysis
- Date/time formats in Python and pandas
- Run/roll aggregates
- Resample
- Stochastic processes
- Noise models overview
- Stationarity
- Random walks and martingales
- Brownian motion
- Diffusion models
- Black-Scholes model—and its limitations
- Time series forecasting
- De-trending and seasonality
- Interpolation and extrapolation
- Auto-Regressive Integrated Moving Average (ARIMA) models
- Measure impact: test for group differences
- Null hypothesis testing and p-values
- Group comparisons (p-values, t-tests, ANOVA, Chi-square tests)
- Correlation
- Progress with regression models
- Linear regression
- Logistic regression
- Regression on count outcomes (Poisson processes)
- Optional: scraping by—obtain financial data from publicly accessible websites
- Requirements: Base Python. Time required: 2 hours
- Parse HTML/CSS with BeautifulSoup
- Navigate tree data structures
- Select named node elements
- Select by property
- Establish a connection
- Urllib3 and connections
- POST and GET directives
- Build a Web Scraper
- Parse a list of websites
- Collect and store data
- Advanced scraping: Build a Web Spider with Scrapy
- Optional: machine learning fundamentals for finance with scikit-learn
- Requirements: NumPy, pandas. Time required: 4 hours
- Machine learning approaches to multi-variate statistics
- Machine learning theory
- Data pre-processing
- Supervised vs. unsupervised learning
- Unsupervised learning: clustering
- Clustering algorithms
- Evaluate cluster performance
- Dimensionality reduction
- A priori
- Principal component analysis (PCA)
- Penalized regression
- Supervised learning: regression
- Linear regression
- Penalized linear regression
- Stochastic gradient descent
- Scoring new data sets
- Cross-validation
- Variance-bias trade-off
- Feature importance
- Supervised earning: classification
- Logistic regression
- LASSO
- Random forests
- Ensemble methods
- Feature importance
- Score new data sets
- Cross-validation
- Optional: recurrent neural nets and LSTMs with PyTorch
- Requirements: NumPy, pandas, machine learning fundamentals. Time required: 4 hours
- Introduce PyTorch
- Introduce tensor algebra and calculus
- Tensor algebra in PyTorch
- Train and validate models
- Regression in PyTorch
- Optimizers in PyTorch
- Linear regression
- Logistic regression
- Artificial Neural Networks
- Overview of Artificial Neural Networks (ANNs)
- Recurrent Neural Networks (RNNs)
- Sequence models and Long Short-Term Memory Networks (LSTMs)
- RNNs/LSTMs with PyTorch
- Build, train and validate a basic ANN
- Create a RNN
- Build a LSTM
- Applications to financial time series and cautionary tales
- Optional: blockchain technologies
- Requirements: Basic Python, NumPy (useful, but not mandatory). Time required: 4 hours.
- The ingredients for a blockchain
- Transaction records
- The distributed ledger
- Chain validation
- Nonces
- The Hash function
- Overview of hash functions and tables
- Cryptographic hash functions
- Proof-of-work
- Advanced functions
- Return statements
- The JSON format
- Exception trapping
- Assertions
- Construct your own blockchain
- Generate a block
- Genesis block
- Generate a chain though block validation
- The ingredients for a blockchain
- Shortcomings of blockchain technologies
- Requirements: Basic Python, NumPy (useful, but not mandatory). Time required: 4 hours.
- Any Windows, Linux or macOS operating system
- Python 3.x installed (Anaconda bundle recommended)
- An IDE with Python support (Jupyter Notebook, Spyder or PyCharm Community Edition, which is free)