AWS Big Data Study Notes – AWS Machine Learning and IoT
The cheat sheet is on AWS Machine Learning (ML) and IoT.
Machine Learning (ML) Services
- Amazon Comprehend: natural language processing (NLP) service that uses machine learning to find insights and relationships in a text
- Amazon Lex: advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions e.g. chatbots
- Amazon Polly: turns text into lifelike speech
- Amazon Rekognition: image and video analysis
- Amazon Transcribe: automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications
- Amazon Textract: automatically extracts text and data from scanned documents and forms
- Amazon Translate: neural machine translation service that delivers fast, high-quality, and affordable language translation
- Amazon Forecast: delivers highly accurate forecasts, and is based on the same technology used at Amazon.com
- Amazon Personalize: create the individualized product and content recommendations for customers using their applications
- Amazon Comprehend Medical: natural language processing service that extracts relevant medical information from unstructured text using advanced machine learning models
AWS Machine Learning (ML)
- Datasources contain metadata associated with data inputs to ML
- ML Models generate predictions using the patterns extracted from the input data
- Evaluations measure the quality of ML models
- Batch Predictions asynchronously generate predictions for multiple input data observations
- Real-time Predictions synchronously generate predictions for individual data observations
- Pre-split the data – You can split the data into two data input locations, before uploading them to Amazon Simple Storage Service (Amazon S3) and creating two separate datasources with them.
- Amazon ML sequential split – You can tell Amazon ML to split your data sequentially when creating the training and evaluation datasources.
- Amazon ML random split – You can tell Amazon ML to split your data using a seeded random method when creating the training and evaluation datasources.
- ML Models
- Supervised learning
- Regression: predict a numeric value
- Multiclass: predict values that belong to a limited, pre-defined set of permissible values.
- Binary: predict values that can only have one of two states, such as true or false
- Unsupervised learning – not support
- Supervised learning
- Build, train, and deploy machine learning models quickly. Take away the limitation of 100GB training data from AWS machine learning
- SageMaker provides fully-managed and pre-built Jupyter notebooks to address common use cases
- SageMaker supports RL in multiple frameworks, including TensorFlow and MXNet, as well as custom-developed frameworks
- GPU accelerated deep learning, scaling effectively unlimited hyperparameter tuning jobs. P2, P3 and G3 instance type for deep learning
IoT Virtuous Cycle
- Amazon FreeRTOS – IoT operating system for microcontrollers
- Local connectivity libraries (WiFi & Ethernet)
- cloud connectivity and security libraries
- over-the-air updates with code signing
- AWS IoT Greengrass – Extend AWS IoT to the edge
- Connectors: pre-built integrations with third-party services, on-premises software, and AWS services
- Secrets Manager: securely store, access, rotate and manage secrets at the edge (local messages and triggers, local actions, data and state sync, security, local resource access, over-the-air updates, machine learning inference)
- Hardware Security Integration: store private device keys on a hardware secure element
- AWS IoT Device SDK (run on devices), AWS IoT API (HTTP/HTTPS requests), AWS SDKs (language-specific API), AWS CLI (commands)
- AWS IoT Device Tester: Windows/Linux/Mac test automation tool for connected devices
- AWS IoT Core – Secure device connectivity and messaging at scale
- Security and identity: such as X.509 certificates, AWS IAM credentials, or 3rd party authentication via AWS Cognito. Custom authentication and authorization strategy (e.g. JSON Web Token verification, OAuth provider callout)
- Device gateway: Allows secure, low-latency, low-overhead, bi-directional communication between connected devices and cloud and mobile applications
- Message broker: HTTP REST interface to publish, the MQTT protocol directly or MQTT over WebSocket to publish and subscribe.
- Three MQTT patterns: point-to-point, broadcast, and fan-in
- Rules engine: Enables continuous processing of data sent by connected devices. Configure rules to filter and transform the data. Configure rules to route the data to other AWS services
- Rule name, description, SQL statement(simple SQL syntax to filter messages received on an MQTT topic), SQL version, one or more actions(executing the rule e.g. write data to DynamoDB/S3/Kinesis, invoke Lambda function, publish to SNS topic), error handling action
- Device shadow: Enables cloud and mobile applications to query data sent from devices and send commands to devices, using a simple REST API, while letting AWS IoT Core handle the underlying communication with the devices.
- Registry: Register the devices and associate up to three custom attributes (e.g. certificates and MQTT client IDs) with each one. Also, build group registry by categorizing devices into groups.
- AWS IoT Device Management – Maintain fleet health at scale
- Batch fleet provisioning
- real-time fleet index & search
- Fine-grained device logging & monitoring
- Over-the-air updates
- AWS IoT Device Defender
- Audit device configurations
- Monitor device behavior
- Identify anomalies
- Generate alerts
- Investigate and mitigate security issues
- AWS IoT Things Graph – Connect devices and web services with little to no code
- AWS IoT Analytics – Run and operationalize sophisticated analytics on massive volumes of IoT data
- Process, enrich, store, analyze, and visualizes IoT data in one service
- AWS IoT SiteWise – Collect data with the local gateway, and structure and search IoT data from industrial equipment at scale
- AWS IoT Events – Detect and respond to events from data across IoT sensors and application
- How to Pass AWS Certified Big Data SpecialtyAWS Big Data Study Notes – AWS Kinesis
- AWS Big Data Study Notes – EMR and Redshift
- AWS Big Data Study Notes – AWS Machine Learning and IoT
- AWS Big Data Study Notes – AWS QuickSight, Athena, Glue, and ES
- AWS Big Data Study Notes – AWS DynamoDB, S3 and SQS
- AWS Kinesis Data Streams vs. Kinesis Data Firehose
- Streaming Platforms: Apache Kafka vs. AWS Kinesis
- When Should Use Amazon DynamoDB Accelerator (AWS DAX)?
- Data Storage for Big Data: Aurora, Redshift or Hadoop?
- Apache Hadoop Ecosystem Cheat Sheet