Skip to main content

AI for Sustainability and Sustainability in AI

I will be referring to the following 3 papers on this very interesting topic.

(1}  https://link.springer.com/article/10.1007/s43681-021-00043-6

 Sustainable AI: AI for sustainability and the sustainability of AI

A Van Wynsberghe – AI and Ethics, 2021 – Springe

(2) https://www.researchgate.net/publication/342763375_Carbontracker_Tracking_and_Predicting_the_Carbon_Footprint_of_Training_Deep_Learning_Models/link/5f0ef0f2a6fdcc3ed7083852/download

(3)      Lacoste, A., Luccioni, A., Schmidt, V., Dandres T.: Quantifying

the Carbon Emissions of Machine Learning. (2019)

While there is a tremendous push for using new-generation generative AI based on large language models to solve business applications, there are also voices of concern from experts in the community about the dangers and ethical consequences.  A lot has been written about this but one aspect which has not picked up sufficient traction, in my opinion, is Sustainable AI.  

In (1), Wynsberghe defines two disciplines on AI & sustainability.   AI for Sustainability and Sustainable AI.

AI for Sustainability is any business application using AIML technology to solve climate problems.  Use of this new generation technology to help in climate change and CO2 reductions.   Major applications are getting developed for optimal energy distribution across renewable and fossil energy sources. Any % extra use from renewable sources, help in less use of fossil fuels and help in climate change.  Various other applications may include better climate predictions and the use of less water, pesticides, and fertilizers for food production.  Many Industry 4.0 applications to build new smart factories, smart cities, and smart buildings fall into this category.

On the other hand, Sustainable AI measures the massive use of GPU and other computing, storage, and communications energy usage while building the AI models and suggest ways to reduce this.  While digital software development and testing can be done in a few developers’ laptops with minimal use of IT resources, the AIML software development life cycle calls for the use of massive training data and develop deep learning neural networks with multiple millions of nodes.   Some of the new generation Large Language models use billions of parameters beyond the imagination of all of us.  The energy use does not stop here.  Fine Tuning learning for specific domains or relearning is as energy-consuming or sometimes even more than original Training.   Some numbers mentioned in (1) are reproduced here to highlight the point.   One deep-learning NLP model consumed energy equivalent to 600,000 lbs of CO2.  Google Alpha-Go-Zero generated over 90 Tonnes of CO2 over 40 days it took for the initial training.  These numbers are large and at least call for review and discussions.   I hope I have been able to open your eyes and generate some interest in this new dimension of AI & Ethics i.e impact on climate change.

I am sure many of you will ask “Isn’t any next-generation industrialization from horse carriages to automobiles or steam engines to combustion always increased the use of energy and why do we need to worry about this for AI?”.  Or “There has been so much talk on how many light bulbs one can light for the same power used for a simple google search , why worry about this now ?”.  All valid questions.  

However, I will argue that

  1. The current climate change situation is already in a critical stage and any unplanned large-scale usage new of energy can become “the feather that broke the camel’s back!”.
  2. Use of fully data driven life cycle and billions of parameters, deep neural networks are being used for the first time at an industrial scale and industry-wide and there are too many unknowns.

What are the suggestions?

  • Energy consumption measurement and publication must become part of the AI & Ethics practice followed by all AI development organizations.   (2)  Carbon Tracker Tool and (3) Machine learning emission calculator are suggestions for this crucial measurement.  I strongly recommend organizations use their Quality & Metrics departments to research and agree on a measurement acceptable to all within each organization.  More research and discussions need to calculate the net increased use of energy compared to current IT tools to get the right measurement. In some cases, the current IT tools may be using legacy mainframes and expensive dedicated communication lines using up large amounts of energy and the net difference by using AIML may not be that large.
  • Predicting the energy use at the beginning of the AIML project life cycle also is required. (3). 
  • The prediction data of CO2 equivalent emissions need to be used as another cost in approving AIML projects.
  • Emission prediction also will force AIML developers to select the right size training data and use of right models for the application. Avoid the temptation of running the model on billions of data sets just because data is available!. Use the right tools for the right job.  You don’t need a tank to kill an ant!.
  • Ask the question of whether the use of deep learning is appropriate for this business application? For example, a simple HR application used for recruitment or employee loyalty prediction with Deep learning models may turn out to be too expensive in terms of Co2 emissions and need not be considered a viable project.
  • CEOs include this data in their Climate Change Initiatives Report to the Board and shareholders and also deduct carbon credits used up by these AIML applications in the company’s Carbon credit commitments.

More Later,

L Ravichandran

Trends and challenges to consider for Enterprise AI solutions

Every company now wants to be an AI company or embed AI into their offerings.

While the maximum potential of AI has not yet been tapped, there are many innovative as well as controversial applications of technology that have been tried and some also in production.

We are already seeing simple applications of AI to make things easier for the humans using these solutions – from predictive text while composing messages or even documents (this post has been completely hand crafted and not generated by a machine 😊), or chatbots that do not get confused when the response does not fit one of the options proposed, to more complex applications such as self learning robots in rescue and assistive situations.

Many teams have been experimenting with revised ways of working to structure the software and solutions development processes.

Some of the challenges and proposed approaches are highlighted in these posts by Anil Sane on the aithoughts.org site.

https://aithoughts.org/aiml-solutions-security-considerations/

https://aithoughts.org/ai-and-aiops-a-perspective-for-it-services-industry/ 

The emergence of data science as a specialization has also meant that the software development lifecycle needs to acknowledge this and incorporate it in the flow.

Governance aspects of AI based solution development are also undergoing changes.

Some inherent issues related to AI based solutions – such as ethics, has been explained in the post by L Ravi in https://aithoughts.org/ai-ethics-self-governance/, also need to be considered as part of the overall approaches and be reviewed

The unpredictability of real time usage situations adds to some of the complexity of the algorithms to be implemented. It is no longer just an algorithm that processes data. The data determines or influences the behavior of an algorithm or even choosing the appropriate algorithm to be chosen at runtime.

Additional ‘-ities’ are also emerging for AI based solutions such as ‘explainability’ as non-functional requirements that need to be considered.

There are some topics related to the overall solutions supply chain that are still being explored.

Some of these, such as the bias in the learning data, aim to improve the overall quality of the decisions derived or suggested by AI systems.

As with any emerging technology, there are some bad actors who keep looking out for holes in the solutions to be exploited, that affect the social good.

Deep Fakes are a good example of advanced algorithms being used to mislead or even trigger anxiety or unrest in individuals or communities.

With the increased interest in the metaverse and related business, information is going to get even more distributed and fragmented.

This would then mean that any solution designer needs to think of ecosystems and not just point solutions.

We have already seen the advantage of this approach – such as using location services or map services offered by some companies being embedded in solutions delivered by many businesses.

Thinking of the ecosystem, one must consider the full supply chain and the potential for benefit and fraud at every stage.

These include challenges related to the bias in the learning data, data poisoning, deep fakes and compromises to the edge devices or sensors.

A recent report by CB Insights identified the following as emerging areas of concern and action, when it comes to AI. Here are the highlights, with my own thoughts also included.

  • Synthetic data for privacy.  This could also be very useful to increase the confidence in testing, where there is a constant struggle to share the real data between the developers and data scientists. It is not as simple to generate synthetic data and there are many considerations that go with it – to ensure adequacy, fairness [no bias] as well as constant validation of the neutrality of the data. We are used to capturing data mostly on the positive results. For example, We need to understand the patterns related to rejected parts during the manufacturing process, and that is another potential application of synthetic data – to generate images from quantitative or descriptive data that might have been captured in various analysis reports

  • Edge AI: embedding AI into the chips that would be the sensors [IoT devices?]. In addition to these being secure and immune to noise, some smartness would also be needed to ensure that these entry points are trusted

  • Policing or protection in the metaverse. While one may desire to have self regulation in the metaverse, one cannot wish away the need to have some mechanism of policing – essentially to create deterrents for abuse. Explainable AI and other such principles are useful in a reactive situation, but what is more important is to have some proactive mechanisms

  • Eliminating Deepfakes. We already spoke about this earlier in this article. When deepfakes are machine generated, the speed and volume could be a major challenge to combat

  • The report also talks about augmented coding. We are seeing tools from multiple vendors that are embedding intelligence in the development environment itself. For teams and organizations to learn faster, there would be a need to [selectively] share the learning across users. The question on how to tell the machine what may be shared and what may not be is another area that needs to mature

  • The next level of evolution of conversational AI, is to be able to support omni-channel interactions and multi modal AI, that can understand and process concepts from video, audio, text, images etc. this may be considered as the next evolution of natural interfaces and interactions, beyond just the spoken or written language

  • Black box AI – or end to end machine learning platforms would become the preferred option for enterprises to accelerate the adoption of company wide solutions

As seen from the above, the AI based solutions space offers enterprises unprecedented opportunities as well as unforeseen or unforeseeable complexities and challenges as the ecosystem also evolves.

In future articles, I intend to go deeper into some of the above trends and associated techniques or tools. The focus for me is to not lose human centricity when we embed more and more intelligence into the machines.

 

If you have any specific topics that you would like covered or questions to be answered, do reach out.

AIML Solutions : Security Considerations

Preface

AI-ML solutions have now become integral part of IT Enabled solutions provided to the businesses.

We examined various life cycle stages including conventional SDLC, which are not entirely suited for data science projects due to its broader scope, experimentation nature, data dependency, creative yet often chaotic, non-linear process, and relatively intangible deliverable (in the form of knowledge, insight).

We also examined DevOps, DevSecOps practices that help repeatability &  provides an overarching ecosystem to continuously build and deploy solutions in a systematic manner. Also, there are the MLOPS practices that cater to the requirements of  building and deploying AIML solutions in the production, in systematic & secured manner. This ecosystem supports continuous experimentation, learn, build, deploy, monitor on the larger scale.

In this part, we discuss the elements of AIML CICD life cycle stages, with key security considerations at every stage. Intent, eventually, is to build a set of practices/processes that will help an organization to securely build & maintain AIML solutions consistently. Towards later half of this write-up, we touch base on overall AIML operations ecosystem, that is essential in building, maintaining and monitoring AIML Solutions.

In subsequent write-ups, we will deal in more details on each of the areas mainly –  planning, effective Testing, Performance Management, Maintenance and currency of  the solutions,  Maturity mechanisms  etc. This will include developing an overall ecosystem comprising of legacy solutions and AIML Solutions and an overall transformation into such ecosystem.

 

AIML CICD Life Cycle 

Below is a high level  functional representation of standard life cycle stages that AIML Projects will adopt in order to deliver an appropriate solution, in a consistent & secured manner. The diagram below illustrates few key security considerations in ML CICD cycle. (More details are available in the table following this diagram).

As depicted in diagram, the AIML Life Cycle typically has  following Lifecycle stages.

It may vary for different models (Supervised or Unsupervised) or  other techniques such as NLP, Deep Learning and so on.

  1. Problem Definition – Define the problem, stakeholders, environment, what data is available, what are the goals and performance expectations
  2. Data Build – caters to data collection, collation, Annotation – building data pipeline
  3. Model Train, build – Feature Engineering, Model Building, Testing – building model pipeline
  4. Test – Evaluation, Validation or Quality Assurance or testing of the model before deploying
  5. Integrate and Release – Freeze code baseline, Baseline versioning, Release Notes, Release Infra readiness
  6. Deployment – Deploying the Model into the Production – independently or integrated with other applications, as decided by the serving mechanism
  7. Model Serving – Serving mechanism, Serving performance, Adjustments,
  8. Monitoring – Monitoring the performance throughout the life cycle, fine tuning, adjusting or retiring the model basis performance of the model and change in the overall ecosystem

The following section describes briefly, for each stage, the typical tasks that are performed, expected output for that stage and more importantly, security considerations that may be employed for the stage.

Life Cycle Stages – Key Tasks, Output and Security Considerations

Stage

Tasks

Output

Security Considerations

Problem Definition

Brainstorm problems, define
Boundaries, goals, thresholds
Define data sources, type, frequency
define ideal outcome, visualization, usage
Define metrics, success-failures
Define resources
Methodology, proposed models that will be used
Define an overall implementation Project Plan
Evaluate threats, vulnerabilities, remedies

Clearly laid out problem statement, defined data needs, ideal outcome, Infra needs, resource competency, measures of success and goals
Clearly defined Project Plan -defining timelines, schedule, cost, delivery artefacts, release schedule
Threat Management Plan (RATP)

Identify Vulnerabilities, possible attack scenarios, probable risks to the data, Model, Infra and overall system, defining probable mitigation actions – creating an RATP (Risk Analysis and Treatment Plan)

Data build

Collect /Ingest data – from sources
Cleanse – missing, improper values, outliers,
Transform – Labelling, Annotating, Feature Engineering – devise features,  build/extract features, select features
Analyzing – for meaningful, completeness, fairness etc
Build training, Validation, Test Data repositories
Verify data & Data building scripts – static code analysisT

Check for Data biasness

Define data coverage, Prediction Accuracy rule / thresholds

Study data patterns, statistical analysis in order to decide appropriate model

Labelled, Annotated data with explicit features identified
Training, Validation and Test Data Repositories
Vulnerabilities in data, features and data build scripts

Databased, API, Features, Infra, Data in transformation
Data formation, transformation scripts are analysed using static code analysers
Data Privacy (such as GDPR, HIPAA) compliance requirements??

Model Build

Select Appropriate Model/s
Build Model/s by defining a model Pipeline (Code, Unit Test, Security test)
Train Model/s with the training data (will include data pipeline)
Evaluate  with the validation data
Refine Model/s as required
Containerize (Build Package) all into image / build file
Unit test
Store artefacts into artefact repository
Store Version of the model code
Static code analysis

Simulation of run time behaviour – where possible

Trained, Evaluated Model(s)
Container Package
Unit test reports
Training, Evaluation metrics, Reports
Version controlled artefacts – Code, Data, Evaluation reports
Static code analysis report

Application code scans to identify security risks to Software, Libraries, containers & other artefacts and to ensure code coverage and adherence to coding standards – using SAST Tools
Analyses Source Code, Byte Code, binaries
Separate Secure build, Staging, production environments

Test

Deployment on Staging
Model Evaluation with Test Data (Testing with fraud data, data injection tests)
Integration Test
UI Test
API testing
Penetration Test
Model Bias Validation
Test defect remediation
Model Refinement

Test Reports
Remediation Reports
Tested Model(s)

Performing Black Box testing, using DAST tools
Memory consumption, Resource usage, encryption algorithm, privileges, cross-site scripting, SQL injections, third party interfaces, Cookie manipulations etc
Test data Security / anomalies testing,
Model Bias Evaluation

Int. & Release

Freeze Code, Feature List
Appropriate Configuration, versioning of all artefacts
Release Notes creation

Code, Feature list
Release Notes

 

Deploy

Perform Security Audit, remediation
Deployment on Production
Test on deployment (Smoke testing)
Vulnerability testing for Containers??

Infra-as-code automation scripts verification

Release Deployed on Production
Security Audit reports
Smoke test reports

Infra-as-a-code automation run reports

Infrastructure Security –
Scan Infrastructure-as-Code Templates
Scan Kubernetes Application Manifests
Scan Container Images
Scan Model Algorithms on production, Versions from Staging to Production

Operate (Model Serving)

Monitor Model Performance on live data – Alerts, KPIs, Under/Over fitting, prediction accuracy etc
Learn and refine Model as required
Remove, Retire Models are required
Monitor the integrated functionality
Security Events Management
Monitor triggers, alarms, error logs
Remediation as per SOP – incident management, Automatic event handler

Model Performance Reports, KPIs performance reports
Incidents, events managed, addressed,
Change Management on Models,
Refined Models and artefacts
including list of models removed / Retired

Model Security, Data Security, API Security, Infrastructure Security, Pipeline Security, Output / UI Security


As evident, it is important to plan for security checks right from the beginning and at every stage in order to build an overall secure and unbiased AIML Solutions.

MLOPS – An Ecosystem

While the above section describes life cycle stages of AIML development projects, an Overall MLOps Ecosystem provides for environments/areas for experimenting, building, continuously training, data management, continuously evaluating,  integrating with other applications, deploying (stand alone as micro service or integrated with other customer applications), as represented in the functional diagram below.

 

MLOPS Ecosystem

Typical areas covered are:

  • Requirement Management – Managing frequent requirements / Problem definition – especially for new requirements and the feed that comes for existing models from production serving and new data that might be available
  • Model Development – that includes experimentation, research, development, evaluation and go-no-go decisions, overall model security
  • Data, Feature Management – overall data estate management, management of overall data pipeline, managing meta data related to models, scripts for building data instance and features, overall data security
  • Continuous Training – this environment provides continuous training for existing and newly built models,
  • Continuous evaluation – mechanisms to evaluate models while building or refining, testing the efficacy of the model with test and real data
  • Continuous Deployment – Systematic process & pipelines for Continuous integration and deployment
  • Serving Management – Manage the production serving, verifying methods of serving, capturing results of serving, tuning and refining as required
  • Continuous Monitoring & Improvement – Monitor the health of models, additions or changes to data behaviour, continuously improvement model performance and remove/ retire models as required
  • Repository/Registry Management- Managing overall repositories for Model, Data, Features, pipelines etc. This includes establishing overall version control, baseline traceability and encourage reuse

Following are the examples of outcome from these areas:

  • Models experimented, Data Sets used, Reusable assets, Model code/package/Containers
  • Data Pipeline, Data assets, Features, data connectors
  • Training, Testing data, Training environments, ML Accelerators,
  • Serving Packages, Serving Logs
  • Evaluation metrics, Performance models, Fairness / biasness measurements
  • Overall repository, registry of the models – experimented models, failed / discarded models, metadata,
  • Overall repository of data – reusable data sets, features, ETL scripts,
  • Other artefacts – Code packages, containers, Infra as a code

AIML CICD (along with the security considerations) mentioned in early part of this blog, therefore, becomes part  of this overall ecosystem. This ecosystem plays an important role in managing overall assets generated across multiple AIML Solutions across the life of the solutions.

 

References

  •         Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning by Google

 

  •         DEVOPS – Secure and Scalable CI/CD pipelines with AWS