8. TML Solution Building

8.1. Why Do I Need TML?

TML is the world’s only technology that can perform entity based machine learning, in-memory, on real-time data integrated with Apache Kafka. Any where you need to process real-time data - you NEED TML. It can be used in any industry globally.

Important

TML offers several advantages over conventional Stream Processing: In addition to being:

the FASTEST and EASIEST way to build advanced, scalable, secure, and cost-effective, real-time solutions, with GenAI, for the Enterprise,

in roughly TWO (2) minutes with

automated documentation,

automated docker builds and,

automated code commits to Github

with tight integration with Apache Airflow and Apache Kafka

More Reasons:

Stream Processing from AWS Kinesis, or Spark Streaming - Do Not perform in-memory entity based machine learning or processing of real-time data. TML Does.

2. Stream Processing technologies are very expensive. Because TML is comprised of 3 binaries they can be operated like microservices with very little cost overhead (if any) due to in-memory processing of real-time data - this means no external databases are needed for machine learning reducing storage, compute and network transfer costs.

Stream processing solutions still use SQL to process data. TML uses JSON processing, in-memory, which is faster, cheaper and easier to manage.

4. Performing machine learning with Streaming processing is difficult, costly, and does not perform entity based machine learning. TML performs in-memory machine learning at the entity level for each device that is producing real-time data, this makes it very effective to learn each individual device behaviours and predict future behaviours more accurately.

5. Stream Processing technologies still require lots of code. TML solutions are low-code or no-code using the TML Solution Studio (TSS). The TSS uses DAGs that allows users to quickly configure their TML solutions, and automatically deploy it with Docker, automatically generate the documentations for the solution, and commit code to Github repos.

6. TML is integrated with GenAI using PrivateGPT and Qdrant vector DB. This integration makes it the first solution that provides fast AI integrated with real-time data processing and machine learning at the entity level.

7. To ingest data from devices TML offers pre-built client python code. Users can easily using gRPC, REST API, MQTT to ingest data directly from devices and stream it to Kafka. Refer to STEP 3: Produce to Kafka Topics for more details.

8.2. Where Is TML Used?

Note

TML is used by companies and people around the world to process real-time data. Because TML is free for students and researchers, it is used by thousands of students in Universities and Colleges around the world as official part of the curriculum courses in IoT, Cybersecurity, Machine Learning, Data Science, and Big Data Management courses.

8.3. TML Solutions Can Be Built In 10 Steps Using Pre-Written DAGs (Directed Acyclic Graphs)

Users simply make configuration changes to the DAGs and build the solution. TML Studio will even automatically containerize your complete solution, and auto generate online documentation.

8.4. Where Do I Start?

Attention

START HERE: The fastest way to build TML solutions with your real-time data is to use the TML Solution Studio (TSS) Container

8.5. Pre-Written 10 Apache Airflow DAGs To Speed Up TML Solution Builds

The TML solution process with DAGS (explained in detail below). The entire TML solution build process is highly efficient; advanced, scalable, real-time TML solutions can be built in few hours with GenAI integrations!

8.5.1. DAG Solution Process Explanation

Note

The above process shows Ten (10) DAGs that are used to build advanced, scalable, real-time TML solutions with no-code - just configurations to the DAGs.

Build Process starts with setting up system parameters for Initial TML Solution Setup. Users simply need to provide configuration information in the following DAG:

STEP 1: Get TML Core Params: tml_system_step_1_getparams_dag
The next step is to create all your topics in Kafka - these topic will store all your input and output data. This is done in:

STEP 2: Create Kafka Topics: tml_system_step_2_kafka_createtopic_dag

Your initial TML setup is complete.

Next, you want to start generating and producing data to the topics you creating and choose an Ingest Real-Time Data Method. TML provides you with FOUR (4) methods to stream your own data from any device. This is done in the following DAGS - you need to CHOOSE ONE method:

STEP 3: Produce to Kafka Topics

3a. MQTT: STEP 3a: Produce Data Using MQTT: tml-read-MQTT-step-3-kafka-producetotopic-dag

3b. REST API: STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

3c. gRPC: STEP 3c: Produce Data Using gRPC: tml-read-gRPC-step-3-kafka-producetotopic-dag

3d. Local File: STEP 3d: Produce Data Using LOCALFILE: tml-read-LOCALFILE-step-3-kafka-producetotopic-dag

You are also provided CLIENT files for REST API and gRPC - these clients connect to the SERVERS in 3b and 3c:

3a.i: STEP 3a.i: MQTT CLIENT

3b.i: STEP 3b.i: REST API CLIENT

3c.i: STEP 3c.i: gRPC API CLIENT

You are also provided with an MQTT method - if you are using a MQTT broker for machine to machine communication.

After you have chosen an ingest data method and producing data, you are ready to Preprocess Real-Time Data - the next DAG performs this function:

STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag - Preprocessing is a very quick way to start generating insights from your real- time in few minutes. All preprocessing is done in-memory and no external databases are needed, only Kafka. After you have preprocessed your data, you can use this preprocessed data for machine learning - the next DAG performs this function.

4a. STEP 4a: Preprocesing Data: tml-system-step-4a-kafka-preprocess-dag - This preprocessing step uses jsoncriteria to extract data from Step 4.

4b. STEP 4b: Preprocesing 2 Data: tml-system-step-4b-kafka-preprocess-dag - This second preprocessing step is an important step that uses the: preprocessed data for additional processing in machine learning. In the conventional machine learning sense, STEP 4 is like “feature engineering” and STEP 4b is using the engineered features for a much deeper understanding of the data streaming variables.
4c. STEP 4c: Preprocesing 3 Data: tml-system-step-4c-kafka-preprocess-dag - This is the third preprocessing step that allows users to incoporate TEXT: files with machine learning outputs and incoprtaing “past memory” with sliding time windows. User can control how TML maintains past memory of past sliding time windows. For details see How TML Maintains Past Memory of Events Using Sliding Time Windows

STEP 5: Entity Based Machine Learning : tml-system-step-5-kafka-machine-learning-dag - this is another powerful DAG automatically starts building
entity based machine learning models for your real-time data. Note, TML will continuously build ML models are new data streams in. All machine learning is done in-memory and no external databases are needed, only Kafka. As these models are trained on your real-time data - the next DAG performs predictions.
STEP 6: Entity Based Predictions: tml-system-step-6-kafka-predictions-dag - These predictions get automatically generated in parallel to machine
learning training process in DAG 5. As predictions are being generated, you can stream these predictions to a real-time dashboard - the next DAG performs this function.
STEP 7: Real-Time Visualization: tml-system-step-7-kafka-visualization-dag - The visualization data are streamed directly from the TML solution container over websockets to the client browser, this eliminates any need for third-party visualization software. Now, that you have built the ENTIRE TML SOLUTION END-END you are ready to deploy it to Docker - the next DAG performs this function.
STEP 8: Deploy TML Solution to Docker : tml-system-step-8-deploy-solution-to-docker-dag - The TML docker container is automatically built for you and pushed to Docker Hub. If you have chosen to integrate GPT into you solution - you can initiate the PrivateGPT and Qdrant containers - the next DAG performs this function.
STEP 9: PrivateGPT and Qdrant Integration: tml-system-step-9-privategpt_qdrant-dag - This DAG integrates your real-time solution seamlessly with GenAI using the privateGPT container see TML and Generative AI.

9b. STEP 9b: Multi-Agentic Agentic A: tml-system-step-9b-agenticai-dag - This DAG integrates Multi-Agentic AI: with your real-time solution seamlessly see TML and Agentic AI.

STEP 10: Create TML Solution Documentation: tml-system-step-10-documentation-dag.

YOU ARE DONE! You just build an advanced, scalable, end-end real-time solution and deployed it to Docker, integrated with AI and with online documentation. ENJOY!

DAGs (Directed Acyclic Graphs) are a powerful and easy way to build powerful (real-time) TML solutions quickly. Users are provided with the following DAGs:

Note

The numbers in the DAGs indicate solution process step. For example, step 2 is dependent on step 1.

8.5.2. DAG Table

DAG Name

STEP 1: Get TML Core Params: tml_system_step_1_getparams_dag

Description: This DAG will get the core TML connection and

tokens needed for operations.

STEP 2: Create Kafka Topics: tml_system_step_2_kafka_createtopic_dag

Description: This DAG will create all the necessary

topics in Kafka (on-prem or Cloud) for

your TML solution.

STEP 3a: Produce Data Using MQTT: tml-read-MQTT-step-3-kafka-producetotopic-dag

Description: This DAG is an MQTT server and will listen

for a connection from a client. You use

this if your TML solution ingests data

from MQTT system like HiveMQ and

stream it to Kafka.

STEP 3d: Produce Data Using LOCALFILE: tml-read-LOCALFILE-step-3-kafka-producetotopic-dag

Description: This DAG will read a local CSV file for

data and stream it to Kafka.

STEP 3c: Produce Data Using gRPC: tml-read-gRPC-step-3-kafka-producetotopic-dag

Description: This DAG is an gRPC server and will

listen for a connection from a gRPC client.

You use this if your TML solution ingests data

from devices and you want to

leverage a gRPC connection and stream the

data to Kafka.

STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

Description: This DAG is an RESTAPI server and will listen

for a connection from a REST client.

You use this if your TML solution ingests

data from devices and you want

to leverage a rest connection and stream the

data to Kafka.

STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag

Description: This DAG perform entity level preprocessing on

the real-time data. There are over

35 different preprocessing types in TML.

STEP 4b: Preprocesing 2 Data: tml-system-step-4b-kafka-preprocess-dag

Description: This DAG perform entity level preprocessing on the featured

engineered variables in STEP 4. The processed variables are named

in a standard way following the procedure here Preprocessed Variable Naming Standard

STEP 4c: Preprocesing 3 Data: tml-system-step-4c-kafka-preprocess-dag

Description: Step 4c is a very powerful task that will incorporate real-time memory

using sliding time windows: for details see How TML Maintains Past Memory of

Events Using Sliding Time Windows. THIS IS `RTMS SOLUTION<https://tml.readthedocs.io/en/latest/rtms.html>`_.

STEP 5: Entity Based Machine Learning : tml-system-step-5-kafka-machine-learning-dag

Description: This DAG perform entity level machine

learning on the real-time data.

STEP 6: Entity Based Predictions: tml-system-step-6-kafka-predictions-dag

Description: This DAG performs predictions using the

trained algorithms for every entity.

STEP 7: Real-Time Visualization: tml-system-step-7-kafka-visualization-dag

Description: This DAG streams the output to a

real-time dashboard.

STEP 8: Deploy TML Solution to Docker : tml-system-step-8-deploy-solution-to-docker-dag

Description: This DAG automatically deploys the entire

TML solution to Docker container - and pushes

it to Dockerhub.

STEP 9: PrivateGPT and Qdrant Integration: tml-system-step-9-privategpt_qdrant-dag

Description: This DAG integrates your real-time solution seamlessly with

GenAI using the privateGPT container see TML and Generative AI.

This is a very powerful, secure, and low-cost way of harnessing the power of

AI for fast AI analysis of your streaming data. No data is sent outside

your network, the privateGPT container runs locally.

STEP 9b: Multi-Agentic Agentic A: tml-system-step-9b-agenticai-dag

Description: This DAG integrates your real-time solution seamlessly with

Multi-Agentic AI see TML and Agentic AI.

This is a very powerful, secure, and low-cost way of harnessing the power of

Multi-Agentic AI for fast Agent-Based analysis of your streaming data. No data is sent outside

your network, the agentic AI solution container runs locally.

STEP 10: Create TML Solution Documentation: tml-system-step-10-documentation-dag

Description: This DAG will automatically create

the documentation for your solution

on readthedocs.io.

8.5.3. STEP 1: Get TML Core Params: tml_system_step_1_getparams_dag

Below is the complete definition of the tml_system_step_1_getparams_dag. Users only need to configure the code highlighted in the USER CHOSEN PARAMETERS.

Tip

For details on the parameters below refer to MAADS-VIPER Environmental Variable Configuration (Viper.env)

Watch the YouTube video on dag configurations: YouTube video

   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.operators.bash import BashOperator
   from datetime import datetime
   from airflow.decorators import dag, task
   import os
   import sys
   import tsslogging
   import time
   import subprocess
   import shutil
   import glob

   sys.dont_write_bytecode = True
   ######################################################USER CHOSEN PARAMETERS ###########################################################
   default_args = {
    'owner': 'Sebastian Maurice',  # <<< ******** change as needed
    'brokerhost' : '127.0.0.1',  # <<<<***************** THIS WILL ACCESS LOCAL KAFKA - YOU CAN CHANGE TO CLOUD KAFKA HOST
    'brokerport' : '9092',     # <<<<***************** LOCAL AND CLOUD KAFKA listen on PORT 9092
    'cloudusername' : '',  # <<<< --THIS WILL BE UPDATED FOR YOU IF USING KAFKA CLOUD WITH API KEY  - LEAVE BLANK
    'cloudpassword' : '',  # <<<< --THIS WILL BE UPDATED FOR YOU IF USING KAFKA CLOUD WITH API SECRET - LEAVE BLANK
    'solutionname': '_mysolution_',   # <<< *** DO NOT MODIFY - THIS WILL BE AUTOMATICALLY UPDATED
    'solutiontitle': 'My Solution Title', # <<< *** Provide a descriptive title for your solution
    'solutionairflowport' : '4040', # << If -1, TSS will choose a free port randonly, or set this to a fixed number
    'solutionexternalport' : '5050', # << If -1, TSS will choose a free port randonly, or set this to a fixed number
    'solutionvipervizport' : '6060', # << If -1, TSS will choose a free port randonly, or set this to a fixed number
    'description': 'This is an awesome real-time solution built by TSS',   # <<< *** Provide a description of your solution
    'HTTPADDR' : 'https://',
    'COMPANYNAME' : 'My company',
    'WRITELASTCOMMIT' : '0',   ## <<<<<<<<< ******************** FOR DETAILS ON BELOW PARAMETER SEE: https://tml.readthedocs.io/en/latest/viper.html
    'NOWINDOWOVERLAP' : '0',
    'NUMWINDOWSFORDUPLICATECHECK' : '5',
    'DATARETENTIONINMINUTES' : '1440',
    'USEHTTP' : '0',
    'ONPREM' : '0',
    'WRITETOVIPERDB' : '0',
    'VIPERDEBUG' : '2',
    'MAXOPENREQUESTS' : '10',
    'LOGSTREAMTOPIC' : 'viperlogs',
    'LOGSTREAMTOPICPARTITIONS' : '1',
    'LOGSTREAMTOPICREPLICATIONFACTOR' : '3',
    'LOGSENDTOEMAILS' : '',
    'LOGSENDTOEMAILSSUBJECT' : '[VIPER]',
    'LOGSENDTOEMAILFOOTER' : 'This e-mail is auto-generated by Transactional Machine Learning (TML) Technology Binaries: Viper, HPDE or Viperviz.  For more information please contact your TML Administrator.  Or, e-mail info@otics.ca for any questions or concerns regarding this e-mail. If you received this e-mail in error please delete it and inform your TML Admin or e-mail info@otics.ca, website: https://www.otics.ca.  Thank you for using TML Data Stream Processing and Real-Time Transactional Machine Learning technologies.',
    'LOGSENDINTERVALMINUTES' : '500',
    'LOGSENDINTERVALONLYERROR' : '1',
    'MAXTRAININGROWS' : '300',
    'MAXPREDICTIONROWS' : '50',
    'MAXPREPROCESSMESSAGES' : '5000',
    'MAXPERCMESSAGES' : '5000',
    'MAXCONSUMEMESSAGES' : '5000',
    'MAXVIPERVIZROLLBACKOFFSET' : '',
    'MAXVIPERVIZCONNECTIONS' : '10',
    'MAXURLQUERYSTRINGBYTES' : '10000',
    'MYSQLMAXLIFETIMEMINUTES' : '4',
    'MYSQLMAXCONN' : '4',
    'MYSQLMAXIDLE' : '10',
    'MYSQLHOSTNAME' : '127.0.0.1:3306',
    'KUBEMYSQLHOSTNAME' : 'mysql-service:3306', # this is the mysql service in kubernetes
    'MYSQLDB' : 'tmlids',
    'MYSQLUSER' : 'root',
    'SASLMECHANISM' : 'PLAIN',
    'MINFORECASTACCURACY' : '55',
    'COMPRESSIONTYPE' : 'gzip',
    'MAILSERVER' : '', #i.e.  smtp.broadband.rogers.com,
    'MAILPORT' : '', #i.e. 465,
    'FROMADDR' : '',
    'SMTP_USERNAME' : '',
    'SMTP_PASSWORD' : '',
    'SMTP_SSLTLS' : 'true',
    'SSL_CLIENT_CERT_FILE' : 'client.cer.pem',
    'SSL_CLIENT_KEY_FILE' : 'client.key.pem',
    'SSL_SERVER_CERT_FILE' : 'server.cer.pem',
    'KUBERNETES' : '0',
   }

   ############################################################### DO NOT MODIFY BELOW ####################################################

   def reinitbinaries(sname):
       pywindowfiles=glob.glob("/tmux/pythonwindows_*")

       for f in pywindowfiles:
           try:
             with open(f, 'r', encoding='utf-8') as file:
               data = file.readlines()
               for d in data:
                 if d != "":
                   d=d.rstrip()
                   v=subprocess.call(["tmux", "kill-window", "-t", "{}".format(d)])
             os.remove(f)
           except Exception as e:
            print("ERROR=",e)
            pass

       vizwindowfiles=glob.glob("/tmux/vipervizwindows_*")

       for f in vizwindowfiles:
           try:
             with open(f, 'r', encoding='utf-8') as file:
                data = file.readlines()
                for d in data:
                    d=d.rstrip()
                    dsw = d.split(",")[0]
                    dsp = d.split(",")[1]
                    if dsw != "":
                      subprocess.call(["tmux", "kill-window", "-t", "{}".format(dsw)])
                      v=subprocess.call(["kill", "-9", "$(lsof -i:{} -t)".format(dsp)])
                      time.sleep(1)
             os.remove(f)
           except Exception as e:
            pass

       # copy folders
       shutil.copytree("/tss_readthedocs", "/{}".format(sname),dirs_exist_ok=True)
       #remove local logs
       try:
         os.remove('/dagslocalbackup/logs.txt')
       except Exception as e:
         pass

   def updateviperenv():
       # update ALL
       os.environ['tssbuild']="0"
       os.environ['tssdoc']="0"

       cloudusername = ""
       cloudpassword = ""

       if 'KAFKACLOUDUSERNAME' in os.environ:
             cloudusername = os.environ['KAFKACLOUDUSERNAME']
       if 'KAFKACLOUDPASSWORD' in os.environ:
             cloudpassword = os.environ['KAFKACLOUDPASSWORD']
       if 'KAFKABROKERHOST' in os.environ:
             default_args['brokerhost'] = os.environ['KAFKABROKERHOST']
             default_args['brokerport']=''
       if 'SASLMECHANISM' in os.environ:
          default_args['SASLMECHANISM']=os.environ['SASLMECHANISM']

       if '127.0.0.1' in default_args['brokerhost']:
         cloudusername = ""
         cloudpassword = ""
         if 'KUBE' in os.environ:
            if os.environ['KUBE'] == "1":
             if 'KAFKABROKERHOST' in os.environ:
                 default_args['brokerhost'] = os.environ['KAFKABROKERHOST']
                 default_args['brokerport']=''
             if "KUBEBROKERHOST" in os.environ:
                 buf = os.environ['KUBEBROKERHOST']
                 sp = buf.split(":")
                 default_args['brokerhost']=sp[0]
                 default_args['brokerport']=sp[1]
             else:
                default_args['brokerhost']="kafka-service"

       filepaths = ['/Viper-produce/viper.env','/Viper-preprocess/viper.env','/Viper-preprocess1/viper.env','/Viper-preprocess-pgpt/viper.env','/Viper-preprocess-agenticai/viper.env','/Viper-preprocess2/viper.env','/Viper-preprocess3/viper.env','/Viper-ml/viper.env','/Viper-predict/viper.env','/Viperviz/viper.env']
       for mainfile in filepaths:
        with open(mainfile, 'r', encoding='utf-8') as file:
          data = file.readlines()
        r=0
        for d in data:
          if d[0] == '#':
             r += 1
             continue

          if 'KAFKA_CONNECT_BOOTSTRAP_SERVERS' in d:
            if default_args['brokerport'] == '':
              data[r] = "KAFKA_CONNECT_BOOTSTRAP_SERVERS={}\n".format(default_args['brokerhost'])
            else:
              data[r] = "KAFKA_CONNECT_BOOTSTRAP_SERVERS={}:{}\n".format(default_args['brokerhost'],default_args['brokerport'])
          if 'CLOUD_USERNAME' in d:
            data[r] = "CLOUD_USERNAME={}\n".format(cloudusername)
          if 'CLOUD_PASSWORD' in d:
            data[r] = "CLOUD_PASSWORD={}\n".format(cloudpassword)
          if 'WRITELASTCOMMIT' in d:
            data[r] = "WRITELASTCOMMIT={}\n".format(default_args['WRITELASTCOMMIT'])
          if 'NOWINDOWOVERLAP' in d:
            data[r] = "NOWINDOWOVERLAP={}\n".format(default_args['NOWINDOWOVERLAP'])
          if 'NUMWINDOWSFORDUPLICATECHECK' in d:
            data[r] = "NUMWINDOWSFORDUPLICATECHECK={}\n".format(default_args['NUMWINDOWSFORDUPLICATECHECK'])
          if 'USEHTTP' in d:
            data[r] = "USEHTTP={}\n".format(default_args['USEHTTP'])
          if 'ONPREM' in d:
            data[r] = "ONPREM={}\n".format(default_args['ONPREM'])
          if 'WRITETOVIPERDB' in d:
            data[r] = "WRITETOVIPERDB={}\n".format(default_args['WRITETOVIPERDB'])
          if 'VIPERDEBUG' in d:
            data[r] = "VIPERDEBUG={}\n".format(default_args['VIPERDEBUG'])
          if 'MAXOPENREQUESTS' in d:
            data[r] = "MAXOPENREQUESTS={}\n".format(default_args['MAXOPENREQUESTS'])
          if 'LOGSTREAMTOPIC' in d:
            data[r] = "LOGSTREAMTOPIC={}\n".format(default_args['LOGSTREAMTOPIC'])
          if 'LOGSTREAMTOPICPARTITIONS' in d:
            data[r] = "LOGSTREAMTOPICPARTITIONS={}\n".format(default_args['LOGSTREAMTOPICPARTITIONS'])
          if 'LOGSTREAMTOPICREPLICATIONFACTOR' in d:
            data[r] = "LOGSTREAMTOPICREPLICATIONFACTOR={}\n".format(default_args['LOGSTREAMTOPICREPLICATIONFACTOR'])
          if 'LOGSENDTOEMAILS' in d:
            data[r] = "LOGSENDTOEMAILS={}\n".format(default_args['LOGSENDTOEMAILS'])
          if 'LOGSENDTOEMAILSSUBJECT' in d:
            data[r] = "LOGSENDTOEMAILSSUBJECT={}\n".format(default_args['LOGSENDTOEMAILSSUBJECT'])
          if 'LOGSENDTOEMAILFOOTER' in d:
            data[r] = "LOGSENDTOEMAILFOOTER={}\n".format(default_args['LOGSENDTOEMAILFOOTER'])
          if 'LOGSENDINTERVALMINUTES' in d:
            data[r] = "LOGSENDINTERVALMINUTES={}\n".format(default_args['LOGSENDINTERVALMINUTES'])
          if 'LOGSENDINTERVALONLYERROR' in d:
            data[r] = "LOGSENDINTERVALONLYERROR={}\n".format(default_args['LOGSENDINTERVALONLYERROR'])
          if 'MAXTRAININGROWS' in d:
            data[r] = "MAXTRAININGROWS={}\n".format(default_args['MAXTRAININGROWS'])
          if 'MAXPREDICTIONROWS' in d:
            data[r] = "MAXPREDICTIONROWS={}\n".format(default_args['MAXPREDICTIONROWS'])
          if 'MAXPREPROCESSMESSAGES' in d:
            data[r] = "MAXPREPROCESSMESSAGES={}\n".format(default_args['MAXPREPROCESSMESSAGES'])
          if 'MAXPERCMESSAGES' in d:
            data[r] = "MAXPERCMESSAGES={}\n".format(default_args['MAXPERCMESSAGES'])
          if 'MAXCONSUMEMESSAGES' in d:
            data[r] = "MAXCONSUMEMESSAGES={}\n".format(default_args['MAXCONSUMEMESSAGES'])
          if 'MAXVIPERVIZROLLBACKOFFSET' in d:
            data[r] = "MAXVIPERVIZROLLBACKOFFSET={}\n".format(default_args['MAXVIPERVIZROLLBACKOFFSET'])
          if 'MAXVIPERVIZCONNECTIONS' in d:
            data[r] = "MAXVIPERVIZCONNECTIONS={}\n".format(default_args['MAXVIPERVIZCONNECTIONS'])
          if 'MAXURLQUERYSTRINGBYTES' in d:
            data[r] = "MAXURLQUERYSTRINGBYTES={}\n".format(default_args['MAXURLQUERYSTRINGBYTES'])
          if 'MYSQLMAXLIFETIMEMINUTES' in d:
            data[r] = "MYSQLMAXLIFETIMEMINUTES={}\n".format(default_args['MYSQLMAXLIFETIMEMINUTES'])
          if 'MYSQLMAXCONN' in d:
            data[r] = "MYSQLMAXCONN={}\n".format(default_args['MYSQLMAXCONN'])
          if 'MYSQLMAXIDLE' in d:
            data[r] = "MYSQLMAXIDLE={}\n".format(default_args['MYSQLMAXIDLE'])
          if 'SASLMECHANISM' in d:
            data[r] = "SASLMECHANISM={}\n".format(default_args['SASLMECHANISM'])
          if 'MINFORECASTACCURACY' in d:
            data[r] = "MINFORECASTACCURACY={}\n".format(default_args['MINFORECASTACCURACY'])
          if 'COMPRESSIONTYPE' in d:
            data[r] = "COMPRESSIONTYPE={}\n".format(default_args['COMPRESSIONTYPE'])
          if 'MAILSERVER' in d:
            data[r] = "MAILSERVER={}\n".format(default_args['MAILSERVER'])
          if 'MAILPORT' in d:
            data[r] = "MAILPORT={}\n".format(default_args['MAILPORT'])
          if 'FROMADDR' in d:
            data[r] = "FROMADDR={}\n".format(default_args['FROMADDR'])
          if 'SMTP_USERNAME' in d:
            data[r] = "SMTP_USERNAME={}\n".format(default_args['SMTP_USERNAME'])
          if 'SMTP_PASSWORD' in d:
            data[r] = "SMTP_PASSWORD={}\n".format(default_args['SMTP_PASSWORD'])
          if 'SMTP_SSLTLS' in d:
            data[r] = "SMTP_SSLTLS={}\n".format(default_args['SMTP_SSLTLS'])
          if 'SSL_CLIENT_CERT_FILE' in d:
            data[r] = "SSL_CLIENT_CERT_FILE={}\n".format(default_args['SSL_CLIENT_CERT_FILE'])
          if 'SSL_CLIENT_KEY_FILE' in d:
            data[r] = "SSL_CLIENT_KEY_FILE={}\n".format(default_args['SSL_CLIENT_KEY_FILE'])
          if 'SSL_SERVER_CERT_FILE' in d:
            data[r] = "SSL_SERVER_CERT_FILE={}\n".format(default_args['SSL_SERVER_CERT_FILE'])
          if 'KUBERNETES' in d:
            data[r] = "KUBERNETES={}\n".format(default_args['KUBERNETES'])
          if 'COMPANYNAME' in d:
            data[r] = "COMPANYNAME={}\n".format(default_args['COMPANYNAME'])
          if 'MYSQLHOSTNAME' in d:
            if "KUBE" in os.environ:
              if os.environ["KUBE"] == "1":
               data[r] = "MYSQLHOSTNAME={}\n".format(default_args['KUBEMYSQLHOSTNAME'])
              else:
               data[r] = "MYSQLHOSTNAME={}\n".format(default_args['MYSQLHOSTNAME'])
            else:
              data[r] = "MYSQLHOSTNAME={}\n".format(default_args['MYSQLHOSTNAME'])
          if 'MYSQLDB' in d:
            data[r] = "MYSQLDB={}\n".format(default_args['MYSQLDB'])
          if 'MYSQLUSER' in d:
            data[r] = "MYSQLUSER={}\n".format(default_args['MYSQLUSER'])

          r += 1
        with open(mainfile, 'w', encoding='utf-8') as file:
         file.writelines(data)

       subprocess.call("/tmux/starttml.sh", shell=True)
       time.sleep(3)

   def getparams(**context):
     args = default_args
     VIPERHOST = ""
     VIPERPORT = ""
     HTTPADDR = args['HTTPADDR']
     HPDEHOST = ""
     HPDEPORT = ""
     VIPERTOKEN = ""
     HPDEHOSTPREDICT = ""
     HPDEPORTPREDICT = ""

     tsslogging.locallogs("INFO", "STEP 1: Build started")

     try:
       if os.environ['TSS']=="1":
        if 'READTHEDOCS' in os.environ:
         if  len(os.environ['READTHEDOCS']) < 4:
           sys.exit()
         f = open("/tmux/rd4.txt", "w")
         rd=os.environ['READTHEDOCS']
         f.write(rd[:4])
         f.close()
        else:
          sys.exit()
     except Exception as e:
       pass

     if os.environ['TSS']=="1":
       try:
         shutil.rmtree("/rawdata/rtms")
       except Exception as e:
          pass
       try:
          with open("/tmux/step5.txt", "r") as f:
              dirbuf=f.read()
              shutil.rmtree(dirbuf)
       except Exception as e:
         pass

     sd = context['dag'].dag_id
     pname = args['solutionname']
     sname = tsslogging.rtdsolution(pname,sd)
     try:
       f = open("/tmux/step1projectname.txt", "w")
       f.write(pname)
       f.close()
     except Exception as e:
       pass

     try:
       f = open("/tmux/step1solution.txt", "w")
       f.write(sname)
       f.close()
     except Exception as e:
       pass

     if 'step1description' in os.environ:
       desc = os.environ['step1description']
     else:
       desc = args['description']

     if 'step1solutiontitle' in os.environ:
       stitle = os.environ['step1solutiontitle']
     else:
       stitle = args['solutiontitle']

     brokerhost = args['brokerhost']
     brokerport = args['brokerport']
     reinitbinaries(sname)
     updateviperenv()

     with open("/Viper-produce/admin.tok", "r") as f:
       VIPERTOKEN=f.read()

     if VIPERHOST=="":
       with open('/Viper-produce/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOST = output.split(",")[0]
         VIPERPORT = output.split(",")[1]
       with open('/Viper-preprocess/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESS = output.split(",")[0]
         VIPERPORTPREPROCESS = output.split(",")[1]
       with open('/Viper-preprocess1/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESS1 = output.split(",")[0]
         VIPERPORTPREPROCESS1 = output.split(",")[1]
       with open('/Viper-preprocess2/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESS2 = output.split(",")[0]
         VIPERPORTPREPROCESS2 = output.split(",")[1]
       with open('/Viper-preprocess3/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESS3 = output.split(",")[0]
         VIPERPORTPREPROCESS3 = output.split(",")[1]
       with open('/Viper-preprocess-pgpt/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESSPGPT = output.split(",")[0]
         VIPERPORTPREPROCESSPGPT = output.split(",")[1]
       with open('/Viper-preprocess-agenticai/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESSAGENTICAI = output.split(",")[0]
         VIPERPORTPREPROCESSAGENTICAI = output.split(",")[1]
       with open('/Viper-ml/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTML = output.split(",")[0]
         VIPERPORTML = output.split(",")[1]
       with open('/Viper-predict/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREDICT = output.split(",")[0]
         VIPERPORTPREDICT = output.split(",")[1]
       with open('/Hpde/hpde.txt', 'r') as f:
         output = f.read()
         HPDEHOST = output.split(",")[0]
         HPDEPORT = output.split(",")[1]
       with open('/Hpde-predict/hpde.txt', 'r') as f:
         output = f.read()
         HPDEHOSTPREDICT = output.split(",")[0]
         HPDEPORTPREDICT = output.split(",")[1]


     if 'CHIP' in os.environ:
        chip = os.environ['CHIP']
        chip = chip.lower()
     else:
         chip = 'amd64'

     if 'VIPERVIZPORT' in os.environ:
         if os.environ['VIPERVIZPORT'] != '' and os.environ['VIPERVIZPORT'] != '-1':
              vipervizport = int(os.environ['VIPERVIZPORT'])
         else:
              vipervizport=tsslogging.getfreeport()
     else:
              vipervizport=tsslogging.getfreeport()

     #   Check the solution airflow port and see if user modfifed port in kubernetes
     if default_args['solutionairflowport'] != '-1':
             solutionairflowport = int(default_args['solutionairflowport'])
             if 'KUBE' in os.environ:
               if os.environ['KUBE'] == '1' and int(os.environ['SOLUTIONAIRFLOWPORT']) != '-1':
                 solutionairflowport = int(os.environ['SOLUTIONAIRFLOWPORT'])
     else:
        if 'KUBE' in os.environ:
           if os.environ['KUBE'] == "0":
             solutionairflowport=tsslogging.getfreeport()
           elif int(os.environ['SOLUTIONAIRFLOWPORT']) != '-1':
            solutionairflowport=int(os.environ['SOLUTIONAIRFLOWPORT'])
           else:
             solutionairflowport=tsslogging.getfreeport()
        else:
         solutionairflowport=tsslogging.getfreeport()

     #   Check the solution external port and see if user modfifed port in kubernetes
     if default_args['solutionexternalport'] != '-1':
             solutionexternalport = int(default_args['solutionexternalport'])
             if 'KUBE' in os.environ:
               if os.environ['KUBE'] == '1' and int(os.environ['SOLUTIONEXTERNALPORT']) != '-1':
                 solutionexternalport = int(os.environ['SOLUTIONEXTERNALPORT'])
     else:
        if 'KUBE' in os.environ:
           if os.environ['KUBE'] == "0":
             solutionexternalport=tsslogging.getfreeport()
           elif int(os.environ['SOLUTIONEXTERNALPORT']) != '-1':
            solutionexternalport=int(os.environ['SOLUTIONEXTERNALPORT'])
           else:
             solutionexternalport=tsslogging.getfreeport()
        else:
         solutionexternalport=tsslogging.getfreeport()

     #   Check the solution visualization port and see if user modfifed port in kubernetes
     if default_args['solutionvipervizport'] != '-1':
             solutionvipervizport = int(default_args['solutionvipervizport'])
             if 'KUBE' in os.environ:
               if os.environ['KUBE'] == '1' and int(os.environ['SOLUTIONVIPERVIZPORT']) != '-1':
                 solutionvipervizport = int(os.environ['SOLUTIONVIPERVIZPORT'])
     else:
        if 'KUBE' in os.environ:
           if os.environ['KUBE'] == "0":
             solutionvipervizport=tsslogging.getfreeport()
           elif int(os.environ['SOLUTIONVIPERVIZPORT']) != '-1':
            solutionvipervizport=int(os.environ['SOLUTIONVIPERVIZPORT'])
           else:
             solutionvipervizport=tsslogging.getfreeport()
        else:
         solutionvipervizport=tsslogging.getfreeport()

     if 'AIRFLOWPORT' in  os.environ:
         airflowport = os.environ['AIRFLOWPORT']
     else:
         airflowport = tsslogging.getfreeport()

     externalport=VIPERPORT
     if 'EXTERNALPORT' in  os.environ:
         if os.environ['EXTERNALPORT'] != "-1":
           externalport = os.environ['EXTERNALPORT']

     tss = os.environ['TSS']
     task_instance = context['task_instance']

     if tss == "1":
       task_instance.xcom_push(key="{}_SOLUTIONEXTERNALPORT".format(sname),value="_{}".format(solutionexternalport))
       task_instance.xcom_push(key="{}_SOLUTIONVIPERVIZPORT".format(sname),value="_{}".format(solutionvipervizport))
       task_instance.xcom_push(key="{}_SOLUTIONAIRFLOWPORT".format(sname),value="_{}".format(solutionairflowport))
     else:
       task_instance.xcom_push(key="{}_SOLUTIONEXTERNALPORT".format(sname),value="_{}".format(os.environ['SOLUTIONEXTERNALPORT']))
       task_instance.xcom_push(key="{}_SOLUTIONVIPERVIZPORT".format(sname),value="_{}".format(os.environ['SOLUTIONVIPERVIZPORT']))
       task_instance.xcom_push(key="{}_SOLUTIONAIRFLOWPORT".format(sname),value="_{}".format(os.environ['SOLUTIONAIRFLOWPORT']))
      # killports()

     if 'MQTTUSERNAME' in os.environ:
       task_instance.xcom_push(key="{}_MQTTUSERNAME".format(sname),value=os.environ['MQTTUSERNAME'])
     else:
       task_instance.xcom_push(key="{}_MQTTUSERNAME".format(sname),value="")

     if 'MQTTPASSWORD' in os.environ:
       task_instance.xcom_push(key="{}_MQTTPASSWORD".format(sname),value=os.environ['MQTTPASSWORD'])
     else:
       task_instance.xcom_push(key="{}_MQTTPASSWORD".format(sname),value="")

     if 'KAFKACLOUDUSERNAME' in os.environ:
       task_instance.xcom_push(key="{}_KAFKACLOUDUSERNAME".format(sname),value=os.environ['KAFKACLOUDUSERNAME'])
     else:
       task_instance.xcom_push(key="{}_KAFKACLOUDUSERNAME".format(sname),value="")

     if 'KAFKACLOUDPASSWORD' in os.environ:
       task_instance.xcom_push(key="{}_KAFKACLOUDPASSWORD".format(sname),value=os.environ['KAFKACLOUDPASSWORD'])
     else:
       task_instance.xcom_push(key="{}_KAFKACLOUDPASSWORD".format(sname),value="")

     task_instance.xcom_push(key="{}_TSS".format(sname),value="_{}".format(tss))

     task_instance.xcom_push(key="{}_EXTERNALPORT".format(sname),value="_{}".format(externalport))
     task_instance.xcom_push(key="{}_AIRFLOWPORT".format(sname),value="_{}".format(airflowport))

     task_instance.xcom_push(key="{}_VIPERVIZPORT".format(sname),value="_{}".format(vipervizport))
     task_instance.xcom_push(key="{}_VIPERTOKEN".format(sname),value=VIPERTOKEN)
     task_instance.xcom_push(key="{}_VIPERHOST".format(sname),value=VIPERHOST)
     task_instance.xcom_push(key="{}_VIPERPORT".format(sname),value="_{}".format(VIPERPORT))
     task_instance.xcom_push(key="{}_VIPERHOSTPRODUCE".format(sname),value=VIPERHOST)
     task_instance.xcom_push(key="{}_VIPERPORTPRODUCE".format(sname),value="_{}".format(VIPERPORT))
     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESS".format(sname),value=VIPERHOSTPREPROCESS)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESS".format(sname),value="_{}".format(VIPERPORTPREPROCESS))
     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESS1".format(sname),value=VIPERHOSTPREPROCESS1)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESS1".format(sname),value="_{}".format(VIPERPORTPREPROCESS1))

     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESS2".format(sname),value=VIPERHOSTPREPROCESS2)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESS2".format(sname),value="_{}".format(VIPERPORTPREPROCESS2))
     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESS3".format(sname),value=VIPERHOSTPREPROCESS3)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESS3".format(sname),value="_{}".format(VIPERPORTPREPROCESS3))

     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESSPGPT".format(sname),value=VIPERHOSTPREPROCESSPGPT)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESSPGPT".format(sname),value="_{}".format(VIPERPORTPREPROCESSPGPT))

     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESSAGENTICAI".format(sname),value=VIPERHOSTPREPROCESSAGENTICAI)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESSAGENTICAI".format(sname),value="_{}".format(VIPERPORTPREPROCESSAGENTICAI))

     task_instance.xcom_push(key="{}_VIPERHOSTML".format(sname),value=VIPERHOSTML)
     task_instance.xcom_push(key="{}_VIPERPORTML".format(sname),value="_{}".format(VIPERPORTML))
     task_instance.xcom_push(key="{}_VIPERHOSTPREDICT".format(sname),value=VIPERHOSTPREDICT)
     task_instance.xcom_push(key="{}_VIPERPORTPREDICT".format(sname),value="_{}".format(VIPERPORTPREDICT))
     task_instance.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)
     task_instance.xcom_push(key="{}_HPDEHOST".format(sname),value=HPDEHOST)
     task_instance.xcom_push(key="{}_HPDEPORT".format(sname),value="_{}".format(HPDEPORT))
     task_instance.xcom_push(key="{}_HPDEHOSTPREDICT".format(sname),value=HPDEHOSTPREDICT)
     task_instance.xcom_push(key="{}_HPDEPORTPREDICT".format(sname),value="_{}".format(HPDEPORTPREDICT))
     task_instance.xcom_push(key="{}_solutionname".format(sd),value=sname)
     task_instance.xcom_push(key="{}_projectname".format(sd),value=pname)
     task_instance.xcom_push(key="{}_solutiondescription".format(sname),value=desc)
     task_instance.xcom_push(key="{}_solutiontitle".format(sname),value=stitle)

     task_instance.xcom_push(key="{}_containername".format(sname),value='')
     task_instance.xcom_push(key="{}_brokerhost".format(sname),value=brokerhost)
     task_instance.xcom_push(key="{}_brokerport".format(sname),value="_{}".format(brokerport))
     task_instance.xcom_push(key="{}_chip".format(sname),value=chip)

     tsslogging.locallogs("INFO", "STEP 1: completed - TML system parameters successfully gathered")

8.5.3.1. DAG STEP 1: Parameter Explanation

Json Key	Description
owner	Change as needed.
start_date	Date of solution creation
brokerhost	This is the IP address for Kafka. If Kafka is running on localhost then use ‘127.0.0.1’ or add Kafka Cloud cluster address. Note, if using multiple brokers, you can separate them by a comma, and set brokerport as empty.
brokerport	The default port for Kafka on-premise or in the cloud is ‘9092’
cloudusername	If you are running Kafka on-premise on 127.0.0.1 - then this should be left blank. If you are using Kafka Cloud then this is the API KEY
cloudpassword	If you are running Kafka on-premise on 127.0.0.1 - then this should be left blank. If you are using Kafka Cloud then this is the API SECRET
solutionairflowport	This is your solution airflow port. If -1, TSS will choose a free port randomly, or set this to a fixed number to prevent the port from changing.
solutionexternalport	This is an external port that you WILL need to stream external data to your TML solution when using: You will need this port in the REST, and gRPC clients. If -1, TSS will choose a free port randomly, or set this to a fixed number to prevent the port from changing.
solutionvipervizport	This is your solution dashboard port. If -1, TSS will choose a free port randomly, or set this to a fixed number to prevent port from changing.
ingestdatamethod	You must choose how you will ingest your data. Choose ONE Method from: localfile mqtt rest grpc
solutionname	DO NOT MODIFY THIS WILL BE AUTOMATICALLY UPDATED when you create your solution. Refer to Lets Start Building a TML Solution
solutiontitle	Provide a descriptive title for your solution
description	Describe your solution in one-line.
retries	Change are neede, i.e. 1 is usually fine.
KUBEMYSQLHOSTNAME	If deploying in Kubernetes - the MySql service will be used.

8.5.4. STEP 2: Create Kafka Topics: tml_system_step_2_kafka_createtopic_dag

Below is the complete definition of the tml_system_step_2_kafka_createtopic_dag that creates all the topics for your solution. Users only need to configure the code highlighted in the USER CHOSEN PARAMETERS.

Tip

Watch the YouTube video for Step 2 dag configurations. YouTube Video

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import maadstml
 import sys
 import tsslogging
 import os
 import subprocess

 sys.dont_write_bytecode = True

 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
  'owner' : 'Sebastian Maurice', # <<< ********** You change as needed
  'companyname': 'Otics',  # <<< ********** You change as needed
   'myname' : 'Sebastian',  # <<< ********** You change as needed
   'myemail' : 'Sebastian.Maurice',  # <<< ********** You change as needed
   'mylocation' : 'Toronto',  # <<< ********** You change as needed
   'replication' : '1',  # <<< ********** You change as needed
   'numpartitions': '1',  # <<< ********** You change as needed
   'enabletls': '1',  # <<< ********** You change as needed
   'brokerhost' : '',  # <<< ********** Leave as is
   'brokerport' : '-999',  # <<< ********** Leave as is
   'microserviceid' : '',  # <<< ********** You change as needed
   'raw_data_topic' : 'iot-raw-data', # Separate multiple topics with comma <<< ********** You change topic names as needed
   'preprocess_data_topic' : 'iot-preprocess,iot-preprocess2', # Separate multiple topics with comma <<< ********** You change topic names as needed
   'ml_data_topic' : 'ml-data', # Separate multiple topics with comma <<< ********** You change topic names as needed
   'prediction_data_topic' : 'prediction-data', # Separate multiple topics with comma <<< ********** You change topic names as needed
   'pgpt_data_topic' : 'cisco-network-privategpt',  #  PrivateGPT will produce responses to this topic - change as  needed
   'description' : 'Topics to store iot data',
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 def deletetopics(topic):

     if 'KUBE' in os.environ:
        if os.environ['KUBE'] == "1":
          return
     buf = "/Kafka/kafka_2.13-3.0.0/bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic {} --delete".format(topic)

     proc=subprocess.run(buf, shell=True)
     #proc.terminate()
     #proc.wait()

     repo=tsslogging.getrepo()
     tsslogging.tsslogit("Deleting topic {} in {}".format(topic,os.path.basename(__file__)), "INFO" )
     tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

 def setupkafkatopics(**context):
  # Set personal data

   tsslogging.locallogs("INFO", "STEP 2: Create topics started")

   args = default_args
   companyname=args['companyname']
   myname=args['myname']
   myemail=args['myemail']
   mylocation=args['mylocation']
   description=args['description']

   # Replication factor for Kafka redundancy
   replication=int(args['replication'])
   # Number of partitions for joined topic
   numpartitions=int(args['numpartitions'])
   # Enable SSL/TLS communication with Kafka
   enabletls=int(args['enabletls'])
   # If brokerhost is empty then this function will use the brokerhost address in your
   brokerhost=args['brokerhost']
   # If this is -999 then this function uses the port address for Kafka in VIPER.ENV in the
   # field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
   brokerport=int(args['brokerport'])
   # If you are using a reverse proxy to reach VIPER then you can put it here - otherwise if
   # empty then no reverse proxy is being used
   microserviceid=args['microserviceid']

   if 'step2raw_data_topic' in os.environ:
      args['raw_data_topic']=os.environ['step2raw_data_topic']

   if 'step2preprocess_data_topic' in os.environ:
      args['preprocess_data_topic']=os.environ['step2preprocess_data_topic']

   raw_data_topic=args['raw_data_topic']
   preprocess_data_topic=args['preprocess_data_topic']
   ml_data_topic=args['ml_data_topic']
   prediction_data_topic=args['prediction_data_topic']

   sd = context['dag'].dag_id
   sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))

   VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
   VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
   VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
   mainbroker = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_brokerhost".format(sname))
   HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

   ti = context['task_instance']
   ti.xcom_push(key="{}_companyname".format(sname), value=companyname)
   ti.xcom_push(key="{}_myname".format(sname), value=myname)
   ti.xcom_push(key="{}_myemail".format(sname), value=myemail)
   ti.xcom_push(key="{}_mylocation".format(sname), value=mylocation)
   ti.xcom_push(key="{}_replication".format(sname), value="_{}".format(replication))
   ti.xcom_push(key="{}_numpartitions".format(sname), value="_{}".format(numpartitions))
   ti.xcom_push(key="{}_enabletls".format(sname), value="_{}".format(enabletls))
   ti.xcom_push(key="{}_microserviceid".format(sname), value=microserviceid)
   ti.xcom_push(key="{}_raw_data_topic".format(sname), value=raw_data_topic)
   ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=preprocess_data_topic)
   ti.xcom_push(key="{}_ml_data_topic".format(sname), value=ml_data_topic)
   ti.xcom_push(key="{}_prediction_data_topic".format(sname), value=prediction_data_topic)



   #############################################################################################################
   #                         CREATE TOPIC TO STORE TRAINED PARAMS FROM ALGORITHM

   topickeys = ['raw_data_topic','preprocess_data_topic','ml_data_topic','prediction_data_topic','pgpt_data_topic']
   VIPERHOSTMAIN = "{}{}".format(HTTPADDR,VIPERHOST)
   ptarr = ""
   for k in topickeys:
     producetotopic=args[k]
     description=args['description']
     if producetotopic != "":
       ptarr = ptarr + producetotopic.strip() + ","
     topicsarr = producetotopic.split(",")
     for topic in topicsarr:
         if topic != '' and "127.0.0.1" in mainbroker:
           try:
             deletetopics(topic)
           except Exception as e:
             print("ERROR: ",e)
             continue

   if '127.0.0.1' in mainbroker:
         replication=1

     #for topic in topicsarr:
   if ptarr != '':
      ptarr=ptarr[:-1]
      print("Creating topic=",ptarr)
      try:
         result=maadstml.vipercreatetopic(VIPERTOKEN,VIPERHOSTMAIN,VIPERPORT[1:],ptarr,companyname,
                                  myname,myemail,mylocation,description,enabletls,
                                  brokerhost,brokerport,numpartitions,replication,
                                  microserviceid='')
      except Exception as e:
        tsslogging.locallogs("ERROR", "STEP 2: Cannot create topic {} in {} - {}".format(ptarr,os.path.basename(__file__),e))

        repo=tsslogging.getrepo()
        tsslogging.tsslogit("Cannot create topic {} in {} - {}".format(topic,os.path.basename(__file__),e), "ERROR" )
        tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

   tsslogging.locallogs("INFO", "STEP 2: Completed")

8.5.4.1. DAG STEP 2: Parameter Explanation

owner	Change as needed
companyname	Change as needed
myname	Change as needed
myemail	Change as needed
mylocation	Change as needed
replication	If using on-premise Kafka at address 127.0.0.1 then this should be 1. If using Kafka Cloud - then this MUST a minimum of 3
numpartitions	Specific partition for topics - usually mimimum 3 partitions is fine
enabletls	Set to 1 for TLS encryption, 0 for no encryption
brokerhost	The setting in Step 1 is fine
brokerport	The setting in Step 1 is fine
microserviceid	If you are using a microservice in a load balancer i.e. NGINX you can specify the route here
raw_data_topic	This is the data your solution will produce raw data to - see STEP 3: Produce to Kafka Topics
preprocess_data_topic	This is where all the preprocess data will be stored - separate multiple topics with a comma
ml_data_topic	This is where the ML estimated paramters are stored.
prediction_data_topic	This is where all the predictions will be stored.
description	Description for the topics.
start_date	Solution start date
retries	DAG retries, i.e. 1 is usually fine

8.5.5. STEP 3: Produce to Kafka Topics

Important

You must CHOOSE how you want to ingest data and produce to a Kafka topic.

TML solution provides 4 (FOUR) ways to ingest data and produce to a topic: MQTT, gRPC, RESTAPI, LOCALFILE. The following DAGs in the table are SERVER files. These server files wait for connections from the client files. For further convenience, client files are provides to access the server DAGs below.

Tip

The client examples for LOCALFILE, REST, MQTT, gRPC the data file can be download from Github:

https://github.com/smaurice101/raspberrypi/tree/main/tml-airflow/data

Also, watch this youtube video that describes the four ingeston methods: YouTube

8.5.5.1. Four Ways to Ingest Data Into Your TML Solution Container

Data Ingest DAG Name	Client File Name	Description
tml-read-MQTT-step-3-kafka-producetotopic-dag	An on_message(client, userdata, msg) event is triggered by the MQTT broker. This DAGs will automatically handle the on_message event and produce the data to Kafka.	This DAG is an MQTT server and will listen for a connection from a client. You use this if your TML solution ingests data from MQTT system like HiveMQ and stream it to Kafka.
tml-read-LOCALFILE-step-3-kafka-producetotopic-dag	You can process a localfile and stream the data to kafka.	This DAG will read a local CSV file for data and stream it to Kafka.
tml-read-gRPC-step-3-kafka-producetotopic-dag	NOTE: For this client you will also need: tml_grpc_pb2_grpc, and tml_grpc_pb2	This DAG is an gRPC server and will listen for a connection from a gRPC client. You use this if your TML solution ingests data from devices and you want to leverage a gRPC connection and stream the data to Kafka.
tml-read-RESTAPI-step-3-kafka-producetotopic-dag	This is one of the most popular APIs.	This DAG is an RESTAPI server and will listen for a connection from a REST client. You use this if your TML solution ingests data from devices and you want to leverage a rest connection and stream the data to Kafka.

8.5.5.2. STEP 3a: Produce Data Using MQTT: tml-read-MQTT-step-3-kafka-producetotopic-dag

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import paho.mqtt.client as paho
 from paho import mqtt
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random
 import json

 sys.dont_write_bytecode = True
 ##################################################  MQTT SERVER #####################################
 # This is a MQTT server that will handle connections from a client.  It will handle connections
 # from an MQTT client for on_message, on_connect, and on_subscribe

 # If Connecting to HiveMQ cluster you will need USERNAME/PASSWORD and mqtt_enabletls = 1
 # USERNAME/PASSWORD should be set in your DOCKER RUN command of the TSS container

 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice',
   'enabletls': '1',
   'microserviceid' : '',
   'producerid' : 'iotsolution',
   'topics' : 'iot-raw-data', # *************** This is one of the topic you created in SYSTEM STEP 2
   'identifier' : 'TML solution',
   'mqtt_broker' : '', # <<<****** Enter MQTT broker i.e. test.mosquitto.org
   'mqtt_port' : '', # <<<******** Enter MQTT port i.e. 1883, 8883    (for HiveMQ cluster)
   'mqtt_subscribe_topic' : '', # <<<******** enter name of MQTT to subscribe to i.e. tml/iot
   'mqtt_enabletls': '0', # set 1=TLS, 0=no TLSS
   'delay' : '7000', # << ******* 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'topicid' : '-999', # <<< ********* do not modify
 }

 ######################################## DO NOT MODIFY BELOW #############################################


 # This sets the lat/longs for the IoT devices so it can be map
 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""
 VIPERHOSTFROM=""
 # this is change 5
 # setting callbacks for different events to see if it works, print the message etc.
 def on_connect(client, userdata, flags, rc, properties=None):
   print("CONNACK received with code %s." % rc)

 # print which topic was subscribed to
 def on_subscribe(client, userdata, mid, granted_qos, properties=None):
   print("Subscribed: " + str(mid) + " " + str(granted_qos))

 def on_message(client, userdata, msg):
   data=json.loads(msg.payload.decode("utf-8"))
   datad = json.dumps(data)
   readdata(datad)

 def mqttserverconnect():

  repo = tsslogging.getrepo()
  tsslogging.tsslogit("MQTT producing DAG in {}".format(os.path.basename(__file__)), "INFO" )
  tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

  username = ""
  password = ""
  if 'MQTTUSERNAME' in os.environ:
        username = os.environ['MQTTUSERNAME']
  if 'MQTTPASSWORD' in os.environ:
        password = os.environ['MQTTPASSWORD']

  try:
    client = paho.Client(paho.CallbackAPIVersion.VERSION2)
    mqttBroker = default_args['mqtt_broker']
    mqttport = int(default_args['mqtt_port'])
    if default_args['mqtt_enabletls'] == "1":
      client.tls_set(tls_version=mqtt.client.ssl.PROTOCOL_TLS)
      client.username_pw_set(username, password)
  except Exception as e:
    tsslogging.locallogs("ERROR", "Cannot connect to MQTT broker in {} - {}".format(os.path.basename(__file__),e))

    tsslogging.tsslogit("ERROR: Cannot connect to MQTT broker in {} - {}".format(os.path.basename(__file__),e), "ERROR" )
    tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
    print("ERROR: Cannot connect to MQTT broker")
    return

  client.connect(mqttBroker,mqttport)

  if client:
    print("Connected")
    tsslogging.locallogs("INFO", "MQTT connection established...")
    client.on_subscribe = on_subscribe
    client.on_message = on_message
    b=client.subscribe(default_args['mqtt_subscribe_topic'], qos=1)
    if 'MQTT_ERR_SUCCESS' not in str(b):
            print("ERROR Making a connection to HiveMQ:",b)
            tsslogging.locallogs("ERROR", "Cannot connect to MQTT broker in {} - {}".format(os.path.basename(__file__),str(b)))
            tsslogging.tsslogit("CANNOT Connect to MQTT Broker in {}".format(os.path.basename(__file__)), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
    else:
      client.on_connect = on_connect
      client.loop_forever()
  else:
     print("Cannot Connect")
     tsslogging.locallogs("ERROR", "Cannot connect to MQTT broker in {} - {}".format(os.path.basename(__file__),e))
     tsslogging.tsslogit("CANNOT Connect to MQTT Broker in {}".format(os.path.basename(__file__)), "ERROR" )
     tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")


 def producetokafka(value, tmlid, identifier,producerid,maintopic,substream,args):
  inputbuf=value
  topicid=int(args['topicid'])

  # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
  delay=int(args['delay'])
  enabletls = int(args['enabletls'])
  identifier = args['identifier']

  try:
     result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,substream,
                                         topicid,identifier)
  except Exception as e:
     print("ERROR:",e)


 def readdata(valuedata):
   # MAin Kafka topic to store the real-time data
   maintopic = default_args['topics']
   producerid = default_args['producerid']
   try:
       producetokafka(valuedata, "", "",producerid,maintopic,"",default_args)
       # change time to speed up or slow down data
       #time.sleep(0.15)
   except Exception as e:
       print(e)
       pass

 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startproducing(**context):
        global VIPERTOKEN
        global VIPERHOST
        global VIPERPORT
        global HTTPADDR
        global VIPERHOSTFROM

        tsslogging.locallogs("INFO", "STEP 3: producing data started")

        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        hs,VIPERHOSTFROM=tsslogging.getip(VIPERHOST)
        ti = context['task_instance']
        ti.xcom_push(key="{}_PRODUCETYPE".format(sname),value='MQTT')
        ti.xcom_push(key="{}_TOPIC".format(sname),value=default_args['topics'])
        buf = default_args['mqtt_broker'] + ":" + default_args['mqtt_port']
        ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="")
        buf="MQTT Subscription Topic: " + default_args['mqtt_subscribe_topic']
        ti.xcom_push(key="{}_IDENTIFIER".format(sname),value=buf)
        ti.xcom_push(key="{}_FROMHOST".format(sname),value="{},{}".format(hs,VIPERHOSTFROM))
        ti.xcom_push(key="{}_TOHOST".format(sname),value=VIPERHOST)

        ti.xcom_push(key="{}_TSSCLIENTPORT".format(sname),value="_{}".format(default_args['mqtt_port']))
        ti.xcom_push(key="{}_TMLCLIENTPORT".format(sname),value="_{}".format(default_args['mqtt_port']))

        ti.xcom_push(key="{}_PORT".format(sname),value="_{}".format(VIPERPORT))
        ti.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('produce',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-produce", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOSTFROM,VIPERPORT[1:]), "ENTER"])


 if __name__ == '__main__':

     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
          VIPERTOKEN = sys.argv[2]
          VIPERHOST = sys.argv[3]
          VIPERPORT = sys.argv[4]

          mqttserverconnect()

Note

There is no MQTT client because MQTT is machine to machine communication, which means if a machine is writing to an MQTT broker, the above DAG automatically gets an on_message(client, userdata, msg) event and streams the data to Kafka. This is a powerful way to use TML with MQTT to process real-time data instantly.

8.5.5.3. DAG STEP 3a: Parameter Explantion

Json Key	Explanation
owner	Change as needed
enabletls	Set to 1 for TLS encryption, 0 no encryption
microserviceid	Enter route is using a load balancer i.e. NGINX
producerid	Enter a name i.e. ‘iotsolution’
topics	The topic to store the raw data. You created in SYSTEM STEP 2
identifier	Some identifier for the data i.e. ‘TML solution data’
mqtt_broker	Enter the address of the mqtt broker i.e. test.mosquitto.org
mqtt_port	Enter MQTT port i.e. 1883
mqtt_subscribe_topic	Enter name of MQTT topic to subscribe to i.e. tml/iot
mqtt_enabletls	You can set to 1 to enable TLS or 0 no TLS. If you are using a HiveMQ cluster or some other MQTT cloud cluster, this is usually set to 1. If you are using a cloud cluster, a USERNAME/PASSWORD is also usually needed. Set the MQTTUSERNAME and MQTTPASSWORD on the Docker RUN command of your TSS container: TSS Docker Run Command
delay	Maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
topicid	Leave at -999
start_date	Solution start date
retries	DAG retries

8.5.5.4. STEP 3a.i: MQTT CLIENT

tml_client_MQTT_step_3_kafka_producetotopic.py

 import paho.mqtt.client as paho
 from paho import mqtt
 import time
 import sys
 from datetime import datetime

 default_args = {
   'mqtt_broker' : 'b526253c5560459da5337e561c142369.s1.eu.hivemq.cloud', # <<<****** Enter MQTT broker i.e. test.mosquitto.org
   'mqtt_port' : '8883', # <<<******** Enter MQTT port i.e. 1883
   'mqtt_subscribe_topic' : 'tml/iot', # <<<******** enter name of MQTT to subscribe to i.e. encyclopedia/#
   'mqtt_enabletls' : '1', # << Enable TLS if connecting to a cloud cluster like HiveMQ
 }


 sys.dont_write_bytecode = True
 ##################################################  MQTT SERVER #####################################
 # This is a MQTT server that will handle connections from a client.  It will handle connections
 # from an MQTT client for on_message, on_connect, and on_subscribe

 ######################################## USER CHOOSEN PARAMETERS ########################################


 def mqttconnection():
      username="<Enter MQTT username>"
      password="<Enter MQTT password>"

      client = paho.Client(paho.CallbackAPIVersion.VERSION2)
      mqttBroker = default_args['mqtt_broker']
      mqttport = int(default_args['mqtt_port'])
      client.tls_set(tls_version=mqtt.client.ssl.PROTOCOL_TLS)
      client.username_pw_set(username, password)
      client.connect(mqttBroker,mqttport)

      client.subscribe(default_args['mqtt_subscribe_topic'], qos=1)
      return client

 def publishtomqttbroker(client,line):

      b=client.publish(topic=default_args['mqtt_subscribe_topic'], payload=line, qos=1, retain=False)
      if 'MQTT_ERR_SUCCESS' in str(b):
         print(line)
         client.loop()
      else:
         print("ERROR Making a connection to HiveMQ:",b)

 def readdatafile(client,inputfile):

   ##############################################################
   # NOTE: You can send any "EXTERNAL" data through this API
   # It is reading a localfile as an example
   ############################################################

   try:
     file1 = open(inputfile, 'r')
     print("Data Producing to Kafka Started:",datetime.now())
   except Exception as e:
     print("ERROR: Something went wrong ",e)
     return
   k = 0
   while True:
     line = file1.readline()
     line = line.replace(";", " ")
     print("line=",line)
     # add lat/long/identifier
     k = k + 1
     try:
       if line == "":
         #break
         file1.seek(0)
         k=0
         print("Reached End of File - Restarting")
         print("Read End:",datetime.now())
         continue
       publishtomqttbroker(client,line)
       # change time to speed up or slow down data
       time.sleep(.15)
     except Exception as e:
       print(e)
       time.sleep(.15)
       pass

 client=mqttconnection()
 inputfile = "IoTDatasample.txt"
 readdatafile(client,inputfile)

8.5.5.5. MQTT Reference Architecture

If using HiveMQ cluster:

8.5.5.6. STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

   import maadstml
   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.operators.bash import BashOperator
   import json
   from datetime import datetime, timezone
   from airflow.decorators import dag, task
   from flask import Flask, request, jsonify
   from gevent.pywsgi import WSGIServer
   import sys
   import tsslogging
   import os
   import subprocess
   import time
   import random
   import shlex
   from typing import Dict, Any
   import re
   import threading
   from fastapi import FastAPI
   from fastapi.middleware.cors import CORSMiddleware
   import uvicorn
   from typing import List
   #import nest_asyncio
   #nest_asyncio.apply()

   lock = threading.Lock()
   mqtt_lock = threading.Lock()


   sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
   import scadaglobals as sg
   import scada_modbus as cv
   import mqtt_loop as mq

   VIPERTOKEN = "" #os.environ['VIPERTOKEN']
   VIPERHOST = "" #os.environ['VIPERHOST']
   VIPERPORT = "" #os.environ['VIPERPORT']
   HTTPADDR = ""
   sys.dont_write_bytecode = True
   ##################################################  REST API SERVER #####################################
   # This is a REST API server that will handle connections from a client
   # There are two endpoints you can use to stream data to this server:
   # 1. jsondataline -  You can POST a single JSONs from your client app. Your json will be streamed to Kafka topic.
   # 2. jsondataarray -  You can POST JSON arrays from your client app. Your json will be streamed to Kafka topic.


   ######################################## USER CHOOSEN PARAMETERS ########################################
   default_args = {
     'owner' : 'Sebastian Maurice',
     'enabletls': '1',
     'microserviceid' : '',
     'producerid' : 'iotsolution',
     'topics' : 'iot-raw-data', # *************** This is one of the topic you created in SYSTEM STEP 2
     'identifier' : 'TML solution',
     'tss_rest_port' : '9001',  # <<< ***** replace replace with port number i.e. this is listening on port 9000
     'rest_port' : '9002',  # <<< ***** replace replace with port number i.e. this is listening on port 9000
     'delay' : '7000', # << ******* 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
     'topicid' : '-999', # <<< ********* do not modify
   }

   ######################################## DO NOT MODIFY BELOW #############################################

   def writeviperlogs(errortype,message,VIPERTOKEN, VIPERHOST, VIPERPORT):

     args = default_args
     dt = datetime.now(timezone.utc)
     timestamp = dt.strftime("[%a, %d %b %Y %H:%M:%S UTC]")

     vmsg=f"{timestamp} {errortype.upper()} [{message}]"
     Logjson = json.dumps({
         "MESSAGE": str(vmsg),
         "SERVICE": "TML-Plugin",
         "HOST": VIPERHOST,
         "PORT": str(VIPERPORT),
         "KAFKA_CONNECT_BOOTSTRAP_SERVERS": "Kafka Broker"
     })

     #Logjson=f'{"MESSAGE":"{vmsg}","SERVICE": "TML-Plugin", "HOST": "{VIPERHOST}","PORT": "{str(VIPERPORT)}","KAFKA_CONNECT_BOOTSTRAP_SERVERS": "Kafka Broker"}'

   #  print("Logjson=",Logjson)
     producetokafka(Logjson, "", "","plugin-producer","viperlogs","",args,VIPERTOKEN, VIPERHOST, VIPERPORT)

   def producetokafka(value, tmlid, identifier,producerid,maintopic,substream,args,VIPERTOKEN, VIPERHOST, VIPERPORT):
        inputbuf=value
        topicid=int(args['topicid'])

        # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
        delay=int(args['delay'])
        enabletls = int(args['enabletls'])
        identifier = args['identifier']

        try:
           result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,substream,
                                               topicid,identifier)
           print("produce result========",result)
        except Exception as e:
           print("ERROR:",e)


   # Check if tmux window exists BEFORE creating
   def tmuxsession(windowinstance,steps):

       chip='amd64'
       mainos='linux'
       cdir=''
       isnew1=0
       isnew2=0
       viperrun=''
       viperport=-1

       if 'CHIP' in os.environ:
         chip=os.environ['CHIP']

       chip=chip.lower()
       windowinstance=windowinstance.replace("_","-")

       # start the binary
       if steps=="4":
          cdir="/Viper-preprocess"
          viperrun=f"/Viper-preprocess/viper-{mainos}-{chip}"
       if steps=="5":
          cdir="/Viper-ml"
          viperrun=f"/Viper-ml/viper-{mainos}-{chip}"
       if steps=="6":
          cdir="/Viper-predict"
          viperrun=f"/Viper-predict/viper-{mainos}-{chip}"
       if steps=="9":
          cdir="/Viper-preprocess-pgpt"
          viperrun=f"/Viper-preprocess-pgpt/viper-{mainos}-{chip}"
       if steps=="9b":
          cdir="/Viper-preprocess-agenticai"
          viperrun=f"/Viper-preprocess-agenticai/viper-{mainos}-{chip}"

       if windowinstance != 'default':
         check_result = subprocess.run(
             ["tmux", "has-session", "-t", f"plugin_{windowinstance}"],
             capture_output=True
         )
         check_result2 = subprocess.run(
             ["tmux", "has-session", "-t", f"plugin_{windowinstance}_{steps}"],
             capture_output=True
         )

         if check_result.returncode != 0:
             # Window doesn't exist - create it
             subprocess.run(["tmux", "new-session", "-d", "-s", f"plugin_{windowinstance}"])
             subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}", f"cd /{cdir}", "ENTER"], capture_output=True, text=True)
             isnew1=1
         else:
            subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}", "C-c"])

         if check_result2.returncode != 0:
             # Window doesn't exist - create it
             subprocess.run(["tmux", "new-session", "-d", "-s", f"plugin_{windowinstance}_{steps}"])
             isnew2=1
         else:
             subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}_{steps}", "C-c"])

       with open(f"{cdir}/viper.txt", 'r', encoding='utf-8') as file:
           line = file.readline()
           oldviperport=line.split(",")[1]

       if windowinstance!='default':
         subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}_{steps}", f"cd /{cdir}", "ENTER"], capture_output=True, text=True)
         subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}_{steps}", viperrun, "ENTER"], capture_output=True, text=True)

       if isnew2:
         time.sleep(5)

       with open(f"{cdir}/viper.txt", 'r', encoding='utf-8') as file:
           line = file.readline()
           viperport=line.split(",")[1]

       return oldviperport,viperport,f"plugin_{windowinstance}_{steps}",f"plugin_{windowinstance}"
       #start the script
     #  subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}", new_pythonrun, "ENTER"], capture_output=True, text=True)


   def flatten_for_shell(arg_list):
       """Flatten lists and remove newlines from strings"""
       flat_args = []
       for arg in arg_list:
           if isinstance(arg, list):
               # Strip newlines/spaces from each list item before joining
               cleaned_items = [str(x).replace('\n', '').replace('\r', '').strip() for x in arg]
               joined = ' '.join(cleaned_items)
               flat_args.append(f'"{joined}"')
           else:
               # Strip newlines from single args too
               arg_str = str(arg).replace('\n', '').replace('\r', '').strip()
               if ' ' in arg_str or ',' in arg_str:
                   flat_args.append(f'"{arg_str}"')
               else:
                   if arg_str.isdigit():
                     flat_args.append(arg_str)
                   else:
                     flat_args.append(f'"{arg_str}"')

       return ' '.join(flat_args)

   def stopstart(step,stepsarr,windowinstance='default'):

     print("Stopstart")
     pythonrun=''

     print("windowinstance==",windowinstance)
     print("step==",isinstance(step,str),step)
     step=str(step)

     if step=="4":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step4_preprocess.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)
           args[-4] = stepsarr[-5]    # raw_data_topic
           args[-3] = stepsarr[-4]    # preprocesstypes
           args[-2] = stepsarr[-3]    # jsoncriteria
           args[-1] = stepsarr[-2]    # preprocess_data_topic

           args[-6] = viperport    # rollbackoffset
           args[-5] = stepsarr[-1]    # rollbackoffset

           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")
     elif step=="5":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step5_ml.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)
           args[-11] = viperport  # viper port
           args[-8] = stepsarr[-8]
           args[-7] = stepsarr[-7]
           args[-6] = stepsarr[-6]
           args[-5] = stepsarr[-5]
           args[-4] = stepsarr[-4]
           args[-3] = stepsarr[-3]
           args[-2] = stepsarr[-2]
           args[-1] = stepsarr[-1]
           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")

     elif step=="6":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step6_predictions.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)
           args[-10] = viperport  # viper port
           args[-7] = stepsarr[-7]
           args[-6] = stepsarr[-6]
           args[-5] = stepsarr[-5]
           args[-4] = stepsarr[-4]
           args[-3] = stepsarr[-3]
           args[-2] = stepsarr[-2]
           args[-1] = stepsarr[-1]
           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")
     elif step=="9":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step9_ai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)

           args[-24] = viperport  # viper port
           args[-23] = stepsarr[-18]   #vectorcollectionname
           args[-22] = stepsarr[-17]   #consumefrom
           args[-21] = stepsarr[-16]   #pgpt data topic
           args[-18] = stepsarr[-15]    #rollback
           args[-17] = stepsarr[-14]    #prompt
           args[-16] = stepsarr[-13]    #context
           args[-15] = stepsarr[-12]   #keyattribute
           args[-14] = stepsarr[-11]   #keyprocess

           args[-13] = stepsarr[-10]    #hyperbatch
           args[-12] = stepsarr[-9]     #docfolder
           args[-11] = stepsarr[-8]    #docingestinterval

           args[-7] = stepsarr[-7]    #temp
           args[-6] = stepsarr[-6]    #vectorsearch
           args[-5] = stepsarr[-5]    ##context window
           args[-4] = stepsarr[-4]    #pgptcontainername
           args[-3] = stepsarr[-3]    #pgpthost
           args[-2] = stepsarr[-2]    #pgptport
           args[-1] = stepsarr[-1]    #vectordimension
           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")
     elif step=="9b":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step9b_agenticai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)

           args[-27] = viperport  # viper port
           args[-26] = stepsarr[-17]
           args[-25] = stepsarr[-16]
           args[-23] = stepsarr[-15]
           args[-22] = stepsarr[-14]
           args[-18] = stepsarr[-13]
           args[-17] = stepsarr[-12]
           args[-14] = stepsarr[-11]
           args[-13] = stepsarr[-10]
           args[-12] = stepsarr[-9]
           args[-11] = stepsarr[-8]
           args[-10] = stepsarr[-7]
           args[-9] = stepsarr[-6]
           args[-8] = stepsarr[-5]
           args[-7] = stepsarr[-4]
           args[-3] = stepsarr[-3]
           args[-2] = stepsarr[-2]
           args[-1] = stepsarr[-1]
           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")

     new_pythonrun=new_pythonrun.replace("<<n>>",'\n')
     if windowinstance=='default':
       subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
       subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "{}".format(new_pythonrun), "ENTER"],capture_output=True, text=True)
     else:
       subprocess.run(["tmux", "send-keys", "-t", "{}".format(swn), "{}".format(new_pythonrun), "ENTER"],capture_output=True, text=True)

       #subprocess.run(["tmux", "new", "-d", "-s", "{}".format(windowinstance)])
       #subprocess.run(["tmux", "send-keys", "-t", "{}".format(windowinstance), "{}".format(new_pythonrun), "ENTER"],capture_output=True, text=True)

   def terminatetmuxwindows(step,wn):
     # Get all tmux sessions
     wt=""
     if wn == 'all':
       result = subprocess.run(['tmux', 'list-sessions'], capture_output=True, text=True)
       sessions = result.stdout.strip().split('\n')

       for session in sessions:
           if session.startswith('plugin_'):
               session_name = session.split(':')[0]
               subprocess.run(['tmux', 'kill-session', '-t', session_name])

               print(f"Killed tmux session: {session_name}")

               mw=session_name.split("_")[1]#session_name.replace("plugin_", "", 1)
               mw=session_name
               wt = wt + mw + ","
       wt = wt[:-1]
       with open("/tmux/step4_preprocess.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
       with open("/tmux/step5_ml.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
       with open("/tmux/step6_predictions.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
       with open("/tmux/step9_ai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn
       with open("/tmux/step9b_agenticai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn
     elif wn=='default':
       if step=="4":
         with open("/tmux/step4_preprocess.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="5":
         with open("/tmux/step5_ml.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="6":
         with open("/tmux/step6_predictions.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="9b":
         with open("/tmux/step9b_agenticai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="9":
         with open("/tmux/step9_ai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="0":
         with open("/tmux/step4_preprocess.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
         with open("/tmux/step5_ml.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
         with open("/tmux/step6_predictions.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
         with open("/tmux/step9_ai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn
         with open("/tmux/step9b_agenticai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn
     else:
          subprocess.run(['tmux', 'kill-session', '-t', f"plugin_{wn}_{step}"])
          subprocess.run(['tmux', 'kill-session', '-t', f"plugin_{wn}"])
          wt = wn
     return wt

   def gettmlsystemsparams():
       repo=tsslogging.getrepo()

     ############################################### API Routes ########################################

       if VIPERHOST != "":
           #app = Flask(__name__)
           app = FastAPI()

           app.add_middleware(
                 CORSMiddleware,
                 allow_origins=["*"],  # Allow all for dev
                 allow_credentials=True,
                 allow_methods=["*"],
                 allow_headers=["*"],
           )

   #-------------------------------- TERMINATE WINDOW -----------------------------------------------------
           @app.post('/api/v1/terminatewindow')
           def windowterminate(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing windows", 400

             step = jdata.get('step','')
             windowname = jdata.get('windowname','')

             if windowname != '':
                  wd=terminatetmuxwindows(step,windowname)
                  return {
                       'status': f"success: windows terminated: {wd}",
                  }

             return {
                 'status': 'success: no windows terminated',
             }

   #-------------------------------- CREATETOPIC -----------------------------------------------------
           @app.post('/api/v1/createtopic')
           def storecreatetopic(jdata: dict):
   #          jdata = request.get_json()
             if not jdata or not jdata.get('topics'):
               return "Missing topics", 400

             topics = jdata.get('topics')
             numpartitions = int(jdata.get('numpartitions',3))
             replication = int(jdata.get('replication',1))
             description = jdata.get('description','user topic')

             enabletls = int(jdata.get('enabletls',1))
             ptarr = [t.strip() for t in topics.split(",") if t.strip()]
             brokerhost=''
             brokerport=''
             try:
               for pt in ptarr:
                 if len(pt)>0:
                   result=maadstml.vipercreatetopic(VIPERTOKEN,VIPERHOST,VIPERPORT,pt,'companyname',
                                    'myname','myemail','mylocation',description,enabletls,
                                    brokerhost,brokerport,numpartitions,replication,'')
                   print(result)
                   writeviperlogs("INFO",f"Creating Topic: {pt}",VIPERTOKEN,VIPERHOST,VIPERPORT)
               return {
                 'status': 'success',
                 'topics': topics,
                 'partitions': numpartitions,
                 'replication': replication,
                 'description': description
               }
             except Exception as e:
               writeviperlogs("ERROR",f"Creating Topic failed: {pt}: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
               return {
                 'status': f"error: {e}",
                 'topics': topics,
                 'partitions': numpartitions,
                 'replication': replication,
                 'description': description
               }


   #-------------------------------- PREPROCESS -----------------------------------------------------
           @app.post('/api/v1/preprocess')
           def storepreprocess(jdata: dict):
   #          jdata = request.get_json()
             if not jdata or not jdata.get('rawdatatopic'):
               return "Missing preprocess or invalid preprocess", 400

             step = str(jdata.get('step','') )
             try:
              if step=='4':
               step4raw_data_topic = jdata.get('rawdatatopic','')
               step4preprocess_data_topic = jdata.get('preprocessdatatopic','')
               step4preprocesstypes = jdata.get('preprocesstypes','')
               step4jsoncriteria = jdata.get('jsoncriteria','')
               rollbackoffset = jdata.get('rollbackoffsets',200)

               windowinstance = jdata.get("windowinstance","default")
               step4arr = [step4raw_data_topic,step4preprocesstypes,step4jsoncriteria,step4preprocess_data_topic,rollbackoffset]
               stopstart(step,step4arr,windowinstance)

              elif step=='4c':
                maxrows = jdata.get('maxrows',10)
                searchterms = jdata.get('searchterms','')
                rememberpastwindows = jdata.get('rememberpastwindows',5)
                patternwindowthreshold = jdata.get('patternwindowthreshold',30)
                raw_data_topic = jdata.get('raw_data_topic','')
                rtmsstream = jdata.get('rtmsstream','')
                rtmsscorethreshold = jdata.get('rtmsscorethreshold',0.6)
                attackscorethreshold = jdata.get('attackscorethreshold',0.6)
                patternscorethreshold = jdata.get('patternscorethreshold',0.6)
                localsearchtermfolder = jdata.get('localsearchtermfolder','')
                localsearchtermfolderinterval = jdata.get('localsearchtermfolderinterval','')
                rtmsfoldername = jdata.get('rtmsfoldername','')
                rtmsmaxwindows = jdata.get('rtmsmaxwindows',10000)
                windowinstance = jdata.get("windowinstance","default")
                step4carr = [maxrows,searchterms,rememberpastwindows,patternwindowthreshold,raw_data_topic,rtmsstream,rtmsscorethreshold,attackscorethreshold,patternscorethreshold,
                            localsearchtermfolder,localsearchtermfolderinterval,rtmsfoldername,rtmsmaxwindows]
                stopstart(step,step4carr,windowinstance)

              return {
                 'status': 'success',
                 'step4raw_data_topic': jdata.get('rawdatatopic',''),
                 'step4preprocess_data_topic': jdata.get('preprocessdatatopic',''),
                 'step4preprocesstypes': jdata.get('preprocesstypes',''),
                 'step4jsoncriteria': jdata.get('jsoncriteria',''),
                 'rollbackoffset': jdata.get('rollbackoffset',400),
                 'windowinstance': jdata.get("windowinstance","default")
                 }
             except Exception as e:
              writeviperlogs("ERROR",f"Preprocessing failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
              return {
                 'status': f"error:{e}",
                 'step4raw_data_topic': jdata.get('rawdatatopic',''),
                 'step4preprocess_data_topic': jdata.get('preprocessdatatopic',''),
                 'step4preprocesstypes': jdata.get('preprocesstypes',''),
                 'step4jsoncriteria': jdata.get('jsoncriteria',''),
                 'rollbackoffset': jdata.get('rollbackoffset',400),
                 'windowinstance': jdata.get("windowinstance","default")
                 }


   #-------------------------------- MACHINE LEARNING -----------------------------------------------------
           @app.post('/api/v1/ml')
           def storeml(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing ml or invalid ml", 400

             step = str(jdata.get('step','') )
             try:
               if step=="5":
                trainingdatafolder = jdata.get('trainingdatafolder','')
                ml_data_topic = jdata.get('ml_data_topic','')
                preprocess_data_topic = jdata.get('preprocess_data_topic','')
                islogistic = jdata.get('islogistic',0)
                dependentvariable = jdata.get('dependentvariable','failure')
                independentvariables = jdata.get('independentvariables','')
                processlogic = jdata.get('processlogic','')
                rollbackoffsets = jdata.get('rollbackoffsets',50)
                windowinstance = jdata.get('windowinstance','default')
                step5arr = [rollbackoffsets,processlogic,independentvariables,dependentvariable,
                            islogistic,preprocess_data_topic,ml_data_topic,trainingdatafolder]
                stopstart(step,step5arr,windowinstance)
                return {
                 'status': "success",
                 'trainingdatafolder': jdata.get('trainingdatafolder',''),
                 'ml_data_topic': jdata.get('ml_data_topic',''),
                 'preprocess_data_topic': jdata.get('preprocess_data_topic',''),
                 'islogistic': jdata.get('islogistic',0),
                 'dependentvariable': jdata.get('dependentvariable','failure'),
                 'independentvariables': jdata.get('independentvariables',''),
                 'processlogic': jdata.get('processlogic',''),
                 'rollbackoffsets': jdata.get('rollbackoffsets',50),
                 'windowinstance': jdata.get('windowinstance','default')
                 }
             except Exception as e:
                writeviperlogs("ERROR",f"Machine learning failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                return {
                 'status': f"error:{e}",
                 'trainingdatafolder': jdata.get('trainingdatafolder',''),
                 'ml_data_topic': jdata.get('ml_data_topic',''),
                 'preprocess_data_topic': jdata.get('preprocess_data_topic',''),
                 'islogistic': jdata.get('islogistic',0),
                 'dependentvariable': jdata.get('dependentvariable','failure'),
                 'independentvariables': jdata.get('independentvariables',''),
                 'processlogic': jdata.get('processlogic',''),
                 'rollbackoffsets': jdata.get('rollbackoffsets',50),
                 'windowinstance': jdata.get("windowinstance","default")
                 }

   #-------------------------------- PREDICTIONS -----------------------------------------------------
           @app.post('/api/v1/predict')
           def predictdata(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing ml or invalid prediction", 400

             step = str(jdata.get('step','') )

             try:
               if step=="6":
                pathtoalgos = jdata.get('pathtoalgos','')
                maxrows = jdata.get('rollbackoffsets',50)
                consumefrom = jdata.get('consumefrom','')
                inputdata = jdata.get('inputdata','')
                streamstojoin = jdata.get('streamstojoin','')
                ml_prediction_topic = jdata.get('ml_prediction_topic','')
                preprocess_data_topic = jdata.get('preprocess_data_topic','')
                windowinstance = jdata.get('windowinstance','default')
                step6arr = [maxrows,preprocess_data_topic,ml_prediction_topic,streamstojoin,inputdata,consumefrom,pathtoalgos]
                stopstart(step,step6arr,windowinstance)
                return {
                 'status': "success",
                  'pathtoalgos': jdata.get('pathtoalgos',''),
                  'maxrows': jdata.get('rollbackoffsets',50),
                  'consumefrom': jdata.get('consumefrom',''),
                  'inputdata': jdata.get('inputdata',''),
                  'streamstojoin': jdata.get('streamstojoin',''),
                  'ml_prediction_topic': jdata.get('ml_prediction_topic',''),
                  'preprocess_data_topic': jdata.get('preprocess_data_topic',''),
                  'windowinstance': jdata.get('windowinstance','default')
                 }
             except Exception as e:
                writeviperlogs("ERROR",f"Predictions failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                return {
                 'status': f"error:{e}",
                  'pathtoalgos': jdata.get('pathtoalgos',''),
                  'maxrows': jdata.get('rollbackoffsets',50),
                  'consumefrom': jdata.get('consumefrom',''),
                  'inputdata': jdata.get('inputdata',''),
                  'streamstojoin': jdata.get('streamstojoin',''),
                  'ml_prediction_topic': jdata.get('ml_prediction_topic',''),
                  'preprocess_data_topic': jdata.get('preprocess_data_topic',''),
                  'windowinstance': jdata.get('windowinstance','default')
                 }

   #-------------------------------- AI -----------------------------------------------------
           @app.post('/api/v1/ai')
           def aidata(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing ai or invalid ai", 400

             step = str(jdata.get('step','') )
             try:
               if step=="9":
                vectordimension = jdata.get('vectordimension','768')
                contextwindowsize= jdata.get('contextwindowsize','8192') #agent - team lead - supervisor
                vectorsearchtype= jdata.get('vectorsearchtype','Manhattan')
                temperature= float(jdata.get('temperature','0.1'))
                docfolderingestinterval= jdata.get('docfolderingestinterval','900')
                docfolder= jdata.get('docfolder','')
                vectordbcollectionname= jdata.get('vectordbcollectionname','tml-pgpt')
                hyperbatch= jdata.get('hyperbatch','0')
                keyprocesstype= jdata.get('keyprocesstype','')
                keyattribute= jdata.get('keyattribute','hyperprediction')
                context= jdata.get('context','')
                prompt= jdata.get('prompt','')
                pgptport= jdata.get('pgptport','8001')
                pgpthost= jdata.get('pgpthost','http://127.0.0.1')
                pgpt_data_topic = jdata.get('pgpt_data_topic','')
                consumefrom = jdata.get('consumefrom','')
                rollbackoffset = jdata.get('rollbackoffset','5')
                pgptcontainername = jdata.get('pgptcontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2')
                windowinstance = jdata.get('windowinstance','default')

                step9arr = [vectordbcollectionname,consumefrom,pgpt_data_topic, rollbackoffset, prompt,context,keyattribute,keyprocesstype,
                            hyperbatch,docfolder,docfolderingestinterval, temperature,vectorsearchtype,contextwindowsize,pgptcontainername, pgpthost,pgptport,vectordimension]

                stopstart(step,step9arr,windowinstance)

                return {
                 'status': "success",
                  'vectordimension': jdata.get('vectordimension','768'),
                  'contextwindowsize': jdata.get('contextwindowsize','8192'), #agent - team lead - supervisor
                  'vectorsearchtype': jdata.get('vectorsearchtype','Manhattan'),
                  'temperature': jdata.get('temperature','0.1'),
                  'docfolderingestinterval': jdata.get('docfolderingestinterval','900'),
                  'docfolder': jdata.get('docfolder',''),
                  'vectordbcollectionname': jdata.get('vectordbcollectionname','tml-pgpt'),
                  'hyperbatch': jdata.get('hyperbatch','0'),
                  'keyprocesstype': jdata.get('keyprocesstype',''),
                  'keyattribute': jdata.get('keyattribute','hyperprediction'),
                  'context': jdata.get('context',''),
                  'prompt': jdata.get('prompt',''),
                  'pgptport': jdata.get('pgptport','8001'),
                  'pgpthost': jdata.get('pgpthost','http://127.0.0.1'),
                  'pgpt_data_topic': jdata.get('pgpt_data_topic',''),
                  'consumefrom': jdata.get('consumefrom',''),
                  'rollbackoffset': jdata.get('rollbackoffset','5'),
                  'pgptcontainername': jdata.get('pgptcontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2'),
                  'windowinstance': jdata.get('windowinstance','default')
                 }
             except Exception as e:
                writeviperlogs("ERROR",f"AI failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                return {
                 'status': f"error:{e}",
                  'vectordimension': jdata.get('vectordimension','768'),
                  'contextwindowsize': jdata.get('contextwindowsize','8192'), #agent - team lead - supervisor
                  'vectorsearchtype': jdata.get('vectorsearchtype','Manhattan'),
                  'temperature': jdata.get('temperature','0.1'),
                  'docfolderingestinterval': jdata.get('docfolderingestinterval','900'),
                  'docfolder': jdata.get('docfolder',''),
                  'vectordbcollectionname': jdata.get('vectordbcollectionname','tml-pgpt'),
                  'hyperbatch': jdata.get('hyperbatch','0'),
                  'keyprocesstype': jdata.get('keyprocesstype',''),
                  'keyattribute': jdata.get('keyattribute','hyperprediction'),
                  'context': jdata.get('context',''),
                  'prompt': jdata.get('prompt',''),
                  'pgptport': jdata.get('pgptport','8001'),
                  'pgpthost': jdata.get('pgpthost','http://127.0.0.1'),
                  'pgpt_data_topic': jdata.get('pgpt_data_topic',''),
                  'consumefrom': jdata.get('consumefrom',''),
                  'rollbackoffset': jdata.get('rollbackoffset','5'),
                  'pgptcontainername': jdata.get('pgptcontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2'),
                  'windowinstance': jdata.get('windowinstance','default')
                 }

   #-------------------------------- AGENTIC AI -----------------------------------------------------
           @app.post('/api/v1/agenticai')
           def agenticaidata(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing agentic ai or invalid agentic ai", 400

             step = str(jdata.get('step','') )

             try:
               if step=="9b":
                maxrows = jdata.get('rollbackoffsets',10)
                ollamamodel= jdata.get('ollama-model','phi3:3.8b,phi3:3.8b,llama3.2:3b') #agent - team lead - supervisor
                vectordbpath= jdata.get('vectordbpath','/rawdata/vectordb')
                temperature= float(jdata.get('temperature','0.1'))
                vectordbcollectionname= jdata.get('vectordbcollectionname','tml-llm-model')
                ollamacontainername= jdata.get('ollamacontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools')
                embedding= jdata.get('embedding','nomic-embed-text')
                agents_topic_prompt= jdata.get('agents_topic_prompt','')
                teamlead_topic= jdata.get('teamlead_topic','team-lead-responses')
                teamleadprompt= jdata.get('teamleadprompt','')
                supervisor_topic= jdata.get('supervisor_topic','supervisor-responses')
                supervisorprompt= jdata.get('supervisorprompt','')
                agenttoolfunctions= jdata.get('agenttoolfunctions','')
                agent_team_supervisor_topic= jdata.get('agent_team_supervisor_topic','all-agents-responses')
                contextwindow = jdata.get('contextwindow','4096')
                localmodelsfolder = jdata.get('localmodelsfolder','/rawdata/ollama')
                agenttopic = jdata.get('agenttopic','agent-responses')
                windowinstance = jdata.get('windowinstance','default')
                step9barr = [maxrows,ollamamodel,vectordbpath,temperature,vectordbcollectionname,ollamacontainername,embedding,agents_topic_prompt,teamlead_topic,teamleadprompt,
                            supervisor_topic,supervisorprompt,agenttoolfunctions,agent_team_supervisor_topic,contextwindow,localmodelsfolder,agenttopic]
                stopstart(step,step9barr,windowinstance)

                return {
                 'status': "success",
                 'rollbackoffset': jdata.get('rollbackoffsets',10),
                 'ollamamodel': jdata.get('ollama-model','phi3:3.8b,phi3:3.8b,llama3.2:3b'), #agent - team lead - supervisor
                 'vectordbpath': jdata.get('vectordbpath','/rawdata/vectordb'),
                 'temperature': jdata.get('temperature','0.1'),
                 'vectordbcollectionname': jdata.get('vectordbcollectionname','tml-llm-model'),
                 'ollamacontainername': jdata.get('ollamacontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools'),
                 'embedding': jdata.get('embedding','nomic-embed-text'),
                 'agents_topic_prompt': jdata.get('agents_topic_prompt',''),
                 'teamlead_topic': jdata.get('teamlead_topic','team-lead-responses'),
                 'teamleadprompt': jdata.get('teamleadprompt',''),
                 'supervisor_topic': jdata.get('supervisor_topic','supervisor-responses'),
                 'supervisorprompt': jdata.get('supervisorprompt',''),
                 'agenttoolfunctions': jdata.get('agenttoolfunctions',''),
                 'agent_team_supervisor_topic': jdata.get('agent_team_supervisor_topic','all-agents-responses'),
                 'contextwindow': jdata.get('contextwindow','4096'),
                 'localmodelsfolder': jdata.get('localmodelsfolder','/rawdata/ollama'),
                 'agenttopic': jdata.get('agenttopic','agent-responses'),
                 'windowinstance': jdata.get('windowinstance','default')
                 }
             except Exception as e:
                writeviperlogs("ERROR",f"Agentic AI failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                return {
                 'status': f"error:{e}",
                 'rollbackoffset': jdata.get('rollbackoffsets',10),
                 'ollamamodel': jdata.get('ollama-model','phi3:3.8b,phi3:3.8b,llama3.2:3b'), #agent - team lead - supervisor
                 'vectordbpath': jdata.get('vectordbpath','/rawdata/vectordb'),
                 'temperature': jdata.get('temperature','0.1'),
                 'vectordbcollectionname': jdata.get('vectordbcollectionname','tml-llm-model'),
                 'ollamacontainername': jdata.get('ollamacontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools'),
                 'embedding': jdata.get('embedding','nomic-embed-text'),
                 'agents_topic_prompt': jdata.get('agents_topic_prompt',''),
                 'teamlead_topic': jdata.get('teamlead_topic','team-lead-responses'),
                 'teamleadprompt': jdata.get('teamleadprompt',''),
                 'supervisor_topic': jdata.get('supervisor_topic','supervisor-responses'),
                 'supervisorprompt': jdata.get('supervisorprompt',''),
                 'agenttoolfunctions': jdata.get('agenttoolfunctions',''),
                 'agent_team_supervisor_topic': jdata.get('agent_team_supervisor_topic','all-agents-responses'),
                 'contextwindow': jdata.get('contextwindow','4096'),
                 'localmodelsfolder': jdata.get('localmodelsfolder','/rawdata/ollama'),
                 'agenttopic': jdata.get('agenttopic','agent-responses'),
                 'windowinstance': jdata.get('windowinstance','default')
                 }

   #-------------------------------- CONSUME -----------------------------------------------------
           @app.post('/api/v1/consume')
           def consumedata(jdata: dict):
   #          jdata = request.get_json()
             osdu = jdata.get('osdu','false')
             kind = jdata.get('kind','tml')

             if not jdata or not jdata.get('topic'):
               if osdu=='false':
                 return "Missing ml or invalid consume", 400
               else:
                 return {
                     "kind": f"{kind}",
                     "id": "consume-error",
                     "error": {
                         "code": 400,
                         "message": "Missing topic or invalid consume request",
                         "reason": "Topic parameter required"
                     }
                 }
             forward_statuses = []
             maintopic = jdata.get('topic','')
             forwardurl = jdata.get('forwardurl','')
             legal = jdata.get('legal','tml-legal')

             forward_headers = {'Content-Type': 'application/json'}

             if maintopic != '':
              try:
               rollbackoffsets = int(jdata.get('rollbackoffsets',100))
               enabletls = int(jdata.get('enabletls',1))
               consumerid='tmlconsumerplugin'
               companyname='companyname'
               offset = int(jdata.get('offset',-1))
               brokerhost = ''
               brokerport = -999
               microserviceid = ''
               topicid = jdata.get('topicid','-999')
               preprocesstype = ''
               delay = 100
               partition = -1

               result=maadstml.viperconsumefromtopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,
                           consumerid,companyname,partition,enabletls,delay,
                           offset, brokerhost,brokerport,microserviceid,
                           topicid,rollbackoffsets,preprocesstype)
               now_iso = datetime.utcnow().isoformat() + "Z"
               result = json.loads(result)
               if osdu=='false':
                   response =  {
                       'status': 'consumed',
                       'topic': maintopic,
                       'Messages': result,  # viperconsumefromtopic output
                       'consumer_id': consumerid
                   }
               else:
                   response = {
                       "kind": f"{kind}",
                       "id": f"osdu:tml:consume:{maintopic}:{int(time.time())}",
                       "data": {
                           "Topic": maintopic,
                           "ConsumerID": consumerid,
                           "CompanyName": companyname,
                           "Messages": result,  # Your viperconsumefromtopic output
                           "Partition": partition,
                           "Offset": offset,
                           "RollbackOffsets": rollbackoffsets,
                           "meta": {
                               "dataPartitionId": "tml-id",
                               "createTime": f"{now_iso}",
                               "modificationTime": f"{now_iso}",
                               "acl": {
                                   "viewers": ["data.default.viewers@tml.group"],
                                   "owners": ["data.default.owners@tml.group"]
                               },
                               "legal": {
                                   "legaltags": f"{legal}",
                                   "status": "compliant"
                               }
                           }
                       }
                   }

               if forwardurl == '':
                   #print("response=",response)
                   return response
               else:
                  farr = [fw.strip() for fw in forwardurl.split(",")]  # Clean whitespace
                  for fw in farr:
                    try:
                      fwdresponse = requests.post(
                       f"{fw}",
                        json=response,
                        headers={'Content-Type': 'application/json', 'data-partition-id': 'tml-id'}, timeout=30 )
                      forward_statuses.append({
                         'url': fw.strip(),
                         'status': fwdresponse.status_code,
                         'success': fwdresponse.ok
                      })
                    except Exception as e:
                       forward_statuses.append({'url': fw.strip(), 'error': str(e)})
                       writeviperlogs("ERROR",f"Forwarding URL failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)

                  response['forward_statuses'] = forward_statuses
                  return response
              except Exception as e:
                  print("Error=",e)
                  writeviperlogs("ERROR",f"Consume failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                  return {"error": f"Consumption failed: {e}"}


   ##################### INDUSTRIAL API ##############################################################
   #-------------------------------- SCADA/MODBUS -----------------------------------------------------
           @app.post("/api/v1/scada_modbus_read")
           async def start_vessel_read(req: dict):

               #req = request.get_json()
               job_id = str(time.time())

               scada_cfg = {
                   "host": req.get("scada_host", "127.0.0.1"),
                   "port": req.get("scada_port", 2502),
                   "unit_id": req.get("slave_id", 1),
               }

               with lock:  # ✅ Thread-safe
                   if sg.read_job and sg.read_job["stop"]:
                       # Don't sleep - just skip or queue
                       pass

                   # Stop existing thread first
                   if sg.read_thread and sg.read_thread.is_alive():
                       sg.read_job["stop"] = True
                       sg.read_thread.join(timeout=float(req.get("read_interval_seconds", 0.3))+1.0)


                   sg.read_job = {"stop": False, "job_id": job_id}
                   sg.read_thread = threading.Thread(
                   target=cv.modbus_read_loop,
                   args=(
                       scada_cfg,
                       req.get("read_interval_seconds", 0.3),
                       req.get("callback_url",""),
                       req.get("max_reads",-1),
                       req.get("fields", []),
                       req.get("scaling", {}),
                       req.get("start_register", 40001) - 40001,
                       req.get("sendtotopic", ""),
                       job_id,
                       VIPERTOKEN,
                       VIPERHOST,
                       VIPERPORT,
                       default_args,
                       req.get("vessel_names", {}),
                       req.get("createvariables", "")  # ✅ Dynamic from request
                      ),
                      daemon=True,
                   )
                   sg.read_thread.start()

               return {
                   "message": "SCADA Vessel read started",
                   "job_id": job_id,
                   "config_from_request": {
                       "fields": len(req.get("fields", [])),
                       "has_createvariables": bool(req.get("createvariables"))
                   }
               }


           @app.post("/api/v1/vessel_data")
           def vessel_data_callback(data: dict):
   #            data = request.get_json()

               # DYNAMIC: Handle ANY data structure from callback
               vessel = data.get('vessel', data)  # Nested OR flat

               # DYNAMIC: Find vessel identifier (vesselIndex OR first field)
               vessel_id = (vessel or {}).get('vesselIndex',
                    next(iter(vessel), 'N/A') if vessel else 'N/A')

               # DYNAMIC: Find pressure field (operatingPressure OR first numeric)
               pressure = 0
               for key, val in vessel.items():
                   if isinstance(val, (int, float)) and 'pressure' in key.lower():
                      pressure = val
                      break

               print(f"📨 Job {data.get('job_id', 'N/A')} | Vessel {vessel_id}: {pressure:.1f}")
               print(f"   Total fields: {len(vessel) if vessel else 0}")

               # DYNAMIC: Show computed vars (anything not in original fields list)
               original_fields = data.get('fields', [])
               computed_fields = {k: v for k, v in vessel.items()
                                 if k not in original_fields and isinstance(v, (int, float))}

               for field, value in list(computed_fields.items())[:3]:
                   print(f"   {field}: {value:.0f}")

               print(json.dumps(data))
               return json.dumps(data)


           @app.post("/api/v1/scada_read_stop")
           def stop_vessel_read():
               if sg.read_job:
                   sg.read_job["stop"] = True
               return {"message": "Stop signal sent"}

           @app.get("/api/v1/scada_status")
           def status():
               return {
                   "running": sg.read_job is not None and not sg.read_job.get("stop", True) if sg.read_job else False
               }

   ################################# MQTT #############################################################

           @app.post("/api/v1/mqtt_subscribe")
           def start_mqtt_subscribe(req: dict):

            try:
             job_id = str(time.time())
             mqtt_cfg = {
               "broker": req.get("mqtt_broker", ""),
               "port": int(req.get("mqtt_port", "8883")),
               "topic": req.get("mqtt_subscribe_topic", ""),
               "sendtotopic": req.get("sendtotopic",""),
               "username": os.environ.get('MQTTUSERNAME', ''),
               "password": os.environ.get('MQTTPASSWORD', ''),
               "enable_tls": req.get("mqtt_enabletls","1"),
               "VIPERTOKEN": app.config['VIPERTOKEN'],
               "VIPERHOST":  app.config['VIPERHOST'],
               "VIPERPORT": app.config['VIPERPORT'],
               "default_args": default_args,
             }

             with mqtt_lock:  # New lock for MQTT globals (add to scadaglobals.py)
             # Stop existing MQTT thread
               if sg.mqtt_thread and sg.mqtt_thread.is_alive():
                 sg.mqtt_job["stop"] = True
                 sg.mqtt_client.disconnect()
   #              sg.mqtt_thread.join(timeout=2.0)

               sg.mqtt_job = {"stop": False, "job_id": job_id}
               sg.mqtt_thread = threading.Thread(
                  target=mq.mqttserverconnect_threaded,  # Your function, modified below
                  args=(mqtt_cfg, job_id),
                  daemon=False
                )
               sg.mqtt_thread.start()

               # Keep this thread alive as long as the job is running

             return {
               "message": "MQTT subscription started",
               "job_id": job_id
             }

            except Exception as e:
               print("❌ JSON ERROR:", str(e))
               return {"error": f"JSON parse failed: {str(e)}"}
   ####################################################################################################

           @app.post('/api/v1/jsondataline')
           def storejsondataline(jdata: dict):
   #          jdata = request.get_json()
             topic = jdata.get('sendtotopic','')
             jdata = json.dumps(jdata)
             readdata(jdata,VIPERTOKEN,VIPERHOST,VIPERPORT,topic)
             return "ok"

           @app.post('/api/v1/jsondataarray')
           def storejsondataarray(jdata: List[dict]):
   #          jdata = request.get_json()

             for item in jdata:
                topic = item.get('sendtotopic','')
                item = json.dumps(item)
                readdata(item,VIPERTOKEN,VIPERHOST,VIPERPORT,topic)
             return "ok"

   ####################################################################################################
           @app.post('/api/v1/health')
           def tmux_health_check_json() -> Dict[str, Any]:
               def run_tmux(cmd):
                   try:
                       result = subprocess.run(['tmux'] + cmd, capture_output=True, text=True, timeout=10)
                       return result.stdout.strip()
                   except:
                       return ""

               result = {
                   "timestamp": datetime.now().isoformat(),
                   "sessions": [],
                   "summary": {
                       "total_plugin_windows": 0,
                       "error_count": 0,
                       "healthy": True
                   }
               }

               # Get clean session list
               sessions_raw = run_tmux(['ls', '-F', '#{session_name}']) or run_tmux(['list-sessions', '-F', '#{session_name}'])
               sessions = [s.strip() for s in sessions_raw.split('\n') if s.strip()]

               crash_patterns = [r'panic[:\s]', r'fatal\s+error', r'segmentation.*fault',
                                r'SIGSEGV', r'runtime\s+error', r'goroutine\s+panic',
                                r'signal:.*killed', r'signal:.*abrt']

               for session_name in sessions:
                   # ✅ FIX 1: Check if SESSION starts with plugin_
                   is_plugin_session = session_name.startswith('plugin_')
                   session_name_user ="n/a"
                   if is_plugin_session:
                     session_name_user=session_name.split("_")[1]

                   session_data = {
                       "name": session_name,
                       "user_session": session_name_user,
                       "is_plugin_session": is_plugin_session,
                       "plugin_windows": [],
                       "status": "healthy",
                       "plugin_window_count": 0
                   }

                   # Get windows for this session
                   windows_raw = run_tmux(['list-windows', '-t', session_name,
                                          '-F', '#{window_index}:#{window_name}'])
                   windows = [w for w in windows_raw.split('\n') if ':' in w]

                   # ✅ FIX 2: Include ANY window starting with plugin_ OR session is plugin_
                   plugin_windows = []
                   for win in windows:
                       win_index, win_name = win.split(':', 1)
                       # Check if WINDOW starts with plugin_ OR SESSION is plugin_
                       #if win_name.startswith('plugin_') or is_plugin_session:
                       plugin_windows.append((win_index, win_name))

                   # Process plugin windows
                   for win_index, win_name in plugin_windows:
                       result["summary"]["total_plugin_windows"] += 1
                       session_data["plugin_window_count"] += 1

                       pane_content = run_tmux(['capture-pane', '-t', f'{session_name}:{win_index}.0',
                                              '-S', '-1000', '-e', '-q'])

                       crashes = [line.strip() for line in pane_content.split('\n')
                                 if any(re.search(p, line, re.IGNORECASE) for p in crash_patterns)]

                       window_data = {
                           "index": win_index,
                           "name": win_name,
                           "status": "healthy" if not crashes else "crashed",
                           "crash_lines": crashes[:5]
                       }

                       if crashes:
                           result["summary"]["error_count"] += 1
                           session_data["status"] = "unhealthy"
                           result["summary"]["healthy"] = False

                       session_data["plugin_windows"].append(window_data)

                   # ✅ FIX 3: Include ANY session with plugin activity
                   if session_data["plugin_window_count"] > 0 or is_plugin_session:
                       result["sessions"].append(session_data)

               writeviperlogs("INFO",f"{result}",VIPERTOKEN,VIPERHOST,VIPERPORT)

               return result



   ####################################################################################################
           #app.run(port=default_args['rest_port']) # for dev
           if os.environ['TSS']=="0":
             try:
               #http_server = WSGIServer(('', int(default_args['rest_port'])), app)

               uvicorn.run(
                 app,  # Replace 'your_file_name' with actual filename
                 host="0.0.0.0",
                 port=int(default_args['rest_port']),
                 log_level="info",
                 reload=False  # Disable reload in production
               )

             except Exception as e:
              tsslogging.locallogs("ERROR", "STEP 3: Cannot connect to WSGIServer in {} - {}".format(os.path.basename(__file__),e))

              tsslogging.tsslogit("ERROR: Cannot connect to WSGIServer in {}".format(os.path.basename(__file__)), "ERROR" )
    #          tsslogging.git_push("/{}".format(repo),"Entry from {} - {}".format(os.path.basename(__file__),e),"origin")
              print("ERROR: Cannot connect to  WSGIServer")
              writeviperlogs("ERROR",f"Cannot start TML Plugin server: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
              return
           else:
             try:
               print("Listening")
               writeviperlogs("INFO","TML Plugin Server Started",VIPERTOKEN,VIPERHOST,VIPERPORT)
               #http_server = WSGIServer(('', int(default_args['tss_rest_port'])), app)

               uvicorn.run(
                  app,  # Replace 'your_file_name' with actual filename
                  host="0.0.0.0",
                  port=int(default_args['tss_rest_port']),
                  log_level="info",
                  reload=False  # Disable reload in production
               )
             except Exception as e:
              tsslogging.locallogs("ERROR", "STEP 3: Cannot connect to WSGIServer in {} - {}".format(os.path.basename(__file__),e))
              tsslogging.tsslogit("ERROR: Cannot connect to WSGIServer in {}".format(os.path.basename(__file__)), "ERROR" )
   #           tsslogging.git_push("/{}".format(repo),"Entry from {} - {}".format(os.path.basename(__file__),e),"origin")
              print("ERROR: Cannot connect to  WSGIServer")
              writeviperlogs("ERROR",f"Cannot start plugin server: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
              return

           tsslogging.locallogs("INFO", "STEP 3: RESTAPI HTTP Server started ... successfully")
   #        http_server.serve_forever()

        #return [VIPERTOKEN,VIPERHOST,VIPERPORT]

   def readdata(valuedata,VIPERTOKEN, VIPERHOST, VIPERPORT,topic=''):
         args = default_args

         # MAin Kafka topic to store the real-time data
         if topic=='':
           maintopic = args['topics']
         else:
           maintopic = topic

         producerid = args['producerid']
         try:
             producetokafka(valuedata, "", "",producerid,maintopic,"",args,VIPERTOKEN, VIPERHOST, VIPERPORT)
             # change time to speed up or slow down data
             #time.sleep(0.15)
         except Exception as e:
             print(e)
             pass

   def windowname(wtype,sname,dagname):
       randomNumber = random.randrange(10, 9999)
       wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
       with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
         file.writelines("{}\n".format(wn))

       return wn

   def startproducing(**context):
          global VIPERTOKEN, VIPERHOST, VIPERPORT, HTTPADDR
          sd = context['dag'].dag_id
          sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
          pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

          VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
          VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
          VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
          HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

          tsslogging.locallogs("INFO", "STEP 3: producing data started")

          chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

          repo=tsslogging.getrepo()
          if sname != '_mysolution_':
           fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
          else:
            fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

          hs,VIPERHOSTFROM=tsslogging.getip(VIPERHOST)
          ti = context['task_instance']
          ti.xcom_push(key="{}_PRODUCETYPE".format(sname),value='REST')
          ti.xcom_push(key="{}_TOPIC".format(sname),value=default_args['topics'])
          if os.environ['TSS']=="0":
            ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="_{}".format(default_args['rest_port']))
          else:
            ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="_{}".format(default_args['tss_rest_port']))

          ti.xcom_push(key="{}_TSSCLIENTPORT".format(sname),value="_{}".format(default_args['tss_rest_port']))
          ti.xcom_push(key="{}_TMLCLIENTPORT".format(sname),value="_{}".format(default_args['rest_port']))

          ti.xcom_push(key="{}_IDENTIFIER".format(sname),value=default_args['identifier'])
          ti.xcom_push(key="{}_FROMHOST".format(sname),value="{},{}".format(hs,VIPERHOSTFROM))
          ti.xcom_push(key="{}_TOHOST".format(sname),value=VIPERHOST)

          ti.xcom_push(key="{}_PORT".format(sname),value="_{}".format(VIPERPORT))
          ti.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)

          wn = windowname('produce',sname,sd)
          subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
          subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-produce", "ENTER"])
          subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOSTFROM,VIPERPORT[1:]), "ENTER"])

   if __name__ == '__main__':

       if len(sys.argv) > 1:
          if sys.argv[1] == "1":
            VIPERTOKEN = sys.argv[2]
            VIPERHOST = sys.argv[3]
            VIPERPORT = sys.argv[4]
            os.environ['VIPERTOKEN']=VIPERTOKEN
            os.environ['VIPERHOST']=VIPERHOST
            os.environ['VIPERPORT']=VIPERPORT

            gettmlsystemsparams()

8.5.5.7. STEP 3b: Parameter Explanation

Parameter	Explanation
owner	Specify owner for Dag
enabletls	Set to 1, for encrytion, 0 no encryption
microserviceid	If using load balancer set this to the microservice id or else leave blank
producerid	Specifies an identifier name i.e. iotsolution’
topics	Specify name of topic to store data into - Note: This is the raw_data_topic in STEP 2 Dag
identifier	Specify some identifying name for solution i.e. TML solution
tss_rest_port	This is the port for TSS dev testing. You can point your REST API client (rest_port) to match this port.
rest_port	This is the TML solution port. Point your client rest_port to here when running the TML in its own container. The tss_rest_port and rest_port are different numbers but they perform the same use: tss is for DEV, rest is container.
delay	System delay parameter when VIPER streams to Kafka.
topicid	Monitors all device entities. Leave at -999

8.5.5.8. STEP 3b.i: REST API CLIENT

tml-client-RESTAPI-step-3-kafka-producetotopic.py

import requests
import sys
from datetime import datetime
import time
import json

sys.dont_write_bytecode = True

# defining the api-endpoint
rest_port = "9002"  # <<< ***** Change Port to match the Server Rest_PORT
httpaddr = "http:" # << Change to https or http

# Modify the apiroute: jsondataline, or jsondataarray
# 1. jsondataline: You can send One Json message at a time
# 1. jsondatarray: You can send a Json array

apiroute = "jsondataline"

# USE THIS ENDPOINT IF TML RUNNING IN DOCKER CONTAINER
# DOCKER CONTAINER ENDPOINT
#API_ENDPOINT = "{}//localhost:{}/{}".format(httpaddr,rest_port,apiroute)

# USE THIS ENDPOINT IF TML RUNNING IN KUBERNETES
# KUBERNETES ENDPOINT
API_ENDPOINT = "{}//tml.tss/ext/{}".format(httpaddr,apiroute)

def send_tml_data(data):
  # data to be sent to api
  headers = {'Content-type': 'application/json'}
  print(API_ENDPOINT)
  r = requests.post(url=API_ENDPOINT, data=json.dumps(data), headers=headers)

  # extracting response text
  return r.text


def readdatafile(inputfile):

  ##############################################################
  # NOTE: You can send any "EXTERNAL" data through this API
  # It is reading a localfile as an example
  ############################################################

  try:
    file1 = open(inputfile, 'r')
    print("Data Producing to Kafka Started:",datetime.now())
  except Exception as e:
    print("ERROR: Something went wrong ",e)
    return
  k = 0
  while True:
    line = file1.readline()
    line = line.replace(";", " ")
    print("line=",line)
    # add lat/long/identifier
    k = k + 1
    try:
      if line == "":
        #break
        file1.seek(0)
        k=0
        print("Reached End of File - Restarting")
        print("Read End:",datetime.now())
        continue
      ret = send_tml_data(line)
      print(ret)
      # change time to speed up or slow down data
      time.sleep(.1)
    except Exception as e:
      print(e)
      time.sleep(0.1)
      pass

def start():
      inputfile = "IoTData.txt"
      readdatafile(inputfile)

if __name__ == '__main__':
    start()

8.5.5.9. STEP 3b.i: REST API CLIENT: Explanation

The REST API client runs outside the TML solution container. The client api gives you the capability of connecting to your internal systems or devices and stream the data directly to the TML server producer. The TML server producer receives data from REST API client and produces the data to Kafka.

Important

The REST API client runs outside the TML solution container. This is a very simple and convenient way to stream any type of json data from any device in your environment.

Client Core Variables	Explanation
rest_port	This is the same rest_port Json field in STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag
apiroute	This indicates how you are sending your Json message. You have two options: jsondataline: You can send One Json message at a time in each Api call jsondatarray: You can send a Json array in each Api call Note: Your Json must be a valid Json. Just store your json in datajson
API_ENDPOINT	API_ENDPOINT = “http://localhost:{}/{}”.format(rest_port,apiroute) This connects to the endpoint defined in STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag
httpaddr	This adds the http prefix.
readdatafile(inputfile)	This function is only for demo purposes. You can send any data you want using this API.
start()	This function starts the process. Note: You can simply modify this function as you wish repeatly to stream your data.
send_tml_data(data)	This is the main function that streams your data to STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

8.5.5.10. REST API Reference Architecture

8.5.5.11. STEP 3c: Produce Data Using gRPC: tml-read-gRPC-step-3-kafka-producetotopic-dag

 import asyncio
 import signal
 from google.protobuf.json_format import MessageToJson
 from grpc_reflection.v1alpha import reflection
 import maadstml
 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import grpc
 from concurrent import futures
 import time
 import tml_grpc_pb2_grpc as pb2_grpc
 import tml_grpc_pb2 as pb2

 import tsslogging
 import sys
 import os
 import subprocess
 import random
 import json
 import nest_asyncio
 nest_asyncio.apply()
 #from grpc.experimental import aio
 sys.dont_write_bytecode = True
 ##################################################  gRPC SERVER ###############################################
 # This is a gRPCserver that will handle connections from a client
 # There are two endpoints you can use to stream data to this server:
 # 1. jsondataline -  You can POST a single JSONs from your client app. Your json will be streamed to Kafka topic.
 # 2. jsondataarray -  You can POST JSON arrays from your client app. Your json will be streamed to Kafka topic.

 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice', # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '', # <<< ***** leave blank
   'producerid' : 'iotsolution',  # <<< *** Change as needed
   'topics' : 'iot-raw-data', # *************** This is one of the topic you created in SYSTEM STEP 2
   'identifier' : 'TML solution',  # <<< *** Change as needed
   'tss_gRPC_Port' : '9001',  # <<< ***** replace with gRPC port i.e. this gRPC server listening on port 9001
   'gRPC_Port' : '9002',  # <<< ***** replace with gRPC port i.e. this gRPC server listening on port 9001
   'delay' : '7000', # << ******* 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'topicid' : '-999', # <<< ********* do not modify
 }

 ######################################## DO NOT MODIFY BELOW #############################################


 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""
 VIPERHOSTFROM=""


 class TmlprotoService(pb2_grpc.TmlprotoServicer):

   def __init__(self, *args, **kwargs):
     pass

   async def GetServerResponse(self, request, context):

     maintopic = default_args['topics']
     producerid = default_args['producerid']


     if request != None:
      try:
       message = json.dumps(json.loads(request.message))
       inputbuf=f"{message}"
       print("inputbuf=",inputbuf)

       topicid=default_args['topicid']

      # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topi> delay=int(args['delay'])
       enabletls = int(default_args['enabletls'])
       identifier = default_args['identifier']
       delay = int(default_args['delay'])
       try:
         result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,'',
                                             topicid,identifier)
         return pb2.MessageResponse(message="Success producing message",received=True)
       except Exception as e:
         return pb2.MessageResponse(message="Failed to produce message, err={} message={}".format(e,inputbuf),received=False)
      except Exception as e:
       return pb2.MessageResponse(message="Failed to produce message, err={} message={}".format(e,inputbuf),received=False)


     return pb2.MessageResponse(message="Failed to produce message",received=False)

 async def serve():


     tsslogging.locallogs("INFO", "STEP 3: producing data started")
     repo=tsslogging.getrepo()
     tsslogging.tsslogit("gRPC producing DAG in {}".format(os.path.basename(__file__)), "INFO" )
     tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
     mainport=0
     server_options = [
         ("grpc.keepalive_time_ms", 20000),
         ("grpc.keepalive_timeout_ms", 10000),
         ("grpc.http2.min_ping_interval_without_data_ms", 5000),
         ("grpc.max_connection_idle_ms", 10000),
         ("grpc.max_connection_age_ms", 30000),
         ("grpc.max_connection_age_grace_ms", 5000),
         ("grpc.http2.max_pings_without_data", 5),
         ("grpc.keepalive_permit_without_calls", 1),
     ]

     try:
         server = grpc.aio.server(futures.ThreadPoolExecutor(),options=server_options)
 #        server = grpc.server(futures.ThreadPoolExecutor(max_workers=100))
         SERVICE_NAMES = (
           pb2.DESCRIPTOR.services_by_name["Tmlproto"].full_name,
           reflection.SERVICE_NAME,
         )
         reflection.enable_server_reflection(SERVICE_NAMES, server)

         pb2_grpc.add_TmlprotoServicer_to_server(TmlprotoService(), server)
         if os.environ['TSS']=="0":
 #          server_creds = grpc.alts_server_credentials()
           with open('/{}/tml-airflow/certs/server.key'.format(repo), 'rb') as f:
             server_key = f.read()
           with open('/{}/tml-airflow/certs/server.crt'.format(repo), 'rb') as f:
            server_cert = f.read()
           server_creds = grpc.ssl_server_credentials( [(server_key, server_cert)] )
           mainport=int(default_args['gRPC_Port'])
           server.add_secure_port("[::]:{}".format(int(default_args['gRPC_Port'])), server_creds)

         else:
           server.add_insecure_port("[::]:{}".format(int(default_args['tss_gRPC_Port'])))
           mainport=int(default_args['tss_gRPC_Port'])
     except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 3: Cannot connect to gRPC server in {} - {}".format(os.path.basename(__file__),e))

            tsslogging.tsslogit("ERROR: Cannot connect to gRPC server in {} - {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
            print("ERROR: Cannot connect to gRPC server in:",e)
            return

     tsslogging.locallogs("INFO", "STEP 3: gRPC server started .. waiting for connections")
     await server.start()
     print("gRPC server started - listening on port ",mainport)
     await server.wait_for_termination()

 async def shutdown_server(server) -> None:
     #logging.info ("Shutting down server...")
     await server.stop(None)

 def handle_sigterm(sig, frame) -> None:
     asyncio.create_task(shutdown_server(server))

 async def handle_sigint() -> None:
     loop = asyncio.get_running_loop()
     for sig in (signal.SIGINT, signal.SIGTERM):
         loop.add_signal_handler(sig, loop.stop)

 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startproducing(**context):
        global VIPERTOKEN
        global VIPERHOST
        global VIPERPORT
        global HTTPADDR
        global VIPERHOSTFROM

        tsslogging.locallogs("INFO", "STEP 3: producing data started")

        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))                                                 VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
        repo=tsslogging.getrepo()

        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        hs,VIPERHOSTFROM=tsslogging.getip(VIPERHOST)
        ti = context['task_instance']
        ti.xcom_push(key="{}_PRODUCETYPE".format(sname),value='gRPC')
        ti.xcom_push(key="{}_TOPIC".format(sname),value=default_args['topics'])

        if os.environ['TSS']=="0":
         ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="_{}".format(default_args['gRPC_Port']))
        else:
         ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="_{}".format(default_args['tss_gRPC_Port']))

        ti.xcom_push(key="{}_TSSCLIENTPORT".format(sname),value="_{}".format(default_args['tss_gRPC_Port']))
        ti.xcom_push(key="{}_TMLCLIENTPORT".format(sname),value="_{}".format(default_args['gRPC_Port']))

        ti.xcom_push(key="{}_IDENTIFIER".format(sname),value=default_args['identifier'])

        ti.xcom_push(key="{}_FROMHOST".format(sname),value="{},{}".format(hs,VIPERHOSTFROM))
        ti.xcom_push(key="{}_TOHOST".format(sname),value=VIPERHOST)

        ti.xcom_push(key="{}_PORT".format(sname),value=VIPERPORT)
        ti.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)

        wn = windowname('produce',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-produce", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOSTFROM,VIPERPORT[1:]), "ENTER"])

        tsslogging.locallogs("INFO", "STEP 3: producing data completed")

 if __name__ == '__main__':

     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
          VIPERTOKEN = sys.argv[2]
          VIPERHOST = sys.argv[3]
          VIPERPORT = sys.argv[4]
 #         serve()

          server = None
          signal.signal(signal.SIGTERM, handle_sigterm)
          try:
             print("Starting asyncio event loop")
             asyncio.get_event_loop().run_until_complete(serve())
          except KeyboardInterrupt:
            pass

8.5.5.12. STEP 3c: Parameter Explanation

Parameter	Explanation
owner	Specify owner for Dag
enabletls	Set to 1, for encrytion, 0 no encryption
microserviceid	If using load balancer set this to the microservice id or else leave blank
producerid	Specifies an identifier name i.e. iotsolution’
topics	Specify name of topic to store data into - Note: This is the raw_data_topic in STEP 2 Dag
identifier	Specify some identifying name for solution i.e. TML solution
tss_gRPC_port	This is the port for TSS dev testing. You can point your gRPC API client (self.server_port) to match this port.
gRPC_port	This is the TML solution port. Point your client rest_port to here when running the TML in its own container. The tss_gRPC_port and gRPC_port are different numbers but they perform the same use: tss is for DEV, rest is container.
delay	System delay parameter when VIPER streams to Kafka.
topicid	Monitors all device entities. Leave at -999

8.5.5.13. STEP 3c.i: gRPC API CLIENT

tml_client_gRPC_step_3_kafka_producetotopic.py

 import grpc
 import tml_grpc_pb2_grpc as pb2_grpc
 import tml_grpc_pb2 as pb2
 import sys
 from datetime import datetime
 import time
 import os
 import subprocess
 import base64
 import json
 # Set kubernetes = 1 if TML solution running in kubernetes
 # Set kubernetes = 0 if TML solution running in docker
 import warnings
 #warnings.filterwarnings("error")

 #host='tml.tss:443' #- use this if using Kubernetes
 host='127.0.01:9002'  #- use this if running TML in standalong docker container

 sys.dont_write_bytecode = True

 # NOTE YOU WILL NEED TO INSTALL grpcurl in Linux

 def sendgrpcurl(mjson):
     #first encode the json
     mainjson = '{"message":' + json.dumps(mjson) + '}'

    # mainjson=pb2.Message(message=mjson)
     sent=0
     while sent==0:
             cmd="grpcurl -insecure -keepalive-time 10 -import-path . -proto tml_grpc.proto -d '{}' {} tmlproto.Tmlproto/GetServerResponse 2>/dev/null".format(mainjson,host)
            # print("CMD=",cmd.replace("\n",""))
             cmd=cmd.replace("\n","")
             print(cmd)
             proc = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE)
             out, err = proc.communicate()
             proc.terminate()
             proc.wait()

             if out.decode('utf-8')=="":
                sent=0
             else:
                print(out.decode('utf-8'))
                sent=1
                break


 def readdata(inputfile):

       ##############################################################
       # NOTE: You can send any "EXTERNAL" data through this API
       # It is reading a localfile as an example
       ############################################################

       try:
         file1 = open(inputfile, 'r')
         print("Data Producing to Kafka Started:",datetime.now())
       except Exception as e:
         print("ERROR: Something went wrong ",e)
         return
       k = 0
       while True:
         line = file1.readline()
         line = line.replace(";", " ")
     #    print("line2=",line)
         # add lat/long/identifier
         k = k + 1
         try:
           if line == "":
             #break
             file1.seek(0)
             k=0
             print("Reached End of File - Restarting")
             print("Read End:",datetime.now())
             continue
           sendgrpcurl(line.rstrip())
           time.sleep(.0)
         except Exception as e:
           print("Main loop error=",e)
           time.sleep(.5)
           pass

 if __name__ == '__main__':
     try:

       inputfile = "IoTData.txt"
       #result = readdata(inputfile) ##### UNCOMMENT TO READ FILE
       print(f'{result}')
     except Exception as e:
       print("ERROR: ",e)

8.5.5.14. STEP 3c.i: gRPC API CLIENT: Explanation

The gRPC API client runs outside the TML solution container. The client api gives you the capability of connecting to your internal systems or devices and stream the data directly to the TML server producer. The TML server producer receives data from gRPC API client and produces the data to Kafka.

Important

The gRPC API client runs outside the TML solution container. This is a very simple and convenient way to stream any type of json data from any device in your environment.

Client Core Variables	Explanation
gRPC imports	You will need the gRPC imports: tml_grpc_pb2_grpc as pb2_grpc tml_grpc_pb2 as pb2 tml_grpc.proto Simply download and place these files in the same folder as your gRPC client.
grpcurl	The client library makes grpcurl calls to the TML server through NGINX secure proxy on port 443. You must have the grpcurl tool installed: see Using gRPcurl to Write Data to the TML gRPC Server
connection parameters	You need to set: self.host = ‘tml.tss’ self.server_port = 443 This the gRPC_port in STEP 3c: Produce Data Using gRPC: tml-read-gRPC-step-3-kafka-producetotopic-dag
sendgrpcurl	You put your Json message here in line. You can send any JSON message using this gRPC client to the gRPC TML server.

8.5.5.15. gRPC Reference Architecture

8.5.5.16. STEP 3d: Produce Data Using LOCALFILE: tml-read-LOCALFILE-step-3-kafka-producetotopic-dag

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import json
 import time
 import random
 import threading
 from contextlib import contextmanager
 from contextlib import ExitStack
 import re

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice', # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '', # <<< *** leave blank
   'producerid' : 'iotsolution',   # <<< *** Change as needed
   'topics' : 'iot-raw-data', # *************** This is one of the topic you created in SYSTEM STEP 2
   'identifier' : 'TML solution',   # <<< *** Change as needed
   'inputfile' : '',#'/rawdatademo/cisco_network_data.txt',  # <<< ***** replace ?  to input file name to read. NOTE this data file should be JSON messages per line and stored in the HOST folder mapped to /rawdata folder
   'delay' : '7000', # << ******* 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'topicid' : '-999', # <<< ********* do not modify
   'sleep' : 0.15, # << Control how fast data streams - if 0 - the data will stream as fast as possible - BUT this may cause connecion reset by peer
   'docfolder' : 'mylogs,mylogs2', # You can read TEXT files or any file in these folders that are inside the volume mapped to /rawdata
   'doctopic' : 'rtms-stream-mylogs,rtms-stream-mylogs2',  # This is the topic that will contain the docfolder file data
   'chunks' :3000, # if 0 the files in docfolder are read line by line, otherwise they are read by chunks i.e. 512
   'docingestinterval' : 0, # specify the frequency in seconds to read files in docfolder - if 0 the files are read ONCE
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 # This sets the lat/longs for the IoT devices so it can be map
 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""

 def read_in_chunks(file_object, chunk_size=1024):
     """Lazy function (generator) to read a file piece by piece.
     Default chunk size: 1k."""
     while True:
         try:
           if chunk_size != 0:
             data = file_object.read(chunk_size).decode('utf-8')
             if len(data)>0 and data[-1] != ' ':
                  ct=0
                  for c in reversed(data):
                    if c == ' ':
                         break
                    ct = ct +1
                  if ct < len(data):
                    file_object.seek(file_object.tell()-ct)
                    data = data[:len(data)-ct]
           else:
             data = file_object.readline().decode('utf-8')
           data=data.replace('"','').replace("'","").replace("\\n"," ").replace('\n'," ").replace("\\r"," ").replace('\r'," ").replace(';'," ").replace('&'," ").strip()
           if not data:
                break
           yield data
         except Exception as e:
            break

 def readallfiles(fd,tr,cs=1024):
   args=default_args
   producerid='userfilestream'
   print("fd=",fd.name)
   for piece in read_in_chunks(fd,cs):
         piece=re.sub(' +', ' ', piece)
         pj='{"RTMSMessage":"' + piece + '"}'

         producetokafka(pj, "", "",producerid,tr,"",args)
   return []

 def ingestfiles():
     args = default_args
     buf = default_args['docfolder']
     chunks = int(default_args['chunks'])
     maintopic = default_args['doctopic']
     producerid='userfilestream'
     interval=int(default_args['docingestinterval'])

     #gather files in the folders
     dirbuf = buf.split(",")
     # check if user wants to split folders to separate topics
     maintopicbuf = maintopic.split(",")
     if len(maintopicbuf) > 1:
       if len(dirbuf) != len(maintopicbuf):
         tsslogging.locallogs("ERROR", "STEP 3: Produce LOCALFILE in {} You specified multiple doctopics, then must match docfolder".format(os.path.basename(__file__)))
         return
     elif len(maintopicbuf) == 1 and len(dirbuf) > 1:
        for i in range(len(dirbuf)-1):
          maintopicbuf.append(maintopic)
     else:
        return

     while True:
        for dr,tr in zip(dirbuf,maintopicbuf):
          filenames = []
          if os.path.isdir("/rawdata/{}".format(dr)):
            a = [os.path.join("/rawdata/{}".format(dr), f) for f in os.listdir("/rawdata/{}".format(dr)) if
            os.path.isfile(os.path.join("/rawdata/{}".format(dr), f))]
            filenames.extend(a)
            print("filename=",filenames)
            if len(filenames) > 0:
              with ExitStack() as stack:
                files = [stack.enter_context(open(i, "rb")) for i in filenames]
                contents = [readallfiles(file,tr,chunks) for file in files]
        if interval==0:
          break
        else:
         time.sleep(interval)

 def startdirread():
   if 'docfolder' not in default_args and 'doctopic' not in default_args and 'chunks' not in default_args and 'docingestinterval' not in default_args:
      return

   if default_args['docfolder'] != '' and default_args['doctopic'] != '':
     print("INFO startdirread")
     try:
       t = threading.Thread(name='child procs', target=ingestfiles)
       t.start()
     except Exception as e:
       print(e)

 def producetokafka(value, tmlid, identifier,producerid,maintopic,substream,args):
  inputbuf=value
  topicid=int(args['topicid'])

  # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
  delay = int(args['delay'])
  enabletls = int(args['enabletls'])
  identifier = args['identifier']

  try:
     result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,substream,
                                         topicid,identifier)
 #    print("result=",result)
  except Exception as e:
     print("ERROR:",e)

 def readdata():

   repo = tsslogging.getrepo()
   tsslogging.tsslogit("Localfile producing DAG in {}".format(os.path.basename(__file__)), "INFO" )
   tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

   args = default_args
   inputfile=args['inputfile']

   # MAin Kafka topic to store the real-time data
   maintopic = args['topics']
   producerid = args['producerid']

   startdirread()

   if maintopic=='' or inputfile=='':
      return
   k=0
   try:
     file1 = open(inputfile, 'r')
     print("Data Producing to Kafka Started:",datetime.now())
   except Exception as e:
     tsslogging.locallogs("ERROR", "Localfile producing DAG in {} - {}".format(os.path.basename(__file__),e))

     tsslogging.tsslogit("Localfile producing DAG in {}".format(os.path.basename(__file__)), "INFO" )
     tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
     return

   tsslogging.locallogs("INFO", "STEP 3: reading local file..successfully")

   while True:
     line = file1.readline()
     line = line.replace(";", " ")
     print("line=",line)
     # add lat/long/identifier
     k = k + 1
     try:
       if line == "":
         #break
         file1.seek(0)
         k=0
         print("Reached End of File - Restarting")
         print("Read End:",datetime.now())
         continue
       producetokafka(line.strip(), "", "",producerid,maintopic,"",args)
       # change time to speed up or slow down data
       time.sleep(args['sleep'])
     except Exception as e:
       print(e)
       pass

   file1.close()

 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startproducing(**context):

   tsslogging.locallogs("INFO", "STEP 3: producing data started")

   sd = context['dag'].dag_id

   sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
   pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))
   VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
   VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
   VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
   HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

   VIPERHOSTFROM=tsslogging.getip(VIPERHOST)
   ti = context['task_instance']
   ti.xcom_push(key="{}_PRODUCETYPE".format(sname),value='LOCALFILE')
   ti.xcom_push(key="{}_TOPIC".format(sname),value=default_args['topics'])
   ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="")
   ti.xcom_push(key="{}_IDENTIFIER".format(sname),value="{},{}".format(default_args['identifier'],default_args['inputfile']))

   ti.xcom_push(key="{}_FROMHOST".format(sname),value=VIPERHOSTFROM)
   ti.xcom_push(key="{}_TOHOST".format(sname),value=VIPERHOST)

   ti.xcom_push(key="{}_TSSCLIENTPORT".format(sname),value="")
   ti.xcom_push(key="{}_TMLCLIENTPORT".format(sname),value="")

   ti.xcom_push(key="{}_PORT".format(sname),value="_{}".format(VIPERPORT))
   ti.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)

   inputfile=default_args['inputfile']
   if 'step3localfileinputfile' in os.environ:
        default_args['inputfile']=os.environ['step3localfileinputfile']
        ti.xcom_push(key="{}_inputfile".format(sname),value=default_args['inputfile'])
   else:
        ti.xcom_push(key="{}_inputfile".format(sname),value=default_args['inputfile'])

   docfolder=''
   if 'docfolder' in default_args and 'doctopic' in default_args:
     docfolder=default_args['docfolder']
     ti.xcom_push(key="{}_docfolder".format(sname),value=default_args['docfolder'])
     ti.xcom_push(key="{}_doctopic".format(sname),value=default_args['doctopic'])
     ti.xcom_push(key="{}_chunks".format(sname),value="_{}".format(default_args['chunks']))
     ti.xcom_push(key="{}_docingestinterval".format(sname),value="_{}".format(default_args['docingestinterval']))
   else:
     ti.xcom_push(key="{}_docfolder".format(sname),value='')
     ti.xcom_push(key="{}_doctopic".format(sname),value='')
     ti.xcom_push(key="{}_chunks".format(sname),value='')
     ti.xcom_push(key="{}_docingestinterval".format(sname),value='')

   if 'step3localfiledocfolder' in os.environ:
        default_args['docfolder']=os.environ['step3localfiledocfolder']
        ti.xcom_push(key="{}_docfolder".format(sname),value=default_args['docfolder'])

   chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

   repo=tsslogging.getrepo()

   if sname != '_mysolution_':
      fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
   else:
      fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

   wn = windowname('produce',sname,sd)
   subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
   subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-produce", "ENTER"])
   subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} \"{}\" \"{}\"".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],inputfile,docfolder), "ENTER"])

 if __name__ == '__main__':

     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
          VIPERTOKEN = sys.argv[2]
          VIPERHOST = sys.argv[3]
          VIPERPORT = sys.argv[4]
          inputfile = sys.argv[5]
          default_args['inputfile']=inputfile
          docfolder = sys.argv[6]
          default_args['docfolder']=docfolder
          readdata()

8.5.5.17. Core Parameter Explanation

Note

The parameters docfolder, doctopic, are needed for https://tml.readthedocs.io/en/latest/tmlbuilds.html#step-4c-preprocesing-3-data-tml-system-step-4c-kafka-preprocess-dag. For details on correlating past information in real-time using sliding time windows, refer to: How TML Maintains Past Memory of Events Using Sliding Time Windows

Parameter	Explanation
inputfile	This is the container path to your local filename. For example, When you start TSS you must do a volume mapping to the /rawdata folder for TSS to read your local file. This is explained below in section: Producing Data Using a Local File
docfolder	You specify a folder name(s) you want TML to read. For example, if docfolder=mylogs, TML assumes container path /rawdata/mylogs that is mapped to your local machine. All TEXT will be read in this folder.
doctopic	This is the Kafka topic that will contain the data from the files in docfolder. NOTE: You can specify different folder names to go to different topic. For example, if doctopic=topic1,topic2, and docfolder=folder1,folder2 TML will stream files in folder1 -> topic1, and files in folder2 -> topic2. This is convenient if you have lots of logs and want to analyse them separately.
chunks	This specifies how to read the files: line by line or in chunks. If chunks=0, the files are read and streamed to Kafka line by line, if chunks=512, then 512 chunks are read and streamed to Kafka.
docingestinterval	This specifies if you want to freuqently read these files in docfolder. If docingestinterval=0, they are read ONCE, if non-zero i.e. docingestinterval=120, they are read every 120 seconds.

8.5.5.18. Producing Data Using a Local File

Important

If you are producing data by reading from a local file, you must ensure that when you run the TSS Docker Run Command that you map a volume on your host system to the rawdata folder in the container; then change the inputfile to /rawdata/<your filename> For example, you need add -v <path to a local folder on your machine>:/rawdata. to the docker run command:

-v /your_localmachine/foldername:/rawdata:z

For example, your TSS Docker Run should look similar to this - replace /your_localmachine/foldername with actual name:

docker run -d --net="host" \
--env CHIP="AMD64" \
--env MAINHOST=127.0.0.1 \
--env TSS=1 \
--env SOLUTIONNAME=TSS \
--env AIRFLOWPORT=9000 \
--env VIPERVIZPORT=9005 \
--env EXTERNALPORT=-1 \
-v /var/run/docker.sock:/var/run/docker.sock:z \
-v /<your local dagsbackup folder>:/dagslocalbackup:z \
-v /your_localmachine/foldername:/rawdata:z \
--env READTHEDOCS='<Token>' \
--env GITREPOURL='<your git hub repo>' \
--env  GITUSERNAME='<your github username>' \
--env GITPASSWORD='<Personal Access Token>' \
--env DOCKERUSERNAME='<your docker hub account>' \
--env DOCKERPASSWORD='<password>' \
--env MQTTUSERNAME='<enter MQTT username>' \
--env MQTTPASSWORD='<enter MQTT password>' \
--env KAFKACLOUDUSERNAME='' \
--env KAFKACLOUDPASSWORD='<Enter your API secret>' \
--env UPDATE=1 \
maadsdocker/tml-solution-studio-with-airflow-amd64

Then,

Add the filename of the file you want to read by updating the ‘inputfile’ : ‘/rawdata/?’ in STEP 3d: Produce Data Using LOCALFILE: tml-read-LOCALFILE-step-3-kafka-producetotopic-dag

8.5.5.19. Local File Reference Architecture

8.5.6. STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag

Note

All preprocess data is also written to “/rawdata/preprocess” folder in the container.

If you mapped the rawdata folder then you can access these files.

8.5.6.1. Preprocessing Types

TML preprocesses real-time data for every entity along each sliding time window. This is quick and powerful way to accelerate insights from real-time data with very little effort. TML provide over 35 different preprocessing types:

Tip

Watch the YouTube on how to configure the parameters in this dag. YouTube Video

Preprocessing Type	Description
anomprob	This will determine the probability that there is an anomaly for each entity in the sliding time windows
anomprobx-y	where X and Y are numbers or “n”, if “n” means examine all anomalies for recurring patterns. This will find the anomalies in the data ignoring set patterns. They allow you to check if the anomaly in the streams are truly anomalies and not some pattern. For example, if a IoT device shuts off and turns on again routinely, this may be picked up as an anomaly when in fact it is normal behaviour. So, to ignore these cases, if ANOMPROB2-5, tells Viper, check anomaly with patterns of 2-5 peaks. If the stream has two classes and these two classes are like 0 and 1000, and show a pattern, then they should not be considered an anomaly. Meaning, class=0, is the device shutting down, class=1000 is the device turning back on. If ANOMPROB3-10, Viper will check for patterns of classes 3 to 10 to see if they recur routinely. This is very helpful to reduce false positives and false negatives.
autocorr	This will determine the autocorrelation in the data for each entity in the sliding time windows
avg	This will determine the average value for each entity in the sliding time windows
std	This will determine the standard deviation value for each entity in the sliding time windows
datacleanstd#_#	This is a powerful function for data cleaning. It uses a Standard Deviation Filter (often referred to as Z-Score filtering). In data science and AI, this is a standard technique used to automatically remove “outliers” or “noise” from a dataset to ensure your model is looking at reliable trends rather than anomalies. It also allows users to eliminate extreme values before the analysis begins. The code defines an “envelope” or a safe zone as: upperLimit: Mean + (Tolerance * StdDev) lowerLimit: Mean - (Tolerance * StdDev) where Tolerance = #, Mean=mean of all data in the sliding time window, StdDev=standard deviation of all data in the sliding time window. For example, if you specify ddatacleanstd3: then TML defines the envelope as: upperLimit: Mean + (3 * StdDev) lowerLimit: Mean - (3 * StdDev) any data point inside this envelope (inclusive) is considered “safe” - any point outside this envelope is consider an outlier or noise and will be removed from analysis. You can specify any reasonable number: datacleanstd5, upperLimit: Mean + (5 * StdDev) lowerLimit: Mean - (5 * StdDev) datacleanstd10, upperLimit: Mean + (10 * StdDev) lowerLimit: Mean - (10 * StdDev) etc. Or, to delete extreme values first you can specify: datacleanstd5_10000, this will delete any value less than -10000 or greater 10000, it will then perform the Z-score filtering. This function ensures you have clean data in your analysis and machine learning/AI.
datacleanmad_#	This is another powerful function for data cleaning. It uses Mean Absolute Deviation (MAD) to clean the data. You can choose to delete extreme values first: i.e. datacleanmad_10000
datacleaniqr_#	This is another powerful function for data cleaning. It uses Inter Quartile Range (IQR) to clean the data. You can choose to delete extreme values first: i.e. datacleaniqr_10000
avgtimediff	This will determine the average time in seconds between the first and last timestamp for each entity in sliding windows; time should be in this layout:2006-01-02T15:04:05.
consistency	This will check if the data all have consistent data types. Returns 1 for consistent data types, 0 otherwise for each entity in sliding windows
count	This will count the number of numeric data points in the sliding time windows for each entity
countstr	This will count the number of string values in the sliding time windows for each entity
cv	This will determine the coefficient of variation average of the median and the midhinge for each entity in sliding windows
dataage_[UTC offset]_[timetype]	dataage can be used to check the last update time of the data in the data stream from current local time. You can specify the UTC offset to adjust the current time to match the timezone of the data stream. You can specify timetype as millisecond, second, minute, hour, day. For example, if dataage_1_minute, then this processtype will compare the last timestamp in the data stream, to the local UTC time offset +1 and compute the time difference between the data stream timestamp and current local time and return the difference in minutes. This is a very powerful processtype for data quality and data assurance programs for any number of data streams.
diff	This will find the difference between the highest and lowest points in the sliding time windows for each entity
diffmargin	This will find the percentage difference between the highest and lowest points in the sliding time windows for each entity
entropy	This will determine the entropy in the data for each entity in the sliding time windows; will compute the amount of information in the data stream.
geodiff	This will determine the distance in kilimetres between two latitude and longitude points for each entity in sliding windows
gm (geometric mean)	This will determine the geometric mean for each entity in sliding windows
hm (harmonic mean)	This will determine the harmonic mean for each entity in sliding windows
iqr	This will compute the interquartile range between Q1 and Q3 for each entity in sliding windows
kurtosis	This will determine the kurtosis for each entity in sliding windows
mad	This will determine the mean absolute deviation for each entity in sliding windows
max	This will determine the maximum value for each entity in the sliding time windows
median	This will find the median of the numeric points in the sliding time windows for each entity
meanci95	returns a 95% confidence interval: mean, low, high for each entity in sliding windows.
meanci99	returns a 99% confidence interval: mean, low, high for each entity in sliding windows.
midhinge	This will determine the average of the first and third quartiles for each entity in sliding windows
min	This will determine the minimum value for each entity in the sliding time windows
outliers	This will find the outliers of the numeric points in the sliding time windows for each entity
outliersx-y	where X and Y are numbers or “n”, if “n” means examine all outliers for recurring patterns. This will find the outliers in the data ignoring set patterns. They allow you to check if the outlier in the streams are truly outliers and not some pattern. For example, if a IoT device shuts off and turns on again routinely, this may be picked up as an outlier when in fact it is normal behaviour. So, to ignore these cases, if OUTLIER2-5, tells Viper, check outliers with patterns of 2-5 peaks. If the stream has two classes and these two classes are like 0 and 1000, and show a pattern, then they should not be considered an outlier. Meaning, class=0, is the device shutting down, class=1000 is the device turning back on. If OUTLIER3-10, Viper will check for patterns of classes 3 to 10 to see if they recur routinely. This is very helpful to reduce false positives and false negatives.
raw	Will not process data stream for each entity in sliding windows.
skewness	This will determine the skewness for each entity in sliding windows
spikedetect	This will determine if there are any spikes in the data using the zscore, using lag = 5, threshold = 3.5 (standard deviation), influence = 0.5, for each entity in sliding windows
sum	This will find the sum of the numeric points in the sliding time windows for each entity
timediff	This will determine, in seconds, the time difference between the first and last timestamp for each entity in sliding windows; time should be in this layout:2006-01-02T15:04:05.
trend	This will determine the trend value for each entity in the sliding time windows. If the trend value is less than zero then data in the sliding time window is decreasing, if trend value is greater than zero then it is increasing.
trimean	This will determine the average of the median and the midhinge for each entity in sliding windows
unique	This will determine if there are unique numeric values in the data for each entity in sliding windows. Returns 1 if no data duplication (unique), 0 otherwise.
uniquestr	This will determine if there are unique string values in the data for each entity in sliding windows. Checks string data for duplication. Returns 1 if no data duplication (unique), 0 otherwise.
variance	This will find the variane of the numeric points in the sliding time windows for each entity
varied	This will determine if there is variation in the data in the sliding time windows for each entity.

8.5.7. Data Cleaning

Ensuring high data quality is critical for machine learning.

Users can autoclean the data using three methods:

Data Cleaning Preprocessing Type	Description
datacleanstd#_#	This is a powerful function for data cleaning. It uses a Standard Deviation Filter (often referred to as Z-Score filtering). In data science and AI, this is a standard technique used to automatically remove “outliers” or “noise” from a dataset to ensure your model is looking at reliable trends rather than anomalies. It also allows users to eliminate extreme values before the analysis begins. The code defines an “envelope” or a safe zone as: upperLimit: Mean + (Tolerance * StdDev) lowerLimit: Mean - (Tolerance * StdDev) where Tolerance = #, Mean=mean of all data in the sliding time window, StdDev=standard deviation of all data in the sliding time window. For example, if you specify ddatacleanstd3: then TML defines the envelope as: upperLimit: Mean + (3 * StdDev) lowerLimit: Mean - (3 * StdDev) any data point inside this envelope (inclusive) is considered “safe” - any point outside this envelope is consider an outlier or noise and will be removed from analysis. You can specify any reasonable number: datacleanstd5, upperLimit: Mean + (5 * StdDev) lowerLimit: Mean - (5 * StdDev) datacleanstd10, upperLimit: Mean + (10 * StdDev) lowerLimit: Mean - (10 * StdDev) etc. Or, to delete extreme values first you can specify: datacleanstd5_10000, this will delete any value less than -10000 or greater 10000, it will then perform the Z-score filtering. This function ensures you have clean data in your analysis and machine learning/AI.
datacleanmad_#	This is another powerful function for data cleaning. It uses Mean Absolute Deviation (MAD) to clean the data. You can choose to delete extreme values first: i.e. datacleanmad_10000
datacleaniqr_#	This is another powerful function for data cleaning. It uses Inter Quartile Range (IQR) to clean the data. You can choose to delete extreme values first: i.e. datacleaniqr_10000

Note

Deleting extreme values could be important because with sensor data one may have very extreme values that may seem normal if the above algorithms have nothing to compare those values against. These extreme values may be due to a sensor malfunction. In this case, deleting extreme values like 999999999 are sensible.

8.5.8. STEP 4: Preprocesing Data Dag: tml-system-step-4-kafka-preprocess-dag

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice',  # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '',  # <<< *** leave blank
   'producerid' : 'iotsolution',   # <<< *** Change as needed
   'raw_data_topic' : 'iot-raw-data', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'preprocess_data_topic' : 'iot-preprocess', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'maxrows' : '800', # <<< ********** Number of offsets to rollback the data stream -i.e. rollback stream by 500 offsets
   'offset' : '-1', # <<< Rollback from the end of the data streams
   'brokerhost' : '',   # <<< *** Leave as is
   'brokerport' : '-999',  # <<< *** Leave as is
   'preprocessconditions' : '', ## <<< Leave blank
   'delay' : '70', # Add a 70 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'array' : '0', # do not modify
   'saveasarray' : '1', # do not modify
   'topicid' : '-999', # do not modify
   'rawdataoutput' : '1', # <<< 1 to output raw data used in the preprocessing, 0 do not output
   'asynctimeout' : '120', # <<< 120 seconds for connection timeout
   'timedelay' : '0', # <<< connection delay
   'tmlfilepath' : '', # leave blank
   'usemysql' : '1', # do not modify
   'streamstojoin' : '', # leave blank
   'identifier' : 'IoT device performance and failures', # <<< ** Change as needed
   'preprocesstypes' : 'anomprob,trend,avg', # <<< **** MAIN PREPROCESS TYPES CHNAGE AS NEEDED refer to https://tml-readthedocs.readthedocs.io/en/latest/
   'pathtotmlattrs' : 'oem=n/a,lat=n/a,long=n/a,location=n/a,identifier=n/a', # Change as needed
   'jsoncriteria' : 'uid=metadata.dsn,filter:allrecords~\
 subtopics=metadata.property_name~\
 values=datapoint.value~\
 identifiers=metadata.display_name~\
 datetime=datapoint.updated_at~\
 msgid=datapoint.id~\
 latlong=lat:long' # <<< **** Specify your json criteria. Here is an example of a multiline json --  refer to https://tml-readthedocs.readthedocs.io/en/latest/
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""

 def processtransactiondata():
  global VIPERTOKEN
  global VIPERHOST
  global VIPERPORT
  global HTTPADDR
  preprocesstopic = default_args['preprocess_data_topic']
  maintopic =  default_args['raw_data_topic']
  mainproducerid = default_args['producerid']

 #############################################################################################################
   #                                    PREPROCESS DATA STREAMS


   # Roll back each data stream by 10 percent - change this to a larger number if you want more data
   # For supervised machine learning you need a minimum of 30 data points in each stream
  maxrows=int(default_args['maxrows'])

   # Go to the last offset of each stream: If lastoffset=500, then this function will rollback the
   # streams to offset=500-50=450
  offset=int(default_args['offset'])
   # Max wait time for Kafka to response on milliseconds - you can increase this number if
   #maintopic to produce the preprocess data to
  topic=maintopic
   # producerid of the topic
  producerid=mainproducerid
   # use the host in Viper.env file
  brokerhost=default_args['brokerhost']
   # use the port in Viper.env file
  brokerport=int(default_args['brokerport'])
   #if load balancing enter the microsericeid to route the HTTP to a specific machine
  microserviceid=default_args['microserviceid']


   # You can preprocess with the following functions: MAX, MIN, SUM, AVG, COUNT, DIFF,OUTLIERS
   # here we will take max values of the arcturus-humidity, we will Diff arcturus-temperature, and average arcturus-Light_Intensity
   # NOTE: The number of process logic functions MUST match the streams - the operations will be applied in the same order
 #
  preprocessconditions=default_args['preprocessconditions']

  # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
  delay=int(default_args['delay'])
  # USE TLS encryption when sending to Kafka Cloud (GCP/AWS/Azure)
  enabletls=int(default_args['enabletls'])
  array=int(default_args['array'])
  saveasarray=int(default_args['saveasarray'])
  topicid=int(default_args['topicid'])

  rawdataoutput=int(default_args['rawdataoutput'])
  asynctimeout=int(default_args['asynctimeout'])
  timedelay=int(default_args['timedelay'])

  jsoncriteria = default_args['jsoncriteria']

  tmlfilepath=default_args['tmlfilepath']
  usemysql=int(default_args['usemysql'])

  streamstojoin=default_args['streamstojoin']
  identifier = default_args['identifier']

  # if dataage - use:dataage_utcoffset_timetype
  preprocesstypes=default_args['preprocesstypes']
  pathtotmlattrs=default_args['pathtotmlattrs']

  try:
     result=maadstml.viperpreprocesscustomjson(VIPERTOKEN,VIPERHOST,VIPERPORT,topic,producerid,offset,jsoncriteria,rawdataoutput,maxrows,enabletls,delay,brokerhost,
                                       brokerport,microserviceid,topicid,streamstojoin,preprocesstypes,preprocessconditions,identifier,
                                       preprocesstopic,array,saveasarray,timedelay,asynctimeout,usemysql,tmlfilepath,pathtotmlattrs)
     #print(result)
     return result
  except Exception as e:
     print(e)
     return e

 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def dopreprocessing(**context):
        tsslogging.locallogs("INFO", "STEP 4: Preprocessing started")
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

        if 'step4raw_data_topic' in os.environ:
          default_args['raw_data_topic']=os.environ['step4raw_data_topic']
        if 'step4preprocesstypes' in os.environ:
            default_args['preprocesstypes']=os.environ['step4preprocesstypes']
        if 'step4jsoncriteria' in os.environ:
            default_args['jsoncriteria']=os.environ['step4jsoncriteria']
        if 'step4preprocess_data_topic'  in os.environ:
            default_args['preprocess_data_topic']=os.environ['step4preprocess_data_topic']

        ti = context['task_instance']
        ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_preprocessconditions".format(sname), value=default_args['preprocessconditions'])
        ti.xcom_push(key="{}_delay".format(sname), value="_{}".format(default_args['delay']))
        ti.xcom_push(key="{}_array".format(sname), value="_{}".format(default_args['array']))
        ti.xcom_push(key="{}_saveasarray".format(sname), value="_{}".format(default_args['saveasarray']))
        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_rawdataoutput".format(sname), value="_{}".format(default_args['rawdataoutput']))
        ti.xcom_push(key="{}_asynctimeout".format(sname), value="_{}".format(default_args['asynctimeout']))
        ti.xcom_push(key="{}_timedelay".format(sname), value="_{}".format(default_args['timedelay']))
        ti.xcom_push(key="{}_usemysql".format(sname), value="_{}".format(default_args['usemysql']))
        ti.xcom_push(key="{}_preprocesstypes".format(sname), value=default_args['preprocesstypes'])
        ti.xcom_push(key="{}_pathtotmlattrs".format(sname), value=default_args['pathtotmlattrs'])
        ti.xcom_push(key="{}_identifier".format(sname), value=default_args['identifier'])
        ti.xcom_push(key="{}_jsoncriteria".format(sname), value=default_args['jsoncriteria'])

        maxrows=default_args['maxrows']
        if 'step4maxrows' in os.environ:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(os.environ['step4maxrows']))
          maxrows=os.environ['step4maxrows']
        else:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(default_args['maxrows']))


        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('preprocess',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {} \"{}\" \"{}\" \"{}\" \"{}\"".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],maxrows,default_args['raw_data_topic'],default_args['preprocesstypes'],default_args['jsoncriteria'],default_args['preprocess_data_topic']), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()
         try:
           tsslogging.tsslogit("Preprocessing DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         maxrows =  sys.argv[5]
         default_args['maxrows'] = maxrows
         default_args['raw_data_topic'] =  sys.argv[6]
         default_args['preprocesstypes'] =  sys.argv[7]
         default_args['jsoncriteria'] =  sys.argv[8]
         default_args['preprocess_data_topic'] =  sys.argv[9]

         tsslogging.locallogs("INFO", "STEP 4: Preprocessing started")

         while True:
           try:
             processtransactiondata()
             time.sleep(1)
           except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 4: Preprocessing DAG in {} {}".format(os.path.basename(__file__),e))
            tsslogging.tsslogit("Preprocessing DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
            break

8.5.8.1. Preprocessed Variable Naming Standard

Important

When a raw variable is processed, TML renames this raw in this standard:

[Variable Name]_processed_[Process Type]

For example, say you want to perform an AnomProb on the variable Voltage. The new preprocessed variable name will be: Voltage_preprocessed_AnomProb

If you want to take the min of Voltage, then the new preprocessed variable name will be: Voltage_preprocessed_Min

This standard naming will be very important when you want to perform machine learning on the “preproccesed” variable.

8.5.8.2. Preprocessed Sample JSON Output

{
 "hyperprediction": "0.980",
 "Maintopic": "iot-preprocess2",
 "Topic": "topicid287_Current_preprocessed_AnomProb_preprocessed_Avg",
 "Type": "External",
 "ProducerId": "ProducerId-OAA--s0Ee-sqUX8QqLfdtivZSKRHoMShBe",
 "TimeStamp": "2024-08-15 19:49:24",
 "Unixtime": 1723751364617162000,
 "kafkakey": "OAA-tFTP8Ym6BHy-bnw2X5XdSUoUSOjns7",
 "Preprocesstype": "Avg",
 "WindowStartTime": "2024-08-15 19:49:08.36546688 +0000 UTC",
 "WindowEndTime": "2024-08-15 19:49:21.600164096 +0000 UTC",
 "WindowStartUnixTime": "1723751348365466880",
 "WindowEndUnixTime": "1723751361600164096",
 "Conditions": "",
 "Identifier": "Current~Current-(mA)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name
  (Current), value:datapoint.value, identifier:metadata.display_name,datetime:datapoint.updated_at,:allrecords, Joinedidentifiers:
  ~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a,IoT device performance and failures~Msgsjoined=dd4dfbbc-7fb3-11ec-e36d-
  28c9ca7b5376(145,34.04893,-111.09373,Current,n/a,n/a,{}); dd781c12-7fb3-11ec-fa99-012971124b46(0,34.04893,-111.09373,Current,n/a,n/a,{});dd94c90c-7fb3-11ec-
  727b-6d558b1c7fe4(0,34.04893,-111.09373,Current,n/a,n/a,{}); ddb6f676-7fb3-11ec-5c48-b5377c00ff05(0,34.04893,-111.09373,Current,n/a,n/a,{});dde3be22-7fb3-
  11ec-4c2e-f10dea945ccd(0,34.04893,-111.09373,Current,n/a,n/a,{}); ddf6a5e6-7fb3-11ec-c25b-509766b7a301(0,34.04893,-111.09373,Current,n/a,n/a,{});de11b6d8-
  7fb3-11ec-77c8-a93cc4b538b6(0,34.04893,-111.09373,Current,n/a,n/a,{}); de2850f0-7fb3-11ec-5b6a-ac3b205641e0(0,34.04893,-111.09373,Current,n/a,n/a,
  {});de405510-7fb3-11ec-bba7-9b0ce93d49d2(0,34.04893,-111.09373,Current,n/a,n/a,{}); de4ee062-7fb3-11ec-3252-
  7c7e46faf86b(0,34.04893,-111.09373,Current,n/a,n/a,{})~latlong=~mainuid=AC000W020496398",
  "PreprocessIdentifier": "IoT Data preprocess",
  "Numberofmessages": 6,
  "Offset": 27041,
  "Consumerid": "StreamConsumer",
  "Generated": "2024-08-15T19:49:55.619+00:00",
  "Partition": 0
  }

8.5.8.3. Preprocessed Sample JSON Output: Explanations

Important

It will be important to carefully study these fields for the visualization or for other downstream analysis.

JSON Field	Description
hyperprediction	This contains the preprocced value for the Preprocesstype: Avg. In this case, the value is 0.980
Maintopic	This is the topic being consumed: iot-preprocess2
Topic	This is the topic name for the preprocessed variable. For example, topicid287_Current_preprocessed_AnomProb_preprocessed_Avg, means entity id 287 was processed (this number 287 is an internal number associated with device serial number: AC000W020496398)
Type	This is an internal parameter
ProducerId	This is an internal parameter: ProducerId-OAA–s0Ee-sqUX8QqLfdtivZSKRHoMShBe
TimeStamp	This is the UTC timestamp of the calculation creation: 2024-08-15 19:49:24
Unixtime	This is the Unixtime of the calculation: 1723751364617162000
kafkakey	This is the TML Kafka key that identifies it came from TML: OAA-tFTP8Ym6BHy-bnw2X5XdSUoUSOjns7
Preprocesstype	This is the preprocessed type used: Avg
WindowStartTime	This is the start of the sliding time window: 2024-08-15 19:49:08.36546688 +0000 UTC
WindowEndTime	This is the end of the sliding time window: 2024-08-15 19:49:21.600164096 +0000 UTC
WindowStartUnixTime	This is the start of the sliding time window in Unix time: 1723751348365466880
WindowEndUnixTime	This is end of the sliding time window in Unix time: 1723751361600164096
Conditions	This contains any preprocessed conditions
Identifier	This will store all the data using in the Avg calculation of Current variable. It is delimited by “~”. If you parse the “Msgsjoined” field you can get the RAW data: dd4dfbbc-7fb3-11ec-e36d-28c9ca7b5376(145,34.04893, -111.09373,Current,n/a,n/a,{}), the first alphanumeric: dd4dfbbc-7fb3-11ec- e36d-28c9ca7b5376 is the msgis, the second number 145 is the current value used in the calculation, then latitude (34.04893) and logitude (-111.09373), the variable being processed (Current), and any additional information. Another important field is mainuid=AC000W020496398, mainuid is the entity identifier in the UID field of the Json criteria (JSON PROCESSING). In summary, TML processed (took average of) 6 messages from this one device (with DSN=AC000W020496398) for the Current stream, in the sliding time window starting at: 2024-08-15 19:49:08, and ending at: 2024-08-15 19:49:21 “Current~Current-(mA)~iot-preprocess~uid:metadata.dsn, subtopic:metadata.property_name (Current), value:datapoint.value, identifier:metadata.display_name,datetime:datapoint.updated_at, allrecords, Joinedidentifiers: ~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a, IoT device performance and failures~ Msgsjoined=dd4dfbbc-7fb3-11ec-e36d- 28c9ca7b5376(145,34.04893,-111.09373,Current,n/a,n/a,{}); dd781c12-7fb3-11ec-fa99-012971124b46(0,34.04893, -111.09373,Current,n/a,n/a,{});dd94c90c-7fb3- 11ec-727b-6d558b1c7fe4(0,34.04893,-111.09373, Current,n/a,n/a,{}); ddb6f676-7fb3-11ec-5c48- b5377c00ff05(0,34.04893,-111.09373,Current,n/a,n/a, {});dde3be22- 7fb3-11ec-4c2e-f10dea945ccd(0,34.04893,-111.09373, Current,n/a,n/a,{}); ddf6a5e6-7fb3-11ec-c25b- 509766b7a301(0,34.04893,-111.09373,Current,n/a,n/a, {});de11b6d8-7fb3-11ec-77c8-a93cc4b538b6(0,34.04893, -111.09373,Current,n/a,n/a,{}); de2850f0-7fb3-11ec-5b6a- ac3b205641e0(0,34.04893,-111.09373,Current,n/a,n/a, {});de405510-7fb3-11ec-bba7-9b0ce93d49d2(0,34.04893, -111.09373,Current,n/a,n/a,{}); de4ee062-7fb3-11ec-3252- 7c7e46faf86b(0,34.04893,-111.09373,Current, n/a,n/a,{})~latlong=~mainuid=AC000W020496398”,
PreprocessIdentifier	This is the preprocess identifier: IoT Data preprocess
Numberofmessages	This is the number of messages used in the Avg calculation: 6
Offset	This is the Kafka Offset where this message is stored: 27041
Consumerid	This is the id of the consumer: StreamConsumer
Generated	This is the time stamp when this message was consumed: 2024-08-15T19:49:55.619+00:00
Partition	This is the Kafka partition this message was stored in: 0

8.5.9. STEP 4a: Preprocesing Data: tml-system-step-4a-kafka-preprocess-dag

Note

This Step 4a is similar to Step 4b, only difference is it allows for jsoncriteria.

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator

from datetime import datetime
from airflow.decorators import dag, task
import sys
import maadstml
import tsslogging
import os
import subprocess
import time
import random

sys.dont_write_bytecode = True
######################################## USER CHOOSEN PARAMETERS ########################################
default_args = {
  'owner' : 'Sebastian Maurice',  # <<< *** Change as needed
  'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
  'microserviceid' : '',  # <<< *** leave blank
  'producerid' : 'iotsolution',   # <<< *** Change as needed
  'raw_data_topic' : 'rtms-pgpt-ai', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
  'preprocess_data_topic' : 'rtms-pgpt-ai-mitre', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
  'maxrows' : '50', # <<< ********** Number of offsets to rollback the data stream -i.e. rollback stream by 500 offsets
  'offset' : '-1', # <<< Rollback from the end of the data streams
  'brokerhost' : '',   # <<< *** Leave as is
  'brokerport' : '-999',  # <<< *** Leave as is
  'preprocessconditions' : '', ## <<< Leave blank
  'delay' : '70', # Add a 70 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
  'array' : '0', # do not modify
  'saveasarray' : '1', # do not modify
  'topicid' : '-999', # do not modify
  'rawdataoutput' : '1', # <<< 1 to output raw data used in the preprocessing, 0 do not output
  'asynctimeout' : '120', # <<< 120 seconds for connection timeout
  'timedelay' : '0', # <<< connection delay
  'tmlfilepath' : '', # leave blank
  'usemysql' : '1', # do not modify
  'streamstojoin' : '', # Change as needed - THESE VARIABLES ARE CREATED BY TML IN tml_system_step_4_kafka_preprocess2_dag.py
  'identifier' : 'Mitre ATTCK', # <<< ** Change as needed
  'preprocesstypes' : 'avg', # <<< **** MAIN PREPROCESS TYPES CHNAGE AS NEEDED refer to https://tml-readthedocs.readthedocs.io/en/latest/
  'pathtotmlattrs' : 'oem=n/a,lat=n/a,long=n/a,location=n/a,identifier=n/a', # Change as needed
  'jsoncriteria' : 'uid=tactic,filter:allrecords~\
subtopics=technique,technique,technique~\
values=FinalAttackScore,FinalPatternScore,RTMSSCORE~\
identifiers=FinalAttackScore,FinalPatternScore,RTMSSCORE~\
datetime=TimeStamp~\
msgid=Entity,PartitionOffsetFound,NumAttackWindowsFound,NumPatternWindowsFound,SearchEntity,rtmsfolder,CurrentRTMSMAXWINDOW~\
latlong=' # <<< **** Specify your json criteria. Here is an example of a multiline json --  refer to https://tml-readthedocs.readthedocs.io/en/latest/
}

######################################## DO NOT MODIFY BELOW #############################################

VIPERTOKEN=""
VIPERHOST=""
VIPERPORT=""
HTTPADDR=""

def processtransactiondata():
         global VIPERTOKEN
         global VIPERHOST
         global VIPERPORT
         global HTTPADDR
         preprocesstopic = default_args['preprocess_data_topic']
         maintopic =  default_args['raw_data_topic']
         mainproducerid = default_args['producerid']

        #############################################################################################################
          #                                    PREPROCESS DATA STREAMS


          # Roll back each data stream by 10 percent - change this to a larger number if you want more data
          # For supervised machine learning you need a minimum of 30 data points in each stream
         maxrows=int(default_args['maxrows'])

          # Go to the last offset of each stream: If lastoffset=500, then this function will rollback the
          # streams to offset=500-50=450
         offset=int(default_args['offset'])
          # Max wait time for Kafka to response on milliseconds - you can increase this number if
          #maintopic to produce the preprocess data to
         topic=maintopic
          # producerid of the topic
         producerid=mainproducerid
          # use the host in Viper.env file
         brokerhost=default_args['brokerhost']
          # use the port in Viper.env file
         brokerport=int(default_args['brokerport'])
          #if load balancing enter the microsericeid to route the HTTP to a specific machine
         microserviceid=default_args['microserviceid']


          # You can preprocess with the following functions: MAX, MIN, SUM, AVG, COUNT, DIFF,OUTLIERS
          # here we will take max values of the arcturus-humidity, we will Diff arcturus-temperature, and average arcturus-Light_Intensity
          # NOTE: The number of process logic functions MUST match the streams - the operations will be applied in the same order
        #
         preprocessconditions=default_args['preprocessconditions']

         # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
         delay=int(default_args['delay'])
         # USE TLS encryption when sending to Kafka Cloud (GCP/AWS/Azure)
         enabletls=int(default_args['enabletls'])
         array=int(default_args['array'])
         saveasarray=int(default_args['saveasarray'])
         topicid=int(default_args['topicid'])

         rawdataoutput=int(default_args['rawdataoutput'])
         asynctimeout=int(default_args['asynctimeout'])
         timedelay=int(default_args['timedelay'])

         jsoncriteria = default_args['jsoncriteria']

         tmlfilepath=default_args['tmlfilepath']
         usemysql=int(default_args['usemysql'])

         streamstojoin=default_args['streamstojoin']
         identifier = default_args['identifier']

         # if dataage - use:dataage_utcoffset_timetype
         preprocesstypes=default_args['preprocesstypes']

         try:
                result=maadstml.viperpreprocessproducetotopicstream(VIPERTOKEN,VIPERHOST,VIPERPORT,topic,producerid,offset,maxrows,enabletls,delay,brokerhost,
                                                  brokerport,microserviceid,topicid,streamstojoin,preprocesstypes,preprocessconditions,identifier,
                                                  preprocesstopic,jsoncriteria)
                #print(result)
         except Exception as e:
                print("ERROR:",e)

def windowname(wtype,sname,dagname):
    randomNumber = random.randrange(10, 9999)
    wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
    with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
      file.writelines("{}\n".format(wn))

    return wn

def dopreprocessing(**context):
       tsslogging.locallogs("INFO", "STEP 4a: Preprocessing started")
       sd = context['dag'].dag_id
       sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
       pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

       VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
       VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS1".format(sname))
       VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS1".format(sname))
       HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

       chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

       if 'step4ajsoncriteria' in os.environ:
          default_args['jsoncriteria']=os.environ['step4ajsoncriteria']
       if 'step4apreprocesstypes' in os.environ:
          default_args['preprocesstypes']=os.environ['step4apreprocesstypes']
       if 'step4araw_data_topic' in os.environ:
         default_args['raw_data_topic']=os.environ['step4araw_data_topic']
       if 'step4apreprocess_data_topic' in os.environ:
          default_args['preprocess_data_topic']=os.environ['step4apreprocess_data_topic']

       ti = context['task_instance']
       ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])
       ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
       ti.xcom_push(key="{}_preprocessconditions".format(sname), value=default_args['preprocessconditions'])
       ti.xcom_push(key="{}_delay".format(sname), value="_{}".format(default_args['delay']))
       ti.xcom_push(key="{}_array".format(sname), value="_{}".format(default_args['array']))
       ti.xcom_push(key="{}_saveasarray".format(sname), value="_{}".format(default_args['saveasarray']))
       ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
       ti.xcom_push(key="{}_rawdataoutput".format(sname), value="_{}".format(default_args['rawdataoutput']))
       ti.xcom_push(key="{}_asynctimeout".format(sname), value="_{}".format(default_args['asynctimeout']))
       ti.xcom_push(key="{}_timedelay".format(sname), value="_{}".format(default_args['timedelay']))
       ti.xcom_push(key="{}_usemysql".format(sname), value="_{}".format(default_args['usemysql']))
       ti.xcom_push(key="{}_preprocesstypes".format(sname), value=default_args['preprocesstypes'])
       ti.xcom_push(key="{}_pathtotmlattrs".format(sname), value=default_args['pathtotmlattrs'])
       ti.xcom_push(key="{}_identifier".format(sname), value=default_args['identifier'])
       ti.xcom_push(key="{}_jsoncriteria".format(sname), value=default_args['jsoncriteria'])

       maxrows=default_args['maxrows']
       if 'step4amaxrows' in os.environ:
         ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(os.environ['step4amaxrows']))
         maxrows=os.environ['step4amaxrows']
       else:
         ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(default_args['maxrows']))


       repo=tsslogging.getrepo()
       if sname != '_mysolution_':
        fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
       else:
         fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

       wn = windowname('preprocess1',sname,sd)
       subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
       subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess1", "ENTER"])
       subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {} \"{}\" \"{}\" \"{}\" \"{}\"".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],maxrows,default_args['jsoncriteria'],default_args['preprocesstypes'],default_args['raw_data_topic'],default_args['preprocess_data_topic']), "ENTER"])

if __name__ == '__main__':
    if len(sys.argv) > 1:
       if sys.argv[1] == "1":
        repo=tsslogging.getrepo()
        try:
          tsslogging.tsslogit("Preprocessing DAG in {}".format(os.path.basename(__file__)), "INFO" )
          tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
        except Exception as e:
            #git push -f origin main
            os.chdir("/{}".format(repo))
            subprocess.call("git push -f origin main", shell=True)

        VIPERTOKEN = sys.argv[2]
        VIPERHOST = sys.argv[3]
        VIPERPORT = sys.argv[4]
        maxrows =  sys.argv[5]
        default_args['maxrows'] = maxrows

        default_args['jsoncriteria'] =  sys.argv[6]
        default_args['preprocesstypes'] =  sys.argv[7]
        default_args['raw_data_topic'] =  sys.argv[8]
        default_args['preprocess_data_topic'] =  sys.argv[9]

        tsslogging.locallogs("INFO", "STEP 4a: Preprocessing started")

        while True:
          try:
            processtransactiondata()
            time.sleep(1)
          except Exception as e:
           tsslogging.locallogs("ERROR", "STEP 4a: Preprocessing DAG in {} {}".format(os.path.basename(__file__),e))
           tsslogging.tsslogit("Preprocessing DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
           break

8.5.10. STEP 4b: Preprocesing 2 Data: tml-system-step-4b-kafka-preprocess-dag

Tip

Watch the YouTube that discussed how to configure this Dag, used to process preprocessed variables in Step 4. YouTube Video

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice',  # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '',  # <<< *** leave blank
   'producerid' : 'iotsolution',   # <<< *** Change as needed
   'raw_data_topic' : 'iot-preprocess', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'preprocess_data_topic' : 'iot-preprocess2', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'maxrows' : '350', # <<< ********** Number of offsets to rollback the data stream -i.e. rollback stream by 500 offsets
   'offset' : '-1', # <<< Rollback from the end of the data streams
   'brokerhost' : '',   # <<< *** Leave as is
   'brokerport' : '-999',  # <<< *** Leave as is
   'preprocessconditions' : '', ## <<< Leave blank
   'delay' : '70', # Add a 70 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'array' : '0', # do not modify
   'saveasarray' : '1', # do not modify
   'topicid' : '-1', # do not modify
   'rawdataoutput' : '1', # <<< 1 to output raw data used in the preprocessing, 0 do not output
   'asynctimeout' : '120', # <<< 120 seconds for connection timeout
   'timedelay' : '0', # <<< connection delay
   'tmlfilepath' : '', # leave blank
   'usemysql' : '1', # do not modify
   'streamstojoin' : 'Voltage_preprocessed_AnomProb,Current_preprocessed_AnomProb', # Change as needed - THESE VARIABLES ARE CREATED BY TML IN tml_system_step_4_kafka_preprocess2_dag.py
   'identifier' : 'IoT device performance and failures', # <<< ** Change as needed
   'preprocesstypes' : 'avg,avg', # <<< **** MAIN PREPROCESS TYPES CHNAGE AS NEEDED refer to https://tml-readthedocs.readthedocs.io/en/latest/
   'pathtotmlattrs' : 'oem=n/a,lat=n/a,long=n/a,location=n/a,identifier=n/a', # Change as needed
   'jsoncriteria' : '', # <<< **** Specify your json criteria. Here is an example of a multiline json --  refer to https://tml-readthedocs.readthedocs.io/en/latest/
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""

 def processtransactiondata():
          global VIPERTOKEN
          global VIPERHOST
          global VIPERPORT
          global HTTPADDR
          preprocesstopic = default_args['preprocess_data_topic']
          maintopic =  default_args['raw_data_topic']
          mainproducerid = default_args['producerid']

         #############################################################################################################
           #                                    PREPROCESS DATA STREAMS


           # Roll back each data stream by 10 percent - change this to a larger number if you want more data
           # For supervised machine learning you need a minimum of 30 data points in each stream
          maxrows=int(default_args['maxrows'])

           # Go to the last offset of each stream: If lastoffset=500, then this function will rollback the
           # streams to offset=500-50=450
          offset=int(default_args['offset'])
           # Max wait time for Kafka to response on milliseconds - you can increase this number if
           #maintopic to produce the preprocess data to
          topic=maintopic
           # producerid of the topic
          producerid=mainproducerid
           # use the host in Viper.env file
          brokerhost=default_args['brokerhost']
           # use the port in Viper.env file
          brokerport=int(default_args['brokerport'])
           #if load balancing enter the microsericeid to route the HTTP to a specific machine
          microserviceid=default_args['microserviceid']


           # You can preprocess with the following functions: MAX, MIN, SUM, AVG, COUNT, DIFF,OUTLIERS
           # here we will take max values of the arcturus-humidity, we will Diff arcturus-temperature, and average arcturus-Light_Intensity
           # NOTE: The number of process logic functions MUST match the streams - the operations will be applied in the same order
         #
          preprocessconditions=default_args['preprocessconditions']

          # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
          delay=int(default_args['delay'])
          # USE TLS encryption when sending to Kafka Cloud (GCP/AWS/Azure)
          enabletls=int(default_args['enabletls'])
          array=int(default_args['array'])
          saveasarray=int(default_args['saveasarray'])
          topicid=int(default_args['topicid'])

          rawdataoutput=int(default_args['rawdataoutput'])
          asynctimeout=int(default_args['asynctimeout'])
          timedelay=int(default_args['timedelay'])

          jsoncriteria = default_args['jsoncriteria']

          tmlfilepath=default_args['tmlfilepath']
          usemysql=int(default_args['usemysql'])

          streamstojoin=default_args['streamstojoin']
          identifier = default_args['identifier']

          # if dataage - use:dataage_utcoffset_timetype
          preprocesstypes=default_args['preprocesstypes']

          pathtotmlattrs=default_args['pathtotmlattrs']

          try:
                 result=maadstml.viperpreprocessproducetotopicstream(VIPERTOKEN,VIPERHOST,VIPERPORT,topic,producerid,offset,maxrows,enabletls,delay,brokerhost,
                                                   brokerport,microserviceid,topicid,streamstojoin,preprocesstypes,preprocessconditions,identifier,preprocesstopic)
                 #print(result)
          except Exception as e:
                 print("ERROR:",e)


 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def dopreprocessing(**context):
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS2".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS2".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

        if 'step4bpreprocesstypes' in os.environ:
           default_args['preprocesstypes']=os.environ['step4bpreprocesstypes']

        if 'step4bjsoncriteria' in os.environ:
           default_args['jsoncriteria']=os.environ['step4bjsoncriteria']

        if 'step4braw_data_topic' in os.environ:
           default_args['raw_data_topic']=os.environ['step4braw_data_topic']

        if 'step4bpreprocess_data_topic' in os.environ:
          default_args['preprocess_data_topic']=os.environ['step4bpreprocess_data_topic']

        ti = context['task_instance']
        ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_preprocessconditions".format(sname), value=default_args['preprocessconditions'])
        ti.xcom_push(key="{}_delay".format(sname), value="_{}".format(default_args['delay']))
        ti.xcom_push(key="{}_array".format(sname), value="_{}".format(default_args['array']))
        ti.xcom_push(key="{}_saveasarray".format(sname), value="_{}".format(default_args['saveasarray']))
        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_rawdataoutput".format(sname), value="_{}".format(default_args['rawdataoutput']))
        ti.xcom_push(key="{}_asynctimeout".format(sname), value="_{}".format(default_args['asynctimeout']))
        ti.xcom_push(key="{}_timedelay".format(sname), value="_{}".format(default_args['timedelay']))
        ti.xcom_push(key="{}_usemysql".format(sname), value="_{}".format(default_args['usemysql']))
        ti.xcom_push(key="{}_preprocesstypes".format(sname), value=default_args['preprocesstypes'])
        ti.xcom_push(key="{}_pathtotmlattrs".format(sname), value=default_args['pathtotmlattrs'])
        ti.xcom_push(key="{}_identifier".format(sname), value=default_args['identifier'])
        ti.xcom_push(key="{}_jsoncriteria".format(sname), value=default_args['jsoncriteria'])

        maxrows=default_args['maxrows']
        if 'step4bmaxrows' in os.environ:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(os.environ['step4bmaxrows']))
          maxrows=os.environ['step4bmaxrows']
        else:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(default_args['maxrows']))

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('preprocess2',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess2", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {} \"{}\" \"{}\" \"{}\" \"{}\"".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],maxrows,default_args['preprocesstypes'],default_args['jsoncriteria'],default_args['raw_data_topic'],default_args['preprocess_data_topic']), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()
         try:
           tsslogging.tsslogit("Preprocessing2 DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         maxrows =  sys.argv[5]
         default_args['maxrows'] = maxrows

         default_args['preprocesstypes'] =  sys.argv[6]
         default_args['jsoncriteria'] =  sys.argv[7]
         default_args['raw_data_topic'] =  sys.argv[8]
         default_args['preprocess_data_topic'] =  sys.argv[9]

         tsslogging.locallogs("INFO", "STEP 4b: Preprocessing 2 started")

         while True:
           try:
             processtransactiondata()
             time.sleep(1)
           except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 4b: Preprocessing2 DAG in {} {}".format(os.path.basename(__file__),e))
            tsslogging.tsslogit("Preprocessing2 DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
            break

8.5.11. STEP 4c: Preprocesing 3 Data: tml-system-step-4c-kafka-preprocess-dag

Important

This Step 4c is a very powerful task that will incorporate real-time memory using sliding time windows: for details see How TML Maintains Past Memory of Events Using Sliding Time Windows.

Users can cross-reference entities with TXT files. The advantage of this is now you can incorporate machine learning outputs with TXT files to mesh data together to get a deeper understanding of each entity. This could be important to analyse log files for any search terms that could be unusual like: authentication failures, unknow users, etc.

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random
 import base64
 import threading
 import shutil

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice',  # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '',  # <<< *** leave blank
   'producerid' : 'rtmssolution',   # <<< *** Change as needed
   'raw_data_topic' : 'iot-preprocess', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'preprocess_data_topic' : 'rtms-preprocess', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'maxrows' : '200', # <<< ********** Number of offsets to rollback the data stream -i.e. rollback stream by 500 offsets
   'offset' : '-1', # <<< Rollback from the end of the data streams
   'brokerhost' : '',   # <<< *** Leave as is
   'brokerport' : '-999',  # <<< *** Leave as is
   'delay' : '70', # Add a 70 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'array' : '0', # do not modify
   'saveasarray' : '1', # do not modify
   'topicid' : '-999', # do not modify
   'rawdataoutput' : '1', # <<< 1 to output raw data used in the preprocessing, 0 do not output
   'asynctimeout' : '120', # <<< 120 seconds for connection timeout
   'timedelay' : '0', # <<< connection delay
   'tmlfilepath' : '', # leave blank
   'usemysql' : '1', # do not modify
   'rtmsstream' : 'rtms-stream-mylogs', # Change as needed - STREAM containing log file data (or other data) for RTMS
                                                     # If entitystream is empty, TML uses the preprocess type only.
   'identifier' : 'RTMS Past Memory of Events', # <<< ** Change as needed
   'searchterms' : 'rgx:p([a-z]+)ch ~~~ |authentication failure,--entity-- password failure ~~~ |unknown--entity--', # main Search terms, if AND add @, if OR use | s first characters, default OR
                                                              # Must include --entity-- if correlating with entity - this will be replaced
                                                              # dynamically with the entities found in raw_data_topic
   'localsearchtermfolder': '|mysearchfile1,|mysearchfile2', # Specify a folder of files containing search terms - each term must be on a new line - use comma
                                # to apply each folder to the rtmstream topic
                                # Use @ =AND, |=OR to specify whether the terms in the file should be AND, OR
                                # For example, @mysearchfolder1,|mysearchfolder2, means all terms in mysearchfolder1 should be AND
                                # |mysearchfolder2, means all search terms should be OR'ed
   'localsearchtermfolderinterval': '60', # This is the number of seconds between reading the localsearchtermfolder.  For example, if 60,
                                        # The files will be read every 60 seconds - and searchterms will be updated
   'rememberpastwindows' : '500', # Past windows to remember
   'patternwindowthreshold' : '30', # check for the number of patterns for the items in searchterms
   'rtmsscorethreshold': '0.6',  # RTMS score threshold i.e. '0.8'
   'rtmsscorethresholdtopic': 'rtmstopic',   # All rtms score greater than rtmsscorethreshold will be streamed to this topic
   'attackscorethreshold': '0.6',   # Attack score threshold i.e. '0.8'
   'attackscorethresholdtopic': 'attacktopic',   # All attack score greater than attackscorethreshold will be streamed to this topic
   'patternscorethreshold': '0.6',   # Pattern score threshold i.e. '0.8'
   'patternscorethresholdtopic': 'patterntopic',   # All pattern score greater thn patternscorethreshold will be streamed to this topic
   'rtmsfoldername': 'rtms',
   'rtmsmaxwindows': '10000'
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""

 def processtransactiondata():
          global VIPERTOKEN
          global VIPERHOST
          global VIPERPORT
          global HTTPADDR
          preprocesstopic = default_args['preprocess_data_topic']
          maintopic =  default_args['raw_data_topic']
          mainproducerid = default_args['producerid']

         #############################################################################################################
           #                                    PREPROCESS DATA STREAMS


           # Roll back each data stream by 10 percent - change this to a larger number if you want more data
           # For supervised machine learning you need a minimum of 30 data points in each stream
          maxrows=int(default_args['maxrows'])

           # Go to the last offset of each stream: If lastoffset=500, then this function will rollback the
           # streams to offset=500-50=450
          offset=int(default_args['offset'])
           # Max wait time for Kafka to response on milliseconds - you can increase this number if
           #maintopic to produce the preprocess data to
          topic=maintopic
           # producerid of the topic
          producerid=mainproducerid
           # use the host in Viper.env file
          brokerhost=default_args['brokerhost']
           # use the port in Viper.env file
          brokerport=int(default_args['brokerport'])
           #if load balancing enter the microsericeid to route the HTTP to a specific machine
          microserviceid=default_args['microserviceid']

          # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
          delay=int(default_args['delay'])
          # USE TLS encryption when sending to Kafka Cloud (GCP/AWS/Azure)
          enabletls=int(default_args['enabletls'])
          array=int(default_args['array'])
          saveasarray=int(default_args['saveasarray'])
          topicid=int(default_args['topicid'])

          rawdataoutput=int(default_args['rawdataoutput'])
          asynctimeout=int(default_args['asynctimeout'])
          timedelay=int(default_args['timedelay'])
          tmlfilepath=default_args['tmlfilepath']
          usemysql=int(default_args['usemysql'])

          rtmsstream=default_args['rtmsstream']
          identifier = default_args['identifier']
          searchterms=default_args['searchterms']
          rememberpastwindows = default_args['rememberpastwindows']
          patternwindowthreshold = default_args['patternwindowthreshold']

          rtmsscorethreshold = default_args['rtmsscorethreshold']
          rtmsscorethresholdtopic = default_args['rtmsscorethresholdtopic']
          attackscorethreshold = default_args['attackscorethreshold']
          attackscorethresholdtopic = default_args['attackscorethresholdtopic']
          patternscorethreshold = default_args['patternscorethreshold']
          patternscorethresholdtopic = default_args['patternscorethresholdtopic']
          rtmsmaxwindows=default_args['rtmsmaxwindows']

          searchterms = str(base64.b64encode(searchterms.encode('utf-8')))
          try:
                 result=maadstml.viperpreprocessrtms(VIPERTOKEN,VIPERHOST,VIPERPORT,topic,producerid,offset,maxrows,enabletls,delay,brokerhost,
                                                   brokerport,microserviceid,topicid,rtmsstream,searchterms,rememberpastwindows,identifier,
                                                   preprocesstopic,patternwindowthreshold,array,saveasarray,rawdataoutput,
                                                   rtmsscorethreshold,rtmsscorethresholdtopic,attackscorethreshold,
                                                   attackscorethresholdtopic,patternscorethreshold,patternscorethresholdtopic,rtmsmaxwindows)
 #                print(result)
          except Exception as e:
                 print("ERROR:",e)


 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 # add any non-fle search terms to the file search terms
 def updatesearchterms(searchtermsfile,regx):
     # check if search terms exist
     stcurr = default_args['searchterms']
     stcurrfile = searchtermsfile
     mainsearchterms=""

     if len(regx) > 0:
         for r in regx:
            mainsearchterms = mainsearchterms + r + "~~~"

     if stcurr != "":
        stcurrarr = stcurr.split("~~~")
        stcurrarrfile = stcurrfile.split("~~~")
        for a in stcurrarr:
           stcurrarrfile.append(a)
        stcurrarrfile = set(stcurrarrfile)
        mainsearchterms = mainsearchterms + '~~~'.join(stcurrarrfile)
        #mainsearchterms = mainsearchterms[:-1]
     else:
        stcurrarrfile = stcurrfile.split("~~~")
        stcurrarrfile = set(stcurrarrfile)
        mainsearchterms = mainsearchterms + '~~~'.join(stcurrarrfile)
        #mainsearchterms = mainsearchterms[:-1]


     return  mainsearchterms

 def ingestfiles():
     buf = default_args['localsearchtermfolder']
     interval=int(default_args['localsearchtermfolderinterval'])
     searchtermsfile = ""

     dirbuf = buf.split(",")
     if len(dirbuf) == 0:
        return

     while True:
      try:
       lg=""
       buf = default_args['localsearchtermfolder']
       interval=int(default_args['localsearchtermfolderinterval'])
       searchtermsfile = ""
       dirbuf = buf.split(",")
       rgx = []
       for dr in dirbuf:
          filenames = []
          linebuf=""
          ibx = []
          if dr != "":
             if dr[0]=='@':
               dr = dr[1:]
               lg="@"
             elif dr[0]=='|':
               dr = dr[1:]
               lg="|"
             else:
               lg="|"

          if os.path.isdir("/rawdata/{}".format(dr)):
            a = [os.path.join("/rawdata/{}".format(dr), f) for f in os.listdir("/rawdata/{}".format(dr)) if
            os.path.isfile(os.path.join("/rawdata/{}".format(dr), f))]
            filenames.extend(a)

          if len(filenames) > 0:
            filenames = set(filenames)

            for fdr in filenames:
              with open(fdr) as f:
               lines = [line.rstrip('\n').strip() for line in f]
               lines = set(lines)
               # check regex
               for m in lines:
                 if len(m) > 0:
                   if 'rgx:' in m and m[:4]=="rgx:":
                     rgx.append(m)
                   elif '~~~' in m and m[:3]=="~~~":
                     ibx.append(m)
                   else:
                     m=m.replace(",", " ")
                     if m[0] != "~":
                       linebuf = linebuf + m + ","

          if linebuf != "":
            linebuf = linebuf[:-1]
            searchtermsfile = searchtermsfile + lg + linebuf +"~~~"
          if len(ibx)>0:
             ibxs = ''.join(ibx)
             ibxs=ibxs[3:]
             searchtermsfile = searchtermsfile + ibxs +"~~~"

       if searchtermsfile != "":
         searchtermsfile = searchtermsfile[:-3]
         searchtermsfile=updatesearchterms(searchtermsfile,rgx)
         default_args['searchterms']=searchtermsfile
         print("INFO:", searchtermsfile)

       if interval==0:
         break
       else:
        time.sleep(interval)
      except Exception as e:
        print("ERROR: ingesting files:",e)
        continue


 def startdirread():
   if 'localsearchtermfolder' not in default_args:
      return

   if default_args['localsearchtermfolder'] != '' and default_args['localsearchtermfolderinterval'] != '':
     print("INFO startdirread")
     try:
       t = threading.Thread(name='child procs', target=ingestfiles)
       t.start()
     except Exception as e:
       print(e)

 def dopreprocessing(**context):
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS3".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS3".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

        ti = context['task_instance']
        ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_delay".format(sname), value="_{}".format(default_args['delay']))
        ti.xcom_push(key="{}_array".format(sname), value="_{}".format(default_args['array']))
        ti.xcom_push(key="{}_saveasarray".format(sname), value="_{}".format(default_args['saveasarray']))
        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_rawdataoutput".format(sname), value="_{}".format(default_args['rawdataoutput']))
        ti.xcom_push(key="{}_asynctimeout".format(sname), value="_{}".format(default_args['asynctimeout']))
        ti.xcom_push(key="{}_timedelay".format(sname), value="_{}".format(default_args['timedelay']))
        ti.xcom_push(key="{}_usemysql".format(sname), value="_{}".format(default_args['usemysql']))
        ti.xcom_push(key="{}_identifier".format(sname), value=default_args['identifier'])

        ti.xcom_push(key="{}_rtmsscorethresholdtopic".format(sname), value=default_args['rtmsscorethresholdtopic'])
        ti.xcom_push(key="{}_attackscorethresholdtopic".format(sname), value=default_args['attackscorethresholdtopic'])
        ti.xcom_push(key="{}_patternscorethresholdtopic".format(sname), value=default_args['patternscorethresholdtopic'])

        localsearchtermfolder=default_args['localsearchtermfolder']
        if 'step4clocalsearchtermfolder' in os.environ:
          ti.xcom_push(key="{}_localsearchtermfolder".format(sname), value=os.environ['step4clocalsearchtermfolder'])
          localsearchtermfolder=os.environ['step4clocalsearchtermfolder']
        else:
         ti.xcom_push(key="{}_localsearchtermfolder".format(sname), value=default_args['localsearchtermfolder'])

        localsearchtermfolderinterval=default_args['localsearchtermfolderinterval']
        if 'step4clocalsearchtermfolderinterval' in os.environ:
          ti.xcom_push(key="{}_localsearchtermfolderinterval".format(sname), value=os.environ['step4clocalsearchtermfolderinterval'])
          localsearchtermfolderinterval=os.environ['step4clocalsearchtermfolderinterval']
        else:
         ti.xcom_push(key="{}_localsearchtermfolderinterval".format(sname), value="_{}".format(default_args['localsearchtermfolderinterval']))

        rtmsstream=default_args['rtmsstream']
        if 'step4crtmsstream' in os.environ:
          ti.xcom_push(key="{}_rtmsstream".format(sname), value=os.environ['step4crtmsstream'])
          rtmsstream=os.environ['step4crtmsstream']
        else:
          ti.xcom_push(key="{}_rtmsstream".format(sname), value=default_args['rtmsstream'])

        maxrows=default_args['maxrows']
        if 'step4cmaxrows' in os.environ:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(os.environ['step4cmaxrows']))
          maxrows=os.environ['step4cmaxrows']
        else:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(default_args['maxrows']))

        searchterms=default_args['searchterms']
        if 'step4csearchterms' in os.environ:
          ti.xcom_push(key="{}_searchterms".format(sname), value="{}".format(os.environ['step4csearchterms']))
          searchterms=os.environ['step4csearchterms']
        else:
          ti.xcom_push(key="{}_searchterms".format(sname), value=default_args['searchterms'])

        raw_data_topic=default_args['raw_data_topic']
        if 'step4crawdatatopic' in os.environ:
          ti.xcom_push(key="{}_raw_data_topic".format(sname), value="{}".format(os.environ['step4crawdatatopic']))
          raw_data_topic=os.environ['step4crawdatatopic']
        else:
          ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])

        rememberpastwindows=default_args['rememberpastwindows']
        if 'step4crememberpastwindows' in os.environ:
          ti.xcom_push(key="{}_rememberpastwindows".format(sname), value="_{}".format(os.environ['step4crememberpastwindows']))
          rememberpastwindows=os.environ['step4crememberpastwindows']
        else:
          ti.xcom_push(key="{}_rememberpastwindows".format(sname), value="_{}".format(default_args['rememberpastwindows']))

        patternwindowthreshold=default_args['patternwindowthreshold']
        if 'step4cpatternwindowthreshold' in os.environ:
          ti.xcom_push(key="{}_patternwindowthreshold".format(sname), value="_{}".format(os.environ['step4cpatternwindowthreshold']))
          patternwindowthreshold=os.environ['step4cpatternwindowthreshold']
        else:
          ti.xcom_push(key="{}_patternwindowthreshold".format(sname), value="_{}".format(default_args['patternwindowthreshold']))

        rtmsscorethreshold=default_args['rtmsscorethreshold']
        if 'step4crtmsscorethreshold' in os.environ:
          ti.xcom_push(key="{}_rtmsscorethreshold".format(sname), value="_{}".format(os.environ['step4crtmsscorethreshold']))
          rtmsscorethreshold=os.environ['step4crtmsscorethreshold']
        else:
          ti.xcom_push(key="{}_rtmsscorethreshold".format(sname), value="_{}".format(default_args['rtmsscorethreshold']))

        attackscorethreshold=default_args['attackscorethreshold']
        if 'step4cattackscorethreshold' in os.environ:
          ti.xcom_push(key="{}_attackscorethreshold".format(sname), value="_{}".format(os.environ['step4cattackscorethreshold']))
          attackscorethreshold=os.environ['step4cattackscorethreshold']
        else:
          ti.xcom_push(key="{}_attackscorethreshold".format(sname), value="_{}".format(default_args['attackscorethreshold']))

        patternscorethreshold=default_args['patternscorethreshold']
        if 'step4cpatternscorethreshold' in os.environ:
          ti.xcom_push(key="{}_patternscorethreshold".format(sname), value="_{}".format(os.environ['step4cpatternscorethreshold']))
          patternscorethreshold=os.environ['step4cpatternscorethreshold']
        else:
          ti.xcom_push(key="{}_patternscorethreshold".format(sname), value="_{}".format(default_args['patternscorethreshold']))

        rtmsfoldername=default_args['rtmsfoldername']
        if 'step4crtmsfoldername' in os.environ:
          ti.xcom_push(key="{}_rtmsfoldername".format(sname), value="{}".format(os.environ['step4crtmsfoldername']))
          rtmsfoldername=os.environ['step4crtmsfoldername']
        else:
          ti.xcom_push(key="{}_rtmsfoldername".format(sname), value="{}".format(default_args['rtmsfoldername']))
        os.environ["step4crtmsfoldername"] = rtmsfoldername
        try:
          f = open("/tmux/rtmsfoldername.txt", "w")
          f.write(rtmsfoldername)
          f.close()
        except Exception as e:
          pass

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        if 'step4crtmsmaxwindows' in os.environ:
           rtmsmaxwindows=os.environ['step4crtmsmaxwindows']
           default_args['rtmsmaxwindows']=rtmsmaxwindows
        else:
           rtmsmaxwindows = default_args['rtmsmaxwindows']
        ti.xcom_push(key="{}_rtmsmaxwindows".format(sname), value="_{}".format(rtmsmaxwindows))
        try:
          f = open("/tmux/rtmsmax.txt", "w")
          f.write(rtmsmaxwindows)
          f.close()
        except Exception as e:
          pass

        wn = windowname('preprocess3',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess3", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {} \"{}\" {} {} \"{}\" \"{}\" {} {} {} \"{}\" {} \"{}\" {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],maxrows,searchterms,rememberpastwindows,patternwindowthreshold,raw_data_topic,rtmsstream,rtmsscorethreshold,attackscorethreshold,patternscorethreshold,localsearchtermfolder,localsearchtermfolderinterval,rtmsfoldername,rtmsmaxwindows), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()
         try:
           tsslogging.tsslogit("Preprocessing3 DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         maxrows =  sys.argv[5]
         default_args['maxrows'] = maxrows
         subprocess.Popen("/tmux/rtmstrunc.sh", shell=True)

         searchterms =  sys.argv[6]
         default_args['searchterms'] = searchterms
         rememberpastwindows =  sys.argv[7]
         default_args['rememberpastwindows'] = rememberpastwindows
         patternwindowthreshold =  sys.argv[8]
         default_args['patternwindowthreshold'] = patternwindowthreshold
         rawdatatopic =  sys.argv[9]
         default_args['raw_data_topic'] = rawdatatopic
         rtmsstream =  sys.argv[10]
         default_args['rtmsstream'] = rtmsstream

         rtmsscorethreshold =  sys.argv[11]
         default_args['rtmsscorethreshold'] = rtmsscorethreshold
         attackscorethreshold =  sys.argv[12]
         default_args['attackscorethreshold'] = attackscorethreshold
         patternscorethreshold =  sys.argv[13]
         default_args['patternscorethreshold'] = patternscorethreshold

         localsearchtermfolder =  sys.argv[14]
         default_args['localsearchtermfolder'] = localsearchtermfolder
         localsearchtermfolderinterval =  sys.argv[15]
         default_args['localsearchtermfolderinterval'] = localsearchtermfolderinterval
         rtmsfoldername =  sys.argv[16]
         default_args['rtmsfoldername'] = rtmsfoldername
         rtmsmaxwindows =  sys.argv[17]
         default_args['rtmsmaxwindows'] = rtmsmaxwindows

         tsslogging.locallogs("INFO", "STEP 4c: Preprocessing 3 started")
         try:
           shutil.rmtree("/rawdata/{}".format(rtmsfoldername),ignore_errors=True)
         except Exception as e:
            pass

         try:
          directory="/rawdata/{}".format(rtmsfoldername)
          if not os.path.exists(directory):
             os.makedirs(directory)
         except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 4c: Cannot make directory /rawdata/{} in {} {}".format(rtmsfoldername,os.path.basename(__file__),e))

         startdirread()
         while True:
           try:
             processtransactiondata()
             time.sleep(1)
           except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 4c: Preprocessing3 DAG in {} {}".format(os.path.basename(__file__),e))
            tsslogging.tsslogit("Preprocessing3 DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
            break

8.5.11.1. Core Parameters in Step 4c

Parameter	Description
rtmsstream	This is the Kafka topic where you stream your text data in STEP 3: if using local file. Note, if you are directly streaming from LogStash just enter the kafka topic name. You can also separate multiple topics with a comma.
searchterms	These are the search terms you want to look for in the data streaming to rtmsstream. Multiple terms must be separated by comman. To specify AND, the first character must be @, for OR use \|. If you are cross-referencing entities use --entity-- and TML will replace --entity-- with the actual entity in the raw_data_topic. NOTE: if you DO NOT include --entity-- then TML will search the rtmsstream as usual. NOTE: You can specify search terms from different topics using ~~~ THREE (3) times. For example, if rtmsstream=topic1,topic2 and searchterms=search1 ~~~ search2 - then TML will apply search1 to topic1, and search2 to topic2. This is convenient for more complex and varied logs.
rememberpastwindows	This is the number of past sliding time windows you want TML to remember: This is where TML captures memory of past events.
patternwindowthreshold	This is the threshold for patterns in the data. For example if you are looking for ‘authentication failures’ and patternscorethreshold=10, then 10 or more occurences of ‘authentication failures’ will affect the patternscore.
localsearchtermfolder	You can specify folders containing search terms. These are local folders that contain search terms. These local folder must exist under your /rawdata mapping that you did when you started the TSS container: Refer to TSS Docker Run TML will read this folder based on the interval in seconds set in the field localsearchtermfolderinterval This is convenient to update search terms in real-time to manage evolving threats or frequently changing events.
localsearchtermfolderinterval	The number of seconds between reading the search terms files in the localsearchtermfolder. TML RTMS solution will update the search terms in real-time.
rtmsscorethreshold	The score threshold for RTMS i.e. 0.8
rtmsscorethresholdtopic	This topic will contain all messages exceeding rtmsscorethreshold. This is convenient to setup alerts on this topc.
attackscorethreshold	The score threshold for Attack score i.e. 0.8
attackscorethresholdtopic	This topic will contain all messages exceeding attackscorethreshold. This is convenient to setup alerts on this topc.
patternscorethreshold	The score threshold for Pattern score i.e. 0.8
patternscorethresholdtopic	This topic will contain all messages exceeding patternscorethreshold. This is convenient to setup alerts on this topc.
rtmsfoldername	This folder is where RTMS stored the output of the logs files analysed. The rtmsfoldername is a subfolder in the /rawdata TSS container folder: You MUST volume map a local folder name to /rawdata when you start your TSS container. Refer to TSS Docker Run Also refer to RTMS for further details.

Important

Your Log files are ingested in STEP 3: Produce to Kafka. Specifically, in STEP 3:

‘docfolder’ : ‘mylogs,mylogs2’, specifies the subfolders in this example, mylogs and mylogs2 contain your log files.

You can specify different folder names and add as many files in these folder(s), RTMS will automatically read and process them.

For more details refer here.

Tip

You can use RegEX statements in the search terms. This allows you to do build powerful RegEx expressions to filter log files.

If using Regex expressions, you must prefix the expression by rgx:. For example, rgx:p([a-z]+)ch

Regex expressions should be the only statement between ~, this is important if your Regex has a comma.

8.5.12. STEP 5: Entity Based Machine Learning : tml-system-step-5-kafka-machine-learning-dag

Tip

Watch the YouTube video to learn how to configure this Step 5 dag. YouTube Video

8.5.12.1. Entity Based Machine Learning By TML

Another powerful feature of TML is performing machine learning at the entity level. See TML Performs Entity Level Machine Learning and Processing for refresher. For example, if TML is processing real-time data from 1 million IoT devices, it can create 1 million individual machine learnig models for each device. TML uses the following ML algorithms:

Note

All ML data are also written to “/rawdata/ml” folder in the container.

If you mapped the rawdata folder then you can access these files.

Algorithm	Description
Logistic Regression	Performs classification regression and predicts probabilities
Linear Regression	Performs linear regression using OLS algorithm
Gradient Boosting	Gradient boosting for non-linear real-time data
Ridge Regression	Ridge Regression for non-linear real-time data
Neural networks	Neural networks non-linear real-time data

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'myname' : 'Sebastian Maurice',   # <<< *** Change as needed
   'enabletls': '1',   # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '', # <<< *** leave blank
   'producerid' : 'iotsolution',    # <<< *** Change as needed
   'preprocess_data_topic' : 'iot-preprocess', # << *** topic/data to use for training datasets - You created this in STEP 2
   'ml_data_topic' : 'ml-data', # topic to store the trained algorithms  - You created this in STEP 2
   'identifier' : 'TML solution',    # <<< *** Change as needed
   'companyname' : 'Your company', # <<< *** Change as needed
   'myemail' : 'Your email', # <<< *** Change as needed
   'mylocation' : 'Your location', # <<< *** Change as needed
   'brokerhost' : '', # <<< *** Leave as is
   'brokerport' : '-999', # <<< *** Leave as is
   'deploy' : '1', # <<< *** do not modofy
   'modelruns': '100', # <<< *** Change as needed
   'offset' : '-1', # <<< *** Do not modify
   'islogistic' : '1',  # <<< *** Change as needed, 1=logistic, 0=not logistic
   'networktimeout' : '600', # <<< *** Change as needed
   'modelsearchtuner' : '90', # <<< *This parameter will attempt to fine tune the model search space - A number close to 100 means you will have fewer models but their predictive quality will be higher.
   'dependentvariable' : 'failure', # <<< *** Change as needed,
   'independentvariables': 'Power_preprocessed_AnomProb', # <<< *** Change as needed,
   'rollbackoffsets' : '1000', # <<< *** Change as needed,
   'consumeridtrainingdata2': '', # leave blank
   'partition_training' : '',  # leave blank
   'consumefrom' : '',  # leave blank
   'topicid' : '-1',  # leave as is
   'fullpathtotrainingdata' : '/Viper-ml/viperlogs/iotlogistic',  #  # <<< *** Change as needed - add name for foldername that stores the training datasets
   'processlogic' : 'classification_name=failure_prob:Power_preprocessed_AnomProb=55,n',  # <<< *** Change as needed, i.e. classification_name=failure_prob:Voltage_preprocessed_AnomProb=55,n:Current_preprocessed_AnomProb=55,n
   'array' : '0',  # leave as is
   'transformtype' : '', # Sets the model to: log-lin,lin-log,log-log
   'sendcoefto' : '',  # you can send coefficients to another topic for further processing -- MUST BE SET IN STEP 2
   'coeftoprocess' : '', # indicate the index of the coefficients to process i.e. 0,1,2 For example, for a 3 estimated parameters 0=constant, 1,2 are the other estmated paramters
   'coefsubtopicnames' : '',  # Give the coefficients a name: constant,elasticity,elasticity2
   'viperconfigfile' : '/Viper-ml/viper.env', # Do not modify
   'HPDEADDR' : 'http://'
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 # This sets the lat/longs for the IoT devices so it can be map
 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HPDEHOST = ''
 HPDEPORT = ''
 HTTPADDR=""
 maintopic =  default_args['preprocess_data_topic']
 mainproducerid = default_args['producerid']

 def performSupervisedMachineLearning():

       viperconfigfile = default_args['viperconfigfile']
       # Set personal data
       companyname=default_args['companyname']
       myname=default_args['myname']
       myemail=default_args['myemail']
       mylocation=default_args['mylocation']

       # Enable SSL/TLS communication with Kafka
       enabletls=int(default_args['enabletls'])
       # If brokerhost is empty then this function will use the brokerhost address in your
       # VIPER.ENV in the field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
       brokerhost=default_args['brokerhost']
       # If this is -999 then this function uses the port address for Kafka in VIPER.ENV in the
       # field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
       brokerport=int(default_args['brokerport'])
       # If you are using a reverse proxy to reach VIPER then you can put it here - otherwise if
       # empty then no reverse proxy is being used
       microserviceid=default_args['microserviceid']

       #############################################################################################################
       #                         VIPER CALLS HPDE TO PERFORM REAL_TIME MACHINE LEARNING ON TRAINING DATA


       # deploy the algorithm to ./deploy folder - otherwise it will be in ./models folder
       deploy=int(default_args['deploy'])
       # number of models runs to find the best algorithm
       modelruns=int(default_args['modelruns'])
       # Go to the last offset of the partition in partition_training variable
       offset=int(default_args['offset'])
       # If 0, this is not a logistic model where dependent variable is discreet
       islogistic=int(default_args['islogistic'])
       # set network timeout for communication between VIPER and HPDE in seconds
       # increase this number if you timeout
       networktimeout=int(default_args['networktimeout'])

       # This parameter will attempt to fine tune the model search space - a number close to 0 means you will have lots of
       # models but their quality may be low.  A number close to 100 means you will have fewer models but their predictive
       # quality will be higher.
       modelsearchtuner=int(default_args['modelsearchtuner'])

       #this is the dependent variable
       dependentvariable=default_args['dependentvariable']
       # Assign the independentvariable streams
       independentvariables=default_args['independentvariables'] #"Voltage_preprocessed_AnomProb,Current_preprocessed_AnomProb"

       rollbackoffsets=int(default_args['rollbackoffsets'])
       consumeridtrainingdata2=default_args['consumeridtrainingdata2']
       partition_training=default_args['partition_training']
       producerid=default_args['producerid']
       consumefrom=default_args['consumefrom']

       topicid=int(default_args['topicid'])
       fullpathtotrainingdata=default_args['fullpathtotrainingdata']

      # These are the conditions that sets the dependent variable to a 1 - if condition not met it will be 0
       processlogic=default_args['processlogic'] #'classification_name=failure_prob:Voltage_preprocessed_AnomProb=55,n:Current_preprocessed_AnomProb=55,n'

       identifier=default_args['identifier']

       producetotopic = default_args['ml_data_topic']

       array=int(default_args['array'])
       transformtype=default_args['transformtype'] # Sets the model to: log-lin,lin-log,log-log
       sendcoefto=default_args['sendcoefto']  # you can send coefficients to another topic for further processing
       coeftoprocess=default_args['coeftoprocess']  # indicate the index of the coefficients to process i.e. 0,1,2
       coefsubtopicnames=default_args['coefsubtopicnames']  # Give the coefficients a name: constant,elasticity,elasticity2


      # Call HPDE to train the model
       result=maadstml.viperhpdetraining(VIPERTOKEN,VIPERHOST,VIPERPORT,consumefrom,producetotopic,
                                       companyname,consumeridtrainingdata2,producerid, HPDEHOST,
                                       viperconfigfile,enabletls,partition_training,
                                       deploy,modelruns,modelsearchtuner,HPDEPORT,offset,islogistic,
                                       brokerhost,brokerport,networktimeout,microserviceid,topicid,maintopic,
                                       independentvariables,dependentvariable,rollbackoffsets,fullpathtotrainingdata,processlogic,identifier)


 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startml(**context):
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTML".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTML".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))
        HPDEADDR = default_args['HPDEADDR']

        HPDEHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOST".format(sname))
        HPDEPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORT".format(sname))
        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

        ti = context['task_instance']
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_ml_data_topic".format(sname), value=default_args['ml_data_topic'])
        ti.xcom_push(key="{}_modelruns".format(sname), value="_{}".format(default_args['modelruns']))
        ti.xcom_push(key="{}_offset".format(sname), value="_{}".format(default_args['offset']))
        ti.xcom_push(key="{}_islogistic".format(sname), value="_{}".format(default_args['islogistic']))
        ti.xcom_push(key="{}_networktimeout".format(sname), value="_{}".format(default_args['networktimeout']))
        ti.xcom_push(key="{}_modelsearchtuner".format(sname), value="_{}".format(default_args['modelsearchtuner']))
        ti.xcom_push(key="{}_dependentvariable".format(sname), value=default_args['dependentvariable'])
        ti.xcom_push(key="{}_independentvariables".format(sname), value=default_args['independentvariables'])

        rollback=default_args['rollbackoffsets']
        if 'step5rollbackoffsets' in os.environ:
          ti.xcom_push(key="{}_rollbackoffsets".format(sname), value="_{}".format(os.environ['step5rollbackoffsets']))
          rollback=os.environ['step5rollbackoffsets']
        else:
          ti.xcom_push(key="{}_rollbackoffsets".format(sname), value="_{}".format(default_args['rollbackoffsets']))

        processlogic=default_args['processlogic']
        if 'step5processlogic' in os.environ:
          ti.xcom_push(key="{}_processlogic".format(sname), value="{}".format(os.environ['step5processlogic']))
          processlogic=os.environ['step5processlogic']
        else:
          ti.xcom_push(key="{}_processlogic".format(sname), value="{}".format(default_args['processlogic']))

        independentvariables=default_args['independentvariables']
        if 'step5independentvariables' in os.environ:
          ti.xcom_push(key="{}_independentvariables".format(sname), value="{}".format(os.environ['step5independentvariables']))
          independentvariables=os.environ['step5independentvariables']
        else:
          ti.xcom_push(key="{}_independentvariables".format(sname), value="{}".format(default_args['independentvariables']))


        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_consumefrom".format(sname), value=default_args['consumefrom'])
        ti.xcom_push(key="{}_fullpathtotrainingdata".format(sname), value=default_args['fullpathtotrainingdata'])
        ti.xcom_push(key="{}_transformtype".format(sname), value=default_args['transformtype'])
        ti.xcom_push(key="{}_sendcoefto".format(sname), value=default_args['sendcoefto'])
        ti.xcom_push(key="{}_coeftoprocess".format(sname), value=default_args['coeftoprocess'])
        ti.xcom_push(key="{}_coefsubtopicnames".format(sname), value=default_args['coefsubtopicnames'])
        ti.xcom_push(key="{}_HPDEADDR".format(sname), value=HPDEADDR)

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('ml',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-ml", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {}{} {} {} \"{}\" \"{}\"".format(fullpath,VIPERTOKEN, HTTPADDR, VIPERHOST, VIPERPORT[1:], HPDEADDR, HPDEHOST, HPDEPORT[1:],rollback,processlogic,independentvariables), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()
         try:
           tsslogging.tsslogit("Machine Learning DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         HPDEHOST = sys.argv[5]
         HPDEPORT = sys.argv[6]
         rollbackoffsets =  sys.argv[7]
         default_args['rollbackoffsets'] = rollbackoffsets
         processlogic =  sys.argv[8]
         default_args['processlogic'] = processlogic
         independentvariables =  sys.argv[9]
         default_args['independentvariables'] = independentvariables
         subprocess.run("rm -rf {}".format(default_args['fullpathtotrainingdata']), shell=True)

         tsslogging.locallogs("INFO", "STEP 5: Machine learning started")

         while True:
          try:
           performSupervisedMachineLearning()
 #          time.sleep(10)
          except Exception as e:
           tsslogging.locallogs("ERROR", "STEP 5: Machine Learning DAG in {} {}".format(os.path.basename(__file__),e))
           tsslogging.tsslogit("Machine Learning DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
           break

8.5.12.2. Additional Details on Machine Learning

Entity based machine learning is a core function of TML. This section discusses some of key defaul_args in the tml-system-step-5-kafka-machine-learning-dag. These are as follows.

Important

TML generates training algorithms and stores them on disk in the ./models or ./deploy folder, and in the Kafka topic specified in the ml_data_topic default_args json key. TML accesses these trained algorithms, for predictions, automatically for each entity specified by topicid. Everything is managed by the TML binary: Viper (see 1. TML Components: Three Binaries)

TML manages the topicid, which represents individual entities in MariaDB. Note, a topicid is uniquely associated with a primary identifier for the device or entity like its Device Serial Number (DSN). So as data streams from all devices, there must be a json key that indicates a DSN from these devices. TML binary Viper, aggregates data for each DSN and process the data for each device in every sliding time window.

TML generates trained algorithms for each sliding time window. This means, as new real-time data is captured in the sliding time windows, TML re-runs algorithms for this sliding time window to see if there is a better algorithm using the MAPE measure. - If the MAPE in the previous sliding time window is higher than the MAPE on the next windows, the older algorithm will be used in the next window, otherwise TML overwrites the older algorithm with the newer, better, algorithm. NOTE: TML is generating brand new algorithms for sliding windows, it is NOT simply updating the estimated parameters for ONE algorithm, as is common in convetional approaches.

All algorithm are Json serialized files that are less than 1K in size. This makes it very efficient to store millions of algorithms on disk without consuming much storage.

All training and predictions happen in parallel using different instances of the Viper binary.

Here are the core parameters in the above dag 5:

Step 5 DAG parameter	Explanation
modelruns	This instructs HPDE to try to find the best trained algorithms out of many. For example, if modelruns=100, it will iterate over 100 models before it finds the best model out of these 100 models. It will perform hyperparameter tuning as well.
islogistic	TML can do classification and regression. If islogistic=1, then TML assumes the dependent variable is a binary variable with value 1 or 0, otherwise if islogistic=0, then it assumes the dependent variable is continuous.
modelsearchtuner	This parameter will attempt to fine tune the model search space - A number close to 100 means you will have fewer models but their predictive quality will be higher.
dependentvariable	You specify the json path of the dependent variable in your Json message. Refer to Json Path Example. If using preprocessed variables refer to Preprocessed Variable Naming Standard
independentvariables	You must specify the independent variables (separate multiple variables by a comma). Refer to the Json Path Example. If using preprocessed variables refer to Preprocessed Variable Naming Standard
topicid	The topicid is an internal directive for TML. If set to -1, it tell the TML Viper binary to process Json by their unique indentifier. Usually, leaving this at -1 is fine.
fullpathtotrainingdata	You must specify the full path to where the training dataset will be store on disk. The formation for the path is /Viper-ml/viperlogs/<choose foldername>, where you specify the foldername.
processlogic	This is the processlogic needed for the dependent variable if you are estimating a logistic model. Specifically, if the conditions in your logic are TRUE, the dependent variable will be set to 1, otherwise it will be 0. For example, classification_name = failure_prob:Voltage_preprocessed_AnomProb=55, n:Current_preprocessed_AnomProb=55,n means, if the preprocessed variable Voltage_preprocessed_AnomProb is greater than 55, and Current_preprocessed_AnomProb is greater than 55, then set dependent variable failure_prob to 1, otherwise set it to 0; the variable n and -n indicates no upper bound, or lower bound, respectively. if you want less than 55, then use classification_name = failure_prob: Voltage_preprocessed_AnomProb=-n,55: Current_preprocessed_AnomProb=-n,55 Note: classification_name must be specified, the name of the dependent variable failure_prob can be changed to any name you want. Performing real-time logistic regression is a very powerful way to perform probability predictions on real-time data generated by devices.
transformtype	You can specify transformation of your machine learning model by specifying: log-lin, lin-log, log-log log-lin: take log of the dependent variable, and leave the independent variable as is. lin-log: leave the dependent variable as is, but take log of the independent variables. log-log: take log of the dependent variable, and take log of the independent variables.
sendcoefto	You can send the coefficients for each trained model to another Kafka topic. This topic MUST BE SET IN STEP 2.
coeftoprocess	You can specify which coeffients to process i.e. 0,1,2 For example, for 3 estimator parameters 0=constant, 1,2 are the other estmated paramters
coefsubtopicnames	You can give names to the coefficients in your model: constant, elasticity, elasticity2

8.5.12.3. Classification Models: Details on the Processlogic field

Important

If you are estimating a classification model, and want to predict probabilities, then you must define the processlogic field.

The processlogic define the rules to classify the dependent variable into 1 or 0 based on the rules. The table below shows how to

specify these rules for the variables you are using or processed in STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag. We will set rules on the processed variables: Voltage and Current.

Tip

You should refer to Preprocessed Variable Naming Standard to properly specify the names of the processed variables: Voltage and Current If Voltage and Current are processed with anomaly probability processing type (i.e. AnomProb), then the new processed variables for Voltage and Current will be named:

Voltage_preprocessed_AnomProb
Current_preprocessed_AnomProb

Similarly, if processing any variable, this naming standard will apply.

For example, lets breakdown the following rule for prepcoccesed variables Voltage and Current - this rule would be the value of the processlogic field in Dag 5 above:

classification_name=failure_prob:Voltage_preprocessed_AnomProb=55,n : Current_preprocessed_AnomProb=55,n

NOTE: Separate multiple rules by a colon (:). The colon acts as an “AND”. Specifically, if Voltage_preprocessed_AnomProb AND Current_preprocessed_AnomProb both satisfy their rules, then failure_prob is set to 1, otherwise, 0.

Variable/Rule	Upper Bound	Lower Bound	Explanation
classification_name	n/a	n/a	This simply tells TML that this is a classification model
failure_prob	n/a	n/a	This is simply the name for your generated classified variable. You can put any name you want.
Voltage_preprocessed_AnomProb=55,n	n	55	This sets the rule for the Voltage_preprocessed_AnomProb and sets the failure_prob to 1 IF the values of the variable Voltage_preprocessed_AnomProb are between 55 and n, where n signifies no upper bound. If rule was Voltage_preprocessed_AnomProb=55,95, then failure_prob will be 1, if it is between 55 and 95, inclusive.
Current_preprocessed_AnomProb=55,n	n	55	This sets the rule for the Current_preprocessed_AnomProb and sets the failure_prob to 1 IF the values of the variable Current_preprocessed_AnomProb are between 55 and n, where n signifies no upper bound. If rule was Current_preprocessed_AnomProb=55,95, then failure_prob will be 1, if it is between 55 and 95, inclusive.

Important

The 1 and 0’s are then compared between the variables to see if they match. For example, if Voltage_preprocessed_AnomProb AND Current_preprocessed_AnomProb both are 1, then the failure_prob variable is 1, otherwise 0.

Tip

If Current_preprocessed_AnomProb=-n,55, then this rule is if Current_preprocessed_AnomProb is less then 55, then set failure_prob to 1, otherwise 0.

Both -n and n indicate that the variable has NO lower bound or upper bound, respectively. If you want a specific lower and upper bound, just replace -n, and n with exact numbers.

8.5.12.4. Machine Learning Trained Model Sample JSON Output

Below is the JSON output after TML binary: HPDE has performed machine learning using the eal-time data streams.

{
    "Algokey": "StreamConsumer_topicid59_json",
    "Algo": "StreamConsumer_topicid59_jsonlgt",
    "Forecastaccuracy": 0.747,
    "DependentVariable": "failure_prob",
    "Filename": "/Viper-tml/viperlogs/iotlogistic/StreamConsumer_topicid59.csv",
    "Fieldnames": "Date,topicid59_Voltage_preprocessed_AnomProb,topicid59_Current_preprocessed_AnomProb",
    "TestResultsFile": "/Viper-tml/viperlogs/iotlogistic/StreamConsumer_topicid59_json_predictions.csv",
    "Deployed": 1,
    "DeployedTo": "Local Machine Deploy Folder",
    "Created": "2024-08-15T22:05:55.692145224Z",
    "Fullpathtomodels": "/Viper-tml/viperlogs/iotlogistic",
    "Identifier": "Voltage~Line-Voltage-(mV)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name (Voltage),value:datapoint.value,identifier:metadata.display_name,datetime:datapoint.updated_at,:allrecords,Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a,IoT device performance and failures~Msgsjoined=e951b524-7faa-11ec-4107-b4937c8d3c24(120743,51.16569,10.45153,Voltage,n/a,n/a,{});e9870b70-7faa-11ec-7911-7438f38e028a(120929,51.16569,10.45153,Voltage,n/a,n/a,{});e9b56d62-7faa-11ec-d0c0-c3d1d2b8ba2b(120824,51.16569,10.45153,Voltage,n/a,n/a,{})~latlong=~mainuid=AC000W018740175",
    "AccuracyThreshold": 0.51,
    "Minmax": "27.774:82.392,27.592:82.013",
    "MachineLearningAlgorithm": "Logistic Regression",
    "ParameterEstimates": "-2.8284930,0.8076427,2.7328265",
    "HasConstantTerm": 1,
    "Topicid": 59,
    "ConsumeridFrom": "StreamConsumer",
    "Producerid": "StreamProducer",
    "ConsumingFrom": "/Viper-tml/viperlogs/iotlogistic/trainingdata_topicid59_.json",
    "ProduceTo": "iot-trained-params-input",
    "Companyname": "OTICS Advanced Analytics",
    "BrokerhostPort": "127.0.0.1:9092",
    "Islogistic": 1,
    "HPDEHOST": "172.18.0.2:44269",
    "HPDEMACHINENAME": "329e7b30d9b8",
    "Modelruns": 100,
    "ModelSearchTuner": 90,
    "TrainingData_Partition": -1,
    "Transformtype": "",
    "Sendcoefto": "",
    "Coeftoprocess": "",
    "Coefsubtopicnames": "",
    "BytesWritten": 1912,
    "kafkakey": "OAA-KK6EoesoB8KX8mkL17D5y5ejN-N7Le",
    "Numberofmessages": 239,
    "Partition": 0,
    "Offset": 59
}

8.5.12.5. Machine Learning Trained Model Sample JSON Output: Explanations

JSON Field	Description
Algokey	This is the Algoirithm key: StreamConsumer_topicid59_json
Algo	This is the physical algorithm on disk: StreamConsumer_topicid59_jsonlgt
Forecastaccuracy	This is the forecast accuracy using MAPE: 0.747,
DependentVariable	This is the computed discreet dependent variable: failure_prob
Filename	File name of the training dataset: /Viper-tml/viperlogs/iotlogistic/StreamConsumer_topicid59.csv The above path is in the Docker container. You can volume this path to save it on your host machine.
Fieldnames	These are the independent variables: Date, topicid59_Voltage_preprocessed_AnomProb, topicid59_Current_preprocessed_AnomProb
TestResultsFile	A results of the predictions using the test dataset is saved here: /Viper-tml/viperlogs/iotlogistic/StreamConsumer_topicid59_json_predictions.csv
Deployed	Model is deployed to the ./deploy folder if its 1
DeployedTo	It is deployed to: Local Machine Deploy Folder”,
Created	The time the trained algorithm was generated: 2024-08-15T22:05:55.692145224Z
Fullpathtomodels	The full path to the model: /Viper-tml/viperlogs/iotlogistic, the ./models and ./deploy folder are relative to this path
Identifier	Additional information about the data Voltage~Line-Voltage-(mV)~iot-preprocess~uid:metadata.dsn, subtopic:metadata.property_name (Voltage), value:datapoint.value, identifier:metadata.display_name,datetime:datapoint.updated_at, :allrecords,Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a ~location:n/a~identifier:n/a,IoT device performance and failures~Msgsjoined=e951b524-7faa-11ec- 4107-b4937c8d3c24(120743, 51.16569,10.45153,Voltage, n/a,n/a,{});e9870b70-7faa-11ec-7911-7438f38e028a(120929, 51.16569,10.45153,Voltage,n/a,n/a, {});e9b56d62-7faa-11ec-d0c0-c3d1d2b8ba2b(120824,51.16569, 10.45153,Voltage,n/a,n/a,{})~ latlong=~mainuid=AC000W018740175”,
AccuracyThreshold	Accuracy threshold for any must be greater than: 0.51 (or 51%)
Minmax	The normalization of the variables: 27.774:82.392,27.592:82.013
MachineLearningAlgorithm	The machine learning algorithm used: Logistic Regression
ParameterEstimates	The parameter estimates: -2.8284930,0.8076427, 2.7328265
HasConstantTerm	Indicates if it has a constant term: 1 - means it does
Topicid	Internal topicid associated with the uid: 59
ConsumeridFrom	The consumerid: StreamConsumer
Producerid	The producerid: StreamProducer
ConsumingFrom	The physical training dataset file in the container: /Viper-tml/viperlogs/iotlogistic/trainingdata_topicid59_.json
ProduceTo	Topic where the estimated parameters are saved:iot-trained-params-input
Companyname	Your company name
BrokerhostPort	Kafka brokerhostport: 127.0.0.1:9092 using On-Premise Kafka
Islogistic	Indicates if the model is logistic: 1 - means it is
HPDEHOST	Address where HPDE is listening for a connection from Viper: 172.18.0.2:44269
HPDEMACHINENAME	Machine name where the HPDE binary is running: 329e7b30d9b8
Modelruns	Number of models to iterate through before stopping: 100
ModelSearchTuner	Hyper parameter tuner: 90 - closer to 100 means higher quality models
TrainingData_Partition	Ignored
Transformtype	This is the log-lin, lin-log, log-log transformations if any
Sendcoefto	You can send the estimated coefficients to a topic
Coeftoprocess	The coeffienct index to process
Coefsubtopicnames	The names of the coefficients
BytesWritten	The size of this json: 1912
kafkakey	The TML kafka key: OAA-KK6EoesoB8KX8mkL17D5y5ejN-N7Le
Numberofmessages	The number of rows in the training dataset: 239
Partition	The partition where this json is store in kafka: 0
Offset	The offset of this json in Kafka: 59

8.5.13. TML Physical Location of Machine Learning Models

All entity level machine learning models are stored in the container folder specified in fullpathtotrainingdata in Step 5.

Important

Step 6 task uses the trained models in this folder for entity level predictions.

Therefore, in Step 6 below, the pathtoalgos must be the same as fullpathtotrainingdata in Step 5.

There are 5 file outputs from STEP 5 stored in the folder fullpathtotrainingdata. For example, for Entity 53 associated wth DSN:AC000W020485383 here are the output files:

Filename	Description
StreamConsumer_topicid53.csv	Training dataset
StreamConsumer_topicid53_json_.info	Information about the trained algorithm. This is shown below Entity 53 Trained Algorithm Information
StreamConsumer_topicid53_json_predictions.csv	The prediction data using the test data.
StreamConsumer_topicid53_jsonlgt	The ACTUAL alorithm used by Step 6 for predictions. This file is encrypted. This is the MOST important file.
StreamConsumer_topicid53_jsonlgt_.param	Parameter estimates.

8.5.14. Entity 53 Trained Algorithm Information

The JSON below is the information on the trained algorithm: “Algo”: “StreamConsumer_topicid53_jsonlgt”
It’s name is “MachineLearningAlgorithm”: “Logistic Regression”.
The independent variables are in the Fieldnames,
The training dataset is in the filename: /Viper-ml/viperlogs/iotlogistic/StreamConsumer_topicid53.csv

Note that the training dataset is normalizied using minmax scaler. The parameter estimates are in the field: “ParameterEstimates”

{
 "Algokey": "StreamConsumer_topicid53_json",
 "Algo": "StreamConsumer_topicid53_jsonlgt",
 "Forecastaccuracy": 1,
 "DependentVariable": "failure_prob",
 "Filename": "/Viper-ml/viperlogs/iotlogistic/StreamConsumer_topicid53.csv",
 "Fieldnames": "Date,topicid53_Power_preprocessed_AnomProb",
 "TestResultsFile": "/Viper-ml/viperlogs/iotlogistic/StreamConsumer_topicid53_json_predictions.csv",
 "Deployed": 1,
 "DeployedTo": "Local Machine Deploy Folder",
 "Created": "2025-01-19T22:39:58.766388441Z",
 "Fullpathtomodels": "/Viper-ml/viperlogs/iotlogistic",
 "Identifier": "Power~Power-(mW)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name (Power),value:datapoint.value,ide> "AccuracyThreshold": 0.55,
 "Minmax": "27.555:82.016",
 "MachineLearningAlgorithm": "Logistic Regression",
 "ParameterEstimates": "-3.4493501,9.3446499",
 "HasConstantTerm": 1
}

8.5.14.1. How TML Optimizes ML Models and Acheives High Forecast Accuracy

TML uses the binaries Viper and HPDE to optimize ML models for high forecast accuracy. All ML models estimated by Viper and HPDE are applied to data in each sliding time window.

Below describes how TML (Viper/HPDE) optimizes ML models for each sliding time window:

TML processes each sliding time window which can be expanded to increase the model training data sets for ML models

More training data allows TML to learn the patterns effectively, BUT because TML does ALL of this processing IN-MEMORY having too large of a training dataset will slow down TML processing/ML

TML applies several different algorithms to the streaming data:

Algorithm	Description
Logistic Regression	Performs classification regression and predicts probabilities
Linear Regression	Performs linear regression using OLS algorithm
Gradient Boosting	Gradient boosting for non-linear real-time data
Ridge Regression	Ridge Regression for non-linear real-time data
Neural networks	Neural networks non-linear real-time data

TML performs real-time data normalization: All data are put on the same scale, between 0-1 – this prevents large variables (with large numbers) from dominating small variables (with small numbers, like decimals)
TML performs real-time hyper parameter tuning in the algorithms in 2 above. This is IMPORTANT to ensure algorithms are properly calibrated for the best prediction accuracy (algorithm MAPE)
TML performs constant machine learning of the streamed data by constantly trying different algorithms for EVERY sliding time window. This is how TML is able to learn highly complex, NON-LINEAR, data in real-Time. So if the underlying pattern changes in the subsequent sliding time windows, these new patterns will be learned by TML immediately.

8.5.15. STEP 6: Entity Based Predictions: tml-system-step-6-kafka-predictions-dag

Tip

Watch the YouTube video to see how this dag is configured. YouTube Video

Note

All Prediction data are also written to “/rawdata/ml” folder in the container.

If you mapped the rawdata folder then you can access these files.

 import maadstml
 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import tsslogging
 import os
 import subprocess
 import random
 import time

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'myname' : 'Sebastian Maurice',   # <<< *** Change as needed
   'enabletls': '1',   # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '', # <<< *** leave blank
   'producerid' : 'iotsolution',    # <<< *** Change as needed
   'preprocess_data_topic' : 'iot-preprocess', # << *** data for the independent variables - You created this in STEP 2
   'ml_prediction_topic' : 'iot-ml-prediction-results-output', # topic to store the predictions - You created this in STEP 2
   'description' : 'TML solution',    # <<< *** Change as needed
   'companyname' : 'Otics', # <<< *** Change as needed
   'myemail' : 'Your email', # <<< *** Change as needed
   'mylocation' : 'Your location', # <<< *** Change as needed
   'brokerhost' : '', # <<< *** Leave as is
   'brokerport' : '-999', # <<< *** Leave as is
   'streamstojoin' : 'Power_preprocessed_AnomProb', # << ** These are the streams in the preprocess_data_topic for these independent variables
   'inputdata' : '', # << ** You can specify independent variables manually - rather than consuming from the preprocess_data_topic stream
   'consumefrom' : 'ml-data', # << This is ml_data_topic in STEP 5 that contains the estimated parameters
   'mainalgokey' : '', # leave blank
   'offset' : '-1', # << ** input data will start from the end of the preprocess_data_topic and rollback maxrows
   'delay' : '60', # << network delay parameter
   'usedeploy' : '1', # << 1=use algorithms in ./deploy folder, 0=use ./models folder
   'networktimeout' : '6000', # << additional network parameter
   'maxrows' : '50',  # << ** the number of offsets to rollback - For example, if 50, you will get 50 predictions continuously
   'produceridhyperprediction' : '',  # << leave blank
   'consumeridtraininedparams' : '',  # << leave blank
   'groupid' : '',  # << leave blank
   'topicid' : '-1',   # << leave as is
   'pathtoalgos' : '/Viper-ml/viperlogs/iotlogistic', # << this is specified in fullpathtotrainingdata in STEP 5
   'array' : '0', # 0=do not save as array, 1=save as array
   'HPDEADDR' : 'http://' # Do not modify
 }
 ######################################## DO NOT MODIFY BELOW #############################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HPDEHOSTPREDICT=''
 HPDEPORTPREDICT=''
 HTTPADDR=""

 # that is a change 2
 # Set Global variable for Viper confifuration file - change the folder path for your computer
 viperconfigfile="/Viper-predict/viper.env"

 mainproducerid = default_args['producerid']
 maintopic=default_args['preprocess_data_topic']
 predictiontopic=default_args['ml_prediction_topic']


 def performPrediction():


       # Set personal data
       companyname=default_args['companyname']
       myname=default_args['myname']
       myemail=default_args['myemail']
       mylocation=default_args['mylocation']

       # Enable SSL/TLS communication with Kafka
       enabletls=int(default_args['enabletls'])
       # If brokerhost is empty then this function will use the brokerhost address in your
       # VIPER.ENV in the field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
       brokerhost=default_args['brokerhost']
       # If this is -999 then this function uses the port address for Kafka in VIPER.ENV in the
       # field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
       brokerport=int(default_args['brokerport'])
       # If you are using a reverse proxy to reach VIPER then you can put it here - otherwise if
       # empty then no reverse proxy is being used
       microserviceid=default_args['microserviceid']

       description=default_args['description']

       # Note these are the same streams or independent variables that are in the machine learning python file
       streamstojoin=default_args['streamstojoin']  #"Voltage_preprocessed_AnomProb,Current_preprocessed_AnomProb"

       #############################################################################################################
       #                                     START HYPER-PREDICTIONS FROM ESTIMATED PARAMETERS
       # Use the topic created from function viperproducetotopicstream for new data for
       # independent variables
       inputdata=default_args['inputdata']

       # Consume from holds the algorithms
       consumefrom=default_args['consumefrom'] #"iot-trained-params-input"

       # if you know the algorithm key put it here - this will speed up the prediction
       mainalgokey=default_args['mainalgokey']
       # Offset=-1 means go to the last offset of hpdetraining_partition
       offset=int(default_args['offset']) #-1
       # wait 60 seconds for Kafka - if exceeded then VIPER will backout
       delay=int(default_args['delay'])
       # use the deployed algorithm - must exist in ./deploy folder
       usedeploy=int(default_args['usedeploy'])
       # Network timeout
       networktimeout=int(default_args['networktimeout'])
       # maxrows - this is percentage to rollback stream

       if 'step6maxrows' in os.environ:
         maxrows=int(os.environ['step6maxrows'])
       else:
         maxrows=int(default_args['maxrows'])
       #Start predicting with new data streams
       produceridhyperprediction=default_args['produceridhyperprediction']
       consumeridtraininedparams=default_args['consumeridtraininedparams']
       groupid=default_args['groupid']
       topicid=int(default_args['topicid'])  # -1 to predict for current topicids in the stream

       # Path where the trained algorithms are stored in the machine learning python file
       pathtoalgos=default_args['pathtoalgos'] #'/Viper-tml/viperlogs/iotlogistic'
       array=int(default_args['array'])
       ml_prediction_topic = default_args['ml_prediction_topic']

       result6=maadstml.viperhpdepredict(VIPERTOKEN,VIPERHOST,VIPERPORT,consumefrom,ml_prediction_topic,
                                      companyname,consumeridtraininedparams,
                                      produceridhyperprediction, HPDEHOSTPREDICT,inputdata,maxrows,mainalgokey,
                                      -1,offset,enabletls,delay,HPDEPORTPREDICT,
                                      brokerhost,brokerport,networktimeout,usedeploy,microserviceid,
                                      topicid,maintopic,streamstojoin,array,pathtoalgos)



 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startpredictions(**context):

        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREDICT".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREDICT".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))
        HPDEADDR = default_args['HPDEADDR']

        HPDEHOSTPREDICT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOSTPREDICT".format(sname))
        HPDEPORTPREDICT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORTPREDICT".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
        ti = context['task_instance']
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname),value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_ml_prediction_topic".format(sname),value=default_args['ml_prediction_topic'])
        ti.xcom_push(key="{}_streamstojoin".format(sname),value=default_args['streamstojoin'])
        ti.xcom_push(key="{}_inputdata".format(sname),value=default_args['inputdata'])
        ti.xcom_push(key="{}_consumefrom".format(sname),value=default_args['consumefrom'])
        ti.xcom_push(key="{}_offset".format(sname),value="_{}".format(default_args['offset']))
        ti.xcom_push(key="{}_delay".format(sname),value="_{}".format(default_args['delay']))
        ti.xcom_push(key="{}_usedeploy".format(sname),value="_{}".format(default_args['usedeploy']))
        ti.xcom_push(key="{}_networktimeout".format(sname),value="_{}".format(default_args['networktimeout']))

        maxrows=default_args['maxrows']
        if 'step6maxrows' in os.environ:
           ti.xcom_push(key="{}_maxrows".format(sname),value="_{}".format(os.environ['step6maxrows']))
           maxrows=os.environ['step6maxrows']
        else:
          ti.xcom_push(key="{}_maxrows".format(sname),value="_{}".format(default_args['maxrows']))
        ti.xcom_push(key="{}_topicid".format(sname),value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_pathtoalgos".format(sname),value=default_args['pathtoalgos'])
        ti.xcom_push(key="{}_HPDEADDR".format(sname), value=HPDEADDR)

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('predict',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-predict", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {}{} {} {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],HPDEADDR,HPDEHOSTPREDICT,HPDEPORTPREDICT[1:],maxrows), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
          repo=tsslogging.getrepo()
          try:
            tsslogging.tsslogit("Predictions DAG in {}".format(os.path.basename(__file__)), "INFO" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
          except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

          VIPERTOKEN=sys.argv[2]
          VIPERHOST=sys.argv[3]
          VIPERPORT=sys.argv[4]
          HPDEHOSTPREDICT=sys.argv[5]
          HPDEPORTPREDICT=sys.argv[6]
          maxrows =  sys.argv[7]
          default_args['maxrows'] = maxrows

          tsslogging.locallogs("INFO", "STEP 6: Predictions started")
          while True:
           try:
             performPrediction()
             time.sleep(1)
           except Exception as e:
             tsslogging.locallogs("ERROR", "STEP 6: Predictions DAG in {} {}".format(os.path.basename(__file__),e))
             tsslogging.tsslogit("Predictions DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
             tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
             break

Here are the core parameters in the above dag 6:

Step 6 DAG parameter	Explanation
preprocess_data_topic	This is the topic that contain the data for the independent variables. Note: this is NOT different from conventional BATCH machine learning, where you train a model on batch data, and then you use new values for the independent variables for prediction of the dependent variable. In the real-time case, we are streaming values for the independent variables contained in this topic.
ml_prediction_topic	This topic will contain the predictions. The predictions can then be used for visualization in STEP 7.
description	You can provide a description for your solution here.
streamstojoin	This is where you specify the independent variables for your predctions. Specifically, if you are preprocessing, the “new” preprocessed variables will be given a standard naming convention - see Preprocessed Variable Naming Standard for details. For example, if you used preprocessed variables Voltage and Current in your model, and used AnomProb (see Preprocessing Types), then the names for the preprocessed Voltage and Current streams will be: Voltage_preprocessed_AnomProb, Current_preprocessed_AnomProb.
inputdata	You can also manually enter the values for the independent variables in this variable. Specifically, if you do NOT want to join streams for the independent variables, buy use different values then enter them here. Note: You can either use streamstojoin or inputdata, not BOTH. The data in the inputdata field MUST be in the exact position of your model. For example, if your model is y = a + b, then inputdata=a_value,b_value, not inputdata=b_value,a_value, since the estimated coefficients will be for a and b, in this precise position.
consumefrom	This is the topic from STEP 5 (ml_data_topic) that contains the trained algorithm with the estimated parameters. You need these estimated parameters for the predictions. This is exactly the same as in conventional machine learning.
mainalgokey	This is the AlgoKey generated by TML it is a unique key identifying the algorithm for the entities.
offset	This determines where to start consuming the data from the stream. For example, if offset=-1, then consumption of the data will start from the latest data in the stream variables specified in streamstojoin. The amount of data to consume is determined by the maxrows parameter.
maxrows	This determines the number of offsets to rollback the stream. For example, if maxrows=50, and the last offset is 1000, then Viper will start consuming data from starting offset 1000-50=950, upto the last offset of 1000.
delay	This is a network delay parameter, that accomodates from any delays in Kafka (if any)
networktimeout	This variable accounts for any connection latency from Python
usedeploy	When algorithms are trained they put in the ./models or ./deploy folder. If usedeploy=1, then trained algorithms will be read from the ./deploy folder, otherwise models from ./models will be used.
topicid	This is an internal parameter that TML uses to keep track of entity ids. Setting this to -1 tells Viper to process individual entities.
pathtoalgos	This is the same path you specified in the key fullpathtotrainingdata in STEP 5. This is the location of the training datasets and algorithms. This is also important if you wanted to keep track of training datasets for auditing and governance.

8.6. Machine Learning Prediction Sample JSON Output

{
 "Hyperprediction": 0.347,
 "Probability1": 0.347,
 "Probability0": 0.653,
 "Algokey": "StreamConsumer_topicid1370_json",
 "Algo": "StreamConsumer_topicid1370_jsonlgt",
 "Usedeploy": 1,
 "Created": "2022-10-29T18:24:27.5145458-04:00",
 "Inputdata": "0.000,0.000,0.000,122022.000,0.000,0.000",
 "Fieldnames":
 "Date, topicid1370_Voltage_preprocessed_AnomProb, topicid1370_Current_preprocessed_AnomProb, topicid1370_Power_preprocessed_Trend,
   topicid1370_Voltage_preprocessed_Avg, topicid1370_Current_preprocessed_Avg,topicid1370_Power_preprocessed_Avg",
 "Topicid": 1370,
 "Fullpathtomodels": "c:/maads/golang/go/bin/viperlogs/iotlogistic/deploy",
 "Identifier": "Power~Power-(mW)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name (
   (Power), value:datapoint.value, identifier:metadata.display_name, datetime:datapoint.updated_at,:allrecords,
   Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a,IoT device performance and failures~Msgsjoined=7c54e7d8-7fab-11ec-1a0b-
   b4bd125d9af1(0);7ce0b024-7fab-11ec-9ac5-3ffbb1c36dfe(0);7ca71d1e-7fab-11ec-223f-87fb225a1c75(0);7cfe6880-7fab-11ec-ea23-17d1132d4605(0);7c7fdd12-7fab-11ec-
   41f5-50aa3db0fe21(0);7cc487c8-7fab-11ec-408e-149982099613(0)~latlong=46.151241,14.995463~mainuid=AC000W020486693",
 "Islogistic": "1",
 "Compression": "GZIP",
 "Produceto": "iot-ml-prediction-results-output",
 "Kafkacluster": "pkc-6ojv2.us-west4.gcp.confluent.cloud:9092",
 "Minmax": "35.487:104.175,35.144:103.602,0.000:0.000,0.000:0.000,0.000:0.000,0.000:0.000",
 "MachineLearningAlgorithm": "Logistic Regression",
 "ParameterEstimates": "-0.6322068,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000",
 "HasConstantTerm": "1"
}

Tip

It will be important to carefully study these fields for the visualization or for other downstream analysis.

Here is the table explaining the fields in the prediction JSON.

JSON Field	Description
Hyperprediction	This contains the probability prediction of failure for the device: mainuid=AC000W020486693 A value of 0.347 means this device has a 34.7% chance of failure.
Probability1	Probability of Class 1: Failure: 0.347
Probability0	Probability of Class 0: No Failure: 0.653
Algokey	Internal algorithm key identifying this algorithm for this device: StreamConsumer_topicid1370_json, internal ID 1370 is mapped to device ID AC000W020486693
Algo	The algorithm used: StreamConsumer_topicid1370_jsonlgt, lgt is logitic
Usedeploy	Determines which folder to grab the algorithm: 1, means use the ./deploy folder
Created	Create time for this prediction in UTC: 2022-10-29T18:24:27.5145458-04:00
Inputdata	Inputdata used in the model: 0.000,0.000,0.000, 122022.000,0.000,0.000 - These are the independent variables
Fieldnames	These are the independent variable streams used in the model: Date, topicid1370_Voltage_preprocessed_AnomProb, topicid1370_Current_preprocessed_AnomProb, topicid1370_Power_preprocessed_Trend, topicid1370_Voltage_preprocessed_Avg, topicid1370_Current_preprocessed_Avg, topicid1370_Power_preprocessed_Avg
Topicid	The topicid associated with this device id: 1370
Fullpathtomodels	This is the full path to trained algorithm: c:/maads/golang/go/bin/viperlogs/iotlogistic/deploy
Identifier	This contains additional information about the json criteria used. Power~Power-(mW)~iot-preprocess~uid:metadata.dsn, subtopic:metadata.property_name ( (Power), value:datapoint.value, identifier: metadata.display_name, datetime:datapoint.updated_at,:allrecords, Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a~location:n/a~ identifier:n/a,IoT device performance and failures~ Msgsjoined=7c54e7d8-7fab-11ec-1a0b- b4bd125d9af1(0);7ce0b024-7fab-11ec-9ac5-3ffbb1c36dfe(0); 7ca71d1e-7fab-11ec-223f-87fb225a1c75(0); 7cfe6880-7fab-11ec-ea23-17d1132d4605(0);7c7fdd12-7fab- 11ec- 41f5-50aa3db0fe21(0);7cc487c8-7fab-11ec-408e-149982099613(0)~ latlong=46.151241,14.995463~mainuid=AC000W020486693
Islogistic	This is a logistic if the value is: 1
Compression	Compression used in the data storage: GZIP
Produceto	The topic the predictions are produced to: iot-ml-prediction-results-output
Kafkacluster	This is the Kafka cluster used: pkc-6ojv2.us-west4.gcp.confluent.cloud:9092
Minmax	All values of the independent variable streams are transformed using minmax - here are the values for each independent variable (Fieldnames): 35.487:104.175,35.144:103.602,0.000:0.000, 0.000:0.000,0.000:0.000,0.000:0.000
MachineLearningAlgorithm	The name of the machine learning algorithm: Logistic Regression
ParameterEstimates	The parameter estimates from the trained model: -0.6322068,0.0000000,0.0000000,0.0000000, 0.0000000,0.0000000,0.0000000
HasConstantTerm	Indicates if the model has a constant term: 1 - indicates it does.

8.6.1. STEP 7: Real-Time Visualization: tml-system-step-7-kafka-visualization-dag

Fields to visualize can be determined from Preprocessed Sample JSON Output and Machine Learning Prediction Sample JSON Output and Machine Learning Trained Model Sample JSON Output.

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import subprocess
 import tsslogging
 import os
 import time
 import random

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'topic' : 'iot-preprocess,iot-preprocess2',    # <<< *** Separate multiple topics by a comma - Viperviz will stream data from these topics to your browser
   'dashboardhtml': 'dashboard.html', # <<< *** name of your dashboard file: This one is ONLY for preprocessing
   'dashboardhtml-ml': 'dashboard-ml.html', # <<< *** This one is IF you include ML dag
   'topic-ml' : 'iot-preprocess,iot-preprocess2',    # <<< *** Separate multiple topics by a comma
   'dashboardhtml-ai': 'dashboard-ai.html', # <<< *** This one is you include AI dag
   'topic-ai' : 'iot-preprocess,iot-preprocess2',    # <<< *** Separate multiple topics by a comma
   'dashboardhtml-ml-ai': 'dashboard-ml-ai.html', # <<< *** This one is you include ML-AI dag
   'topic-ml-ai' : 'iot-preprocess,iot-preprocess2',    # <<< *** Separate multiple topics by a comma
   'secure': '1',   # <<< *** 1=connection is encrypted, 0=no encryption
   'offset' : '-1',    # <<< *** -1 indicates to read from the last offset always
   'append' : '0',   # << ** Do not append new data in the browser
   'rollbackoffset' : '400', # *************** Rollback the data stream by rollbackoffset.  For example, if 500, then Viperviz wll grab all of the data from the last offset - 500
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 def windowname(wtype,vipervizport,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "viperviz-{}-{}-{}={}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/vipervizwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{},{}\n".format(wn,vipervizport))

     return wn

 def startstreamingengine(**context):
         repo=tsslogging.getrepo()
         tsslogging.locallogs("INFO", "STEP 7: Visualization started")
         try:
           tsslogging.tsslogit("Visualization DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         sd = context['dag'].dag_id
         sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
         chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
         vipervizport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERVIZPORT".format(sname))
         solutionvipervizport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONVIPERVIZPORT".format(sname))
         tss = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_TSS".format(sname))

         if '_ml_ai_' in sd:
           topic = default_args['topic-ml-ai']
           dashboardhtml = default_args['dashboardhtml-ml-ai']
         elif '_ai_' in sd:
           topic = default_args['topic-ai']
           dashboardhtml = default_args['dashboardhtml-ai']
         elif '_ml_' in sd:
           topic = default_args['topic-ml']
           dashboardhtml = default_args['dashboardhtml-ml']
         else:
           topic = default_args['topic']
           dashboardhtml = default_args['dashboardhtml']

         secure = default_args['secure']
         offset = default_args['offset']
         append = default_args['append']
         rollbackoffset = default_args['rollbackoffset']

         ti = context['task_instance']
         ti.xcom_push(key="{}_topic".format(sname),value="{}".format(topic))
         ti.xcom_push(key="{}_dashboardhtml".format(sname),value="{}".format(dashboardhtml))
         ti.xcom_push(key="{}_secure".format(sname),value="_{}".format(secure))
         ti.xcom_push(key="{}_offset".format(sname),value="_{}".format(offset))
         ti.xcom_push(key="{}_append".format(sname),value="_{}".format(append))
         ti.xcom_push(key="{}_chip".format(sname),value=chip)
         ti.xcom_push(key="{}_rollbackoffset".format(sname),value="_{}".format(rollbackoffset))

         # start the viperviz on Vipervizport
         # STEP 5: START Visualization Viperviz
         vizgood=0
         for i in range(5):
           wn = windowname('visual',vipervizport,sname,sd)
           subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
           subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viperviz", "ENTER"])
           mainport=0
           if tss[1:] == "1":
             subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "/Viperviz/viperviz-linux-{} 0.0.0.0 {}".format(chip,vipervizport[1:]), "ENTER"])
             mainport=int(vipervizport[1:])
           else:
             subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "/Viperviz/viperviz-linux-{} 0.0.0.0 {}".format(chip,solutionvipervizport[1:]), "ENTER"])
             mainport=int(solutionvipervizport[1:])

           time.sleep(5)
           if tsslogging.testvizconnection(mainport)==1:
             tsslogging.locallogs("INFO", "STEP 7: /Viperviz/viperviz-linux-{} 0.0.0.0 {}".format(chip,mainport))
             vizgood=1
             break
           else:
              if i < 4:
                subprocess.call(["tmux", "kill-window", "-t", "{}".format(wn)])
                subprocess.call(["kill", "-9", "$(lsof -i:{} -t)".format(mainport)])
              tsslogging.locallogs("WARN", "STEP 7: Cannot make a connection to Viperviz on port {}.  Going to try again...".format(mainport))


         if vizgood==0:
           tsslogging.locallogs("ERROR", "STEP 7: Network issue.  Cannot make a connection to Viperviz on port {}".format(mainport))

8.7. Visualization DAG Parameter Explanation

DAG Parameter	Explanation
topic	This is the topic that Viperviz will consume from. For example, Viperviz will automatically connect into this topic: iot-preprocess, and start streaming to your browser. If you want to consume from multiple topics, you can specify topic: iot-preprocess, iot-preprocess2,iot-preprocess3
topic-ml	Based on the TML Solution Templates you are using you can specify different topics for the appropriate solution. So, topic-ml, is for any solution template that is ML related or has “_ml_” in the solution name. This gives users flexibility in using different dashboards for different solutions.
topic-ai	Based on the TML Solution Templates you are using you can specify different topics for the appropriate solution. So, topic-ai, is for any solution template that is AI related or has “_ai_” in the solution name.
topic-ml-ai	Based on the TML Solution Templates you are using you can specify different topics for the appropriate solution. So, topic-ml-ai, is for any solution template that is AI related or has “_ml_ai_” in the solution name.
dashboardhtml	This dashboard will use the topics in the topic field.
dashboardhtml-ml	This dashboard will use the topics in the topic-ml field.
dashboardhtml-ai	This dashboard will use the topics in the topic-ai field.
dashboardhtml-ml-ai	This is dashboard will use the topics in the topic-ml-ai field.
secure	If set to 1, then connection is TLS secure, if 0 it is not.
vipervizport	This is the port you want the Viperviz binary to listen on. For example, if 9005, Viperviz will listen on Port 9005
offset	Indicate where in the stream to consume from. If -1, latest data is consumed.
append	If 0, data will not accumulate in your dashboard, if 1 it will accumulate.
chip	Viperviz can run on Windows/Mac/Linux. Use ‘amd64’ for Windows/Linux, use ‘arm64’ for Mac/Linux
rollbackoffset	This indicates the number of offsets to rollack from the latest (or end of the stream). If 500, then Viperviz wll grab all of the data from the last offset - 500

8.7.1. STEP 8: Deploy TML Solution to Docker : tml-system-step-8-deploy-solution-to-docker-dag

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
from datetime import datetime
from airflow.decorators import dag, task
import os
import subprocess
import tsslogging
import git
import time
import sys

sys.dont_write_bytecode = True

############################################################### DO NOT MODIFY BELOW ####################################################

def doparse(fname,farr):
      data = ''
      with open(fname, 'r', encoding='utf-8') as file:
        data = file.readlines()
        r=0
        for d in data:
            for f in farr:
                fs = f.split(";")
                if fs[0] in d:
                    data[r] = d.replace(fs[0],fs[1])
            r += 1
      with open(fname, 'w', encoding='utf-8') as file:
        file.writelines(data)

def dockerit(**context):
     if 'tssbuild' in os.environ:
        if os.environ['tssbuild']=="1":
            return
     try:

       sd = context['dag'].dag_id
       sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
       pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

       repo=tsslogging.getrepo()
       tsslogging.tsslogit("Docker DAG in {}".format(os.path.basename(__file__)), "INFO" )
       tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

       chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
       cname = os.environ['DOCKERUSERNAME']  + "/{}-{}".format(sname,chip)

       print("Containername=",cname)
       tsslogging.locallogs("INFO", "STEP 8: Starting docker push for: {}".format(cname))
       if os.environ['TSS'] == "1":
         try:
           f = open("/tmux/cname.txt", "w")
           f.write(cname)
           f.close()
         except Exception as e:
           pass

       ti = context['task_instance']
       ti.xcom_push(key="{}_containername".format(sname),value=cname)
       ti.xcom_push(key="{}_solution_dag_to_trigger".format(sname), value=sd)

       scid = tsslogging.getrepo('/tmux/cidname.txt')
       cid = scid # cid added

       key = "trigger-{}".format(sname)
       os.environ[key] = sd
       if os.environ['TSS'] == "1" and len(cid) > 1:
         print("[INFO] docker commit {} {}".format(cid,cname))
         subprocess.call("docker rmi -f $(docker images --filter 'dangling=true' -q --no-trunc)", shell=True)
         cbuf="docker commit {} {}".format(cid,cname)
         v=subprocess.call("docker commit {} {}".format(cid,cname), shell=True)

         status=tsslogging.optimizecontainer(cname,sname,sd)
         if status=="":
           tsslogging.locallogs("WARN", "STEP 8: There seems to be an issue optimizing the container.  Here is the commit command: {} - message={}.  Container may NOT pushed.".format(cbuf,v))
         else:
           tsslogging.locallogs("INFO", "STEP 8: Docker Container created and optimized.  Will push it now.  Here is the commit command: {} - message={}".format(cbuf,v))

         #v=subprocess.call("docker push {}".format(cname), shell=True)
         proc=subprocess.Popen("docker push {}".format(cname), shell=True)
         time.sleep(3)
         proc.terminate()
         proc.wait()

       elif len(cid) <= 1:
              tsslogging.locallogs("ERROR", "STEP 8: There seems to be an issue with docker commit. Here is the command: docker commit {} {}".format(cid,cname))
              tsslogging.tsslogit("Deploying to Docker in {}".format(os.path.basename(__file__)), "ERROR" )
              tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

       os.environ['tssbuild']="1"

       doparse("/{}/tml-airflow/dags/tml-solutions/{}/docker_run_stop-{}.py".format(repo,pname,pname), ["--solution-name--;{}".format(sname)])
       doparse("/{}/tml-airflow/dags/tml-solutions/{}/docker_run_stop-{}.py".format(repo,pname,pname), ["--solution-dag--;{}".format(sd)])

     except Exception as e:
        print("[ERROR] Step 8: ",e)
        tsslogging.locallogs("ERROR", "STEP 8: Deploying to Docker in {}: {}".format(os.path.basename(__file__),e))
        tsslogging.tsslogit("Deploying to Docker in {}: {}".format(os.path.basename(__file__),e), "ERROR" )
        tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

8.7.2. STEP 9: PrivateGPT and Qdrant Integration: tml-system-step-9-privategpt_qdrant-dag

Tip

Watch the YouTube video to learn how to configure the key paramaters in the Step 9 dag.

Also, it would be advised to pull the PrivateGPT containers before running this step 9.

 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import os
 import tsslogging
 import sys
 import time
 import maadstml
 import subprocess
 import random
 import json
 import threading
 import re
 from binaryornot.check import is_binary
 docidstrarr = []

 sys.dont_write_bytecode = True

 ######################################################USER CHOSEN PARAMETERS ###########################################################
 default_args = {
  'owner': 'Sebastian Maurice',   # <<< *** Change as needed
  'pgptcontainername' : 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2', #'maadsdocker/tml-privategpt-no-gpu-amd64',  # enter a valid container https://hub.docker.com/r/maadsdocker/tml-privategpt-no-gpu-amd64
  'rollbackoffset' : '5',  # <<< *** Change as needed
  'offset' : '-1', # leave as is
  'enabletls' : '1', # change as needed
  'brokerhost' : '', # <<< *** Leave as is
  'brokerport' : '-999', # <<< *** Leave as is
  'microserviceid' : '',  # change as needed
  'topicid' : '-999', # leave as is
  'delay' : '100', # change as needed
  'companyname' : 'otics',  # <<< *** Change as needed
  'consumerid' : 'streamtopic',  # <<< *** Leave as is
  'consumefrom' : 'cisco-network-preprocess',    # <<< *** Change as needed
  'pgpt_data_topic' : 'cisco-network-privategpt',
  'producerid' : 'private-gpt',   # <<< *** Leave as is
  'identifier' : 'This is analysing TML output with privategpt',
  'pgpthost': 'http://127.0.0.1', # PrivateGPT container listening on this host
  'pgptport' : '8001', # PrivateGPT listening on this port
  'preprocesstype' : '', # Leave as is
  'partition' : '-1', # Leave as is
  'prompt': '[INST] Are there any errors in the  logs? Give s detailed response including IP addresses and host machines.[/INST]', # Enter your prompt here
  'context' : 'This is network data from inbound and outbound packets. The data are \
 anomaly probabilities for cyber threats from analysis of inbound and outbound packets. If inbound or outbound \
 anomaly probabilities are less than 0.60, it is likely the risk of a cyber attack is also low. If its above 0.60, then risk is mid to high.', # what is this data about? Provide context to PrivateGPT
  'jsonkeytogather' : 'hyperprediction', # enter key you want to gather data from to analyse with PrivateGpt i.e. Identifier or hyperprediction
  'keyattribute' : 'inboundpackets,outboundpackets', # change as needed
  'keyprocesstype' : 'anomprob',  # change as needed
  'hyperbatch' : '0', # Set to 1 if you want to batch all of the hyperpredictions and sent to chatgpt, set to 0, if you want to send it one by one
  'vectordbcollectionname' : 'tml-llm-model-v2', # change as needed
  'concurrency' : '2', # change as needed Leave at 1
  'CUDA_VISIBLE_DEVICES' : '0', # change as needed
  'docfolder': 'mylogs,mylogs2',  # You can specify the sub-folder that contains TEXT or PDF files..this is a subfolder in the MAIN folder mapped to /rawdata
                    # if this field in NON-EMPTY, privateGPT will query these documents as the CONTEXT to answer your prompt
                    # separate multiple folders with a comma
  'docfolderingestinterval': '900', # how often you want TML to RE-LOAD the files in docfolder - enter the number of SECONDS, if 0 they are read ONCE
  'useidentifierinprompt': '1', # If 1, this uses the identifier in the TML json output and appends it to prompt, If 0, it uses the prompt only
  'searchterms': '192.168.--identifier--,authentication failure',
  'temperature' : '0.1', # This value ranges between 0 and 1, it controls how conservative LLM model will be, if 0 very very, if 1 it will hallucinate
  'vectorsearchtype' : 'Manhattan', # this is for the Qdrant Search algorithm.  it can be: Cosine, Euclid, Dot, or Manhattan
  'streamall': '1',
  'contextwindowsize': '8192', # Size of the context window.  This controls the number of tokens to process by LLM model
  'vectordimension': '768',
  'mitrejson': '/rawdata/mitre.json'
 }

 ############################################################### DO NOT MODIFY BELOW ####################################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""
 maintopic =  default_args['consumefrom']
 mainproducerid = default_args['producerid']
 GPTONLINE=0

 def checkresponse(response,ident):
     global GPTONLINE
     st="false"

     if "ERROR:" in response:
          return response,st,""

     GPTONLINE=1

     response = response.replace("null","-1").replace("\\n","").replace("\n","")
     r1=json.loads(response)
     c1=r1['choices'][0]['message']['content']
     c1=c1.replace('"','\\"').replace("'","\'").replace("\\n"," ").replace("&","and")
     c1 = re.sub(' +', ' ', c1)
     if '=' in c1 and ('Answer:' in c1 or 'A:' in c1):
       r1['choices'][0]['message']['content'] = "The analysis of the document(s) did not find a proper result."
       response = json.dumps(r1)
       return response,st,c1.strip()

     if default_args['searchterms'] != '':
           starr = default_args['searchterms'].split(",")
           for t in starr:
               if '--identifier--' in t:
                   t = t.replace("--identifier--",ident)
               if t in  c1:
                 st="true"
                 break

     return response,st,c1.strip()

 def stopcontainers():
    pgptcontainername = default_args['pgptcontainername']
    cfound=0
    subprocess.call("docker image ls > gptfiles.txt", shell=True)
    with open('gptfiles.txt', 'r', encoding='utf-8') as file:
         data = file.readlines()
         r=0
         for d in data:
           darr = d.split(" ")
           if '-privategpt-' in darr[0]:
             buf="docker stop $(docker ps -q --filter ancestor={} )".format(darr[0])
             if pgptcontainername in darr[0]:
                 cfound=1
             print(buf)
             subprocess.call(buf, shell=True)
    if cfound==0:
       print("INFO STEP 9: PrivateGPT container {} not found.  It may need to be pulled.".format(pgptcontainername))
       tsslogging.locallogs("WARN", "STEP 9: PrivateGPT container not found. It may need to be pulled if it does not start: docker pull {}".format(pgptcontainername))

 def llmattrs(pgptcontainername):
   if '-deepseek-medium' in pgptcontainername:
      return "DeepSeek-R1-Distill-Llama-8B-Q5_K_M.gguf","BAAI/bge-base-en-v1.5"
   elif pgptcontainername=='maadsdocker/tml-privategpt-with-gpu-nvidia-amd64':
      return "TheBloke/Mistral-7B-Instruct-v0.1-GGUF","BAAI/bge-small-en-v1.5"
   elif 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2' == pgptcontainername:
      return "mistralai/Mistral-7B-Instruct-v0.2","BAAI/bge-small-en-v1.5"
   elif 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v3' == pgptcontainername:
      return "mistralai/Mistral-7B-Instruct-v0.3","BAAI/bge-base-en-v1.5"
   elif 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v3-large' == pgptcontainername:
      return "mistralai/Mistral-7B-Instruct-v0.3","BAAI/bge-m3"

   return "",""

 def startpgptcontainer():
       print("Starting PGPT container: {}".format(default_args['pgptcontainername']))
       collection = default_args['vectordbcollectionname']
       concurrency = default_args['concurrency']
       pgptcontainername = default_args['pgptcontainername']
       pgptport = int(default_args['pgptport'])
       cuda = int(default_args['CUDA_VISIBLE_DEVICES'])
       temp = default_args['temperature']
       vectorsearchtype = default_args['vectorsearchtype']
       cw = default_args['contextwindowsize']
       vectordimension=default_args['vectordimension']

       stopcontainers()
       time.sleep(10)
       if '-no-gpu-' in pgptcontainername:
           buf = "docker run -d -p {}:{} --net=host --env PORT={} --env GPU=0 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env temperature={} --env vectorsearchtype=\"{}\" {}".format(pgptport,pgptport,pgptport,collection,concurrency,cuda,temperature,vectorsearchtype,pgptcontainername)
       else:
         mainmodel,mainembedding=llmattrs(pgptcontainername)
         if os.environ['TSS'] == "1":
           buf = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z --env PORT={} --env TSS=1 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env vectorsearchtype=\"{}\" --env contextwindowsize={} --env vectordimension={} --env mainmodel=\"{}\" --env mainembedding=\"{}\" {}".format(pgptport,pgptport,pgptport,collection,concurrency,cuda,temperature,vectorsearchtype,cw,vectordimension,mainmodel,mainembedding,pgptcontainername)
         else:
           buf = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z --env PORT={} --env TSS=0 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env vectorsearchtype=\"{}\" --env contextwindowsize={} --env vectordimension={}  --env mainmodel=\"{}\" --env mainembedding=\"{}\" {}".format(pgptport,pgptport,pgptport,collection,concurrency,cuda,temperature,vectorsearchtype,cw,vectordimension,mainmodel,mainembedding,pgptcontainername)

       v=subprocess.call(buf, shell=True)
       print("INFO STEP 9: PrivateGPT container.  Here is the run command: {}, v={}".format(buf,v))
       tsslogging.locallogs("INFO", "STEP 9: PrivateGPT container.  Here is the run command: {}, v={}".format(buf,v))

       return v,buf,mainmodel,mainembedding

 def qdrantcontainer():
     v=0
     buf=""
     buf="docker stop $(docker ps -q --filter ancestor=qdrant/qdrant )"
     subprocess.call(buf, shell=True)
     time.sleep(4)
     if os.environ['TSS'] == "1":
       buf = "docker run -d -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant"
     else:
        buf = "docker run -d --network=bridge -v /var/run/docker.sock:/var/run/docker.sock:z -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant"

     v=subprocess.call(buf, shell=True)
     print("INFO STEP 9: Qdrant container.  Here is the run command: {}, v={}".format(buf,v))

     tsslogging.locallogs("INFO", "STEP 9: Qdrant container.  Here is the run command: {}, v={}".format(buf,v))

     return v,buf

 def pgptchat(prompt,context,docfilter,port,includesources,ip,endpoint):
   prompt=prompt.replace("&","and")

   print("Pgptchat=",prompt)
   response=maadstml.pgptchat(prompt,context,docfilter,port,includesources,ip,endpoint)
   return response

 def producegpttokafka(value,maintopic):
      inputbuf=value
      topicid=int(default_args['topicid'])
      producerid=default_args['producerid']
      identifier = default_args['identifier']

      # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
      delay=default_args['delay']
      enabletls=default_args['enabletls']

      try:
         result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,'',
                                             topicid,identifier)
         print(result)
      except Exception as e:
         print("ERROR:",e)

 def consumetopicdata():
       maintopic = default_args['consumefrom']
       rollbackoffsets = int(default_args['rollbackoffset'])
       enabletls = int(default_args['enabletls'])
       consumerid=default_args['consumerid']
       companyname=default_args['companyname']
       offset = int(default_args['offset'])
       brokerhost = default_args['brokerhost']
       brokerport = int(default_args['brokerport'])
       microserviceid = default_args['microserviceid']
       topicid = default_args['topicid']
       preprocesstype = default_args['preprocesstype']
       delay = int(default_args['delay'])
       partition = int(default_args['partition'])

       result=maadstml.viperconsumefromtopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,
                   consumerid,companyname,partition,enabletls,delay,
                   offset, brokerhost,brokerport,microserviceid,
                   topicid,rollbackoffsets,preprocesstype)

       return result

 def writetortmslogfile(fname,jsonbuf):
        print("fname=",fname)
        print("jsonbuf=",jsonbuf)
        try:
          f = open(fname, "w")
          f.write(jsonbuf +"\n")
          f.close()
        except Exception as e:
          pass

 def getsearchtext(res,context,prompt):
    privategptmessage = []
    messages = ""
    mainmessages=""
    cw = int(default_args['contextwindowsize'])

    for r in res['StreamTopicDetails']['TopicReads']:
       fname=r['Filename']
       messages=""
       for d in r['SearchTextFound']:
         messages = messages + str(d[15:].strip()) + ". "
         if len(messages) > cw:
           messages = messages[0:cw-1]
           break


       mainmessages = "{}. Here are the messages: {}. {}".format(context,messages,prompt)
       privategptmessage.append([mainmessages,"SearchTextFound",fname,json.dumps(r)])

    return privategptmessage

 def gatherdataforprivategpt(result):

    privategptmessage = []
    if 'step9prompt' in os.environ:
       if os.environ['step9prompt'] != '':
         prompt = os.environ['step9prompt']
         prompt=prompt.replace("&","and")
         default_args['prompt'] = prompt
       else:
        prompt = default_args['prompt']
        prompt=prompt.replace("&","and")
    else:
       prompt = default_args['prompt']
       prompt=prompt.replace("&","and")

    if 'step9context' in os.environ:
       if os.environ['step9context'] != '':
         context = os.environ['step9context']
         context=context.replace("&","and")
         default_args['context'] = context
       else:
         context = default_args['context']
         context=context.replace("&","and")
    else:
      context = default_args['context']
      context=context.replace("&","and")

    jsonkeytogather = default_args['jsonkeytogather']
    if default_args['docfolder'] != '':
        context = ''
        if default_args['useidentifierinprompt'] == "1":
           jsonkeytogather = "Identifier"

    if 'step9keyattribute' in os.environ:
      if os.environ['step9keyattribute'] != '':
        attribute = os.environ['step9keyattribute']
        default_args['keyattribute'] = attribute
      else:
        attribute = default_args['keyattribute']
    else:
     attribute = default_args['keyattribute']

    if 'step9keyprocesstype' in os.environ:
      if os.environ['step9keyprocesstype'] != '':
         processtype = os.environ['step9keyprocesstype']
         default_args['keyprocesstype'] = processtype
      else:
        processtype = default_args['keyprocesstype']
    else:
      processtype = default_args['keyprocesstype']

    if 'step9hyperbatch' in os.environ:
      if os.environ['step9hyperbatch'] != '':
         hyperbatch = os.environ['step9hyperbatch']
         default_args['hyperbatch'] = hyperbatch
      else:
        hyperbatch = default_args['hyperbatch']
    else:
      hyperbatch = default_args['hyperbatch']

    try:
      res=json.loads(result,strict='False')
    except Exception as e:
      print("Error=",e)
      tsslogging.tsslogit("PrivateGPT DAG jsonkeytogather is empty in {} {}".format(os.path.basename(__file__),e), "ERROR" )
      tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
      return


    message = ""
    found=0

    if jsonkeytogather == '':
      tsslogging.tsslogit("PrivateGPT DAG jsonkeytogather is empty in {} {}".format(os.path.basename(__file__),e), "ERROR" )
      tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
      return

    if jsonkeytogather.lower()=="searchtextfound":
      privategptmessage=getsearchtext(res,context,prompt)
      return privategptmessage

    for r in res['StreamTopicDetails']['TopicReads']:
        if jsonkeytogather == 'Identifier' or jsonkeytogather == 'identifier':
          identarr=r['Identifier'].split("~")
          try:
            attribute = attribute.lower()
            aar = attribute.split(",")
            isin=any(x in r['Identifier'].lower() for x in aar)
            if isin:
              found=0
              for d in r['RawData']:
                 found=1
                 message = message  + str(d) + ', '
              if found:
                if context != '':
                   message = "{}.  Data: {}. {}".format(context,message,prompt)
                elif '--identifier--' in prompt:
                   prompt2 = prompt.replace('--identifier--',identarr[0])
                   message = "{}".format(prompt2)
                else:
                  message = "{}".format(prompt)
                privategptmessage.append([message,identarr[0]])
              message = ""
          except Excepption as e:
            tsslogging.tsslogit("PrivateGPT DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
        else:
          isin1 = False
          isin2 = False
          found=0
          message = ""
          identarr=r['Identifier'].split("~")
          if processtype != '' and attribute != '':
            processtype = processtype.lower()
            ptypearr = processtype.split(",")
            isin1=any(x in r['Preprocesstype'].lower() for x in ptypearr)

            attribute = attribute.lower()
            aar = attribute.split(",")
            isin2=any(x in r['Identifier'].lower() for x in aar)

            if isin1 and isin2:
              buf = r[jsonkeytogather]
              if buf != '':
                found=1
                message = message  + "{} (Identifier={})".format(buf,identarr[0]) + ', '
          elif processtype != '' and attribute == '':
            processtype = processtype.lower()
            ptypearr = processtype.split(",")
            isin1=any(x in r['Preprocesstype'].lower() for x in ptypearr)
            if isin1:
              buf = r[jsonkeytogather]
              if buf != '':
                found=1
                message = message  + "{} (Identifier={})".format(buf,identarr[0]) + ', '
          elif processtype == '' and attribute != '':
            attribute = attribute.lower()
            aar = attribute.split(",")
            isin2=any(x in r['Identifier'].lower() for x in aar)
            if isin2:
              buf = r[jsonkeytogather]
              if buf != '':
                found=1
                message = message  + "{} (Identifier={})".format(buf,identarr[0]) + ', '
          else:
            buf = r[jsonkeytogather]
            if buf != '':
              found=1
              message = message  + "{} (Identifier={})".format(buf,identarr[0]) + ', '

          if found and hyperbatch=="0":
               if '--identifier--' in prompt:
                   prompt2 = prompt.replace('--identifier--',identarr[0])
                   message = "{}.  Data: {}.  {}".format(context,message,prompt2)
               else:
                   message = "{}.  Data: {}.  {}".format(context,message,prompt)
               privategptmessage.append([message,identarr[0]])


    if jsonkeytogather != 'Identifier' and found and hyperbatch=="1":
      message = "{}.  Data: {}.  {}".format(context,message,prompt)
      privategptmessage.append(message)


    return privategptmessage

 def startdirread():
   global GPTONLINE
   print("INFO startdirread")
   try:
     t = threading.Thread(name='child procs', target=ingestfiles)
     t.start()
   except Exception as e:
     print(e)

 def deleteembeddings(docids):
   pgptendpoint="/v1/ingest/"
   pgptip = default_args['pgpthost']
   pgptport = default_args['pgptport']
   maadstml.pgptdeleteembeddings(docids,pgptip,pgptport,pgptendpoint)


 def getingested(docname):
   pgptendpoint="/v1/ingest/list"
   pgptip = default_args['pgpthost']
   pgptport = default_args['pgptport']
   docids,docstr,docidsstr=maadstml.pgptgetingestedembeddings(docname,pgptip,pgptport,pgptendpoint)
   return docids,docstr,docidsstr

 def ingestfiles():
     global docidstrarr, GPTONLINE
     pgptendpoint="/v1/ingest"
     docidstrarr = []
     basefolder='/rawdata/'
     pgptip = default_args['pgpthost']
     pgptport = default_args['pgptport']
     buf = default_args['docfolder']

     bufarr=buf.split(",")
     while True:
      if GPTONLINE:
       docidstrarr = []
       for dirp in bufarr:
         # lock the directory
         dirp = basefolder + dirp
         if os.path.exists(dirp):
           with tsslogging.LockDirectory(dirp) as lock:
             newfd = os.dup(lock.dir_fd)
             files = [ os.path.join(dirp,f) for f in os.listdir(dirp) if os.path.isfile(os.path.join(dirp,f)) ]
             for mf in files:
                docids,docstr,docidstr=getingested(mf)
                deleteembeddings(docids)
                print("INFO Ingestfiles:",mf)

                if is_binary(mf):
                  maadstml.pgptingestdocs(mf,'binary',pgptip,pgptport,pgptendpoint)
                else:
                  try:
                     maadstml.pgptingestdocs(mf,'text',pgptip,pgptport,pgptendpoint)
                  except Exception as e:
                      print("ERROR:",e)

                docids,docstr,docidstr=getingested(mf)
                if len(docidstr) >=1:
                  docidstrarr.append(docidstr[0])

         else:
           print("WARN Directory Path: {} does not exist".format(dirp))
       if int(default_args['docfolderingestinterval'])==0:
         break
       time.sleep(int(default_args['docfolderingestinterval']))
       print("docidsstr=",docidstrarr)
      time.sleep(1)

 def sendtoprivategpt(maindata,docfolder):
    global docidstrarr
    counter = 0
    maxc = 300
    pgptendpoint="/v1/completions"

    prompt = default_args['prompt']
    prompt=prompt.replace("&","and")

    context = default_args['context']
    context=context.replace("&","and")

    mcontext = False
    usingqdrant = ''
    if docfolder != '':
      mcontext = True
      usingqdrant = 'Using documents in Qdrant VectorDB for context.'

    maintopic = default_args['pgpt_data_topic']
    if os.environ['TSS']=="1":
      mainip = default_args['pgpthost']
    else:
      mainip = "http://" + os.environ['qip']
      if os.environ['qip']=="":
           mainip=default_args['pgpthost']

    mainport = default_args['pgptport']

    if 'step9keyattribute' in os.environ:
      if os.environ['step9keyattribute'] != '':
        attribute = os.environ['step9keyattribute']
        default_args['keyattribute'] = attribute
      else:
        attribute = default_args['keyattribute']
    else:
     attribute = default_args['keyattribute']

    if 'step9hyperbatch' in os.environ:
      if os.environ['step9hyperbatch'] != '':
         hyperbatch = os.environ['step9hyperbatch']
         default_args['hyperbatch'] = hyperbatch
      else:
        hyperbatch = default_args['hyperbatch']
    else:
      hyperbatch = default_args['hyperbatch']

    for mess in maindata:
         if default_args['jsonkeytogather']=='Identifier' or hyperbatch=="0" or default_args['jsonkeytogather'].lower()=="searchtextfound":
            m = mess[0]
            m1 = mess[1]
         else:
            m = mess
            m1 = attribute #default_args['keyattribute']

         m=m.replace("&","and")
         response=pgptchat(m,mcontext,docidstrarr,mainport,False,mainip,pgptendpoint)
         response=response.strip()
         # Produce data to Kafka
         sf="false"
         response,sf,contentmessage=checkresponse(response,m1)
         tactic,technique,jbm=tsslogging.getmitre(response,default_args['mitrejson'])
         if usingqdrant != '':
            if default_args['streamall']=="0": # Only stream if search terms found in response
               if sf=="false":
                  response="ERROR:"
            m = m + ' (' + usingqdrant + ')'
         if 'ERROR:' not in response and contentmessage != "":
           if default_args['jsonkeytogather'].lower()=="searchtextfound":
              jmess = mess[3]
              response1 = jmess[:-1] + ",\"privateGPT_AI_response\":\"" + contentmessage.strip().rstrip().lstrip() + \
                        "\"," + "\"prompt\":\"" + prompt + "\",\"context\":\""+context + \
                        "\",\"pgptcontainer\":\"" + default_args['pgptcontainername'] + "\",\"pgpt_consumefrom\":\"" + \
                         default_args['consumefrom'] + "\", \"pgpt_data_topic\":\"" + default_args['pgpt_data_topic'] + \
                         "\",\"contextwindowsize\":" + default_args['contextwindowsize'] + ",\"temperature\":\""+default_args['temperature'] + \
                         "\",\"pgptrollbackoffset\":"+default_args['rollbackoffset'] + jbm + "}"
              writetortmslogfile(mess[2],response1)
           else:
              response1 = response[:-1] + "," + "\"prompt\":\"" + m.strip() + "\",\"identifier\":\"" + m1.strip() + "\",\"searchfound\":\"" + sf.strip() + "\"}"
           response1=response1.replace(";",":")
           producegpttokafka(response1,maintopic)
         else:
           counter += 1
           time.sleep(1)
           if counter > maxc:
              startpgptcontainer()
              qdrantcontainer()
              counter = 0
              tsslogging.tsslogit("PrivateGPT Step 9 DAG PrivateGPT Container restarting in {} {}".format(os.path.basename(__file__),response), "WARN" )
              tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")


 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startprivategpt(**context):
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        if 'step9rollbackoffset' in os.environ:
           if os.environ['step9rollbackoffset'] != '':
             default_args['rollbackoffset'] = os.environ['step9rollbackoffset']

        if 'step9prompt' in os.environ:
           if os.environ['step9prompt'] != '':
             default_args['prompt'] = os.environ['step9prompt']
        if 'step9context' in os.environ:
           if os.environ['step9context'] != '':
             default_args['context'] = os.environ['step9context']

        if 'step9contextwindowsize' in os.environ:
           if os.environ['step9contextwindowsize'] != '':
             default_args['contextwindowsize'] = os.environ['step9contextwindowsize']

        if 'step9pgptcontainername' in os.environ:
           if os.environ['step9pgptcontainername'] != '':
             default_args['pgptcontainername'] = os.environ['step9pgptcontainername']

        if 'step9keyattribute' in os.environ:
           if os.environ['step9keyattribute'] != '':
             default_args['keyattribute'] = os.environ['step9keyattribute']
        if 'step9keyprocesstype' in os.environ:
           if os.environ['step9keyprocesstype'] != '':
             default_args['keyprocesstype'] = os.environ['step9keyprocesstype']
        if 'step9hyperbatch' in os.environ:
           if os.environ['step9hyperbatch'] != '':
             default_args['hyperbatch'] = os.environ['step9hyperbatch']
        if 'step9vectordbcollectionname' in os.environ:
           if os.environ['step9vectordbcollectionname'] != '':
             default_args['vectordbcollectionname'] = os.environ['step9vectordbcollectionname']
        if 'step9concurrency' in os.environ:
           if os.environ['step9concurrency'] != '':
             default_args['concurrency'] = os.environ['step9concurrency']
        if 'CUDA_VISIBLE_DEVICES' in os.environ:
           if os.environ['CUDA_VISIBLE_DEVICES'] != '':
             default_args['CUDA_VISIBLE_DEVICES'] = os.environ['CUDA_VISIBLE_DEVICES']

        if 'step9docfolder' in os.environ:
           if os.environ['step9docfolder'] != '':
             default_args['docfolder'] = os.environ['step9docfolder']
        if 'step9docfolderingestinterval' in os.environ:
           if os.environ['step9docfolderingestinterval'] != '':
             default_args['docfolderingestinterval'] = os.environ['step9docfolderingestinterval']
        if 'step9useidentifierinprompt' in os.environ:
           if os.environ['step9useidentifierinprompt'] != '':
             default_args['useidentifierinprompt'] = os.environ['step9useidentifierinprompt']

        if 'step9searchterms' in os.environ:
           if os.environ['step9searchterms'] != '':
             default_args['searchterms'] = os.environ['step9searchterms']

        if 'step9temperature' in os.environ:
           if os.environ['step9temperature'] != '':
             default_args['temperature'] = os.environ['step9temperature']
        if 'step9vectorsearchtype' in os.environ:
           if os.environ['step9vectorsearchtype'] != '':
             default_args['vectorsearchtype'] = os.environ['step9vectorsearchtype']


        if 'step9pgpthost' in os.environ:
           if os.environ['step9pgpthost'] != '':
             default_args['pgpthost'] = os.environ['step9pgpthost']
        if 'step9pgptport' in os.environ:
           if os.environ['step9pgptport'] != '':
             default_args['pgptport'] = os.environ['step9pgptport']

        if 'step9vectordimension' in os.environ:
           if os.environ['step9vectordimension'] != '':
             default_args['vectordimension'] = os.environ['step9vectordimension']

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESSPGPT".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESSPGPT".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        ti = context['task_instance']
        ti.xcom_push(key="{}_consumefrom".format(sname), value=default_args['consumefrom'])
        ti.xcom_push(key="{}_pgpt_data_topic".format(sname), value=default_args['pgpt_data_topic'])
        ti.xcom_push(key="{}_pgptcontainername".format(sname), value=default_args['pgptcontainername'])
        ti.xcom_push(key="{}_offset".format(sname), value="_{}".format(default_args['offset']))
        ti.xcom_push(key="{}_rollbackoffset".format(sname), value="_{}".format(default_args['rollbackoffset']))

        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_enabletls".format(sname), value="_{}".format(default_args['enabletls']))
        ti.xcom_push(key="{}_partition".format(sname), value="_{}".format(default_args['partition']))

        ti.xcom_push(key="{}_prompt".format(sname), value=default_args['prompt'])
        ti.xcom_push(key="{}_context".format(sname), value=default_args['context'])
        ti.xcom_push(key="{}_jsonkeytogather".format(sname), value=default_args['jsonkeytogather'])
        ti.xcom_push(key="{}_keyattribute".format(sname), value=default_args['keyattribute'])
        ti.xcom_push(key="{}_keyprocesstype".format(sname), value=default_args['keyprocesstype'])

        ti.xcom_push(key="{}_vectordbcollectionname".format(sname), value=default_args['vectordbcollectionname'])

        ti.xcom_push(key="{}_concurrency".format(sname), value="_{}".format(default_args['concurrency']))
        ti.xcom_push(key="{}_cuda".format(sname), value="_{}".format(default_args['CUDA_VISIBLE_DEVICES']))
        ti.xcom_push(key="{}_pgpthost".format(sname), value=default_args['pgpthost'])
        ti.xcom_push(key="{}_pgptport".format(sname), value="_{}".format(default_args['pgptport']))
        ti.xcom_push(key="{}_hyperbatch".format(sname), value="_{}".format(default_args['hyperbatch']))

        ti.xcom_push(key="{}_docfolder".format(sname), value="{}".format(default_args['docfolder']))
        ti.xcom_push(key="{}_docfolderingestinterval".format(sname), value="_{}".format(default_args['docfolderingestinterval']))
        ti.xcom_push(key="{}_useidentifierinprompt".format(sname), value="_{}".format(default_args['useidentifierinprompt']))
        ti.xcom_push(key="{}_searchterms".format(sname), value="{}".format(default_args['searchterms']))
        ti.xcom_push(key="{}_streamall".format(sname), value="_{}".format(default_args['streamall']))
        ti.xcom_push(key="{}_temperature".format(sname), value="_{}".format(default_args['temperature']))
        ti.xcom_push(key="{}_vectorsearchtype".format(sname), value="{}".format(default_args['vectorsearchtype']))
        ti.xcom_push(key="{}_contextwindowsize".format(sname), value="_{}".format(default_args['contextwindowsize']))
        ti.xcom_push(key="{}_vectordimension".format(sname), value="_{}".format(default_args['vectordimension']))
        ti.xcom_push(key="{}_mitrejson".format(sname), value="{}".format(default_args['mitrejson']))

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        mainmodel,mainembedding=llmattrs(default_args['pgptcontainername'])
        ti.xcom_push(key="{}_mainmodel".format(sname), value="{}".format(mainmodel))
        ti.xcom_push(key="{}_mainembedding".format(sname), value="{}".format(mainembedding))

        wn = windowname('ai',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess-pgpt", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" {} {} {} {} \"{}\" \"{}\" {} {}".format(fullpath,VIPERTOKEN, HTTPADDR, VIPERHOST, VIPERPORT[1:],
                        default_args['vectordbcollectionname'],default_args['concurrency'],default_args['CUDA_VISIBLE_DEVICES'],default_args['rollbackoffset'],
                        default_args['prompt'],default_args['context'],default_args['keyattribute'],default_args['keyprocesstype'],
                        default_args['hyperbatch'],default_args['docfolder'],default_args['docfolderingestinterval'],
                        default_args['useidentifierinprompt'],default_args['searchterms'],default_args['streamall'],default_args['temperature'],
                        default_args['vectorsearchtype'], default_args['contextwindowsize'], default_args['pgptcontainername'],
                        default_args['pgpthost'],default_args['pgptport'],default_args['vectordimension']), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         vectordbcollectionname =  sys.argv[5]
         concurrency =  sys.argv[6]

         cuda =  sys.argv[7]
         rollbackoffset =  sys.argv[8]
         prompt =  sys.argv[9]
         context =  sys.argv[10]
         keyattribute =  sys.argv[11]
         keyprocesstype =  sys.argv[12]
         hyperbatch =  sys.argv[13]
         docfolder =  sys.argv[14]
         docfolderingestinterval =  sys.argv[15]
         useidentifierinprompt =  sys.argv[16]
         searchterms =  sys.argv[17]
         streamall =  sys.argv[18]
         temperature = sys.argv[19]
         vectorsearchtype = sys.argv[20]

         contextwindowsize = sys.argv[21]
         pgptcontainername = sys.argv[22]

         pgpthost = sys.argv[23]
         pgptport = sys.argv[24]
         vectordimension=sys.argv[25]

         default_args['vectordimension']=vectordimension

         default_args['rollbackoffset']=rollbackoffset
         default_args['prompt'] = prompt
         default_args['context'] = context

         default_args['keyattribute'] = keyattribute
         default_args['keyprocesstype'] = keyprocesstype
         default_args['hyperbatch'] = hyperbatch
         default_args['vectordbcollectionname'] = vectordbcollectionname
         default_args['concurrency'] = concurrency
         default_args['CUDA_VISIBLE_DEVICES'] = cuda

         default_args['docfolder'] = docfolder
         default_args['docfolderingestinterval'] = docfolderingestinterval
         default_args['useidentifierinprompt'] = useidentifierinprompt
         default_args['searchterms'] = searchterms
         default_args['streamall'] = streamall
         default_args['temperature'] = temperature
         default_args['vectorsearchtype'] = vectorsearchtype

         default_args['contextwindowsize'] = contextwindowsize
         default_args['pgptcontainername'] = pgptcontainername

         default_args['pgpthost'] = pgpthost
         default_args['pgptport'] = pgptport

         if "KUBE" not in os.environ:
           v,buf=qdrantcontainer()
           if buf != "":
            if v==1:
             tsslogging.locallogs("WARN", "STEP 9: There seems to be an issue starting the Qdrant container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
            else:
             tsslogging.locallogs("INFO", "STEP 9: Success starting Qdrant.  Here is the run command: {}".format(buf))

           time.sleep(5)  # wait for containers to start

           tsslogging.locallogs("INFO", "STEP 9: Starting privateGPT")
           v,buf,mainmodel,mainembedding=startpgptcontainer()
           if v==1:
             tsslogging.locallogs("WARN", "STEP 9: There seems to be an issue starting the privateGPT container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
           else:
             tsslogging.locallogs("INFO", "STEP 9: Success starting privateGPT.  Here is the run command: {}".format(buf))

           time.sleep(10)  # wait for containers to start
           tsslogging.getqip()
         elif  os.environ["KUBE"] == "0":
           v,buf=qdrantcontainer()
           if buf != "":
            if v==1:
             tsslogging.locallogs("WARN", "STEP 9: There seems to be an issue starting the Qdrant container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
            else:
             tsslogging.locallogs("INFO", "STEP 9: Success starting Qdrant.  Here is the run command: {}".format(buf))

           time.sleep(5)  # wait for containers to start

           tsslogging.locallogs("INFO", "STEP 9: Starting privateGPT")
           v,buf,mainmodel,mainembedding=startpgptcontainer()
           if v==1:
             tsslogging.locallogs("WARN", "STEP 9: There seems to be an issue starting the privateGPT container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
           else:
             tsslogging.locallogs("INFO", "STEP 9: Success starting privateGPT.  Here is the run command: {}".format(buf))

           time.sleep(10)  # wait for containers to start
           tsslogging.getqip()
         else:
           tsslogging.locallogs("INFO", "STEP 9: [KUBERNETES] Starting privateGPT - LOOKS LIKE THIS IS RUNNING IN KUBERNETES")
           tsslogging.locallogs("INFO", "STEP 9: [KUBERNETES] Make sure you have applied the private GPT YAML files and have the privateGPT Pod running")

         if docfolder != '':
           startdirread()
         count=0
         while True:
          try:
              # Get preprocessed data from Kafka
              result = consumetopicdata()
 #             print("Result=",result)
              if result != "" and result is not None:
              # Format the preprocessed data for PrivateGPT
                maindata = gatherdataforprivategpt(result)
              # Send the data to PrivateGPT and produce to Kafka
                if len(maindata) > 0:
                 sendtoprivategpt(maindata,docfolder)
 #             time.sleep(2)
              count=0
          except Exception as e:
           print("Error=",e)
           tsslogging.locallogs("ERROR", "STEP 9: PrivateGPT Step 9 DAG in {} {}  Aborting after 10 consecutive errors.".format(os.path.basename(__file__),e))
           tsslogging.tsslogit("PrivateGPT Step 9 DAG in {} {} Aborting after 10 consecutive errors.".format(os.path.basename(__file__),e), "ERROR" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
           time.sleep(5)
           count = count + 1
           if count > 10:
             break

8.8. STEP 9 DAG Core Parameter Explanation

Step 9 DAG parameter	Explanation
pgptcontainername	Enter the privateGPT container to use. For example: maadsdocker/tml-privategpt-with-gpu-nvidia-amd64 maadsdocker/tml-privategpt-no-gpu-amd64 Containers can be found in Docker Hub under MAADSDOCKER account name
rollbackoffset	Choose rollback offset
offset	Choose offset - usually leave at -1
enabletls	Set to 1 for TLS encryption, or 0 no encryption.
consumefrom	Enter the topic to consume from
pgpt_data_topic	This is the topic that will store the privateGPT responses.
pgpthost	This is the host where privateGPT is running i.e. http://127.0.0.1
pgptport	This is the port privateGPT is listening on i.e. 8001
prompt	This the prompt for privateGPT. For example, Do the device data show any malfunction or defects?
context	Provide the context for the data. For example, This is IoT data from devices. The data are anomaly probabilities for each IoT device. If voltage or current probabilities are low, it is likely the device is not working properly.
hyperbatch	Set to 1, if you want to sen privateGPT batch grouping of hyperpredictions. Or set to 1, if you want to send privateGPT one result of the hyperpredictions at a time. For example, if doing anomaly predictions on each IoT device, set hyperbatch to 0 and TML will send individyual hyperpredictions to privateGPT, or in a batch.
jsonkeytogather	This is the JSON key to use to gather the data for privateGPT. Normally, you two options (only ONE value can be specified): hyperprediction: TML will store predictions and other outcomes in this variable. Identifier: TML will store additional output details here.
keyattribute	This is the attiribute you are analysing with TML i.e. Voltage,current
keyprocesstype	This is the type of processing you are doing on the keyattribute i.e. anomprob, avg, trend etc. See Preprocessing Types for a complete list.
vectordbcollectionname	This is the name of the collection on Qdrant Vector DB
concurrency	The number of instances of privateGPT to run i.e. 2
CUDA_VISIBLE_DEVICES	If you have NVIDIA GPU enter the location here i.e. 0
docfolder	You can specify the sub-folder that contains TEXT or PDF files..this is a subfolder in the MAIN folder mapped to /rawdata if this field in NON-EMPTY, privateGPT will query these documents as the CONTEXT to answer your prompt separate multiple folders with a comma
docfolderingestinterval	How often you want TML to RE-LOAD the files in docfolder - enter the number of SECONDS
useidentifierinprompt	If 1, this uses the identifier in the TML json output and appends it to prompt, If 0, it uses the prompt only
searchterms	If you are searching document embeddings, you can specify search terms like: ‘192.168.–identifier–,authentication failure’, etc.. In the privateGPT responses to the prompt, TML does a further search of the responses to see if the search terms exist in the response. This is very powerful, because you can raise alerts on the responses that contain special terms that raise an alerts i.e. hacking attempt
streamall	This determines whether to stream all of the privateGPT responses or just the ones that contain search terms. If set to ‘1’, all responses are streamed, if ‘0’, only response containing search terms are streamed.
temperature	This determines how the LLM responds, it is a number between 0 and 1. If 0, the response will be very conservative. If 1, the LLM will hallucinate.
vectorsearchtype	This determines how similarity searches are performed in the Qdrant vector DB. You must choose one of the following: Cosine, Dot, Manhattan or Euclid. See Qdrant for more details
contextwindowsize	The size of the context window. This is the maximum number of tokens to send to PGPT for processing. For exampled, if contextwindow is 8192, then a maximum of 8192 words can be sent to privateGPT for processing. You can increase this number, but it will consume more memory.
vectordimension	This is the size of the embedding array. It is specific to the embedding model being used. For example, 384, 768, 1024 etc. see the figure below.
mitrejson	You can use the mitre.json and save it to your mapped /rawdata folder. RTMS will ask AI to classifiy the messages in accordance with the MITRE ATT&CK classification matrix.

8.9. Vector Dimensions

This shows the different dimensions for embedding models. See here for more details.

8.10. privateGPT Processing Explanation

Consider the following JSON. This JSON is the output from STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag

{
        "hyperprediction": "120714.692",
        "Maintopic": "iot-preprocess",
        "Topic": "topicid155_Voltage_preprocessed_Avg",
        "Type": "External",
        "ProducerId": "customjson",
        "TimeStamp": "2024-09-13 17:04:36",
        "Unixtime": 1726247076213196638,
        "kafkakey": "OAA-Tvw04fZB3lr7bDehMDMAmK1ug2p0jw",
        "Preprocesstype": "Avg",
        "WindowStartTime": "2022-01-27 19:55:07 +0000 UTC",
        "WindowEndTime": "2022-01-27 19:55:09 +0000 UTC",
        "WindowStartUnixTime": "1643313307000000000",
        "WindowEndUnixTime": "1643313309000000000",
        "Conditions": "",
        "Identifier": "Voltage~Line-Voltage-(mV)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name (Voltage),value:datapoint.value,identifier:metadata.display_name,datetime:datapoint.updated_at,:allrecords,Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a,TML solution~Msgsjoined=06d99238-7fab-11ec-16dd-04357e6ea60c(120609,41.60322,-73.08775,Voltage,n/a,n/a,{});06f7a066-7fab-11ec-b57e-c6fecac720c2(120456,41.60322,-73.08775,Voltage,n/a,n/a,{});071a7abe-7fab-11ec-d105-4ccdd61deb1a(120609,41.60322,-73.08775,Voltage,n/a,n/a,{});0733212c-7fab-11ec-d162-80400f9d10d6(120609,41.60322,-73.08775,Voltage,n/a,n/a,{});0758c90e-7fab-11ec-24d3-2c9b20193b60(120609,41.60322,-73.08775,Voltage,n/a,n/a,{});0780e5a6-7fab-11ec-4416-1bf4bf386653(120812,41.60322,-73.08775,Voltage,n/a,n/a,{});07a1965c-7fab-11ec-ab45-fb68b835cee7(120712,41.60322,-73.08775,Voltage,n/a,n/a,{});07b56970-7fab-11ec-2762-03c9c43b6eac(120812,41.60322,-73.08775,Voltage,n/a,n/a,{});07ce4558-7fab-11ec-f91b-bce1f12d0bdc(120712,41.60322,-73.08775,Voltage,n/a,n/a,{});07ea1986-7fab-11ec-3b6d-d650f04215e1(120812,41.60322,-73.08775,Voltage,n/a,n/a,{});08014156-7fab-11ec-924c-3d9a32b7def1(120915,41.60322,-73.08775,Voltage,n/a,n/a,{});08197cd0-7fab-11ec-5c87-5902076c89be(120812,41.60322,-73.08775,Voltage,n/a,n/a,{});083c9760-7fab-11ec-f6e0-05d9b27e71d5(120812,41.60322,-73.08775,Voltage,n/a,n/a,{})~latlong=~mainuid=AC000W017810194",
        "PreprocessIdentifier": "",
        "Numberofmessages": 13,
        "RawData": [
                120609,
                120456,
                120812,
                120712,
                120915
        ],
        "MsgIdData": [
                "06d99238-7fab-11ec-16dd-04357e6ea60c(120609):{1}",
                "06f7a066-7fab-11ec-b57e-c6fecac720c2(120456):{1}",
                "071a7abe-7fab-11ec-d105-4ccdd61deb1a(120609):{1}",
                "0733212c-7fab-11ec-d162-80400f9d10d6(120609):{1}",
                "0758c90e-7fab-11ec-24d3-2c9b20193b60(120609):{1}",
                "0780e5a6-7fab-11ec-4416-1bf4bf386653(120812):{1}",
                "07a1965c-7fab-11ec-ab45-fb68b835cee7(120712):{1}",
                "07b56970-7fab-11ec-2762-03c9c43b6eac(120812):{1}",
                "07ce4558-7fab-11ec-f91b-bce1f12d0bdc(120712):{1}",
                "07ea1986-7fab-11ec-3b6d-d650f04215e1(120812):{1}",
                "08014156-7fab-11ec-924c-3d9a32b7def1(120915):{1}",
                "08197cd0-7fab-11ec-5c87-5902076c89be(120812):{1}",
                "083c9760-7fab-11ec-f6e0-05d9b27e71d5(120812):{1}"
        ],
        "Offset": 524247,
        "Consumerid": "StreamConsumer",
        "Generated": "2024-09-13T17:04:37.459+00:00",
        "Partition": 0
}

Important

It is important to note the format of this JSON as follows.

hyperprediction - all TML output is stored in this variable. This could be the name of the value of jsonkeytogather. The Step 9 DAG, will gather all the data from this key and ask privateGPT the question in your prompt.
Identifier - Additional details are put in this key. Specifically, the data used in the analysis is stored in the RawData JSON array, that can also be gathered and presented to privateGPT for prompting.

Now,

keyattribute is the variable you are processing. This is seen in the “Topic”: “topicid155_Voltage_preprocessed_Avg”, here TML is taking Average of voltage from the devices. Clearly, you can specify any name for key attribute you are processing.

keyprocesstype is the type of processing you are doing, as listed in Preprocessing Types. This is seen in the “Preprocesstype”: “Avg”,, here TML is taking Average of voltage from the devices. Clearly, you can specify any name for key processing type from the processing types table.

Tip

You can separate multiple keyattribute, and keyprocesstype with a comma.

This way of using processed data with privateGPT for further analysis, offers a tremendously powerful way to leverage GenAI technology with real-time data streams at no cost: since all API calls are done to the privateGPT container that is running locally. Also, no data are sent outside your environment, this further makes this solution very secure giving you 100% data control.

8.11. Using Qdrant VectorDB for Local Document Analysis

Users can search local documents to cross-reference the Identifier field in the privateGPT Processing Explanation

8.12. TML, PrivateGPT and Qdrant Example Scenarios

You can map local folders to the /rawdata folder and store your files (TEXT or PDF) as subfolders.

For example: docfolder=’mylog1,mylog2’, these two folders would be subfolders in the local folder mapped to /rawdata

The contents of these folders would be ingested into Qdrant Vector DB

These folder will automatically rel-loaded every docfolderingestinterval seconds. For example, if you want to analyse log files, then if docfolderingestinterval=60, these folders will be ingested every 60 seconds
If useidentifierinprompt is 1, then TML will add the Identifier as part of the prompt. For example, if you are analysing IP addresses for anomalies, and compute an anomaly score, you can further complement this score by looking in to log files, to see if this IP address has authentication failures, which may indicate this IP address is a HACKING attempt.

You can even add a placeholder for identifier in the prompt by adding --identifier--. For example, prompt=Does the following **--identifier-- have any errors in the logs?** TML will replace --identifier-- is the real-time IP address or value in the Identifier JSON field.

This way, you can use TML, privateGPT and Qdrant for powerful analysis of documents, by cross-referencing and meshing information together to get greater real-time insights from your real-time data.

8.13. STEP 9b: Multi-Agentic Agentic A: tml-system-step-9b-agenticai-dag

This DAG implements multi-agentic AI to real-time data processing. Take a look at ref:TML and Agentic AI for more information.

   from airflow.operators.python import PythonOperator
   from airflow.operators.bash import BashOperator
   from datetime import datetime, timezone
   from airflow.decorators import dag, task
   from langgraph_supervisor import create_supervisor
   from llama_index.core.indices.vector_store.base import VectorStoreIndex
   from llama_index.core.schema import Document  # Document is often found here
   from langgraph.prebuilt import create_react_agent
   from llama_index.embeddings.ollama import OllamaEmbedding
   from langchain_ollama import ChatOllama
   import importlib
   import json
   import pprint
   from llama_index.core.settings import Settings
   from datetime import datetime, timezone
   import os
   import tsslogging
   import sys
   import time
   import maadstml
   import subprocess
   import random
   import json
   import threading
   import re
   from binaryornot.check import is_binary
   import base64
   import requests
   from json_repair import repair_json

   sys.dont_write_bytecode = True

   ######################################################USER CHOSEN PARAMETERS ###########################################################
   SMTP_SERVER=''
   SMTP_PORT=0
   SMTP_USERNAME=''
   SMTP_PASSWORD='' # this should be base64 encoded
   recipient=''

   if 'SMTP_SERVER' in os.environ:
      SMTP_SERVER=os.environ['SMTP_SERVER']
   if 'SMTP_PORT' in os.environ:
      SMTP_PORT=int(os.environ['SMTP_PORT'])
   if 'SMTP_USERNAME' in os.environ:
      SMTP_USERNAME=os.environ['SMTP_USERNAME']
   if 'SMTP_PASSWORD' in os.environ:
      SMTP_PASSWORD=os.environ['SMTP_PASSWORD']
      SMTP_PASSWORD=base64.b64decode(SMTP_PASSWORD)
      SMTP_PASSWORD = SMTP_PASSWORD.decode('utf-8')
   if 'recipient' in os.environ:
      recipient=os.environ['recipient']

   default_args = {
    'owner': 'Sebastian Maurice',   # <<< *** Change as needed
    'ollamacontainername' : 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools', #'maadsdocker/tml-privategpt-no-gpu-amd64',  # enter a valid container https://hub.docker.com/r/maadsdocker/tml-privategpt-no-gpu-amd64
    'rollbackoffset' : '5',  # <<< *** Change as needed
    'offset' : '-1', # leave as is
    'enabletls' : '1', # change as needed
    'brokerhost' : '', # <<< *** Leave as is
    'brokerport' : '-999', # <<< *** Leave as is
    'microserviceid' : '',  # change as needed
    'topicid' : '-999', # leave as is
    'delay' : '100', # change as needed
    'companyname' : 'otics',  # <<< *** Change as needed
    'consumerid' : 'streamtopic',  # <<< *** Leave as is
    'agenttopic' : '', # this topic contains the individual agent responses
    'agents_topic_prompt' : """
   <consumefrom - topic agent will monitor:prompt you want for the agent to answer->>consumefrom - topic2 agent will monitor<<-prompt you want for the agent to answer>
   """, # <topic agent will monitor:prompt you want for the agent>, separate multiple topic agents with ->>
    'teamlead_topic' : '', # Enter the team lead topic - all team lead responses will be written to this topic
    'teamleadprompt' : """
   Enter the prompt for the Team lead agent
   """, # Enter the team lead prompt
    'supervisor_topic' : '', # Enter the supervisor topic - all supervisor responses will be written to this topic
    'supervisorprompt' : '', # Enter the supervisor prompt
    'agenttoolfunctions' : """
   tool_function:agent_name:system_prompt;tool_function2:agent_name2:sysemt_prompt2;....
   """,  # enter the tools : tool_function is the name of the funtions in the agenttools python file
    'agent_team_supervisor_topic': '', # this topic will hold the responses from agents, team lead and supervisor
    'producerid' : 'agentic-ai',   # <<< *** Leave as is
    'identifier' : 'This is analysing TML output with Agentic AI',
    'mainip': 'http://127.0.0.1', # Ollama server container listening on this host
    'mainport' : '11434', # Ollama listening on this port
    'embedding': 'nomic-embed-text', # Embedding model
    'preprocesstype' : '', # Leave as is
    'partition' : '-1', # Leave as is
    'vectordbcollectionname' : 'tml-llm-model-v2', # change as needed
    'concurrency' : '2', # change as needed Leave at 1
    'CUDA_VISIBLE_DEVICES' : '0', # change as needed
    'temperature' : '0.1', # This value ranges between 0 and 1, it controls how conservative LLM model will be, if 0 very very, if 1 it will hallucinate
    #--------------------
    'ollama-model': 'llama3.1',
    'deletevectordbcount': '10',
    'vectordbpath': '/rawdata/vectordb',
    'contextwindow': '10000',
    'localmodelsfolder': '/mnt/c/maads/tml-airflow/rawdata/ollama'
   }

   ############################################################### DO NOT MODIFY BELOW ####################################################

   VIPERTOKEN=""
   VIPERHOST=""
   VIPERPORT=""
   HTTPADDR=""
   mainproducerid = default_args['producerid']

   def setollama(model):
       ###############  Ollama Model #################################
   #    model=default_args['ollama-model']
       temperature=float(default_args['temperature'])
       embeddingmodel=default_args['embedding'] #"nomic-embed-text"
       mainip=default_args['mainip']
       mainport=int(default_args['mainport'])
       contextwindow=default_args['contextwindow']

   #    mainmodels = model.split(",") # agent,teamlead,supervisor

       if 'KUBE' in os.environ:
         if os.environ['KUBE'] == "1":
            default_args['mainip']="ollama-service"
            mainip=default_args['mainip']

       print("model====",model)
       gotllm=0
       for i in range(30):
         print("Checking if LLM loaded..wait")
         try:
           llm = ChatOllama(model=model, base_url=mainip+":"+str(mainport), temperature=temperature, num_ctx=int(contextwindow))
           gotllm=1
           print("LLM loaded")
           break
         except Exception as e:
           print("Error=",e)
           time.sleep(5)

       if gotllm==0:
           print("ERROR STEP 9b: Cannot load Ollama LLM model '{}' not found.".format(model))
           tsslogging.locallogs("ERROR", "STEP 9b: Cannot load Ollama LLM model '{}' not found.".format(model))
           return "",""

       try:
         ollama_emb = OllamaEmbedding(
           base_url=mainip+":"+str(mainport),
           model_name=embeddingmodel
         )
       except Exception as e:
         print("ERROR STEP 9b: Cannot load Ollama embedding '{}' not found.".format(embeddingmodel))
         tsslogging.locallogs("ERROR", "STEP 9b: Cannot load Ollama embedding '{}' not found.".format(embeddingmodel))
         return "",""

       Settings.embed_model = ollama_emb
       Settings.llm = llm

       return llm,ollama_emb


   def checkforloadedmodels(mainmodel):

       if 'KUBE' in os.environ:
         if os.environ['KUBE'] == "1":
            default_args['mainip']="ollama-service"
            mainip=default_args['mainip']

       mainip=default_args['mainip']
       mainport=int(default_args['mainport'])

       OLLAMA_URL = f"{mainip}:{mainport}/api/tags"
       count = 0

       while True:
         try:
           response = requests.get(OLLAMA_URL)
           response.raise_for_status()
           data = response.json()
           # Assume 'models' key contains the list of available/loaded models
           loaded_models = [model for model in data.get("models", [])]
           print("loaded_models=",loaded_models)
           if mainmodel in json.dumps(loaded_models) or mainmodel+":latest" in json.dumps(loaded_models):
             print(f"Model {mainmodel} found")
             return 1
           else:
             pull_ollama_model(mainmodel) # pull the model
             time.sleep(5)
             count += 1
             if count > 600:
              break
             else:
               continue
         except Exception as e:
           print(f"Error querying Ollama server: {e} Will keep trying")
           time.sleep(5)
           count += 1
           if count > 20:
             break
           continue

       return 0


   def get_loaded_models():

       if 'KUBE' in os.environ:
         if os.environ['KUBE'] == "1":
            default_args['mainip']="ollama-service"
            mainip=default_args['mainip']

       mainip=default_args['mainip']
       mainport=int(default_args['mainport'])
       mainmodel=default_args['ollama-model']
       mainmodel = mainmodel.split(",")[0] #check if one model is there
       OLLAMA_URL = f"{mainip}:{mainport}/api/tags"
       count = 0

       while True:
         try:
           response = requests.get(OLLAMA_URL)
           response.raise_for_status()
           data = response.json()
           # Assume 'models' key contains the list of available/loaded models
           loaded_models = [model for model in data.get("models", [])]
           print("loaded_models=",loaded_models)
           if mainmodel in json.dumps(loaded_models) or mainmodel+":latest" in json.dumps(loaded_models):
             print(f"Model {mainmodel} found")
             return 1
           else:
             time.sleep(5)
             count += 1
             if count > 600:
              break
             else:
               continue
         except Exception as e:
           print(f"Error querying Ollama server: {e} Will keep trying")
           time.sleep(5)
           count += 1
           if count > 20:
             break
           continue

       return 0

   def remove_escape_sequences(string):
       return string.encode('utf-8').decode('unicode_escape')

   def cleanstringjson(mainstr):

       mainstr = mainstr.replace("'","").replace('`',"").replace("\n","").replace("\\n","").replace("\t","").replace("\\t","").replace("\r","").replace("\\r","").replace("\\*","").replace("\\ ","").replace("\\\\","\\")


       a = list(mainstr.lower())
       b = "abcdefghijklmnopqrstuvwxyz-*123456789'{}`"
       i=0
       for char in a:
           if char == "\\" and a[i+1] in b:
             a[i]=''
           if char == "\\" and a[i+1] == "\\" and a[i+2] == '"':
             a[i]=''

           i=i+1

       mainstr=''.join(a)
       mainstr=re.sub(r'[\n\r]+', '', mainstr)

       mainstr = mainstr.translate({ord('\n'): None, ord('\r'): None})
       mainstr = " ".join(mainstr.splitlines())

       return mainstr

   def cleanstring(mainstr):

       mainstr = mainstr.replace('"',"").replace("'","").replace('`',"").replace("\n","").replace("\\n","").replace("\t","").replace("\\t","").replace("\r","").replace("\\r","").replace("\\*","").replace("\\ ","").replace("\\\\","\\").replace("\\1","1").replace("\\2","2").replace("\\3","3").replace("\\4","4").replace("\\5","5").replace("\\6","6").replace("\\7","7").replace("\\8","8").replace("\\9","9")
       mainstr = mainstr.splitlines()
       mainstr = " ".join(mainstr)

       a = list(mainstr.lower())
       b = "abcdefghijklmnopqrstuvwxyz-*123456789'{}`"
       i=0
       for char in a:
           if char == "\\" and a[i+1] in b:
             a[i]=''
           if char == "\\" and a[i+1] == "\\" and a[i+2] == '"':
             a[i]=''

           i=i+1

       mainstr=''.join(a)
       mainstr=re.sub(r'[\n\r]+', '', mainstr)

       mainstr = mainstr.translate({ord('\n'): None, ord('\r'): None})
       return mainstr

   ############## Delete folder content ########################
   def deletefoldercontents(dirpath,deletevectordbcnt):
       if deletevectordbcnt < int(default_args['deletevectordbcount']):
           deletevectordbcnt += 1
           return deletevectordbcnt
       else:
           deletevectordbcn=0

       folder = dirpath
       for filename in os.listdir(folder):
           file_path = os.path.join(folder, filename)
           try:
               if os.path.isfile(file_path) or os.path.islink(file_path):
                   os.unlink(file_path)
               elif os.path.isdir(file_path):
                   shutil.rmtree(file_path)
           except Exception as e:
               print('Failed to delete %s. Reason: %s' % (file_path, e))
       return deletevectordbcnt
   ########################### Vector DB for Team Lead: Agent Responses ###############
   # this is for the team lead agent to consolidate information from individual agents
   ###################################################################################
   def loadtextdataintovectordb(responses,deletevectordbcnt,llm):

       vectordbpath = default_args['vectordbpath']

       directory_path="{}/tmlvectortextindex".format(vectordbpath)

       if not os.path.exists(directory_path):
          os.makedirs(directory_path)

       # delete previous folder content
       deletevectordbcnt=deletefoldercontents(directory_path,deletevectordbcnt)

       documents = [Document(text=t) for t in responses]

       #build index
       tml_index = VectorStoreIndex.from_documents(
           documents,
           embedding="local"
       )
       #persist index

       # persist index
       tml_index.storage_context.persist(persist_dir=directory_path)

       tml_text_engine = tml_index.as_query_engine(llm=llm,similarity_top_k=3)

       return tml_text_engine,deletevectordbcnt

   def pull_ollama_model(model_name):
       """
       Initiates an Ollama model pull using the Ollama API.

       Args:
           model_name (str): The name of the model to pull (e.g., "llama3").
       """
       mainip=default_args['mainip']
       mainport=int(default_args['mainport'])

       url = f"{mainip}:{mainport}/api/pull"  # Default Ollama API endpoint
       headers = {"Content-Type": "application/json"}
       payload = {"name": model_name}

       try:
           response = requests.post(url, headers=headers, data=json.dumps(payload), stream=True)
           response.raise_for_status()  # Raise an exception for HTTP errors

           print(f"Initiating pull for model: {model_name}")
           for chunk in response.iter_content(chunk_size=None):
               if chunk:
                   # Process the streaming response, e.g., print progress
                   try:
                       data = json.loads(chunk.decode('utf-8'))
                       if 'status' in data:
                           print(f"Status: {data['status']}", end='\r')
                   except json.JSONDecodeError:
                       pass # Handle incomplete JSON chunks if necessary

           print(f"\nPull for model '{model_name}' completed.")

       except requests.exceptions.RequestException as e:
           print(f"Error pulling model '{model_name}': {e}")


   def stopcontainers():


      ollamacontainername = default_args['ollamacontainername']
      cfound=0
      subprocess.call("docker image ls > gptfiles.txt", shell=True)
      with open('gptfiles.txt', 'r', encoding='utf-8') as file:
           data = file.readlines()
           r=0
           for d in data:
             darr = d.split(" ")
             if '-privategpt-' in darr[0]:
               buf="docker stop $(docker ps -q --filter ancestor={} )".format(darr[0])
               if ollamacontainername in darr[0]:
                   cfound=1
                   # if ollama container found check if model is already loaded - if not  stop container
                   if get_loaded_models()==0:
                     print(buf)
                     subprocess.call(buf, shell=True)
                     return 0
                   break
      if cfound==0:
         print("INFO STEP 9b: Ollama container {} not found.  It may need to be pulled.".format(ollamacontainername))
         tsslogging.locallogs("WARN", "STEP 9b: Ollama container not found. It may need to be pulled if it does not start: docker pull {}".format(ollamacontainername))
         return 0

      return 1

   def startpgptcontainer():
         print("Starting Ollama container: {}".format(default_args['ollamacontainername']))
         collection = default_args['vectordbcollectionname']
         concurrency = default_args['concurrency']
         ollamacontainername = default_args['ollamacontainername']
         mainport = int(default_args['mainport'])
         cuda = int(default_args['CUDA_VISIBLE_DEVICES'])
         temp = default_args['temperature']
         mainmodel=default_args['ollama-model']
         mainembedding=default_args['embedding']
         mainhost = default_args['mainip']

         mainmodels = mainmodel.split(",")
         mainmodel = " && ".join(mainmodels)

         ollamaserver = mainhost + ":" + str(mainport)
         localmodels=''
         if default_args['localmodelsfolder'] != '':
             localmodels = "-v " + default_args['localmodelsfolder'] + ":/root/.ollama:z"

         time.sleep(10)
         if os.environ['TSS'] == "1":
             buf = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z {} --env OLLAMA_LOAD_TIMEOUT=30m0s --env PORT={} --env TSS=1 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env LLAMAMODEL=\"{}\" --env mainembedding=\"{}\" --env OLLAMASERVERPORT=\"{}\" {}".format(mainport,mainport,localmodels,mainport,collection,concurrency,cuda,temperature,mainmodel,mainembedding,ollamaserver,ollamacontainername)
         else:
             buf = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z {} --env OLLAMA_LOAD_TIMEOUT=30m0s --env PORT={} --env TSS=0 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env LLAMAMODEL=\"{}\" --env mainembedding=\"{}\" --env OLLAMASERVERPORT=\"{}\" {}".format(mainport,mainport,localmodels,mainport,collection,concurrency,cuda,temperature,mainmodel,mainembedding,ollamaserver,ollamacontainername)


         if stopcontainers() == 1:
           return 1,buf,mainmodel,mainembedding

         v=subprocess.call(buf, shell=True)
         print("INFO STEP 9b: Ollama container.  Here is the run command: {}, v={}".format(buf,v))
         tsslogging.locallogs("INFO", "STEP 9b: Ollama container.  Here is the run command: {}, v={}".format(buf,v))

         return v,buf,mainmodel,mainembedding


   def producegpttokafka(value,maintopic):
        inputbuf=value.strip()
        topicid=int(default_args['topicid'])
        producerid=default_args['producerid']
        identifier = default_args['identifier']

        # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
        delay=default_args['delay']
        enabletls=default_args['enabletls']

        inputbuf=cleanstringjson(inputbuf)


        try:
           result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,'',
                                               topicid,identifier)
           print(result)
        except Exception as e:
           print("ERROR:",e)

   def consumefromtopic(maintopic):

         rollbackoffsets = int(default_args['rollbackoffset'])
         enabletls = int(default_args['enabletls'])
         consumerid=default_args['consumerid']
         companyname=default_args['companyname']
         offset = int(default_args['offset'])
         brokerhost = default_args['brokerhost']
         brokerport = int(default_args['brokerport'])
         microserviceid = default_args['microserviceid']
         topicid = default_args['topicid']
         preprocesstype = default_args['preprocesstype']
         delay = int(default_args['delay'])
         partition = int(default_args['partition'])

         print("before viperconsume",VIPERHOST,VIPERPORT,maintopic)
         result=maadstml.viperconsumefromtopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,
                     consumerid,companyname,partition,enabletls,delay,
                     offset, brokerhost,brokerport,microserviceid,
                     topicid,rollbackoffsets,preprocesstype)
         return result


   def windowname(wtype,sname,dagname):
       randomNumber = random.randrange(10, 9999)
       wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
       with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
         file.writelines("{}\n".format(wn))

       return wn

   ############# Get the real-time data from the data streams #########################
   def getjsonsfromtopics(topics):

       print("in getjsonsfromtopics==",topics)

       topicsarr = topics.split("->>")
       topicjsons = []

       for t in topicsarr:
         t=t.strip()
         t2 = t.split("<<-")[0].strip()
         try:
           jsonvalue=consumefromtopic(t2)
         except Exception as e:
           print("error=",e)
         topicjsons.append(jsonvalue)

       return topicjsons


   def extract_hyperpredictiondata(hjson):

       print("in extract")

       hyper_json = json.loads(hjson)
       hnum=0
       pt=""
       pv=""
       mainuid=""
       jbufs = ""

       if len(hyper_json['streamtopicdetails']['topicreads']) == 0:
        return ""

       for item in hyper_json['streamtopicdetails']['topicreads']:
           jbuf = ""

           if "preprocesstype" in item:
              ptypes = item['preprocesstype']
              pt = ptypes
              iden = item['identifier']
              idenarr = iden.split("~")
              pv = idenarr[0]
              hyperprediction = str(item['hyperprediction'])
              hnum=round(float(hyperprediction))

           if "islogistic" in item:
              pv="machine learning"
              if item['islogistic'] == "1":
                 pt = "probability prediction"
                 hyperprediction = str(item['hyperprediction'])
                 hnum = round(float(hyperprediction)*100)
              else:
                 hyperprediction = str(item['hyperprediction'])
                 hnum = round(float(hyperprediction))
                 pt = "prediction"


           if "identifier" in item:
               iden = item['identifier']
               idenarr = iden.split("~")
               mainuid = idenarr[-1]
               mainuid = mainuid.split("=")[1]


           jbuf = '{"hp":' + str(hnum) + ',"pt":"' + pt + '", "pv":"' + pv + '", "uid":"' + mainuid + '"}'
           jbufs = jbufs + jbuf +","


       hliststr = "[" + jbufs[:-1] + "]"
       hliststr=re.sub(r'[\n\r]+', '', hliststr)
       hliststr = hliststr.translate({ord('\n'): None, ord('\r'): None})
       print("hliststr==",hliststr)
       return hliststr

   def checkjson(cjson):

       model = default_args['ollama-model']
       temperature = float(default_args['temperature'])
       embeddingmodel = default_args['embedding']

       cjson = cjson.strip()
       try:
        checkedjson = json.loads(cjson)  # check to see if json loads - if not its bad
       except Exception as e:
        print("Json error=",e)
        if cjson[-1] != '}':
           if "Model" not in cjson and "Embedding" not in cjson and "Temperature" not in cjson:
             cjson = cjson +'","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'
           else:
             cjson = cjson + '"}'

        elif cjson[-2] != '"':
           if "Model" not in cjson and "Embedding" not in cjson and "Temperature" not in cjson:
             cjson = cjson[:-1] +'","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'
           else:
             cjson = cjson[:-1] + '"}'

        cjson = repair_json(cjson, skip_json_loads=True )
        pass
        # bad json

       return cjson


   def agentquerytopics(usertopics,topicjsons,llm):
       topicsarr = usertopics.split("->>")
       bufresponse = ""
       bufarr = []
       agenttopic = default_args['agenttopic']

       model = default_args['ollama-model']
       temperature = float(default_args['temperature'])
       embeddingmodel = default_args['embedding']

       md = model.split(",")
       model=md[0]

       if len(topicsarr) == 0:
           print("No topics data")
           return "",""

       responses = []
       for t,mainjson in zip(topicsarr,topicjsons):
         t=t.strip()
         t2  = t.split("<<-")
         mainjson=mainjson.lower()
         if "hyperprediction" in mainjson:
            mainjson=extract_hyperpredictiondata(mainjson)
            if mainjson == "":
              continue

         if "<<data>>" in t2[1]:
            query_str=t2[1]
            query_str = query_str.replace("<<data>>", f"{mainjson}")
            print("query_string====",query_str)


       # Invoking with a string
         print("------before llm invoke===")
         response = llm.invoke(query_str)
         response=str(response.content)

         prompt=cleanstring(t2[1].strip())

         response=cleanstring(response)
         response=response.replace(";",",").replace(":","").replace("'","").replace('"',"")

         bufresponse  = '{"Date": "' + str(datetime.now(timezone.utc)) + '","Agent_Name": "Topic_Agent", "Topic": "'+t2[0].strip()+'","Prompt":"' + prompt + '","Response": "' + response.strip() + '","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'
         bufresponse=checkjson(bufresponse)
         print("======bufresponse====",bufresponse)
         bufarr.append(bufresponse)

         producegpttokafka(bufresponse,agenttopic)

         responses.append(response)

       return responses,bufarr

   def teamleadqueryengine(tml_text_engine):
       bufresponse = ""

       model = default_args['ollama-model']
       md = model.split(",")
       if len(md)>1:
         model=md[1]

       temperature = float(default_args['temperature'])
       embeddingmodel = default_args['embedding']

       teamleadprompt = teamleadprompt.replace(";"," ")
       response = tml_text_engine.query(teamleadprompt )
       response=str(response)
   #    print("team repsose = ", response)
       prompt=cleanstring(teamleadprompt.strip())
       response=cleanstring(response.strip())
       response=response.replace(";",",").replace(":","").replace('"',"").replace("'","")
       bufresponse  = '{"Date": "' + str(datetime.now(timezone.utc)) + '","Agent_Name": "Team_Lead_Agent", "Topic": "'+default_args['teamlead_topic'] +'","Prompt":"' + prompt + '","Response": "' + response.strip() + '","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'
       bufresponse=checkjson(bufresponse)

       producegpttokafka(bufresponse,default_args['teamlead_topic'])

       return response,bufresponse

   ################ Create Supervisor

   def createactionagents(llm,sname):
       print("in createactionagents")
       repo=tsslogging.getrepo()

       agents=[]
       filepath=f"/{repo}/tml-airflow/dags/tml-solutions/{sname}/agenttools.py"
       print("filepath===",filepath)
       module_name = "agenttools"

       spec = importlib.util.spec_from_file_location(module_name, filepath)
       dynamic_module = importlib.util.module_from_spec(spec)
       spec.loader.exec_module(dynamic_module)

       maintools=default_args['agenttoolfunctions'].strip()
       funcname=maintools.split("->>")

       for f in funcname:
          if len(f)>2:
            f=f.strip()
            fname=f.split("<<-")[0]
            print(fname)
            func_objects = []
            func_object = getattr(dynamic_module, fname)
            func_objects.append(func_object)

            aname=f.split("<<-")[1]
            aprompt=f.split("<<-")[2]

            agent = create_react_agent(
               model=llm,
               tools=func_objects,
               name=aname,
               prompt=aprompt

            )
            agents.append(agent)
       return agents


   def createasupervisor(agents,supervisorprompt,llm):
       print("in createasupervisor==",supervisorprompt)

       supervisorprompt = supervisorprompt.replace(";"," ")

       workflow = create_supervisor(
         agents,
         model=llm,
         prompt=supervisorprompt
       )
   # Compile and run
       app = workflow.compile()
       return app

   def invokesupervisor(app,maincontent):

       model = default_args['ollama-model']
       md = model.split(",")
       if len(md)>2:
         model=md[2]

       temperature = float(default_args['temperature'])
       embeddingmodel = default_args['embedding']
       funcname = default_args['agenttoolfunctions']
       funcname = funcname.replace(";","==")
       maincontent=maincontent.replace(";",",")

       try:
           supervisormaincontent ="""
             Here is the team lead's assessment: {}.  Based on the Team Lead's assessment what is the appropriate action.
           """.format(maincontent)

           result = app.invoke({
             "messages": [
                 {
                     "role": "user",
                     "content": supervisormaincontent
                 }
             ]
           })
       except Exception as e:
         print("WARN STEP 9b: Agentic AI: unable to create supervisor agent")
         tsslogging.locallogs("WARN", "STEP 9b: Agentic AI: unable to create supervisor agent")
         return "error","error"

       lastmessage=""
       for chunk in app.stream(
           input=result,
           stream_mode="values",):
           if chunk["messages"][-1].content != "":
             lastmessage=chunk["messages"][-1].content

       lastmessage=str(lastmessage)
       lastmessage=cleanstring(lastmessage.strip())
       lastmessage=lastmessage.replace(";",",").replace("'","").replace('"',"").replace(":","")
       bufresponse  = '{"Date": "' + str(datetime.now(timezone.utc)) + '","Agent_Name": "Supervisor_Agent", "Topic": "' + default_args['supervisor_topic'] + '","Prompt":"' + supervisormaincontent + '","Response": "' + lastmessage.strip() + '","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'


       mainjson=[]
       mainstr=""
       for m in result["messages"]:
         mainjson.append(pprint.pformat(m))
        # mainstr = mainstr + json.dumps(str(m.json)) + ","

       mainjson=json.dumps({"supervisor_workflow_invocation": mainjson})
       mainjson=mainjson[:-1] + ",\"funcname\":" + json.dumps(funcname)+",\"supervisorprompt\":\""+supervisormaincontent+"\"}"
       mainjson=cleanstring(mainjson)
       mainjson=checkjson(mainjson)

       try:
         #print(mainjson)
         producegpttokafka(mainjson,default_args['supervisor_topic'])

         return mainjson,bufresponse
       except Exception as e:
         print("ERROR: invalid json")
         return "error","error"

   def formatcompletejson(bufresponses,teamlead_response,lastmessage):

       bufresponses = " ".join(str(bufresponses).splitlines())
       teamlead_response = " ".join(str(teamlead_response).splitlines())
       lastmessage = " ".join(str(lastmessage).splitlines())

       bufresponses = " ".join(bufresponses.split(" "))
       teamlead_response = " ".join(teamlead_response.split(" "))
       lastmessage = " ".join(lastmessage.split(" "))

       bufresponses = bufresponses.replace("'","").replace("\n"," ").replace("\\n"," ").replace("\t", " ").replace("\r"," ").replace("#","").strip()
       teamlead_response = teamlead_response.replace("'","").replace("\n"," ").replace("\\n"," ").replace("\t", " ").replace("\r", " ").replace("#","").strip()
       lastmessage = lastmessage.replace("'","").replace("\n"," ").replace("\t", " ").replace("\\n"," ").replace("\r"," ").replace("#","").strip()

       print("bufresponses===",bufresponses)
       print("teambuf===",teambuf)
       print("supbuf===",supbuf)

       # check if valid
       try:
         jvalid=json.loads(bufresponses)
       except Exception as e:
         bufresponses = '[{"Status": "no data found", "Model": "na", "Embedding": "na", "Temperature": "na", "Prompt": "na", "Response": "no data found", "Date": "' + str(datetime.now(timezone.utc)) + '", "Agent_Name": "", "Topic": "na"}]'

       try:
         jvalid=json.loads(teamlead_response)
       except Exception as e:
         teamlead_response =  '{"Status": "no data found", "Model": "na", "Embedding": "na", "Temperature": "na", "Prompt": "na", "Response": "no data found", "Date": "' + str(datetime.now(timezone.utc)) + '", "Agent_Name": "Team Lead agent", "Topic": "na"}'

       try:
         jvalid=json.loads(lastmessage)
       except Exception as e:
         lastmessage = '{"Status": "no data found", "Model": "na", "Embedding": "na", "Temperature": "na", "Prompt": "na", "Response": "Error - likely a Tool could not be run. Check your tools.", "Date": "' + str(datetime.now(timezone.utc)) + '", "Agent_Name": "Supervisor agent", "Topic": "na"}'


       mainjson = bufresponses[:-1] + "," + teamlead_response + "," + lastmessage + "]"
       mainjson = " ".join(mainjson.split())
       mainjson = " ".join(mainjson.splitlines())

       mainjson=re.sub(r'[\n\r]+', '', mainjson)

       mainjson = mainjson.replace("'","").replace("\n"," ").replace("\\n"," ").replace("\t", " ").replace("\r"," ").replace("\\r"," ").strip()

       mainjson = mainjson.translate({ord('\n'): None, ord('\r'): None})
       print("mainjson======",mainjson)

       return mainjson

   def startagenticai(**context):
          sd = context['dag'].dag_id
          sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
          pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

          if 'step9brollbackoffset' in os.environ:
             if os.environ['step9brollbackoffset'] != '':
               default_args['rollbackoffset'] = os.environ['step9brollbackoffset']

          if 'step9bollama-model' in os.environ:
             if os.environ['step9bollama-model'] != '':
               default_args['ollama-model'] = os.environ['step9bollama-model']
          if 'step9bdeletevectordbcount' in os.environ:
             if os.environ['step9bdeletevectordbcount'] != '':
               default_args['deletevectordbcount'] = os.environ['step9bdeletevectordbcount']

          if 'step9bvectordbpath' in os.environ:
             if os.environ['step9bvectordbpath'] != '':
               default_args['vectordbpath'] = os.environ['step9bvectordbpath']

          if 'step9btemperature' in os.environ:
             if os.environ['step9btemperature'] != '':
               default_args['temperature'] = os.environ['step9btemperature']

          if 'step9bvectordbcollectionname' in os.environ:
             if os.environ['step9bvectordbcollectionname'] != '':
               default_args['vectordbcollectionname'] = os.environ['step9bvectordbcollectionname']
          if 'step9bollamacontainername' in os.environ:
             if os.environ['step9bollamacontainername'] != '':
               default_args['ollamacontainername'] = os.environ['step9bollamacontainername']
          if 'step9bCUDA_VISIBLE_DEVICES' in os.environ:
             if os.environ['step9bCUDA_VISIBLE_DEVICES'] != '':
               default_args['CUDA_VISIBLE_DEVICES'] = os.environ['step9bCUDA_VISIBLE_DEVICES']

          if 'step9bmainip' in os.environ:
             if os.environ['step9bmainip'] != '':
               default_args['mainip'] = os.environ['step9bmainip']
          if 'step9bmainport' in os.environ:
             if os.environ['step9bmainport'] != '':
               default_args['mainport'] = os.environ['step9bmainport']

          if 'step9bembedding' in os.environ:
             if os.environ['step9bembedding'] != '':
               default_args['embedding'] = os.environ['step9bembedding']
          if 'step9bagents_topic_prompt' in os.environ:
             if os.environ['step9bagents_topic_prompt'] != '':
               default_args['agents_topic_prompt'] = os.environ['step9bagents_topic_prompt']

          if 'step9bagenttopic' in os.environ:
             if os.environ['step9bagenttopic'] != '':
               default_args['agenttopic'] = os.environ['step9bagenttopic']

          if 'step9bteamlead_topic' in os.environ:
             if os.environ['step9bteamlead_topic'] != '':
               default_args['teamlead_topic'] = os.environ['step9bteamlead_topic']
          if 'step9bteamleadprompt' in os.environ:
             if os.environ['step9bteamleadprompt'] != '':
               default_args['teamleadprompt'] = os.environ['step9bteamleadprompt']
          if 'step9bsupervisor_topic' in os.environ:
             if os.environ['step9bsupervisor_topic'] != '':
               default_args['supervisor_topic'] = os.environ['step9bsupervisor_topic']
          if 'step9bagenttoolfunctions' in os.environ:
             if os.environ['step9bagenttoolfunctions'] != '':
               default_args['agenttoolfunctions'] = os.environ['step9bagenttoolfunctions']
          if 'step9bagent_team_supervisor_topic' in os.environ:
             if os.environ['step9bagent_team_supervisor_topic'] != '':
               default_args['agent_team_supervisor_topic'] = os.environ['step9bagent_team_supervisor_topic']
          if 'step9bcontextwindow' in os.environ:
             if os.environ['step9bcontextwindow'] != '':
               default_args['contextwindow'] = os.environ['step9bcontextwindow']

          if 'step9blocalmodelsfolder' in os.environ:
             if os.environ['step9blocalmodelsfolder'] != '':
               default_args['localmodelsfolder'] = os.environ['step9blocalmodelsfolder']

          VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
          VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESSAGENTICAI".format(sname))
          VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESSAGENTICAI".format(sname))
          HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))


          ti = context['task_instance']
          ti.xcom_push(key="{}_rollbackoffset".format(sname), value="_{}".format(default_args['rollbackoffset']))
          ti.xcom_push(key="{}_ollama-model".format(sname), value=default_args['ollama-model'])
          ti.xcom_push(key="{}_deletevectordbcount".format(sname), value="_{}".format(default_args['deletevectordbcount']))
          ti.xcom_push(key="{}_vectordbpath".format(sname), value="{}".format(default_args['vectordbpath']))
          ti.xcom_push(key="{}_temperature".format(sname), value="_{}".format(default_args['temperature']))
          ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
          ti.xcom_push(key="{}_enabletls".format(sname), value="_{}".format(default_args['enabletls']))
          ti.xcom_push(key="{}_partition".format(sname), value="_{}".format(default_args['partition']))
          ti.xcom_push(key="{}_vectordbcollectionname".format(sname), value=default_args['vectordbcollectionname'])
          ti.xcom_push(key="{}_ollamacontainername".format(sname), value=default_args['ollamacontainername'])
          ti.xcom_push(key="{}_mainip".format(sname), value=default_args['mainip'])
          ti.xcom_push(key="{}_mainport".format(sname), value="_{}".format(default_args['mainport']))
          ti.xcom_push(key="{}_embedding".format(sname), value=default_args['embedding'])
          ti.xcom_push(key="{}_agents_topic_prompt".format(sname), value=default_args['agents_topic_prompt'])
          ti.xcom_push(key="{}_teamlead_topic".format(sname), value=default_args['teamlead_topic'])
          ti.xcom_push(key="{}_teamleadprompt".format(sname), value=default_args['teamleadprompt'])
          ti.xcom_push(key="{}_supervisor_topic".format(sname), value=default_args['supervisor_topic'])
          ti.xcom_push(key="{}_supervisorprompt".format(sname), value=default_args['supervisorprompt'])

          at=default_args['agenttoolfunctions']
          at=at.replace(SMTP_PASSWORD,'')

          ti.xcom_push(key="{}_agenttoolfunctions".format(sname), value=at)

          ti.xcom_push(key="{}_agent_team_supervisor_topic".format(sname), value=default_args['agent_team_supervisor_topic'])
          ti.xcom_push(key="{}_concurrency".format(sname), value="_{}".format(default_args['concurrency']))
          ti.xcom_push(key="{}_cuda".format(sname), value="_{}".format(default_args['CUDA_VISIBLE_DEVICES']))
          ti.xcom_push(key="{}_agenttopic".format(sname), value="{}".format(default_args['agenttopic']))

          ti.xcom_push(key="{}_contextwindow".format(sname), value="_{}".format(default_args['contextwindow']))

          ti.xcom_push(key="{}_localmodelsfolder".format(sname), value="{}".format(default_args['localmodelsfolder']))

          repo=tsslogging.getrepo()
          if sname != '_mysolution_':
           fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
          else:
            fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

          wn = windowname('agenticai',sname,sd)
          subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
          subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess-agenticai", "ENTER"])
          subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" {} {} {} {} \"{}\" \"{}\" {} {} \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" {} \"{}\" \"{}\"".format(fullpath,
                          VIPERTOKEN, HTTPADDR, VIPERHOST, VIPERPORT[1:],
                          default_args['rollbackoffset'],default_args['ollama-model'],default_args['deletevectordbcount'],default_args['vectordbpath'],
                          default_args['temperature'],default_args['topicid'],default_args['enabletls'],
                          default_args['partition'], default_args['vectordbcollectionname'], default_args['ollamacontainername'],
                          default_args['mainip'],default_args['mainport'],default_args['embedding'],
                          default_args['agents_topic_prompt'],default_args['teamlead_topic'],default_args['teamleadprompt'],
                          default_args['supervisor_topic'],default_args['supervisorprompt'],default_args['agenttoolfunctions'],
                          default_args['agent_team_supervisor_topic'],default_args['concurrency'],default_args['CUDA_VISIBLE_DEVICES'],
                          pname,default_args['contextwindow'],default_args['localmodelsfolder'],default_args['agenttopic']),"ENTER"])

   if __name__ == '__main__':
       if len(sys.argv) > 1:
          if sys.argv[1] == "1":
           repo=tsslogging.getrepo()

           VIPERTOKEN = sys.argv[2]
           VIPERHOST = sys.argv[3]
           VIPERPORT = sys.argv[4]

           rollbackoffset =  sys.argv[5]
           ollamamodel =  sys.argv[6]
           deletevectordb =  sys.argv[7]
           vectordbpath=sys.argv[8]
           temperature=sys.argv[9]

           topicid=sys.argv[10]
           enabletls=sys.argv[11]

           partition=sys.argv[12]
           vectordbcollectionname=sys.argv[13]
           ollamacontainername=sys.argv[14]
           mainip=sys.argv[15]
           mainport=sys.argv[16]
           embedding=sys.argv[17]
           agents_topic_prompt=sys.argv[18]
           teamlead_topic=sys.argv[19]
           teamleadprompt=sys.argv[20]
           supervisor_topic=sys.argv[21]
           supervisorprompt=sys.argv[22]
           agenttoolfunctions=sys.argv[23]

           agent_team_supervisor_topic=sys.argv[24]
           concurrency=sys.argv[25]
           cuda =  sys.argv[26]
           pname = sys.argv[27]
           contextwindow = sys.argv[28]
           localmodelsfolder = sys.argv[29]

           agenttopic = sys.argv[30]

          default_args['rollbackoffset']=rollbackoffset
          default_args['ollama-model']=ollamamodel
          default_args['deletevectordbcount']=deletevectordb
          default_args['vectordbpath']=vectordbpath
          default_args['temperature']=temperature
          default_args['topicid']=topicid
          default_args['enabletls']=enabletls
          default_args['partition']=partition
          default_args['vectordbcollectionname']=vectordbcollectionname
          default_args['ollamacontainername']=ollamacontainername
          default_args['mainip']=mainip
          default_args['mainport']=mainport
          default_args['embedding']=embedding
          default_args['agents_topic_prompt']=agents_topic_prompt
          default_args['teamlead_topic']=teamlead_topic
          default_args['teamleadprompt']=teamleadprompt
          default_args['supervisor_topic']=supervisor_topic
          default_args['supervisorprompt']=supervisorprompt
          default_args['agenttoolfunctions']=agenttoolfunctions
          default_args['agent_team_supervisor_topic']=agent_team_supervisor_topic
          default_args['concurrency']=concurrency
          default_args['CUDA_VISIBLE_DEVICES']=cuda
          default_args['contextwindow']=contextwindow
          default_args['localmodelsfolder']=localmodelsfolder
          default_args['agenttopic']=agenttopic

       if "KUBE" not in os.environ:

             tsslogging.locallogs("INFO", "STEP 9b: Starting Ollama container")
             v,buf,mainmodel,mainembedding=startpgptcontainer()
             if v==1:
               tsslogging.locallogs("WARN", "STEP 9b: There seems to be an issue starting the Ollama container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
             else:
               tsslogging.locallogs("INFO", "STEP 9b: Success starting Ollama container.  Here is the run command: {}".format(buf))

             time.sleep(10)  # wait for containers to start
       elif  os.environ["KUBE"] == "0":

             tsslogging.locallogs("INFO", "STEP 9b: Starting ollama server")
             v,buf,mainmodel,mainembedding=startpgptcontainer()
             if v==1:
               tsslogging.locallogs("WARN", "STEP 9b: There seems to be an issue starting the Ollama container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
             else:
               tsslogging.locallogs("INFO", "STEP 9b: Success starting Agentic AI.  Here is the run command: {}".format(buf))

             time.sleep(10)  # wait for containers to start
       else:
             tsslogging.locallogs("INFO", "STEP 9b: [KUBERNETES] Starting Agentic AI - LOOKS LIKE THIS IS RUNNING IN KUBERNETES")
             tsslogging.locallogs("INFO", "STEP 9b: [KUBERNETES] Make sure you have applied the Agentic AI YAML files and have the agentic AI Pod running")

       count=0

           # create the Supervisor and kick off action

   #    llmstatus = get_loaded_models()
    #   print("llmstatus==",llmstatus,pname)

       mainmodels=default_args['ollama-model']

       models = mainmodels.split(",")  #models must be agent,teamlead,supervisor
       embedding=None

       modelsarr = []
       for m in models:
          llmstatus = get_loaded_models()
          checkforloadedmodels(m)
          print("llmstatus==",llmstatus,pname)
          llm,embedding=setollama(m.strip())
          modelsarr.append(llm)


       if len(modelsarr) >2:
         #try:
         actionagents=createactionagents(modelsarr[2],pname)
         supervisorprompt = default_args['supervisorprompt']
         try:
           app=createasupervisor(actionagents,supervisorprompt,modelsarr[2])
         except Exception as e:
           print("Error=",e)
           tsslogging.locallogs("WARN", "STEP 9b unable to create agents {}".format(e))
       else:
          tsslogging.locallogs("WARN","STEP 9b unable to load LLM - Aborting")
          print("WARN", "STEP 9b unable to load LLM - Aborting")
          exit(0)

       deletevectordbcnt=0
       while True:
            deletevectordbcnt +=1
            try:
               agent_topics = default_args['agents_topic_prompt']
               topicjsons=getjsonsfromtopics(agent_topics)
               responses,bufresponses=agentquerytopics(agent_topics,topicjsons,modelsarr[0])
            #try:
               tml_text_engine,deletevectordbcnt=loadtextdataintovectordb(responses,deletevectordbcnt,modelsarr[1])
               teamlead_response,teambuf=teamleadqueryengine(tml_text_engine)
               mainjson,supbuf=invokesupervisor(app,teamlead_response)
               complete=formatcompletejson(bufresponses,teambuf,supbuf)

               if default_args['agent_team_supervisor_topic']!='':
                 producegpttokafka(complete,default_args['agent_team_supervisor_topic'])

               time.sleep(1)
            except Exception as e:
             print("Error=",e)
             if count == 0:
               tsslogging.locallogs("ERROR", "STEP 9b: Agentic AI Step 9b DAG in {} {}  Aborting after 10 consecutive errors.".format(os.path.basename(__file__),e))
               tsslogging.tsslogit("PrivateGPT Step 9b DAG in {} {} Aborting after 10 consecutive errors.".format(os.path.basename(__file__),e), "ERROR" )
               tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
             time.sleep(5)
             count = count + 1
             if count > 600:
               break

8.14. STEP 9b DAG Core Parameter Explanation

Step 9b DAG parameter	Explanation
ollamacontainername	Use this Ollama container: `maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools<https://hub.docker.com/r/maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools>`_ This Containers will run your LLM locally.
rollbackoffset	This determines how much data to process.
agents_topic_prompt	This is the field where you tell the agent which topic to monitor and the prompt. FORMAT: topic agent will monitor<<-prompt you want for the agent to answer->> For example: “testtopic<<-Do you seee any issues in the real-time json data?->>” Separate multiple topics by a ->> You can also add <<data>> in the prompt you want the agent to answer. For example: “testtopic<<-Do you seee any issues in the real-time json data. Here is data <<data>>?->>”
teamlead_topic	This topic will contain all of the team lead responses.
teamleadprompt	Enter the prompt for the Team Lead agent.
supervisor_topic	All supervisor responses are stored in this topic.
supervisorprompt	Enter the prompt for the supervisor.
agenttoolfunctions	This is the key field that will link the tools (python functions) to the supervisor agent that will execure the tools. The tools are feined in ref:STEP 9b: Agents’ Tools FORMAT: <tool_function<<-agent_name<<-system_prompt->>tool_function2<<-agent_name2<<-sysemt_prompt2->>….> For example if connecting to the send_email tools then agenttoolfunctions is: “””send_email<<-send_email<<- You are an email-sending agent. Use smtp parameters to send emails when there is an anomaly in the data, make sure to indicate the device name in the mainuid field. do not write a smtp script, actually send the email using the SMTP parameters smtp_server=’{}’ smtp_port={} username=’{}’ password=’{}’ sender=’{}’ recipient=’{}’ subject=’’ body=’’->> “””.format(SMTP_SERVER,SMTP_PORT,SMTP_USERNAME,SMTP_PASSWORD,SMTP_USERNAME,recipient) Note: the delimiters <<- (separates tool function, agent name and agent promt and ->> ends the tool funtion, and starts another one. The variables SMTP_SERVER,SMTP_PORT,SMTP_USERNAME, SMTP_PASSWORD,SMTP_USERNAME,recipient should be defined as environmental variables when starting the TSS container.
agent_team_supervisor_topic	This topic will contain responses from the individual agents, team lead, and supervisor. See ref:Sample Output from TML Multi-Agentic AI Solution
mainip	This is the IP to the Ollama container.
mainport	This is the port Ollama server is listening on i.e. 11434
embedding	This is the embedding used in the Vector DB. TML Multi-Agentic AI solution uses from llama_index.core.indices.vector_store.base import VectorStoreIndex TML recommends the embedding: nomic-embed-text
temperature	This is the temperature for the Ollama model. A temperature of 0 means LLM will be conservative, 1 means it may hallucinate.
ollama-model	The Ollama LLM models to use. Any Ollama model with tools training can be used. Note: In this field you need to specify a model for: topic agent, team lead agent and supervisor agent For example: ‘ollama-model’: ‘phi3:3.8b,phi3:3.8b,llama3.2:3b’ this tells TML to use phi3:3.8b for both the topic agents and team lead and llama3.2:3b for the supervisor agent.
deletevectordbcount	This count determines how much data to save in the vector DB. A higher number will cause more data in the vector DB which would give the LLM more memory to base its responses.
vectordbpath	This is the path to the vector store on disk.
contextwindow	Enter the context window for the LLM. This will vary for each LLM. Higher windows will require more VRAM.
localmodelsfolder	Enter the local path where LLM models will be saved. It is important to cache the LLM from Ollama to improve LLM loading times.

8.15. Example of 9b Configuration Parameters

Below is an example of the configurations of Dag 9b above. In this example, we connect the send_email function in the Agenttools.py file to the supervisor agent. Note, that the SMTP parameters are environmental variables that are set when the solution container or TSS container is started.

default_args = {
 'owner': 'Sebastian Maurice',   # <<< *** Change as needed
 'ollamacontainername' : 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools', #'maadsdocker/tml-privategpt-no-gpu-amd64',  # enter a valid container https://hub.docker.com/r/maadsdocker/tml-privategpt-no-gpu-amd64
 'rollbackoffset' : '15',  # <<< *** Change as needed
 'offset' : '-1', # leave as is
 'enabletls' : '1', # change as needed
 'brokerhost' : '', # <<< *** Leave as is
 'brokerport' : '-999', # <<< *** Leave as is
 'microserviceid' : '',  # change as needed
 'topicid' : '-999', # leave as is
 'delay' : '100', # change as needed
 'companyname' : 'otics',  # <<< *** Change as needed
 'consumerid' : 'streamtopic',  # <<< *** Leave as is
 'agenttopic' : 'agent-responses', # this topic containes the individual agent responses
 'agents_topic_prompt' : """
        iot-preprocess<<-You are a precise data analysis assistant. Your task is to point out any anomalies or interesting insights that could help improve the performance and functioning of
        IoT device.  The json data are from IOT devices.  the hp field shows the data that are processed for the process variable (pv), using the process types (pt) like:
        avg or average, or trend analysis, or anomprob (i.e. anomaly probability) etc.  The device being processed is in the uid field of the json.
         here is the json data:

          <<data>>

         INSTRUCTIONS:
         1. Examine each number in the json array
         2. Provide a brief analysis of the results

         FORMAT YOUR RESPONSE:
         - Filtered results: [list the qualifying numbers with their "uid" fields]
         - Count of qualifying numbers: [number]
         - Analysis: [brief explanation of what the filter revealed]

         Be precise and concise in your response.->>
        iot-ml-prediction-results-output<<-You are a precise data analysis assistant. Your task is to filter and analyze numeric data based on specified criteria.

        TASK: Filter numbers from the given json array using the threshold: greater than 90

        Input JSON arrary:

             <<data>>

         INSTRUCTIONS:
         1. Examine each number in the json array
         2. Apply the filter condition: number > 90
         3. Return only numbers that meet the criteria with their "uid" fields
         4. If no numbers meet the criteria, explicitly state this
         5. Provide a brief analysis of the results

         FORMAT YOUR RESPONSE:
         - Filtered results: [list the qualifying numbers with their "uid" fields]
         - Count of qualifying numbers: [number]
         - Analysis: [brief explanation of what the filter revealed]

         Be precise and concise in your response.
""", # <topic agent will monitor:prompt you want for the agent>
 'teamlead_topic' : 'team-lead-responses', # Enter the team lead topic - all team lead responses will be written to this topic
 'teamleadprompt' : """
         Analyze the dataset containing IoT device monitoring records managed by individual agents.
         Review all data fields to determine whether there are any issues or major concerns requiring urgent attention.

         Focus on the following criteria:
         1. Each record contains a unique device identifier stored in the field "uid".
         2. Examine the failure probability for each device stored in the hp field.
         3. Categorize the probabilities as follows:
          - Low: 0% to 50%
          - Medium: 51% to 75%
          - High: 76% to 89%
          - Urgent: 90% to 100%

        Tasks:
        - Identify and highlight devices (by their "uid") that have **urgent failure probabilities** (≥ 90%).
        - For each flagged device, provide details and reasoning on why it may require immediate investigation.
        - Only include devices that meet the urgent threshold. Do not report on low, medium, or high categories unless relevant for context.
        - State clearly whether the identified issue is *urgent*.
        - Do not use or generate any code; perform a reasoning-based analysis directly from the provided data.

""", # Enter the team lead prompt
'supervisor_topic' : 'supervisor-responses', # Enter the supervisor topic - all supervisor responses will be written to this topic
'supervisorprompt' : """
        You are a team supervisor analyzing operational device data and recommending whether an alert email should be send.
        You manage a send email expert and a average expert.
        For send email, use send_email agent.
        For average, use average agent.

       INSTRUCTIONS:
       1.Analyze the Team Lead assessment and determine the proper action:
       - If devices are marked urgent or failure probabilities exceed 90%, select "send_email".
       - If no urgent devices are found or probabilities remain below thresholds, then no action is needed.
""", # Enter the supervisor prompt
 'agenttoolfunctions' : """
        send_email<<-send_email<<- You are an email-sending agent. Use smtp parameters to send emails when there is an anomaly in the data, make sure to
                     indicate the device name in the mainuid field. do not write a smtp script, actually send the email using the SMTP parameters
                     smtp_server='{}'
                     smtp_port={}
                     username='{}'
                     password='{}'
                     sender='{}'
                     recipient='{}'
                     subject=''
                     body=''->>
        average<<-average<<-You are an average agent.  Take average of the device failure probabilities.
""".format(SMTP_SERVER,SMTP_PORT,SMTP_USERNAME,SMTP_PASSWORD,SMTP_USERNAME,recipient),  # enter the tools : tool_function is the name of the funtions in the agenttools python file
 'agent_team_supervisor_topic': 'all-agents-responses', # this topic will hold the responses from agents, team lead and supervisor
'producerid' : 'agentic-ai',   # <<< *** Leave as is
 'identifier' : 'This is analysing TML output with Agentic AI',
 'mainip': 'http://127.0.0.1', # Ollama server container listening on this host
 'mainport' : '11434', # Ollama listening on this port
 'embedding': 'nomic-embed-text', # Embedding model
 'preprocesstype' : '', # Leave as is
 'partition' : '-1', # Leave as is
 'vectordbcollectionname' : 'tml-llm-model-v2', # change as needed
 'concurrency' : '2', # change as needed Leave at 1
 'CUDA_VISIBLE_DEVICES' : '0', # change as needed
 'temperature' : '0.1', # This value ranges between 0 and 1, it controls how conservative LLM model will be, if 0 very very, if 1 it will hallucinate
 #--------------------
 'ollama-model': 'phi3:3.8b,phi3:3.8b,llama3.2:3b', # maximum  3 models can be specified: agent,teamlead,supervisor
 'deletevectordbcount': '5',
 'vectordbpath': '/rawdata/vectordb',
 'contextwindow': '4096',
 'localmodelsfolder': '/mnt/c/maads/tml-airflow/rawdata/ollama'
}

8.16. STEP 9b: Agents’ Tools

Below code allows users to incorporate any tools they want to their TML multi-agentic solutions.

Note

If your tool special Python libraries you can easily install these libraries using the def install_package(package_name, importname):

This gives tremendous flexibility in integrating tools that the AI cn execute in real-time..ie send_mail tool is added as an example.

You integrate the tools to your solution by configuring the agenttoolfunctions in Step 9b DAG.

# Agent Tool
from langchain_core.tools import tool
from email.mime.text import MIMEText
from email.message import EmailMessage
import smtplib
#from langchain_tavily import TavilySearch
import subprocess
import sys

"""
You must define all your tools here for your agents to execute
You can define as many agents tools you want

YOU MUST ALSO update funcname

funcname = ["web_search:search_agent:You are a search expert","add:math_expert:You are a math expert","maxagent:max_agent:You find the company with maximum employees"]

The format is funcname = ["<function name>,<function_name>:<agent name>:<prompt>","<function name>:<agent name>:<prompt>",...]

NOTE: You can assign multiple functions to agents - separate multiple functions by a comma
"""

# if your tool requires a package you can install it using the install_package function
# the function will check if package is already installed
def install_package(package_name, importname):
    """
    Installs a specified Python package using pip.
    """
    try:
        __import__(importname)
    except ImportError:
        print(f"Package '{package_name}' not found. Attempting to install...")
        try:
            subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
            print(f"Package '{package_name}' installed successfully.")
        except subprocess.CalledProcessError as e:
            print(f"Error installing package '{package_name}': {e}")

#install_package("langchain-tavily","from langchain_tavily import TavilySearch")

# SendEmail by Agent
@tool
def send_email(smtp_server: str, smtp_port: int, username: str, password: str,
                    sender: str, recipient: str, subject: str, body: str) -> bool:
    """
    Sends an email reply via SMTP using the generated response.
    """

    recemails = recipient.split(",")

    try:
        # Use the updated format_email which preserves body line breaks
        msg = EmailMessage()
        msg["Subject"] = subject
        msg["From"] = username
        msg["To"] = recipient
        msg.set_content(body)

        with smtplib.SMTP(smtp_server, int(smtp_port)) as server:
            server.starttls()
            server.login(username, password)
#            server.send_message(msg)
            server.sendmail(username, recemails, msg.as_string())

        return True
    except Exception as e:
        print("Failed to send email:", e)
        return False

#send_email({"smtp_server":"smtp.gmail.com","smtp_port":587,"username":SMTP_USERNAME,"password":SMTP_PASSWORD,"sender":SMTP_USERNAME,"recipient":recipientlist,"subject":"test","body":"test 2"})

# Example: Add two numbers
@tool
def add(a: float, b: float) -> float:
    '''Add two numbers.'''
    return a + b


@tool
def web_search(query: str) -> str:
    '''Search the web for information.'''
    return "Searched the web"

@tool
def max_agent(query: list) -> int:
    '''Find the company with the most employees.'''
    print(query)
    return max(query)

@tool
def average(query: list) -> int:
    '''Find the average.'''
    average=0.0
    if len(query) !=0:
      average = sum(query) / len(query)
      average = round(average, 2)
    return average

8.17. STEP 10: Create TML Solution Documentation: tml-system-step-10-documentation-dag

Note

TSS will automatically generate documentation for your solution at READTHEDOCS. Each TML solution you create will have its own documentation that will detail the solution parameters in the DAGs. This is another unique and powerful feature of the TSS. This enables you to share your documentation with others - almost instantly!

Tip

The TSS will develop the base documentation for your solution.

Note. Your documentation URL will be: https://<Your Solution Name>.readthedocs.io

Your Solution Name is the name you chose here: Lets Start Building a TML Solution plus first 4 characters of your ReadTheDocs token. This project is committed under the tml-solutions folder in Github.

Watch the YouTube to see how to configure this Dag: YouTube Video

   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.operators.bash import BashOperator
   from datetime import datetime
   from airflow.decorators import dag, task
   import os
   import sys
   import requests
   import json
   import subprocess
   import tsslogging
   import shutil
   from git import Repo
   import time
   sys.dont_write_bytecode = True

   ######################################################USER CHOSEN PARAMETERS ###########################################################
   default_args = {
    'conf_project' : 'Transactional Machine Learning (TML)',
    'conf_copyright' : '2024, Otics Advanced Analytics, Incorporated - For Support email support@otics.ca',
    'conf_author' : 'Sebastian Maurice',
    'conf_release' : '0.1',
    'conf_version' : '0.1.0',
    'dockerenv': '', # add any environmental variables for docker must be: variable1=value1, variable2=value2
    'dockerinstructions': '', # add instructions on how to run the docker container
   }

   ############################################################### DO NOT MODIFY BELOW ####################################################

   def triggerbuild(sname):

           URL = "https://readthedocs.org/api/v3/projects/{}/versions/latest/builds/".format(sname)
           TOKEN = os.environ['READTHEDOCS']
           HEADERS = {'Authorization': f'token {TOKEN}'}
           response = requests.post(URL, headers=HEADERS)
           print(response.json())

   def updatebranch(sname,branch):

           URL = "https://readthedocs.org/api/v3/projects/{}/".format(sname)
           TOKEN = os.environ['READTHEDOCS']
           HEADERS = {'Authorization': f'token {TOKEN}'}
           data={
               "name": "{}".format(sname),
               "repository": {
                   "url": "https://github.com/{}/{}".format(os.environ['GITUSERNAME'],sname),
                   "type": "git"
               },
               "default_branch": "{}".format(branch),
               "homepage": "http://template.readthedocs.io/",
               "programming_language": "py",
               "language": "en",
               "privacy_level": "public",
               "external_builds_privacy_level": "public",
               "tags": [
                   "automation",
                   "sphinx"
               ]
           }
           response = requests.patch(
               URL,
               json=data,
               headers=HEADERS,
           )

   def setupurls(projectname,producetype,sname):

       ptype=""
       if producetype=="LOCALFILE":
         ptype=producetype
       elif producetype=="REST":
         ptype="RESTAPI"
       elif producetype=="MQTT":
         ptype=producetype
       elif producetype=="gRPC":
         ptype=producetype


       stepurl1="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_1_getparams_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl2="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_2_kafka_createtopic_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl3="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_read_{}_step_3_kafka_producetotopic_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,ptype,projectname)
       stepurl4="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_4_kafka_preprocess_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl4a="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_4a_kafka_preprocess_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl4b="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_4b_kafka_preprocess_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl4c="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_4c_kafka_preprocess_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl5="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_5_kafka_machine_learning_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl6="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_6_kafka_predictions_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl7="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_7_kafka_visualization_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl8="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_8_deploy_solution_to_docker_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl9="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_9_privategpt_qdrant_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl9b="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_9b_agenticai_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl10="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_10_documentation_dag_tml-multi-agenticai-iot-3f10-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)

       print("stepurl1=",stepurl1)

       doparse("/{}/docs/source/details.rst".format(sname), ["--step1url--;{}".format(stepurl1)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step2url--;{}".format(stepurl2)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step3url--;{}".format(stepurl3)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step4url--;{}".format(stepurl4)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step4aurl--;{}".format(stepurl4a)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step4burl--;{}".format(stepurl4b)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step4curl--;{}".format(stepurl4c)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step5url--;{}".format(stepurl5)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step6url--;{}".format(stepurl6)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step7url--;{}".format(stepurl7)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step8url--;{}".format(stepurl8)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step9url--;{}".format(stepurl9)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step9burl--;{}".format(stepurl9b)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step10url--;{}".format(stepurl10)])

   def doparse(fname,farr):
         data = ''
         try:
          with open(fname, 'r', encoding='utf-8') as file:
           data = file.readlines()
           r=0
           for d in data:
               for f in farr:
                   fs = f.split(";")
                   if fs[0] in d:
                       data[r] = d.replace(fs[0],fs[1])
               r += 1
          with open(fname, 'w', encoding='utf-8') as file:
           file.writelines(data)
         except Exception as e:
            pass

   def updateollamaandpgpt(op,ollamacontainername,concurrency,collection,temp,rollback,ollama,deletevector,vectordbpath,topicid,enabletls,partition,mainip,
                          mainport,embedding,agents_topic_prompt,teamlead_topic,teamleadprompt,supervisor_topic,supervisorprompt,agenttoolfunctions,agent_team_supervisor_topic,contextwindow,
                          pvectorsearchtype,ptemperature,pcollection,pconcurrency,pvectordimension,pcontextwindowsize,mainmodel,mainembedding,pgptcontainername):
         print("update==",op)
         if ollamacontainername != None:
          doparse("/{}/ollama.yml".format(op), ["--ollamacontainername--;{}".format(ollamacontainername)])
          doparse("/{}/ollama.yml".format(op), ["--agenticai-kubeconcur--;{}".format(concurrency[1:])])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-kubecollection--;{}".format(collection)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-kubetemperature--;{}".format(temp)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-rollbackoffset--;{}".format(rollback)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-ollama-model--;{}".format(ollama)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-deletevectordbcount--;{}".format(deletevector)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-vectordbpath--;{}".format(vectordbpath)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-topicid--;{}".format(topicid)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-enabletls--;{}".format(enabletls)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-partition--;{}".format(partition)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-vectordbcollectionname--;{}".format(collection)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-ollamacontainername--;{}".format(ollamacontainername)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-mainip--;{}".format(mainip)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-mainport--;{}".format(mainport)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-embedding--;{}".format(embedding)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-agents_topic_prompt--;{}".format(agents_topic_prompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-teamlead_topic--;{}".format(teamlead_topic)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-teamleadprompt--;{}".format(teamleadprompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-supervisor_topic--;{}".format(supervisor_topic)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-supervisorprompt--;{}".format(supervisorprompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-agenttoolfunctions--;{}".format(agenttoolfunctions.strip().replace('\n','').replace("\\n","").replace("'","").replace(";","=="))])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-agent_team_supervisor_topic--;{}".format(agent_team_supervisor_topic)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-contextwindow--;{}".format(contextwindow)])

         if pgptcontainername != None:
          doparse("/{}/privategpt.yml".format(op), ["--kubevectorsearchtype--;{}".format(pvectorsearchtype)])
          doparse("/{}/privategpt.yml".format(op), ["--kubetemperature--;{}".format(ptemperature[1:])])
          doparse("/{}/privategpt.yml".format(op), ["--kubecollection--;{}".format(pcollection)])
          doparse("/{}/privategpt.yml".format(op), ["--kubeconcur--;{}".format(pconcurrency[1:])])
          doparse("/{}/privategpt.yml".format(op), ["--kubevectordimension--;{}".format(pvectordimension[1:])])
          doparse("/{}/privategpt.yml".format(op), ["--kubecontextwindowsize--;{}".format(pcontextwindowsize[1:])])
          doparse("/{}/privategpt.yml".format(op), ["--kubemainmodel--;{}".format(mainmodel)])
          doparse("/{}/privategpt.yml".format(op), ["--kubemainembedding--;{}".format(mainembedding)])
          doparse("/{}/privategpt.yml".format(op), ["--kubeprivategpt--;{}".format(pgptcontainername)])

   def copyymls(projectname,sname,ingressyml,solutionyml):
       orepo=tsslogging.getrepo()
       op=f"/{orepo}/tml-airflow/dags/tml-solutions/{projectname}/ymls"
       os.makedirs(op, exist_ok=True)
       op=f"/{orepo}/tml-airflow/dags/tml-solutions/{projectname}/ymls/{sname}"
       os.makedirs(op, exist_ok=True)

       tsslogging.writeoutymls(op,ingressyml,solutionyml,sname)
       return op

   def generatedoc(**context):
       istss1=1
       if 'TSS' in os.environ:
         if os.environ['TSS'] == "1":
           istss1=1
         else:
           istss1=0

       if 'tssdoc' in os.environ:
           if os.environ['tssdoc']=="1":
               return

       sd = context['dag'].dag_id
       sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
   #    rtdsname = tsslogging.rtdprojects(sname,sd)

       kube=0
       step9prompt=''
       step9context=''
       step9keyattribute=''
       step9keyprocesstype=''
       step9hyperbatch=''
       step9vectordbcollectionname=''
       step9concurrency=''
       cudavisibledevices=''
       step9docfolder=''
       step9docfolderingestinterval=''
       step9useidentifierinprompt=''
       step5processlogic=''
       step5independentvariables=''
       step9searchterms=''
       step9streamall=''
       step9temperature=''
       step9vectorsearchtype=''
       step9pcontextwindowsize=''
       step9pgptcontainername=''
       step9pgpthost=''
       step9pgptport=''
       step9vectordimension=''
       step4crawdatatopic=''
       step4csearchterms=''
       step4crememberpastwindows=''
       step4cpatternwindowthreshold=''
       step4crtmsstream=''
       step4crtmsscorethreshold=''
       step4cattackscorethreshold=''
       step4cpatternscorethreshold=''
       step4clocalsearchtermfolder=''
       step4clocalsearchtermfolderinterval=''
       step4crtmsfoldername=''
       step3localfileinputfile=''
       step3localfiledocfolder=''
       step4crtmsmaxwindows=''
       rtmsoutputurl=""
       mloutputurl=""

       step2raw_data_topic=""
       step2preprocess_data_topic=""
       step4raw_data_topic=""
       step4preprocess_data_topic=''
       step4preprocesstypes=""
       step4jsoncriteria=""
       step4ajsoncriteria=""
       step4amaxrows=""
       step4apreprocesstypes=""
       step4araw_data_topic=""
       step4apreprocess_data_topic=""
       step4bpreprocesstypes=""
       step4bjsoncriteria=""
       step4bmaxrows=""
       step4braw_data_topic=""
       step4bpreprocess_data_topic=""

       step9brollback=""
       step9bdeletevectordbcount=""
       step9bvectordbpath=""
       step9btemperature=""
       step9bvectordbcollectionname=""
       step9bollamacontainername=""
       step9bCUDA_VISIBLE_DEVICES=""
       step9bmainip=""
       step9bmainport=""
       step9bembedding=""
       step9bagents_topic_prompt=""
       step9bteamlead_topic=""
       step9bteamleadprompt=""
       step9bsupervisor_topic=""
       step9bagenttoolfunctions=""
       step9bagent_team_supervisor_topic=""
       step9bconcurrency=""
       step9bollama=""
       step9btopicid=""
       step9benabletls=""
       step9bpartition=""
       step9bsupervisorprompt=""
       step9bcontextwindow=""
       step9blocalmodelsfolder=""
       step9bagenttopic=""

       if "KUBE" in os.environ:
             if os.environ["KUBE"] == "1":
                kube=1
                return

       tsslogging.locallogs("INFO", "STEP 10: Started to build the documentation")
       producinghost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODCE".format(sname))
       producingport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
       preprocesshost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS".format(sname))
       preprocessport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS".format(sname))
       preprocesshost2 = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS2".format(sname))
       preprocessport2 = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS2".format(sname))

       mlhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTML".format(sname))
       mlport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTML".format(sname))
       predictionhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREDICT".format(sname))
       predictionport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREDICT".format(sname))
       dashboardhtml = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_dashboardhtml".format(sname))
       vipervizport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERVIZPORT".format(sname))
       solutionvipervizport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONVIPERVIZPORT".format(sname))
       airflowport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_AIRFLOWPORT".format(sname))
       mqttusername = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_MQTTUSERNAME".format(sname))
       kafkacloudusername = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_KAFKACLOUDUSERNAME".format(sname))
       projectname = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))
       externalport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_EXTERNALPORT".format(sname))
       solutionexternalport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONEXTERNALPORT".format(sname))

       solutionairflowport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONAIRFLOWPORT".format(sname))

       hpdehost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOST".format(sname))
       hpdeport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORT".format(sname))

       hpdepredicthost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOSTPREDICT".format(sname))
       hpdepredictport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORTPREDICT".format(sname))

       subprocess.call(["sed", "-i", "-e",  "s/--project--/{}/g".format(default_args['conf_project']), "/{}/docs/source/conf.py".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--copyright--/{}/g".format(default_args['conf_copyright']), "/{}/docs/source/conf.py".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--author--/{}/g".format(default_args['conf_author']), "/{}/docs/source/conf.py".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--release--/{}/g".format(default_args['conf_release']), "/{}/docs/source/conf.py".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--version--/{}/g".format(default_args['conf_version']), "/{}/docs/source/conf.py".format(sname)])

       stitle = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutiontitle".format(sname))
       sdesc = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutiondescription".format(sname))
       brokerhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_brokerhost".format(sname))
       brokerport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_brokerport".format(sname))
       cloudusername = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_cloudusername".format(sname))
       cloudpassword = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_cloudpassword".format(sname))

       subprocess.call(["sed", "-i", "-e",  "s/--solutionname--/{}/g".format(sname), "/{}/docs/source/index.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--solutiontitle--/{}/g".format(stitle), "/{}/docs/source/index.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--solutiondescription--/{}/g".format(sdesc), "/{}/docs/source/index.rst".format(sname)])

       projecturl="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname)

       doparse("/{}/docs/source/index.rst".format(sname), ["--projectname--;{}".format(projectname)])

       subprocess.call(["sed", "-i", "-e",  "s/--solutionname--/{}/g".format(sname), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--sname--/{}/g".format(sname), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--stitle--/{}/g".format(stitle), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--sdesc--/{}/g".format(sdesc), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--brokerhost--/{}/g".format(brokerhost), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--brokerport--/{}/g".format(brokerport[1:]), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--cloudusername--/{}/g".format(cloudusername), "/{}/docs/source/details.rst".format(sname)])

       subprocess.call(["sed", "-i", "-e",  "s/--solutiontitle--/{}/g".format(stitle), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--solutiondescription--/{}/g".format(sdesc), "/{}/docs/source/details.rst".format(sname)])


       companyname = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_companyname".format(sname))
       myname = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_myname".format(sname))
       myemail = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_myemail".format(sname))
       mylocation = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_mylocation".format(sname))
       replication = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_replication".format(sname))
       numpartitions = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_numpartitions".format(sname))
       enabletls = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_enabletls".format(sname))
       microserviceid = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_microserviceid".format(sname))
       raw_data_topic = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_raw_data_topic".format(sname))
       step2raw_data_topic=raw_data_topic
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_preprocess_data_topic".format(sname))
       step2preprocess_data_topic=preprocess_data_topic
       ml_data_topic = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_ml_data_topic".format(sname))
       prediction_data_topic = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_prediction_data_topic".format(sname))

       subprocess.call(["sed", "-i", "-e",  "s/--companyname--/{}/g".format(companyname), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--myname--/{}/g".format(myname), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--myemail--/{}/g".format(myemail), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--mylocation--/{}/g".format(mylocation), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--replication--/{}/g".format(replication[1:]), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--numpartitions--/{}/g".format(numpartitions[1:]), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--enabletls--/{}/g".format(enabletls[1:]), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--microserviceid--/{}/g".format(microserviceid), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--ml_data_topic--/{}/g".format(ml_data_topic), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--prediction_data_topic--/{}/g".format(prediction_data_topic), "/{}/docs/source/details.rst".format(sname)])

       PRODUCETYPE = ""
       TOPIC = ""
       PORT = ""
       IDENTIFIER = ""
       HTTPADDR = ""
       FROMHOST = ""
       TOHOST = ""
       CLIENTPORT = ""
       snamertd = sname.replace("_", "-")
       PRODUCETYPE = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_PRODUCETYPE".format(sname))
       TOPIC = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_TOPIC".format(sname))
       PORT = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_PORT".format(sname))
       IDENTIFIER = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_IDENTIFIER".format(sname))
       HTTPADDR = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_HTTPADDR".format(sname))
       FROMHOST = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_FROMHOST".format(sname))
       TOHOST = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_TOHOST".format(sname))

       CLIENTPORT = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_CLIENTPORT".format(sname))
       TSSCLIENTPORT = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_TSSCLIENTPORT".format(sname))
       TMLCLIENTPORT = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_TMLCLIENTPORT".format(sname))

       setupurls(projectname,PRODUCETYPE,sname)

       if PRODUCETYPE=='LOCALFILE':
         inputfile = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_inputfile".format(sname))
         step3localfileinputfile=inputfile
         docfolderprocess = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_docfolder".format(sname))
         step3localfiledocfolder=docfolderprocess
         doctopic = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_doctopic".format(sname))
         chunks = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_chunks".format(sname))
         docingestinterval = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_docingestinterval".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--docfolderprocess--;{}".format(docfolderprocess)])
         doparse("/{}/docs/source/details.rst".format(sname), ["--doctopic--;{}".format(doctopic)])
         doparse("/{}/docs/source/details.rst".format(sname), ["--chunks--;{}".format(chunks[1:])])
         doparse("/{}/docs/source/details.rst".format(sname), ["--docingestinterval--;{}".format(docingestinterval[1:])])
         doparse("/{}/docs/source/details.rst".format(sname), ["--inputfile--;{}".format(inputfile)])

       subprocess.call(["sed", "-i", "-e",  "s/--PRODUCETYPE--/{}/g".format(PRODUCETYPE), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--TOPIC--/{}/g".format(TOPIC), "/{}/docs/source/details.rst".format(sname)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--PORT--;{}".format(PORT[1:])])
       doparse("/{}/docs/source/details.rst".format(sname), ["--HTTPADDR--;{}".format(HTTPADDR)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--FROMHOST--;{}".format(FROMHOST)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--TOHOST--;{}".format(TOHOST)])

       doparse("/{}/docs/source/details.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])
       doparse("/{}/docs/source/index.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])
       doparse("/{}/docs/source/operating.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])
       doparse("/{}/docs/source/logs.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])

       if len(CLIENTPORT) > 1:
         doparse("/{}/docs/source/details.rst".format(sname), ["--CLIENTPORT--;{}".format(CLIENTPORT[1:])])
         doparse("/{}/docs/source/details.rst".format(sname), ["--TSSCLIENTPORT--;{}".format(TSSCLIENTPORT[1:])])
         doparse("/{}/docs/source/details.rst".format(sname), ["--TMLCLIENTPORT--;{}".format(TMLCLIENTPORT[1:])])
       else:
         doparse("/{}/docs/source/details.rst".format(sname), ["--CLIENTPORT--;Not Applicable"])
         doparse("/{}/docs/source/details.rst".format(sname), ["--TSSCLIENTPORT--;Not Applicable"])
         doparse("/{}/docs/source/details.rst".format(sname), ["--TMLCLIENTPORT--;Not Applicable"])

       doparse("/{}/docs/source/details.rst".format(sname), ["--IDENTIFIER--;{}".format(IDENTIFIER)])

       subprocess.call(["sed", "-i", "-e",  "s/--ingestdatamethod--/{}/g".format(PRODUCETYPE), "/{}/docs/source/details.rst".format(sname)])

       raw_data_topic = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_raw_data_topic".format(sname))
       if raw_data_topic:
         step4raw_data_topic=raw_data_topic
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_preprocess_data_topic".format(sname))
       if preprocess_data_topic:
         step4preprocess_data_topic=preprocess_data_topic
       preprocessconditions = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_preprocessconditions".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_delay".format(sname))
       array = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_array".format(sname))
       saveasarray = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_saveasarray".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_topicid".format(sname))
       rawdataoutput = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_rawdataoutput".format(sname))
       asynctimeout = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_asynctimeout".format(sname))
       timedelay = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_timedelay".format(sname))
       usemysql = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_usemysql".format(sname))
       preprocesstypes = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_preprocesstypes".format(sname))
       if preprocesstypes:
         step4preprocesstypes=preprocesstypes
       pathtotmlattrs = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_pathtotmlattrs".format(sname))
       identifier = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_identifier".format(sname))
       jsoncriteria = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_jsoncriteria".format(sname))
       if jsoncriteria:
         step4jsoncriteria=jsoncriteria
       maxrows4 = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_maxrows".format(sname))
       if maxrows4:
         step4maxrows=maxrows4

       if preprocess_data_topic:
           subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocessconditions--/{}/g".format(preprocessconditions), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--array--/{}/g".format(array[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--saveasarray--/{}/g".format(saveasarray[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rawdataoutput--/{}/g".format(rawdataoutput[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--asynctimeout--/{}/g".format(asynctimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--timedelay--/{}/g".format(timedelay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocesstypes--/{}/g".format(preprocesstypes), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--pathtotmlattrs--/{}/g".format(pathtotmlattrs), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--identifier--/{}/g".format(identifier), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--jsoncriteria--/{}/g".format(jsoncriteria), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows--/{}/g".format(maxrows4[1:]), "/{}/docs/source/details.rst".format(sname)])

       raw_data_topic = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_raw_data_topic".format(sname))
       if raw_data_topic:
         step4araw_data_topic=raw_data_topic
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_preprocess_data_topic".format(sname))
       if preprocess_data_topic:
         step4apreprocess_data_topic=preprocess_data_topic
       preprocessconditions = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_preprocessconditions".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_delay".format(sname))
       array = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_array".format(sname))
       saveasarray = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_saveasarray".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_topicid".format(sname))
       rawdataoutput = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_rawdataoutput".format(sname))
       asynctimeout = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_asynctimeout".format(sname))
       timedelay = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_timedelay".format(sname))
       usemysql = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_usemysql".format(sname))
       preprocesstypes = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_preprocesstypes".format(sname))
       if preprocesstypes:
         step4apreprocesstypes=preprocesstypes
       pathtotmlattrs = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_pathtotmlattrs".format(sname))
       identifier = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_identifier".format(sname))
       jsoncriteria = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_jsoncriteria".format(sname))
       if jsoncriteria:
        step4ajsoncriteria=jsoncriteria
       maxrows4 = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_maxrows".format(sname))
       if maxrows4:
         step4amaxrows=maxrows4

       if preprocess_data_topic:
           subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic1--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic1--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocessconditions1--/{}/g".format(preprocessconditions), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay1--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--array1--/{}/g".format(array[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--saveasarray1--/{}/g".format(saveasarray[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid1--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rawdataoutput1--/{}/g".format(rawdataoutput[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--asynctimeout1--/{}/g".format(asynctimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--timedelay1--/{}/g".format(timedelay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocesstypes1--/{}/g".format(preprocesstypes), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--pathtotmlattrs1--/{}/g".format(pathtotmlattrs), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--identifier1--/{}/g".format(identifier), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--jsoncriteria1--/{}/g".format(jsoncriteria), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows1--/{}/g".format(maxrows4[1:]), "/{}/docs/source/details.rst".format(sname)])

       raw_data_topic = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_raw_data_topic".format(sname))
       if raw_data_topic:
          step4braw_data_topic=raw_data_topic
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_preprocess_data_topic".format(sname))
       if preprocess_data_topic:
           step4bpreprocess_data_topic=preprocess_data_topic
       preprocessconditions = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_preprocessconditions".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_delay".format(sname))
       array = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_array".format(sname))
       saveasarray = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_saveasarray".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_topicid".format(sname))
       rawdataoutput = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_rawdataoutput".format(sname))
       asynctimeout = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_asynctimeout".format(sname))
       timedelay = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_timedelay".format(sname))
       usemysql = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_usemysql".format(sname))
       preprocesstypes = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_preprocesstypes".format(sname))
       if preprocesstypes:
          step4bpreprocesstypes=preprocesstypes
       pathtotmlattrs = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_pathtotmlattrs".format(sname))
       identifier = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_identifier".format(sname))
       jsoncriteria = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_jsoncriteria".format(sname))
       if jsoncriteria:
          step4bjsoncriteria=jsoncriteria
       maxrows4b = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_maxrows".format(sname))
       if maxrows4b:
          step4bmaxrows=maxrows4b

       if preprocess_data_topic:
           subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic2--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic2--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocessconditions2--/{}/g".format(preprocessconditions), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay2--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--array2--/{}/g".format(array[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--saveasarray2--/{}/g".format(saveasarray[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid2--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rawdataoutput2--/{}/g".format(rawdataoutput[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--asynctimeout2--/{}/g".format(asynctimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--timedelay2--/{}/g".format(timedelay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocesstypes2--/{}/g".format(preprocesstypes), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--pathtotmlattrs2--/{}/g".format(pathtotmlattrs), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--identifier2--/{}/g".format(identifier), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--jsoncriteria2--/{}/g".format(jsoncriteria), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows2--/{}/g".format(maxrows4b[1:]), "/{}/docs/source/details.rst".format(sname)])


       raw_data_topic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_raw_data_topic".format(sname))
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_preprocess_data_topic".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_delay".format(sname))
       array = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_array".format(sname))
       saveasarray = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_saveasarray".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_topicid".format(sname))
       rawdataoutput = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rawdataoutput".format(sname))
       asynctimeout = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_asynctimeout".format(sname))
       timedelay = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_timedelay".format(sname))
       usemysql = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_usemysql".format(sname))
       searchterms = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_searchterms".format(sname))
       rememberpastwindows = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rememberpastwindows".format(sname))
       identifier = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_identifier".format(sname))
       patternwindowthreshold = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_patternwindowthreshold".format(sname))
       maxrows4c = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_maxrows".format(sname))
       rtmsstream = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsstream".format(sname))
       rtmsscorethresholdtopic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsscorethresholdtopic".format(sname))
       attackscorethresholdtopic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_attackscorethresholdtopic".format(sname))
       patternscorethresholdtopic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_patternscorethresholdtopic".format(sname))
       rtmsscorethreshold = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsscorethreshold".format(sname))
       attackscorethreshold = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_attackscorethreshold".format(sname))
       patternscorethreshold = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_patternscorethreshold".format(sname))
       rtmsmaxwindows = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsmaxwindows".format(sname))
       if rtmsmaxwindows:
         step4crtmsmaxwindows=rtmsmaxwindows
         subprocess.call(["sed", "-i", "-e",  "s/--rtmsmaxwindows--/{}/g".format(rtmsmaxwindows[1:]), "/{}/docs/source/details.rst".format(sname)])

       localsearchtermfolder = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_localsearchtermfolder".format(sname))
       localsearchtermfolderinterval = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_localsearchtermfolderinterval".format(sname))
       rtmsfoldername = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsfoldername".format(sname))

       if searchterms:
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsscorethresholdtopic--;{}".format(rtmsscorethresholdtopic)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--attackscorethresholdtopic--;{}".format(attackscorethresholdtopic)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--patternscorethresholdtopic--;{}".format(patternscorethresholdtopic)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsfoldername--;{}".format(rtmsfoldername)])

           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsscorethreshold--;{}".format(rtmsscorethreshold[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--attackscorethreshold--;{}".format(attackscorethreshold[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--patternscorethreshold--;{}".format(patternscorethreshold[1:])])
           subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic3--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic3--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rtmsstream--/{}/g".format(rtmsstream), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay3--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--array3--/{}/g".format(array[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--saveasarray3--/{}/g".format(saveasarray[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid3--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rawdataoutput3--/{}/g".format(rawdataoutput[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--asynctimeout3--/{}/g".format(asynctimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--timedelay3--/{}/g".format(timedelay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rememberpastwindows--/{}/g".format(rememberpastwindows[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--patternwindowthreshold--/{}/g".format(patternwindowthreshold[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--identifier3--/{}/g".format(identifier), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows3--/{}/g".format(maxrows4c[1:]), "/{}/docs/source/details.rst".format(sname)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmssearchterms--;{}".format(searchterms)])
           rtmsoutputurl="https:\/\/github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/{}".format(os.environ["GITUSERNAME"], tsslogging.getrepo(),projectname,rtmsfoldername)
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsoutputurl--;{}".format(rtmsoutputurl)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--localsearchtermfolder--;{}".format(localsearchtermfolder)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--localsearchtermfolderinterval--;{}".format(localsearchtermfolderinterval[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsfoldername--;{}".format(rtmsfoldername)])

           step4crawdatatopic=raw_data_topic
           step4csearchterms=searchterms
           step4crememberpastwindows=rememberpastwindows
           step4cpatternwindowthreshold=patternwindowthreshold
           step4crtmsstream=rtmsstream
           step4crtmsscorethreshold=rtmsscorethreshold
           step4cattackscorethreshold=attackscorethreshold
           step4cpatternscorethreshold=patternscorethreshold
           step4clocalsearchtermfolder=localsearchtermfolder
           step4clocalsearchtermfolderinterval=localsearchtermfolderinterval
           step4crtmsfoldername=rtmsfoldername

       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_preprocess_data_topic".format(sname))
       ml_data_topic = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_ml_data_topic".format(sname))
       modelruns = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_modelruns".format(sname))
       offset = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_offset".format(sname))
       islogistic = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_islogistic".format(sname))
       networktimeout = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_networktimeout".format(sname))
       modelsearchtuner = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_modelsearchtuner".format(sname))
       dependentvariable = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_dependentvariable".format(sname))
       independentvariables = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_independentvariables".format(sname))
       if independentvariables:
         step5independentvariables = independentvariables

       rollbackoffsets = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_rollbackoffsets".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_topicid".format(sname))
       consumefrom = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_consumefrom".format(sname))
       fullpathtotrainingdata = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_fullpathtotrainingdata".format(sname))
       transformtype = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_transformtype".format(sname))
       sendcoefto = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_sendcoefto".format(sname))
       coeftoprocess = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_coeftoprocess".format(sname))
       coefsubtopicnames = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_coefsubtopicnames".format(sname))
       processlogic = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_processlogic".format(sname))
       if fullpathtotrainingdata:
            step5sp=fullpathtotrainingdata.split("/")
            if len(step5sp)>0:
              mloutputurl="https:\/\/github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/mldata/{}".format(os.environ["GITUSERNAME"], tsslogging.getrepo(),projectname,step5sp[-1])
              doparse("/{}/docs/source/details.rst".format(sname), ["--mloutputurl--;{}".format(mloutputurl)])

       if processlogic:
         step5processlogic = processlogic

       if modelruns:
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--ml_data_topic--/{}/g".format(ml_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--modelruns--/{}/g".format(modelruns[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--offset--/{}/g".format(offset[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--islogistic--/{}/g".format(islogistic[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--networktimeout--/{}/g".format(networktimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--modelsearchtuner--/{}/g".format(modelsearchtuner[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--dependentvariable--/{}/g".format(dependentvariable), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--independentvariables--/{}/g".format(independentvariables), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rollbackoffsets--/{}/g".format(rollbackoffsets[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--consumefrom--/{}/g".format(consumefrom), "/{}/docs/source/details.rst".format(sname)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--fullpathtotrainingdata--;{}".format(fullpathtotrainingdata)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--processlogic--;{}".format(processlogic)])

           subprocess.call(["sed", "-i", "-e",  "s/--transformtype--/{}/g".format(transformtype), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--sendcoefto--/{}/g".format(sendcoefto), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--coeftoprocess--/{}/g".format(coeftoprocess), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--coefsubtopicnames--/{}/g".format(coefsubtopicnames), "/{}/docs/source/details.rst".format(sname)])

       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_preprocess_data_topic".format(sname))
       ml_prediction_topic = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_ml_prediction_topic".format(sname))
       streamstojoin = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_streamstojoin".format(sname))
       inputdata = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_inputdata".format(sname))
       consumefrom2 = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_consumefrom".format(sname))
       offset = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_offset".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_delay".format(sname))
       usedeploy = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_usedeploy".format(sname))
       networktimeout = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_networktimeout".format(sname))
       maxrows = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_maxrows".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_topicid".format(sname))
       pathtoalgos = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_pathtoalgos".format(sname))

       if ml_prediction_topic:
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--ml_prediction_topic--/{}/g".format(ml_prediction_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--streamstojoin--/{}/g".format(streamstojoin), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--inputdata--/{}/g".format(inputdata), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--consumefrom2--/{}/g".format(consumefrom2), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--offset--/{}/g".format(offset[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--usedeploy--/{}/g".format(usedeploy[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--networktimeout--/{}/g".format(networktimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows--/{}/g".format(maxrows[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--pathtoalgos--;{}".format(pathtoalgos)])

       topic = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_topic".format(sname))
       secure = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_secure".format(sname))
       offset = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_offset".format(sname))
       append = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_append".format(sname))
       chip = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_chip".format(sname))
       rollbackoffset = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_rollbackoffset".format(sname))
       dashboardhtml = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_dashboardhtml".format(sname))

       containername = context['ti'].xcom_pull(task_ids='step_8_solution_task_containerize',key="{}_containername".format(sname))
       if containername:
           hcname = containername.split('/')[1]
           huser = containername.split('/')[0]
           hurl = "https://hub.docker.com/r/{}/{}".format(huser,hcname)
       else:
           containername="TBD"

       if vipervizport:
           subprocess.call(["sed", "-i", "-e",  "s/--vipervizport--/{}/g".format(vipervizport[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topic--/{}/g".format(topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--dashboardhtml--/{}/g".format(dashboardhtml), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--secure--/{}/g".format(secure[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--offset--/{}/g".format(offset[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--append--/{}/g".format(append[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--chip--/{}/g".format(chip), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rollbackoffset--/{}/g".format(rollbackoffset[1:]), "/{}/docs/source/details.rst".format(sname)])


       repo = tsslogging.getrepo()
       gitrepo="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}".format(os.environ['GITUSERNAME'],repo,projectname)
      # gitrepo = "\/{}\/tml-airflow\/dags\/tml-solutions\/{}".format(repo,sname)

       v=subprocess.call(["sed", "-i", "-e",  "s/--gitrepo--/{}/g".format(gitrepo), "/{}/docs/source/operating.rst".format(sname)])
       print("V=",v)
       doparse("/{}/docs/source/operating.rst".format(sname), ["--gitrepo--;{}".format(gitrepo)])

       subprocess.call(["sed", "-i", "-e",  "s/--solutionname--/{}/g".format(sname), "/{}/docs/source/operating.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--dockercontainer--/{}\n\n{}/g".format(containername,hurl), "/{}/docs/source/operating.rst".format(sname)])

       chipmain = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

       doparse("/{}/docs/source/operating.rst".format(sname), ["--justcontainer--;{}".format(containername)])

       doparse("/{}/docs/source/operating.rst".format(sname), ["--tsscontainer--;maadsdocker/tml-solution-studio-with-airflow-{}".format(chip)])

       doparse("/{}/docs/source/operating.rst".format(sname), ["--chip--;{}".format(chipmain)])
       if istss1==0:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionairflowport--;{}".format(solutionairflowport[1:])])
       else:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionairflowport--;{}".format("TBD")])

       doparse("/{}/docs/source/operating.rst".format(sname), ["--externalport--;{}".format(externalport[1:])])
       if istss1==0:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionexternalport--;{}".format(solutionexternalport[1:])])
       else:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionexternalport--;{}".format("TBD")])

       pconsumefrom = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_consumefrom".format(sname))
       pgpt_data_topic = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_pgpt_data_topic".format(sname))
       pgptcontainername = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_pgptcontainername".format(sname))
       pmainmodel=""
       pmainembedding=""
       if pgptcontainername != None:
         step9pgptcontainername=pgptcontainername
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubeprivategpt--;{}".format(pgptcontainername)])
         mainmodel = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_mainmodel".format(sname))
         pmainmodel=mainmodel
         mainembedding = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_mainembedding".format(sname))
         pmainembedding=mainembedding
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubemainmodel--;{}".format(mainmodel)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubemainembedding--;{}".format(mainembedding)])

       poffset = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_offset".format(sname))
       prollbackoffset = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_rollbackoffset".format(sname))
       ptopicid = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_topicid".format(sname))
       penabletls = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_enabletls".format(sname))
       ppartition = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_partition".format(sname))
       pprompt = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_prompt".format(sname))
       pcontextwindowsize = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_contextwindowsize".format(sname))
       pvectordimension = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_vectordimension".format(sname))
       pmitrejson = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_mitrejson".format(sname))

       if pmitrejson:
          doparse("/{}/docs/source/details.rst".format(sname), ["--mitrejson--;{}".format(pmitrejson)])

       if pcontextwindowsize:
          step9pcontextwindowsize=pcontextwindowsize
          doparse("/{}/docs/source/details.rst".format(sname), ["--contextwindowsize--;{}".format(pcontextwindowsize[1:])])
          doparse("/{}/docs/source/kube.rst".format(sname), ["--kubecontextwindowsize--;{}".format(pcontextwindowsize[1:])])

       if pvectordimension:
          step9vectordimension=pvectordimension
          doparse("/{}/docs/source/details.rst".format(sname), ["--vectordimension--;{}".format(pvectordimension[1:])])
          doparse("/{}/docs/source/kube.rst".format(sname), ["--kubevectordimension--;{}".format(pvectordimension[1:])])

       if pprompt:
         step9prompt=pprompt
         step9prompt=step9prompt.strip().replace('\n','').replace("\\n","").replace(";",",").replace("''","")

       pdocfolder = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_docfolder".format(sname))
       if pdocfolder:
         step9docfolder=pdocfolder
         doparse("/{}/docs/source/details.rst".format(sname), ["--docfolder--;{}".format(pdocfolder)])

       pdocfolderingestinterval = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_docfolderingestinterval".format(sname))
       if pdocfolderingestinterval:
         step9docfolderingestinterval=pdocfolderingestinterval
         doparse("/{}/docs/source/details.rst".format(sname), ["--docfolderingestinterval--;{}".format(pdocfolderingestinterval[1:])])

       puseidentifierinprompt = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_useidentifierinprompt".format(sname))
       if puseidentifierinprompt:
         step9useidentifierinprompt=puseidentifierinprompt
         doparse("/{}/docs/source/details.rst".format(sname), ["--useidentifierinprompt--;{}".format(puseidentifierinprompt[1:])])

       pcontext = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_context".format(sname))
       if pcontext:
          step9context=pcontext
       pjsonkeytogather = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_jsonkeytogather".format(sname))
       pkeyattribute = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_keyattribute".format(sname))
       if pkeyattribute:
         step9keyattribute=pkeyattribute
       pconcurrency = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_concurrency".format(sname))
       if pconcurrency:
         step9concurrency=pconcurrency
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubeconcur--;{}".format(pconcurrency[1:])])

       pcuda = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_cuda".format(sname))
       if pcuda:
        cudavisibledevices=pcuda
       pcollection = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_vectordbcollectionname".format(sname))
       if pcollection:
         step9vectordbcollectionname=pcollection
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubecollection--;{}".format(pcollection)])

       pgpthost = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_pgpthost".format(sname))
       if pgpthost:
         step9pgpthost=pgpthost

       pgptport = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_pgptport".format(sname))
       if pgptport:
         step9pgptport=pgptport

       pprocesstype = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_keyprocesstype".format(sname))
       if pprocesstype:
         step9keyprocesstype=pprocesstype
       hyperbatch = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_hyperbatch".format(sname))
       if hyperbatch:
         step9hyperbatch=hyperbatch
       psearchterms = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_searchterms".format(sname))
       if psearchterms:
         step9searchterms=psearchterms
         doparse("/{}/docs/source/details.rst".format(sname), ["--searchterms--;{}".format(psearchterms)])
       pstreamall = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_streamall".format(sname))
       if pstreamall:
         step9streamall=pstreamall
         doparse("/{}/docs/source/details.rst".format(sname), ["--streamall--;{}".format(pstreamall[1:])])
       ptemperature = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_temperature".format(sname))
       if ptemperature:
         step9temperature=ptemperature
         doparse("/{}/docs/source/details.rst".format(sname), ["--temperature--;{}".format(ptemperature[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubetemperature--;{}".format(ptemperature[1:])])

       pvectorsearchtype = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_vectorsearchtype".format(sname))
       if pvectorsearchtype:
         step9vectorsearchtype=pvectorsearchtype
         doparse("/{}/docs/source/details.rst".format(sname), ["--vectorsearchtype--;{}".format(pvectorsearchtype)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubevectorsearchtype--;{}".format(pvectorsearchtype)])

       ollama= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_ollama-model".format(sname))
       if ollama != None: # Step 9b executing
         step9bollama=ollama
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-ollama-model--;{}".format(ollama)])
         rollback= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_rollbackoffset".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-rollbackoffset--;{}".format(rollback[1:])])
         step9brollback=rollback[1:]

         deletevector= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_deletevectordbcount".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-deletevectordbcount--;{}".format(deletevector[1:])])
         step9bdeletevectordbcount=deletevector[1:]

         vectordbpath= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_vectordbpath".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-vectordbpath--;{}".format(vectordbpath)])
         step9bvectordbpath=vectordbpath

         temp= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_temperature".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-temperature--;{}".format(temp[1:])])
         step9btemperature=temp[1:]

         topicid= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_topicid".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-topicid--;{}".format(topicid[1:])])
         step9btopicid=topicid[1:]

         enabletls= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_enabletls".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-enabletls--;{}".format(enabletls[1:])])
         step9benabletls=enabletls[1:]

         partition= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_partition".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-partition--;{}".format(partition[1:])])
         step9bpartition=partition[1:]

         collection= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_vectordbcollectionname".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-vectordbcollectionname--;{}".format(collection)])
         step9bvectordbcollectionname=collection

         ollamacontainername= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_ollamacontainername".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-ollamacontainername--;{}".format(ollamacontainername)])
         step9bollamacontainername=ollamacontainername

         mainip= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_mainip".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-mainip--;{}".format(mainip)])
         step9bmainip=mainip

         mainport= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_mainport".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-mainport--;{}".format(mainport[1:])])
         step9bmainport=mainport[1:]

         embedding= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_embedding".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-embedding--;{}".format(embedding)])
         step9bembedding=embedding

         agents_topic_prompt= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_agents_topic_prompt".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-agents_topic_prompt--;{}".format(agents_topic_prompt)])
         step9bagents_topic_prompt=agents_topic_prompt

         teamlead_topic= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_teamlead_topic".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-teamlead_topic--;{}".format(teamlead_topic)])
         step9bteamlead_topic=teamlead_topic

         teamleadprompt= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_teamleadprompt".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-teamleadprompt--;{}".format(teamleadprompt)])
         step9bteamleadprompt=teamleadprompt
         step9bteamleadprompt=step9bteamleadprompt.replace('\n',' ').replace("\\n","").strip().replace(";",",").replace("''","")

         supervisor_topic= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_supervisor_topic".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-supervisor_topic--;{}".format(supervisor_topic)])
         step9bsupervisor_topic=supervisor_topic

         supervisorprompt= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_supervisorprompt".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-supervisorprompt--;{}".format(supervisorprompt)])
         step9bsupervisorprompt=supervisorprompt
         step9bsupervisorprompt=step9bsupervisorprompt.replace('\n','').replace("\\n","").strip().replace(";",",").replace("''","")

         agenttoolfunctions= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_agenttoolfunctions".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-agenttoolfunctions--;{}".format(agenttoolfunctions)])
         step9bagenttoolfunctions=agenttoolfunctions
         step9bagenttoolfunctions=step9bagenttoolfunctions.replace('\n','').replace("\\n","").strip().replace(";",",").replace("''","")


         agent_team_supervisor_topic= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_agent_team_supervisor_topic".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-agent_team_supervisor_topic--;{}".format(agent_team_supervisor_topic)])
         step9bagent_team_supervisor_topic=agent_team_supervisor_topic

         agenttopic= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_agenttopic".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-agenttopic--;{}".format(agenttopic)])
         step9bagenttopic=agenttopic

         localmodelsfolder= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_localmodelsfolder".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-localmodelsfolder--;{}".format(localmodelsfolder)])
         step9blocalmodelsfolder=localmodelsfolder

         concurrency= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_concurrency".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-concurrency--;{}".format(concurrency[1:])])
         step9bconcurrency=concurrency[1:]

         cuda= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_cuda".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-cuda--;{}".format(cuda[1:])])
         step9bCUDA_VISIBLE_DEVICES=cuda[1:]

         contextwindow= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_contextwindow".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-contextwindow--;{}".format(contextwindow[1:])])
         step9bcontextwindow=contextwindow[1:]

         doparse("/{}/docs/source/kube.rst".format(sname), ["--ollamacontainername--;{}".format(ollamacontainername)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-kubeconcur--;{}".format(concurrency[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-kubecollection--;{}".format(collection)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-kubetemperature--;{}".format(temp[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-rollbackoffset--;{}".format(rollback[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-ollama-model--;{}".format(ollama)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-deletevectordbcount--;{}".format(deletevector[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-vectordbpath--;{}".format(vectordbpath)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-topicid--;{}".format(topicid[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-enabletls--;{}".format(enabletls[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-partition--;{}".format(partition[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-vectordbcollectionname--;{}".format(collection)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-ollamacontainername--;{}".format(ollamacontainername)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-mainip--;{}".format(mainip)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-mainport--;{}".format(mainport[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-contextwindow--;{}".format(contextwindow[1:])])

         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-agenttopic--;{}".format(agenttopic)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-localmodelsfolder--;{}".format(localmodelsfolder)])

         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-embedding--;{}".format(embedding)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-agents_topic_prompt--;{}".format(agents_topic_prompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-teamlead_topic--;{}".format(teamlead_topic)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-teamleadprompt--;{}".format(teamleadprompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",",") )])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-supervisor_topic--;{}".format(supervisor_topic)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-supervisorprompt--;{}".format(supervisorprompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-agenttoolfunctions--;{}".format(agenttoolfunctions.strip().replace('\n','').replace("\\n","").replace("'","").replace(";","=="))])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-agent_team_supervisor_topic--;{}".format(agent_team_supervisor_topic)])

       ebuf=""
       if 'dockerenv' in default_args:
        if default_args['dockerenv'] != '':
          buf=default_args['dockerenv']
          darr = buf.split("***")
          ebuf="\n"
          for d in darr:
             v=d.split("=")
             if len(v)>1:
               if 'jsoncriteria' in v[0].strip():
                 d=d[d.index("=")+1:]
                 ebuf = ebuf + '          --env ' + v[0].strip() + '=\"' + d + '\" \\ \n'
               else:
                 ebuf = ebuf + '          --env ' + v[0].strip() + '=\"' + v[1].strip() + '\" \\ \n'
             else:
               ebuf = ebuf + '          --env ' + v[0].strip() + '=' + ' \\ \n'
          ebuf = ebuf[:-1]
        if default_args['dockerinstructions'] != '':
          doparse("/{}/docs/source/operating.rst".format(sname), ["--dockerinstructions--;{}".format(default_args['dockerinstructions'])])
        else:
          doparse("/{}/docs/source/operating.rst".format(sname), ["--dockerinstructions--;{}".format("Please ask the developer of this solution.")])

       if len(CLIENTPORT) > 1:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--clientport--;{}".format(TMLCLIENTPORT[1:])])
         dockerrun = """docker run -d --net=host -p {}:{} -p {}:{} -p {}:{} -p {}:{} \\
             --env TSS=0 \\
             --env SOLUTIONNAME={} \\
             --env SOLUTIONDAG={} \\
             --env GITUSERNAME=<Enter Github Username> \\
             --env GITPASSWORD='<Enter Github Password>' \\
             --env GITREPOURL=<Enter Github Repo URL> \\
             --env SOLUTIONEXTERNALPORT={} \\
             -v /var/run/docker.sock:/var/run/docker.sock:z  \\
             -v /your_localmachine/foldername:/rawdata:z \\
             --env CHIP={} \\
             --env SOLUTIONAIRFLOWPORT={}  \\
             --env SOLUTIONVIPERVIZPORT={} \\
             --env DOCKERUSERNAME='' \\
             --env CLIENTPORT={}  \\
             --env EXTERNALPORT={} \\
             --env KAFKABROKERHOST=127.0.0.1:9092 \\
             --env KAFKACLOUDUSERNAME='<Enter API key>' \\
             --env KAFKACLOUDPASSWORD='<Enter API secret>' \\
             --env SASLMECHANISM=PLAIN \\
             --env VIPERVIZPORT={} \\
             --env MQTTUSERNAME='' \\
             --env MQTTPASSWORD='' \\
             --env AIRFLOWPORT={}  \\
             --env READTHEDOCS='<Enter Readthedocs token>' \\{}
             {}""".format(solutionexternalport[1:],solutionexternalport[1:],
                             solutionairflowport[1:],solutionairflowport[1:],solutionvipervizport[1:],solutionvipervizport[1:],
                             TMLCLIENTPORT[1:],TMLCLIENTPORT[1:],sname,sd,
                             solutionexternalport[1:],chipmain,
                             solutionairflowport[1:],solutionvipervizport[1:],TMLCLIENTPORT[1:],
                             externalport[1:],vipervizport[1:],airflowport[1:],ebuf,containername)
       else:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--clientport--;Not Applicable"])
         dockerrun = """docker run -d --net=host -p {}:{} -p {}:{} -p {}:{} \\
             --env TSS=0 \\
             --env SOLUTIONNAME={} \\
             --env SOLUTIONDAG={} \\
             --env GITUSERNAME=<Enter Github Username> \\
             --env GITPASSWORD='<Enter Github Password>' \\
             --env GITREPOURL=<Enter Github Repo URL> \\
             --env SOLUTIONEXTERNALPORT={} \\
             -v /var/run/docker.sock:/var/run/docker.sock:z \\
             -v /your_localmachine/foldername:/rawdata:z \\
             --env CHIP={} \\
             --env SOLUTIONAIRFLOWPORT={} \\
             --env SOLUTIONVIPERVIZPORT={} \\
             --env DOCKERUSERNAME='' \\
             --env EXTERNALPORT={} \\
             --env KAFKABROKERHOST=127.0.0.1:9092 \\
             --env KAFKACLOUDUSERNAME='<Enter API key>' \\
             --env KAFKACLOUDPASSWORD='<Enter API secret>' \\
             --env SASLMECHANISM=PLAIN \\
             --env VIPERVIZPORT={} \\
             --env MQTTUSERNAME='' \\
             --env MQTTPASSWORD='' \\
             --env AIRFLOWPORT={} \\
             --env READTHEDOCS='<Enter Readthedocs token>' \\{}
             {}""".format(solutionexternalport[1:],solutionexternalport[1:],
                             solutionairflowport[1:],solutionairflowport[1:],solutionvipervizport[1:],solutionvipervizport[1:],
                             sname,sd,solutionexternalport[1:],chipmain,
                             solutionairflowport[1:],solutionvipervizport[1:],
                             externalport[1:],vipervizport[1:],airflowport[1:],ebuf,containername)

      # dockerrun = re.escape(dockerrun)
       v=subprocess.call(["sed", "-i", "-e",  "s/--dockerrun--/{}/g".format(dockerrun), "/{}/docs/source/operating.rst".format(sname)])

       if istss1==1:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--dockerrun--;{}".format(dockerrun),"--dockercontainer--;{} ({})".format(containername, hurl)])
         doparse("/{}/docs/source/details.rst".format(sname), ["--dockerrun--;{}".format(dockerrun),"--dockercontainer--;{} ({})".format(containername, hurl)])
       else:
         try:
           with open("/tmux/step1solutionold.txt", "r") as f:
             msname=f.read()
             mbuf="Refer to the original solution container and documenation here: https://{}.readthedocs.io/en/latest/operating.html".format(msname.strip())
             doparse("/{}/docs/source/operating.rst".format(sname), ["--dockerrun--;{}".format(dockerrun),"--dockercontainer--;{}".format(mbuf)])
         except Exception as e:
           pass

       step9rollbackoffset=-1
       step9llmmodel=''
       step9embedding=''
       step9vectorsize=''
       if pgptcontainername != None:
           if os.environ['TSS'] == "1":
              privategptrun = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z --env PORT={} --env TSS=1 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env vectorsearchtype=\"{}\" --env contextwindowsize={} --env vectordimension={} {}".format(pgptport[1:],pgptport[1:],pgptport[1:],pcollection,pconcurrency[1:],pcuda[1:],ptemperature[1:], pvectorsearchtype, pcontextwindowsize[1:], pvectordimension[1:],pgptcontainername)
           else:
              privategptrun = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z --env PORT={} --env TSS=0 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env vectorsearchtype=\"{}\" --env contextwindowsize={} --env vectordimension={} {}".format(pgptport[1:],pgptport[1:],pgptport[1:],pcollection,pconcurrency[1:],pcuda[1:],ptemperature[1:], pvectorsearchtype, pcontextwindowsize[1:], pvectordimension[1:],pgptcontainername)

           step9llmmodel='Refer to: https://tml.readthedocs.io/en/latest/genai.html'
           step9embedding='Refer to: https://tml.readthedocs.io/en/latest/genai.html'
           step9vectorsize='Refer to: https://tml.readthedocs.io/en/latest/genai.html'

           doparse("/{}/docs/source/details.rst".format(sname), ["--llmmodel--;{}".format(step9llmmodel)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--embedding--;{}".format(step9embedding)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--vectorsize--;{}".format(step9vectorsize)])

           doparse("/{}/docs/source/details.rst".format(sname), ["--pgptcontainername--;{}".format(pgptcontainername),"--privategptrun--;{}".format(privategptrun)])

           qdrantcontainer = "qdrant/qdrant"
           qdrantrun = "docker run -d -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant"
           doparse("/{}/docs/source/details.rst".format(sname), ["--qdrantcontainer--;{}".format(qdrantcontainer),"--qdrantrun--;{}".format(qdrantrun)])

           doparse("/{}/docs/source/details.rst".format(sname), ["--consumefrom--;{}".format(pconsumefrom)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--pgpt_data_topic--;{}".format(pgpt_data_topic)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--vectordbcollectionname--;{}".format(pcollection)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--offset--;{}".format(poffset[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--rollbackoffset--;{}".format(prollbackoffset[1:])])
           step9rollbackoffset=prollbackoffset[1:]
           doparse("/{}/docs/source/details.rst".format(sname), ["--topicid--;{}".format(ptopicid[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--enabletls--;{}".format(penabletls[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--partition--;{}".format(ppartition[1:])])
           pprompt=pprompt.replace("\\n"," ")
           doparse("/{}/docs/source/details.rst".format(sname), ["--prompt--;{}".format(pprompt)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--context--;{}".format(pcontext)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--jsonkeytogather--;{}".format(pjsonkeytogather)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--keyattribute--;{}".format(pkeyattribute)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--concurrency--;{}".format(pconcurrency[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--cuda--;{}".format(pcuda[1:])])
           if kube == 1:
               doparse("/{}/docs/source/details.rst".format(sname), ["--pgpthost--;{}".format('privategpt-service')])
           else:
               doparse("/{}/docs/source/details.rst".format(sname), ["--pgpthost--;{}".format(pgpthost)])

           doparse("/{}/docs/source/details.rst".format(sname), ["--pgptport--;{}".format(pgptport[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--keyprocesstype--;{}".format(pprocesstype)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--hyperbatch--;{}".format(hyperbatch[1:])])

       snamerp=sname.replace("_","-")
       rbuf = "https://{}.readthedocs.io".format(snamerp)
       doparse("/{}/docs/source/details.rst".format(sname), ["--readthedocs--;{}".format(rbuf)])

       ############# VIZ URLS

       vizurl = "http:\/\/localhost:{}\/{}?topic={}\&offset={}\&groupid=\&rollbackoffset={}\&topictype=prediction\&append={}\&secure={}".format(solutionvipervizport[1:],dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])
       vizurlkube = "http://localhost:{}/{}?topic={}&offset={}&groupid=&rollbackoffset={}&topictype=prediction&append={}&secure={}".format(solutionvipervizport[1:],dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])
       if 'gRPC' in PRODUCETYPE:
         vizurlkubeing = "http://tml.tss2/viz/{}?topic={}&offset={}&groupid=&rollbackoffset={}&topictype=prediction&append={}&secure={}".format(dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])
       else:
         vizurlkubeing = "http://tml.tss/viz/{}?topic={}&offset={}&groupid=&rollbackoffset={}&topictype=prediction&append={}&secure={}".format(dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])

       if istss1==0:
         subprocess.call(["sed", "-i", "-e",  "s/--visualizationurl--/{}/g".format(vizurl), "/{}/docs/source/operating.rst".format(sname)])
       else:
         subprocess.call(["sed", "-i", "-e",  "s/--visualizationurl--/{}/g".format("This will appear AFTER you run Your Solution Docker Container"), "/{}/docs/source/operating.rst".format(sname)])

       tssvizurl = "http:\/\/localhost:{}\/{}?topic={}\&offset={}\&groupid=\&rollbackoffset={}\&topictype=prediction\&append={}\&secure={}".format(vipervizport[1:],dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])
       subprocess.call(["sed", "-i", "-e",  "s/--tssvisualizationurl--/{}/g".format(tssvizurl), "/{}/docs/source/operating.rst".format(sname)])

       tsslogfile = "http:\/\/localhost:{}\/viperlogs.html?topic=viperlogs\&append=0".format(vipervizport[1:])
       subprocess.call(["sed", "-i", "-e",  "s/--tsslogfile--/{}/g".format(tsslogfile), "/{}/docs/source/operating.rst".format(sname)])

       solutionlogfile = "http:\/\/localhost:{}\/viperlogs.html?topic=viperlogs\&append=0".format(solutionvipervizport[1:])
       if istss1==0:
         subprocess.call(["sed", "-i", "-e",  "s/--solutionlogfile--/{}/g".format(solutionlogfile), "/{}/docs/source/operating.rst".format(sname)])
       else:
         subprocess.call(["sed", "-i", "-e",  "s/--solutionlogfile--/{}/g".format("This will appear AFTER you run Your Solution Docker Container"), "/{}/docs/source/operating.rst".format(sname)])

       githublogs = "https:\/\/github.com\/{}\/{}\/blob\/main\/tml-airflow\/logs\/logs.txt".format(os.environ['GITUSERNAME'],repo)
       subprocess.call(["sed", "-i", "-e",  "s/--githublogs--/{}/g".format(githublogs), "/{}/docs/source/operating.rst".format(sname)])
       #-----------------------
       subprocess.call(["sed", "-i", "-e",  "s/--githublogs--/{}/g".format(githublogs), "/{}/docs/source/logs.rst".format(sname)])
       tsslogging.locallogs("INFO", "STEP 10: Documentation successfully built on GitHub..Readthedocs build in process and should complete in few seconds")
       try:
          sf = ""
          with open('/dagslocalbackup/logs.txt', "r") as f:
               sf=f.read()
          doparse("/{}/docs/source/logs.rst".format(sname), ["--logs--;{}".format(sf)])
       except Exception as e:
         print("Cannot open file - ",e)
         pass

       #-------------------
       airflowurl = "http:\/\/localhost:{}".format(airflowport[1:])
       subprocess.call(["sed", "-i", "-e",  "s/--airflowurl--/{}/g".format(airflowurl), "/{}/docs/source/operating.rst".format(sname)])

       readthedocs = "https:\/\/{}.readthedocs.io".format(sname)
       subprocess.call(["sed", "-i", "-e",  "s/--readthedocs--/{}/g".format(readthedocs), "/{}/docs/source/operating.rst".format(sname)])

       triggername = sd
       print("triggername=",triggername)
       doparse("/{}/docs/source/operating.rst".format(sname), ["--triggername--;{}".format(sd)])
       doparse("/{}/docs/source/operating.rst".format(sname), ["--airflowport--;{}".format(airflowport[1:])])
       doparse("/{}/docs/source/operating.rst".format(sname), ["--vipervizport--;{}".format(vipervizport[1:])])
       if istss1==0:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionvipervizport--;{}".format(solutionvipervizport[1:])])
       else:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionvipervizport--;{}".format("TBD")])

       tssdockerrun = ("docker run -d \-\-net=host \-\-env AIRFLOWPORT={} " \
                       " -v <change to your local folder>:/dagslocalbackup:z " \
                       " -v /var/run/docker.sock:/var/run/docker.sock:z " \
                       " -v /your_localmachine/foldername:/rawdata:z " \
                       " \-\-env GITREPOURL={} " \
                       " \-\-env CHIP={} \-\-env TSS=1 \-\-env SOLUTIONNAME=TSS " \
                       " \-\-env EXTERNALPORT={} " \
                       " \-\-env VIPERVIZPORT={} " \
                       " \-\-env GITUSERNAME='{}' " \
                       " \-\-env DOCKERUSERNAME='{}' " \
                       " \-\-env MQTTUSERNAME='{}' " \
                       " \-\-env KAFKACLOUDUSERNAME='{}' " \
                       " \-\-env KAFKACLOUDPASSWORD='<Enter your API secret>' " \
                       " \-\-env READTHEDOCS='<Enter your readthedocs token>' " \
                       " \-\-env GITPASSWORD='<Enter personal access token>' " \
                       " \-\-env DOCKERPASSWORD='<Enter your docker hub password>' " \
                       " \-\-env MQTTPASSWORD='<Enter your mqtt password>' " \
                       " \-\-env UPDATE=1 " \
                       " maadsdocker/tml-solution-studio-with-airflow-{}".format(airflowport[1:],os.environ['GITREPOURL'],
                               chip,externalport[1:],vipervizport[1:],
                               os.environ['GITUSERNAME'],os.environ['DOCKERUSERNAME'],mqttusername,kafkacloudusername,chip))

       doparse("/{}/docs/source/operating.rst".format(sname), ["--tssdockerrun--;{}".format(tssdockerrun)])

       producinghost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
       producingport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONEXTERNALPORT".format(sname))
       preprocesshost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS".format(sname))
       preprocessport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS".format(sname))
       preprocesshost2 = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS2".format(sname))
       preprocessport2 = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS2".format(sname))

       preprocesshostpgpt = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESSPGPT".format(sname))
       preprocessportpgpt = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESSPGPT".format(sname))

       mlhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTML".format(sname))
       mlport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTML".format(sname))
       predictionhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREDICT".format(sname))
       predictionport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREDICT".format(sname))

       hpdehost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOST".format(sname))
       hpdeport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORT".format(sname))

       hpdepredicthost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOSTPREDICT".format(sname))
       hpdepredictport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORTPREDICT".format(sname))

       tmlbinaries = ("VIPERHOST_PRODUCE={}, VIPERPORT_PRODUCE={}, "
                          "VIPERHOST_PREPOCESS={}, VIPERPORT_PREPROCESS={}, "
                          "VIPERHOST_PREPOCESS2={}, VIPERPORT_PREPROCESS2={}, "
                          "VIPERHOST_PREPOCESS_PGPT={}, VIPERPORT_PREPROCESS_PGPT={}, "
                          "VIPERHOST_ML={}, VIPERPORT_ML={}, "
                          "VIPERHOST_PREDCT={}, VIPERPORT_PREDICT={}, "
                          "HPDEHOST={}, HPDEPORT={}, "
                          "HPDEHOST_PREDICT={}, HPDEPORT_PREDICT={}".format(producinghost,producingport[1:],preprocesshost,preprocessport[1:],
                                                                               preprocesshost2,preprocessport2[1:],
                                                                                preprocesshostpgpt,preprocessportpgpt[1:],
                                                                                 mlhost,mlport[1:],predictionhost,predictionport[1:],
                                                                                 hpdehost,hpdeport[1:],hpdepredicthost,hpdepredictport[1:] ))


       subprocess.call(["sed", "-i", "-e",  "s/--tmlbinaries--/{}/g".format(tmlbinaries), "/{}/docs/source/operating.rst".format(sname)])
       ########################## Kubernetes

       doparse("/{}/docs/source/kube.rst".format(sname), ["--solutionnamefile--;{}.yml".format(sname)])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--solutionname--;{}".format(sname)])
       if pgptcontainername != None and ollama != None:
               if '127.0.0.1' in brokerhost:
                 kcmd = "kubectl apply -f kafka.yml -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f qdrant.yml -f privategpt.yml -f ollama.yml -f {}.yml".format(sname)
               else:
                 kcmd = "kubectl apply -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f qdrant.yml -f privategpt.yml -f ollama.yml -f {}.yml".format(sname)

               doparse("/{}/docs/source/kube.rst".format(sname), ["--kubectl--;{}".format(kcmd)])
       elif pgptcontainername != None:
               if '127.0.0.1' in brokerhost:
                 kcmd = "kubectl apply -f kafka.yml -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f qdrant.yml -f privategpt.yml -f {}.yml".format(sname)
               else:
                 kcmd = "kubectl apply -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f qdrant.yml -f privategpt.yml -f {}.yml".format(sname)

               doparse("/{}/docs/source/kube.rst".format(sname), ["--kubectl--;{}".format(kcmd)])
       elif ollama != None:
               if '127.0.0.1' in brokerhost:
                 kcmd = "kubectl apply -f kafka.yml -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f {}.yml -f ollama.yml".format(sname)
               else:
                 kcmd = "kubectl apply -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f {}.yml -f ollama.yml".format(sname)

               doparse("/{}/docs/source/kube.rst".format(sname), ["--kubectl--;{}".format(kcmd)])
       else:
               if '127.0.0.1' in brokerhost:
                 kcmd = "kubectl apply -f kafka.yml -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f {}.yml".format(sname)
               else:
                 kcmd = "kubectl apply -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f {}.yml".format(sname)

               doparse("/{}/docs/source/kube.rst".format(sname), ["--kubectl--;{}".format(kcmd)])


       if maxrows4:
         step4maxrows=maxrows4[1:]
       else:
         step4maxrows=-1

       if maxrows4b:
         step4bmaxrows=maxrows4b[1:]
       else:
         step4bmaxrows=-1

       if maxrows4c:
         step4cmaxrows=maxrows4c[1:]
       else:
         step4cmaxrows=-1

       if rollbackoffsets:
         step5rollbackoffsets=rollbackoffsets[1:]
       else:
         step5rollbackoffsets=-1

       if maxrows:
         step6maxrows=maxrows[1:]
       else:
         step6maxrows=-1

       kubebroker='kafka-service:9092'
       if 'KUBEBROKERHOST' in os.environ:
          kubebroker = os.environ['KUBEBROKERHOST']
       kafkabroker='127.0.0.1:9092'
       if 'KAFKABROKERHOST' in os.environ:
          kafkabroker = os.environ['KAFKABROKERHOST']

       step1solutiontitle=stitle
       step1description=sdesc
       try:
         with open("/tmux/cname.txt", "r") as f:
           containername=f.read()
       except Exception as e:
           pass

   #    step9bagenttoolfunctions=""
       step9bagents_topic_prompt=step9bagents_topic_prompt.replace("\\n","").replace('\n','').strip().replace(";","==").replace("'","")
       if len(CLIENTPORT) > 1:
         kcmd2=tsslogging.genkubeyaml(sname,containername,TMLCLIENTPORT[1:],solutionairflowport[1:],solutionvipervizport[1:],solutionexternalport[1:],
                          sd,os.environ['GITUSERNAME'],os.environ['GITREPOURL'],chipmain,os.environ['DOCKERUSERNAME'],
                          externalport[1:],kafkacloudusername,mqttusername,airflowport[1:],vipervizport[1:],
                          step4maxrows,step4bmaxrows,step5rollbackoffsets,step6maxrows,step1solutiontitle,step1description,
                          step9rollbackoffset,kubebroker,kafkabroker,PRODUCETYPE,step9prompt,step9context,step9keyattribute,step9keyprocesstype,
                          step9hyperbatch[1:],step9vectordbcollectionname,step9concurrency[1:],cudavisibledevices[1:],
                          step9docfolder,step9docfolderingestinterval[1:],step9useidentifierinprompt[1:],step5processlogic,
                          step5independentvariables,step9searchterms,step9streamall[1:],step9temperature[1:],step9vectorsearchtype,
                          step9llmmodel,step9embedding,step9vectorsize,step4cmaxrows,step4crawdatatopic,step4csearchterms,step4crememberpastwindows[1:],
                          step4cpatternwindowthreshold[1:],step4crtmsstream,projectname,step4crtmsscorethreshold[1:],step4cattackscorethreshold[1:],
                          step4cpatternscorethreshold[1:],step4clocalsearchtermfolder,step4clocalsearchtermfolderinterval[1:],step4crtmsfoldername,
                          step3localfileinputfile,step3localfiledocfolder,step4crtmsmaxwindows[1:],step9pcontextwindowsize[1:],
                          step9pgptcontainername,step9pgpthost,step9pgptport[1:],step9vectordimension[1:],
                          step2raw_data_topic,step2preprocess_data_topic,step4raw_data_topic,step4preprocesstypes,
                          step4jsoncriteria,step4ajsoncriteria,step4amaxrows[1:],step4apreprocesstypes,step4araw_data_topic,
                          step4apreprocess_data_topic,step4bpreprocesstypes,step4bjsoncriteria,step4braw_data_topic,
                          step4bpreprocess_data_topic,step4preprocess_data_topic,
                          step9brollback,
                          step9bdeletevectordbcount,
                          step9bvectordbpath,
                          step9btemperature,
                          step9bvectordbcollectionname,
                          step9bollamacontainername,
                          step9bCUDA_VISIBLE_DEVICES,
                          step9bmainip,
                          step9bmainport,
                          step9bembedding,
                          step9bagents_topic_prompt,
                          step9bteamlead_topic,
                          step9bteamleadprompt,
                          step9bsupervisor_topic,
                          step9bagenttoolfunctions,
                          step9bagent_team_supervisor_topic,step9bcontextwindow,step9blocalmodelsfolder, step9bagenttopic)
       else:
         kcmd2=tsslogging.genkubeyamlnoext(sname,containername,TMLCLIENTPORT[1:],solutionairflowport[1:],solutionvipervizport[1:],solutionexternalport[1:],
                          sd,os.environ['GITUSERNAME'],os.environ['GITREPOURL'],chipmain,os.environ['DOCKERUSERNAME'],
                          externalport[1:],kafkacloudusername,mqttusername,airflowport[1:],vipervizport[1:],
                          step4maxrows,step4bmaxrows,step5rollbackoffsets,step6maxrows,step1solutiontitle,step1description,step9rollbackoffset,
                          kubebroker,kafkabroker,step9prompt,step9context,step9keyattribute,step9keyprocesstype,
                          step9hyperbatch[1:],step9vectordbcollectionname,step9concurrency[1:],cudavisibledevices[1:],
                          step9docfolder,step9docfolderingestinterval[1:],step9useidentifierinprompt[1:],step5processlogic,
                          step5independentvariables,step9searchterms,step9streamall[1:],step9temperature[1:],step9vectorsearchtype,
                          step9llmmodel,step9embedding,step9vectorsize,step4cmaxrows,step4crawdatatopic,step4csearchterms,step4crememberpastwindows[1:],
                          step4cpatternwindowthreshold[1:],step4crtmsstream,projectname,step4crtmsscorethreshold[1:],step4cattackscorethreshold[1:],
                          step4cpatternscorethreshold[1:],step4clocalsearchtermfolder,step4clocalsearchtermfolderinterval[1:],step4crtmsfoldername,
                          step3localfileinputfile,step3localfiledocfolder,step4crtmsmaxwindows[1:],step9pcontextwindowsize[1:],
                          step9pgptcontainername,step9pgpthost,step9pgptport[1:],step9vectordimension[1:],
                          step2raw_data_topic,step2preprocess_data_topic,step4raw_data_topic,step4preprocesstypes,
                          step4jsoncriteria,step4ajsoncriteria,step4amaxrows[1:],step4apreprocesstypes,step4araw_data_topic,
                          step4apreprocess_data_topic,step4bpreprocesstypes,step4bjsoncriteria,step4braw_data_topic,
                          step4bpreprocess_data_topic,step4preprocess_data_topic,
                          step9brollback,
                          step9bdeletevectordbcount,
                          step9bvectordbpath,
                          step9btemperature,
                          step9bvectordbcollectionname,
                          step9bollamacontainername,
                          step9bCUDA_VISIBLE_DEVICES,
                          step9bmainip,
                          step9bmainport,
                          step9bembedding,
                          step9bagents_topic_prompt,
                          step9bteamlead_topic,
                          step9bteamleadprompt,
                          step9bsupervisor_topic,
                          step9bagenttoolfunctions,
                          step9bagent_team_supervisor_topic,step9bcontextwindow,step9blocalmodelsfolder, step9bagenttopic)

       doparse("/{}/docs/source/kube.rst".format(sname), ["--solutionnamecode--;{}".format(kcmd2)])

       kpfwd="kubectl port-forward deployment/{} {}:{}".format(sname,solutionvipervizport[1:],solutionvipervizport[1:])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--kube-portforward--;{}".format(kpfwd)])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--visualizationurl--;{}".format(vizurlkube)])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--visualizationurling--;{}".format(vizurlkubeing)])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--nginxname--;{}".format(sname)])

       if len(CLIENTPORT) > 1:
         if 'gRPC' in PRODUCETYPE:
           kcmd3=tsslogging.ingressgrpc(sname)
         else:
           kcmd3=tsslogging.ingress(sname)
       else:   # localfile being processed
         kcmd3=tsslogging.ingressnoext(sname)

       doparse("/{}/docs/source/kube.rst".format(sname), ["--ingress--;{}".format(kcmd3)])

       ###########################
       try:
         tmuxwindows = "None"
         with open("/tmux/pythonwindows_{}.txt".format(sname), 'r', encoding='utf-8') as file:
           data = file.readlines()
           data.append("viper-produce")
           data.append("viper-preprocess")
           data.append("viper-preprocess-pgpt")
           data.append("viper-preprocess-agenticai")
           data.append("viper-ml")
           data.append("viper-predict")
           tmuxwindows = ", ".join(data)
           tmuxwindows = tmuxwindows.replace("\n","")
           print("tmuxwindows=",tmuxwindows)
       except Exception as e:
          pass

       doparse("/{}/docs/source/operating.rst".format(sname), ["--tmuxwindows--;{}".format(tmuxwindows)])
       #try:
       if os.environ['TSS'] == "1":
         doparse("/{}/docs/source/operating.rst".format(sname), ["--tssgen--;TSS Development Environment Container"])
       else:
          if "KUBE" not in os.environ:
            doparse("/{}/docs/source/operating.rst".format(sname), ["--tssgen--;TML Solution Container"])
          else:
            if os.environ["KUBE"] == "0":
              doparse("/{}/docs/source/operating.rst".format(sname), ["--tssgen--;TML Solution Container"])
            else:
              doparse("/{}/docs/source/operating.rst".format(sname), ["--tssgen--;TML Solution Container (RUNNING IN KUBERNETES)"])

       # Kick off shell script
       #tsslogging.git_push("/{}".format(sname),"For solution details GOTO: https://{}.readthedocs.io".format(sname),sname)


       rtd = context['ti'].xcom_pull(task_ids='step_10_solution_task_document',key="{}_RTD".format(sname))
        #try:
       sp=f"{sname}/docs/source"
       orepo=tsslogging.getrepo()
       op=f"/{orepo}/tml-airflow/dags/tml-solutions/{projectname}"
       files,opath=tsslogging.dorst2pdf(sp,op)
       tsslogging.mergepdf(opath,files,f"{sname}")

       gb="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/pdf_documentation/{}.pdf".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,sname)
       print("INFO: Your PDF Documentation will be found here: {}".format(gb))

       # gityml
       gityml="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/ymls/{}".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,sname)
       doparse("/{}/docs/source/kube.rst".format(sname), ["--gityml--;{}".format(gityml)])

       oppt=copyymls(projectname,sname,kcmd3,kcmd2)
       updateollamaandpgpt(oppt,step9bollamacontainername,step9bconcurrency,step9bvectordbcollectionname,step9btemperature,step9brollback,step9bollama,step9bdeletevectordbcount,step9bvectordbpath,step9btopicid,step9benabletls,step9bpartition,step9bmainip,
                          step9bmainport,step9bembedding,step9bagents_topic_prompt,step9bteamlead_topic,step9bteamleadprompt,step9bsupervisor_topic,step9bsupervisorprompt,step9bagenttoolfunctions,step9bagent_team_supervisor_topic,step9bcontextwindow,
                          pvectorsearchtype,ptemperature,pcollection,pconcurrency,pvectordimension,pcontextwindowsize,pmainmodel,pmainembedding,pgptcontainername)

       subprocess.call("/tmux/gitp.sh {} 'For solution details GOTO: https://{}.readthedocs.io'".format(sname,snamertd), shell=True)

        #except Exception as e:
         # print("Error=",e)
       try:
        if rtd == None:
           URL = 'https://readthedocs.org/api/v3/projects/'
           TOKEN = os.environ['READTHEDOCS']
           HEADERS = {'Authorization': f'token {TOKEN}'}
           data={
               "name": "{}".format(sname),
               "repository": {
                   "url": "https://github.com/{}/{}".format(os.environ['GITUSERNAME'],sname),
                   "type": "git"
               },
               "homepage": "http://template.readthedocs.io/",
               "programming_language": "py",
               "language": "en",
               "privacy_level": "public",
               "external_builds_privacy_level": "public",
               "tags": [
                   "automation",
                   "sphinx"
               ]
           }
           response = requests.post(
               URL,
               json=data,
               headers=HEADERS,
           )
           print(response.json())
           tsslogging.tsslogit(response.json())
           os.environ['tssdoc']="1"
        time.sleep(10)
        updatebranch(sname,"main")
        triggerbuild(sname)
        ti = context['task_instance']
        ti.xcom_push(key="{}_RTD".format(sname), value="DONE")
        print("INFO: Your Documentation will be found here: https://{}.readthedocs.io/en/latest".format(snamertd))
       except Exception as e:
        print("ERROR=",e)

Json Key	Explanation
conf_project	This is the project name that will be used in Readthedocs documentation
conf_copyright	This is the copyright information that will be used in Readthedocs documentation
conf_author	This is the author name that will be used in Readthedocs documentation
conf_release	This is the release number for your Readthedocs documentation
conf_version	This is the version number that will be used in Readthedocs documentation
dockerenv	Ideally, TML solution containers run in Kubernetes. But, if you or other users run this container you can specify the docker environmental variables that can be modified at runtime. The format must be variable1=value1**variable2=value2…**, use THREE (3) stars to separate variable and value pairs.
dockerinstructions	You can specify instructions for users on how to to run your container.

8.18. Example Of Setting Docker Instructions in Step 10

default_args = {
 'conf_project' : 'Transactional Machine Learning (TML)',
 'conf_copyright' : '2024, Otics Advanced Analytics, Incorporated - For Support email support@otics.ca',
 'conf_author' : 'Sebastian Maurice',
 'conf_release' : '0.1',
 'conf_version' : '0.1.0',
 'dockerenv': 'step4cmaxrows=100***step4crawdatatopic=iot-preprocess***step4csearchterms=rgx:p([a-z]+)ch ~~~ |authentication failure,--entity-- password failure ***\
 step4crememberpastwindows=500***step4cpatternwindowthreshold=30***step4crtmsscorethreshold=0.6***step4cattackscorethreshold=0.6***\
 step4cpatternscorethreshold=0.6***step4crtmsstream=rtms-stream-mylogs***step4clocalsearchtermfolder=|mysearchfile1,|mysearchfile2***\
 step4clocalsearchtermfolderinterval=60***step4crtmsfoldername=rtms2***step3localfiledocfolder=mylogs,mylogs2***step4crtmsmaxwindows=1000000', # add any environmental variables for docker must be: variable1=value1***variable2=value2
 'dockerinstructions': """To run this docker container Enter the following CORE parameters:

      1. KAFKABROKERHOST=127.0.0.1:9092 - this uses the Local Kafka installed in your TML solution container.
         You can specify a Kafka Cloud URL if using AWS MSK or Confluent Kafka Cloud, simply replace this field.

      2. Enter KAFKACLOUDUSERNAME and  KAFKACLOUDPASSWORD IF using Kafka Cloud from AWS MSK
         and Confluent, if using local kafka (127.0.0.1:9092), these MUST be empty.

      3. SASLMECHANISM=PLAIN is set for Local Kafka and Confluent Kafka Cloud.
         If using AWS MSK, this MUST be changed to SCRAM512.

      4. Enter GITUSERNAME

      5. Enter GITPASSWORD

      6. Enter READTHEDOCS

      7. Update volume mapping: /your_localmachine/foldername:/rawdata:z

      8. IF YOU ARE DISTRUBUTING THIS CONTAINER TO OTHERS THEN SEND THEM THIS DOCKER RUN BUT THEY WILL NEED TO ENTER THE ABOVE CORE PARAMETERS.
         TO MAKE IT EASY FOR OTHERS TO RUN YOUR SOLUTION YOU CAN USE THE TSSTMLDEMO GITHUB AND READTHEDOCS ACCOUNT - UPDATE THE FOLLOWING:

      9.  GITUSERNAME=tsstmldemo

      10. GITREPOURL=https://github.com/tsstmldemo/tsstmldemo

      11. GITPASSWORD=<Will be retrieved from OS IF using tsstmldemo>

      12. READTHEDOCS=aefa71df39ad764ac2785b3167b77e8c1d7c553a

      13. step4cmaxrows=100 this means the number of offsets to rollback.  Change to higher or lower number.
          Higher number more data will be processed and more memory consumed.

      14. step4crawdatatopic=iot-preprocess, this is the Step 4 preprocessing topic of the entities.
          If this is empty string, no entities are cross-refenced with the log files.  Only log files will be processed.

      15. step4csearchterms=rgx:p([a-z]+)ch ~~~ |authentication failure,--entity-- password failure, these are
          the fixed search terms.  You can specify dynamic search terms in the field step4clocalsearchtermfolder

      16. step4crememberpastwindows=500, this is the past, short-term windows for TML to remember.
          TML RTMS will go back 500 sliding time windows.

      17. step4cpatternwindowthreshold=30, this is the maximum pattern threshold before raising an alarm.

      18. step4crtmsscorethreshold=0.6, this is the RTMS score threshold.  This is used to send
          messages that exceed this RTMS threshold to its own rtms topic.

      19. step4cattackscorethreshold=0.6, this is the Attack score threshold.  This is used to send messages
          that exceed this attack threshold to its own attack topic.

      20. step4cpatternscorethreshold=0.6, this is the Pattern score threshold.  This is used to send
          messages that exceed this pattern threshold to its own pattern topic.

      21. step4crtmsstream=rtms-stream-mylogs, this is the kafka topic that stores ALL the results from RTMS.

      22. step4clocalsearchtermfolder=|mysearchfile1,|mysearchfile2, this is name of the folders that
          contain text files for searches. A | for OR, and @ for AND.  TML will read the search terms
          in real-time and immediately start applying them to the streamed data.

      23. step4clocalsearchtermfolderinterval=60, this is the number in seconds that the files
          in the folders specified in step4clocalsearchtermfolder, will be read.  So, 60 means,
          read files every 60 seconds.

      24. step4crtmsfoldername=rtms2, TML RTMS will output logs of the search results to GitHub.
          This is convenient for testing and validation.  NOTE: Only the latest 950 files will
          be sent to GitHub because GitHub has a maximum file limit of 1000.

      25. step3localfiledocfolder=mylogs,mylogs2, these are the folders that contain your log
          text log files.  These are read in STEP 3 LOCALFILE task.

      26. step4crtmsmaxwindows=1000000, this is the maximum number of windows for LONG-TERM
          pattern matching.  Here, TML will go-back 1,000,000 sliding time windows,
          which in effect could be months of analysis.  Yoi can easily increase this number.

      - PLEASE NOTE: THE GITHUB AND READTHEDOCS ACCOUNTS ARE PUBLIC AND SHARED ACCOUNTS BY OTHERS.

      - THEY ARE MEANT ONLY FOR QUICK DEMOS.  IDEALLY, PERSONAL GITHUB AND READTHEDOCS ACCONTS SHOULD BE USED."""
}

8.19. Creating Your Own DAG

Note

This is for advanced TML developer who are also advanced Python developers.

You can easily create your own custom DAG and add it to the solution templates. Follow these guideline.

Create a project first - see Lets Start Building a TML Solution
Go to your project folder in TSS - as shown in figure below
Create and SAVE your DAG

Tip

You should copy a previously written TML Dag and then simply modify it for your needs.
Your new DAG will be in the project folder.

Important

Make sure you click Git Workspaces to commit your DAG to Github. As shown in the figure below.

Now add your new DAG to one of the solution templates. Simply click one of the solution templates.

Lets choose solution DAG solution_template_processing_dag-myawesometmlsolution.py. Import your new DAG into the temlate by adding an import statement for your new DAG. Here you can create step 11 for your new DAG called “mynewdag”:

step11 = importlib.import_module(“tml-solutions.myawesometmlsolution.mynewdag”)

Now, connect your new DAG to the solution process flow - as shown in figure below:

Note

This task assumes you have a function named mycooldag in your python script: tml-solutions.myawesometmlsolution.mynewdag.py and now TSS will also run sensor_H task you just created.

To run your new solution - click DAGs in the top-menu.

You should see your new STEP 11. If so, CONGRATULATIONS! You just created a new/custom TML solution.

8.20. Github Push Issues

You may, sometimes, encounter an issue pushing to Github in the UI. IF this happens, you can issue a +gitresetpull or +gitresetpush as shown in the figure below:

Note

This ususaly happens if there is commit from another process.

Important to note that +gitresetpull will fetch all of the commits and add them to the main branch.

+gitresetpush will rebase the commit to the head of the main branch, commit the changes and push it to main branch.

After the +gitresetpull – you can then Push your changes.

8.21. Example TML Solution Container Reference Architecture

The above image shows a typical TML solution container

Attention

Every TML solution runs in a Docker container
Linux is installed in the container
TMUX (terminal multiplexer) is used to structure TML solution components in their own task windows to make it easier to maintain and operationalize TML solutions
Apache Kafka is installed (Cloud Kafka can easily be used)
maria db is used as a configuration database for TML solutions
specific solution python scripts are installed and run the TML solution
TML dashboard code (html/javascript) runs in the container
java is installed

8.22. Lets Start Building a TML Solution

Here is the TML solution creation process, that is detailed below:

PROCESS STEPS

Process STEP 0. Go into tml-airflow folder

Start the TSS container (TSS Docker Run Command) and go into the TSS Code Editor: TSS Code Editor.

Process STEP 1. Type the name of your project

You must choose a name for your TML project. No spaces, or special characaters, just text.

NOTE: Four characters from your READTHEDOCS token will be automatically appended to your project name.

Process STEP 2. Click the folder: myawesometmlproject-3f10

You must choose a name for your TML project. No spaces, or special characaters, just text.

NOTE: We are just using myawesometmlproject as an example. Youc an choose any name you want.

Process STEP 3. Make Parameter Modifications to Your Project’s TML DAGs

Simply update the parameters to your TML DAGs. You do not need to write any code.

Process STEP 4. Choose the Solution Template You Want to Run

You must select a solution template. These templates build and run the entire end-end

TML solution and make modifications to your TML DAGs.

Process STEP 5. Run Your Solution

You can now run your solution.

Process STEP 6: Go To the Solution Documentation

You can now run your solution.

Process STEP 7: Your Solution Docker Run Command

You can now run your solution container.

Process STEP 8: Stream Your Solution Dashboard

Stream your real-time dashboard.

Process STEP 9: TML Solution Built in Less than 2 Minutes

Congratulations! You just built a real-time solution in less than 2 minutes

8.23. STEP 0. Go into tml-airflow folder

Tip

Watch the video that shows how to easily create, delete, copy and stop TML project: Youtube Video

Assuming you have the TSS container running following the steps here TSS Docker Run Command and logged in using the instructions here How To Use the TML Solution Container go into DAG code editor then:

8.24. STEP 0. tml-airflow -> dags -> tml-solutions

You will see the following as shown in figure below

8.25. STEP 1. Click the file: CREATETMLPROJECT.txt - you will see the following as shown in figure below:

8.26. STEP 1. Type the name of your project

8.26.1. Creating a Project

Important

You should use lowecase letters. DO NOT ENTER ANY SPACES - Enter any name like myawesometmlproject then PRESS SAVE

Note

All projects will be “appended” with parts of your READTHEDOCS token. This is to ensure project uniqness on READTHEDOCS.

8.27. STEP 1. You just created a TML Project and committed to Github. Congratulations!

To confirm everything went ok go to the Github account:

i.e. /raspberrypi/tml-airflow/dags/tml-solutions/ you should see a folder for myawesometmlproject-3f10

8.28. Deleting a Project

Tip

If you want to DELETE this project simply type a - (minus) in front of it (as shown below):

-myawesometmlproject

The TSS will delete the entire project and commit the changes to Github.

NOTE: If you deleted a previous project and re-created it you should CLEAR your TSS browser CACHE.

Warning

All information/code related to this project will be deleted and may not

be recoverable.

8.29. STEP 2. Click the folder: myawesometmlproject-3f10

You will see the figure below - VOILA!

8.30. STEP 2. Confirm Your New Project Was Created in TSS and Committed to Github

To confirm the new DAGs for myawesometmlproject were created properly, in TSS click DAGs (top menu item)

Then enter a filter: myawesometmlproject Click Enter.

You should see all your DAGs (note if they don’t show up just wait 30 seconds or so) - you should see figure below:

Important

What did you just do?

You copied TML TEMPLATE DAGs to your own solution folder - for your own TML solution build.

If you want to create another TML solution - just repeat STEPS 1-3 with a new project name.

Tip

New project could take 30 seconds or more to show up on the main Airflow screen.

Please be patient. If there are no errors - it will show up.

8.30.1. Stopping a Running Project

To stop a running project use the ‘.’ then project name.

8.30.2. Copying A Previous Project

Tip

If you want to copy from a previous TML project and rename to a new project then:

In STEP 3 type myawesometmlproject>myawesometmlproject2, the character “>” means copy myawesometmlproject to myawesometmlproject2 (as shown in figure below)
Hit Save
Voila! You just copied an older projec to a new one and saved the time in entering paramters in the DAGs.

To confirm the new project was properly copied repeat STEPS 4 - 6. You should see your myawesometmlproject2-3f10 committed to Github:

Important

The documentation link WILL ONLY be functional AFTER you run your project in TSS.

Here are your new DAGs:

Tip

Check the logs for status updates: Go to /raspberrypi/tml-airflow/logs/logs.txt

Tip

For details on the editor go to Codemirror

8.31. STEP 3. Make Parameter Modifications to Your Project’s TML DAGs

TML Dags inside your project:

8.32. STEP 4. Choose the Solution Template You Want to Run

You have several solution templates to choose from see TML Solution Templates and choose the functions you want your solution to perform see The Solution Template Naming Conventions

Attention

After you create a project in STEP 1 above, these templates will be copied under your project.

DO NOT MODIFY the original templates, create a project first, then work on the renamed templates under your project name.

This ensure proper versioning of projects, and ensures project integrity. Also, it allows you to see the differences between multiple projects.

Important

This solution reads a local file. All local files are in the /rawdata folder in the container. If you want to read your own local file, you MUST map a local folder to the rawdata folder. For further details refer to here Producing Data Using a Local File

8.32.1. Project Solution Template Run

As an example, let choose solution_preprocessing_dag-myawesometmlsolution-3f10

Tip

Note, when you create your own project - I called mine: myawesometmlsolution - all of the DAGs and solution templates are copied, renamed and committed to Github. It is a copy of DAG 8. Solution Template: solution_template_processing_dag.py and simply copied, renamed and moved under your project folder myawesometmlsolution-3f10. Go to TSS and see it as in STEP 3.

Also, this project folder will automatically be committed to your Github folder - see figure below.

Now, as per STEP 3. Make a Parameter Modification to Your Project’s TML DAGs as you need. This DAG uses a local file for ingesting data: how do I know this? See The Solution Template Naming Conventions

8.32.1.1. Parameter Changes to TML DAGs

Here is a step by step changes to the TML DAGs.

tml_read_LOCALFILE_step_3_kafka_producetotopic_dag-myawesometmlsolution-3f10.py: Change the inputfile field to point to your local data file:
- I added ‘inputfile’ : ‘/rawdata/IoTData.txt’ - the IoTData.txt is provided to you for demonstation inside the TSS container in the /rawdata folder.
- SAVE the file
tml_system_step_1_getparams_dag-myawesometmlsolution-3f10.py: Most of the parameters are set for you. But, if you are using KAFKA CLOUD you may want to set:

brokerhost : ‘127.0.0.1’, # <<<<************* THIS WILL ACCESS LOCAL KAFKA - YOU CAN CHANGE TO CLOUD KAFKA HOST

brokerport : ‘9092’, # <<<<************* LOCAL AND CLOUD KAFKA listen on PORT 9092

cloudusername : ‘’, # <<<< –THIS WILL BE UPDATED FOR YOU IF USING KAFKA CLOUD WITH API KEY - LEAVE BLANK

cloudpassword : ‘’, # <<<< –THIS WILL BE UPDATED FOR YOU IF USING KAFKA CLOUD WITH API SECRET - LEAVE BLANK

To see what all the other parameters mean, go here DAG STEP 1: Parameter Explanation

For our demonstration we will use the existing values in the DAG.

tml_system_step_2_kafka_createtopic_dag-myawesometmlsolution-3f10.py: Now create all the Kafka topics for your solution. Specifcally,

‘raw_data_topic’ : ‘iot-raw-data’, # Separate multiple topics with comma <<< ****** You change topic names as needed

‘preprocess_data_topic’ : ‘iot-preprocess,iot-preprocess2’, # Separate multiple topics with comma <<< ****** You change topic names as needed

‘ml_data_topic’ : ‘ml-data’, # Separate multiple topics with comma <<< ****** You change topic names as needed

‘prediction_data_topic’ : ‘prediction-data’, # Separate multiple topics with comma <<< ****** You change topic names as needed

‘pgpt_data_topic’ : ‘cisco-network-privategpt’, # PrivateGPT will produce responses to this topic - change as needed

‘replication’ : ‘1’, Leave at 1 for on-prem Kafka

‘numpartitions’: ‘1’, Increase partition as needed.

All topics will be created for your solution in Kafka.

Important

If using Kafka Cloud you will need to set

‘replication’ : ‘3’, Change to a minimum of 3 for replication factor
‘numpartitions’: ‘1’, Increase partition as needed.

For more explanation on parameters go here DAG STEP 2: Parameter Explanation

tml_system_step_4_kafka_preprocess_dag-myawesometmlsolution-3f10.py: Modify the preprocessing JSONCRITERIA.

Refer to JSON PROCESSING for more explanation. The following jsoncriteria is being used.

'jsoncriteria' : 'uid=metadata.dsn,filter:allrecords~\
subtopics=metadata.property_name~\
values=datapoint.value~\
identifiers=metadata.display_name~\
datetime=datapoint.updated_at~\
msgid=datapoint.id~\
latlong=lat:long', # <<< **** Specify your json criteria. Here is an example of a multiline json -

Note

Since this is preprocessing ONLY we are skipping the Machine Learning and AI DAGs - DAGS 5, 6 and 9.

tml_system_step_7_kafka_visualization_dag-myawesometmlsolution-3f10.py

For further details on how to create your own dashboards refer to :re:`Creating Your Own Dashboards`

As an example, TSS has several dashboards out of the box - dashboard.html is being used here.

Other dashboards are:

iot-failure-seneca.html
iot-failure-machinelearning-uoft.html
tml-cisco-network-privategpt-monitor.html

You can go inside these dashboard by going to your <repo>/tml-airflow/dashboard in Github and create your own.

8.33. STEP 5. Run Your Solution

The figures below show the VERY SIMPLE steps of running your solution template DAG:

Then click the START button on top right.

If the solution ran successfully you will see all green light.

8.34. STEP 6: Go To the Solution Documentation

Your solution documentation is automatically generated for you:

Important

Goto the URL: https://myawesometmlsolution-3f10.readthedocs.io/

Tip

To find the name of the documentation URL goto to your Github /tml-airflow/dags/tml-solutions/myawesometmlsolution-3f10

The url is in the commit message as shown in figure below.

8.35. STEP 7: Your Solution Docker Run Command

You solution docker container is also automatically built and pushed to Docker hub:

Your Solution docker run command is in the documentation. You can now take this Docker container and scale it with Kubernetes as you wish.

8.36. STEP 8: Stream Your Solution Dashboard

Click the Operating Details and Run Your Dashboard

And, here is your real-time dashboard - auto-generated!

8.37. STEP 9: TML Solution Built in Less than 2 Minutes

CONGRATULATIONS! YOU JUST BUILT A END-END REAL-TIME SOLUTION IN LESS THAN 2 MINUTES!

8.38. Project Action Commands Summary

Goto the TSS and select from the top menu item: Admin -> Dags Code Editor

Navigate to the File: root/tml-airflow/dags/tml-solutions/CREATETMLPROJECT.txt then perform the following as you wish:

Action Type	Syntax	Explanation
Add Project	No symbol needed	Just Type project name. No spaces, or special characters, just alphanumerics in CREATETMLPROJECT.txt
Delete Project	-	Type - then project name. For example, -myproject in CREATETMLPROJECT.txt
Copy From a Previous Project	>	Type > between projects. For example, oldproject>newproject in CREATETMLPROJECT.txt
Stop a Running Project	.	Type . then your currently running project. For example, .myproject in CREATETMLPROJECT.txt

Tip

Also see here Copying TML Project(s) From Others Git Repo for copying projects between TML users.