7. TML Solution Building

7.1. Why Do I Need TML?

TML is the world’s only technology that can perform entity based machine learning, in-memory, on real-time data integrated with Apache Kafka. Any where you need to process real-time data - you NEED TML. It can be used in any industry globally.

Important

TML offers several advantages over conventional Stream Processing: In addition to being:

  • the FASTEST and EASIEST way to build advanced, scalable, secure, and cost-effective, real-time solutions, with GenAI, for the Enterprise,

  • in roughly TWO (2) minutes with

    • automated documentation,

    • automated docker builds and,

    • automated code commits to Github

    • with tight integration with Apache Airflow and Apache Kafka

More Reasons:

  1. Stream Processing from AWS Kinesis, or Spark Streaming - Do Not perform in-memory entity based machine learning or processing of real-time data. TML Does.

2. Stream Processing technologies are very expensive. Because TML is comprised of 3 binaries they can be operated like microservices with very little cost overhead (if any) due to in-memory processing of real-time data - this means no external databases are needed for machine learning reducing storage, compute and network transfer costs.

  1. Stream processing solutions still use SQL to process data. TML uses JSON processing, in-memory, which is faster, cheaper and easier to manage.

4. Performing machine learning with Streaming processing is difficult, costly, and does not perform entity based machine learning. TML performs in-memory machine learning at the entity level for each device that is producing real-time data, this makes it very effective to learn each individual device behaviours and predict future behaviours more accurately.

5. Stream Processing technologies still require lots of code. TML solutions are low-code or no-code using the TML Solution Studio (TSS). The TSS uses DAGs that allows users to quickly configure their TML solutions, and automatically deploy it with Docker, automatically generate the documentations for the solution, and commit code to Github repos.

6. TML is integrated with GenAI using PrivateGPT and Qdrant vector DB. This integration makes it the first solution that provides fast AI integrated with real-time data processing and machine learning at the entity level.

7. To ingest data from devices TML offers pre-built client python code. Users can easily using gRPC, REST API, MQTT to ingest data directly from devices and stream it to Kafka. Refer to STEP 3: Produce to Kafka Topics for more details.

7.2. Where Is TML Used?

Note

TML is used by companies and people around the world to process real-time data. Because TML is free for students and researchers, it is used by thousands of students in Universities and Colleges around the world as official part of the curriculum courses in IoT, Cybersecurity, Machine Learning, Data Science, and Big Data Management courses.

7.3. TML Solutions Can Be Built In 10 Steps Using Pre-Written DAGs (Directed Acyclic Graphs)

Users simply make configuration changes to the DAGs and build the solution. TML Studio will even automatically containerize your complete solution, and auto generate online documentation.

7.4. Where Do I Start?

Attention

START HERE: The fastest way to build TML solutions with your real-time data is to use the TML Solution Studio (TSS) Container

7.5. Pre-Written 10 Apache Airflow DAGs To Speed Up TML Solution Builds

The TML solution process with DAGS (explained in detail below). The entire TML solution build process is highly efficient; advanced, scalable, real-time TML solutions can be built in few hours with GenAI integrations!

_images/tssprocess.png

7.5.1. DAG Solution Process Explanation

Note

The above process shows Ten (10) DAGs that are used to build advanced, scalable, real-time TML solutions with no-code - just configurations to the DAGs.

  1. Build Process starts with setting up system parameters for Initial TML Solution Setup. Users simply need to provide configuration information in the following DAG:

    STEP 1: Get TML Core Params: tml_system_step_1_getparams_dag

  2. The next step is to create all your topics in Kafka - these topic will store all your input and output data. This is done in:

    STEP 2: Create Kafka Topics: tml_system_step_2_kafka_createtopic_dag

Your initial TML setup is complete.

Next, you want to start generating and producing data to the topics you creating and choose an Ingest Real-Time Data Method. TML provides you with FOUR (4) methods to stream your own data from any device. This is done in the following DAGS - you need to CHOOSE ONE method:

  1. STEP 3: Produce to Kafka Topics

3a. MQTT: STEP 3a: Produce Data Using MQTT: tml-read-MQTT-step-3-kafka-producetotopic-dag

3b. REST API: STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

3c. gRPC: STEP 3c: Produce Data Using gRPC: tml-read-gRPC-step-3-kafka-producetotopic-dag

3d. Local File: STEP 3d: Produce Data Using LOCALFILE: tml-read-LOCALFILE-step-3-kafka-producetotopic-dag

You are also provided CLIENT files for REST API and gRPC - these clients connect to the SERVERS in 3b and 3c:

3a.i: STEP 3a.i: MQTT CLIENT

3b.i: STEP 3b.i: REST API CLIENT

3c.i: STEP 3c.i: gRPC API CLIENT

You are also provided with an MQTT method - if you are using a MQTT broker for machine to machine communication.

After you have chosen an ingest data method and producing data, you are ready to Preprocess Real-Time Data - the next DAG performs this function:

  1. STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag - Preprocessing is a very quick way to start generating insights from your real- time in few minutes. All preprocessing is done in-memory and no external databases are needed, only Kafka. After you have preprocessed your data, you can use this preprocessed data for machine learning - the next DAG performs this function.

4a. STEP 4a: Preprocesing Data: tml-system-step-4a-kafka-preprocess-dag - This preprocessing step uses jsoncriteria to extract data from Step 4.

4b. STEP 4b: Preprocesing 2 Data: tml-system-step-4b-kafka-preprocess-dag - This second preprocessing step is an important step that uses the

preprocessed data for additional processing in machine learning. In the conventional machine learning sense, STEP 4 is like “feature engineering” and STEP 4b is using the engineered features for a much deeper understanding of the data streaming variables.

4c. STEP 4c: Preprocesing 3 Data: tml-system-step-4c-kafka-preprocess-dag - This is the third preprocessing step that allows users to incoporate TEXT

files with machine learning outputs and incoprtaing “past memory” with sliding time windows. User can control how TML maintains past memory of past sliding time windows. For details see How TML Maintains Past Memory of Events Using Sliding Time Windows

  1. STEP 5: Entity Based Machine Learning : tml-system-step-5-kafka-machine-learning-dag - this is another powerful DAG automatically starts building

    entity based machine learning models for your real-time data. Note, TML will continuously build ML models are new data streams in. All machine learning is done in-memory and no external databases are needed, only Kafka. As these models are trained on your real-time data - the next DAG performs predictions.

  2. STEP 6: Entity Based Predictions: tml-system-step-6-kafka-predictions-dag - These predictions get automatically generated in parallel to machine

    learning training process in DAG 5. As predictions are being generated, you can stream these predictions to a real-time dashboard - the next DAG performs this function.

  3. STEP 7: Real-Time Visualization: tml-system-step-7-kafka-visualization-dag - The visualization data are streamed directly from the TML solution container over websockets to the client browser, this eliminates any need for third-party visualization software. Now, that you have built the ENTIRE TML SOLUTION END-END you are ready to deploy it to Docker - the next DAG performs this function.

  4. STEP 8: Deploy TML Solution to Docker : tml-system-step-8-deploy-solution-to-docker-dag - The TML docker container is automatically built for you and pushed to Docker Hub. If you have chosen to integrate GPT into you solution - you can initiate the PrivateGPT and Qdrant containers - the next DAG performs this function.

  5. STEP 9: PrivateGPT and Qdrant Integration: tml-system-step-9-privategpt_qdrant-dag - This DAG integrates your real-time solution seamlessly with GenAI using the privateGPT container see TML and Generative AI.

9b. STEP 9b: Multi-Agentic Agentic A: tml-system-step-9b-agenticai-dag - This DAG integrates Multi-Agentic AI

with your real-time solution seamlessly see TML and Agentic AI.

  1. STEP 10: Create TML Solution Documentation: tml-system-step-10-documentation-dag.

YOU ARE DONE! You just build an advanced, scalable, end-end real-time solution and deployed it to Docker, integrated with AI and with online documentation. ENJOY!

DAGs (Directed Acyclic Graphs) are a powerful and easy way to build powerful (real-time) TML solutions quickly. Users are provided with the following DAGs:

Note

The numbers in the DAGs indicate solution process step. For example, step 2 is dependent on step 1.

7.5.2. DAG Table

DAG Name

STEP 1: Get TML Core Params: tml_system_step_1_getparams_dag

Description: This DAG will get the core TML connection and

tokens needed for operations.

STEP 2: Create Kafka Topics: tml_system_step_2_kafka_createtopic_dag

Description: This DAG will create all the necessary

topics in Kafka (on-prem or Cloud) for

your TML solution.

STEP 3a: Produce Data Using MQTT: tml-read-MQTT-step-3-kafka-producetotopic-dag

Description: This DAG is an MQTT server and will listen

for a connection from a client. You use

this if your TML solution ingests data

from MQTT system like HiveMQ and

stream it to Kafka.

STEP 3d: Produce Data Using LOCALFILE: tml-read-LOCALFILE-step-3-kafka-producetotopic-dag

Description: This DAG will read a local CSV file for

data and stream it to Kafka.

STEP 3c: Produce Data Using gRPC: tml-read-gRPC-step-3-kafka-producetotopic-dag

Description: This DAG is an gRPC server and will

listen for a connection from a gRPC client.

You use this if your TML solution ingests data

from devices and you want to

leverage a gRPC connection and stream the

data to Kafka.

STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

Description: This DAG is an RESTAPI server and will listen

for a connection from a REST client.

You use this if your TML solution ingests

data from devices and you want

to leverage a rest connection and stream the

data to Kafka.

STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag

Description: This DAG perform entity level preprocessing on

the real-time data. There are over

35 different preprocessing types in TML.

STEP 4b: Preprocesing 2 Data: tml-system-step-4b-kafka-preprocess-dag

Description: This DAG perform entity level preprocessing on the featured

engineered variables in STEP 4. The processed variables are named

in a standard way following the procedure here Preprocessed Variable Naming Standard

STEP 4c: Preprocesing 3 Data: tml-system-step-4c-kafka-preprocess-dag

Description: Step 4c is a very powerful task that will incorporate real-time memory

using sliding time windows: for details see How TML Maintains Past Memory of

Events Using Sliding Time Windows. THIS IS `RTMS SOLUTION<https://tml.readthedocs.io/en/latest/rtms.html>`_.

STEP 5: Entity Based Machine Learning : tml-system-step-5-kafka-machine-learning-dag

Description: This DAG perform entity level machine

learning on the real-time data.

STEP 6: Entity Based Predictions: tml-system-step-6-kafka-predictions-dag

Description: This DAG performs predictions using the

trained algorithms for every entity.

STEP 7: Real-Time Visualization: tml-system-step-7-kafka-visualization-dag

Description: This DAG streams the output to a

real-time dashboard.

STEP 8: Deploy TML Solution to Docker : tml-system-step-8-deploy-solution-to-docker-dag

Description: This DAG automatically deploys the entire

TML solution to Docker container - and pushes

it to Dockerhub.

STEP 9: PrivateGPT and Qdrant Integration: tml-system-step-9-privategpt_qdrant-dag

Description: This DAG integrates your real-time solution seamlessly with

GenAI using the privateGPT container see TML and Generative AI.

This is a very powerful, secure, and low-cost way of harnessing the power of

AI for fast AI analysis of your streaming data. No data is sent outside

your network, the privateGPT container runs locally.

STEP 9b: Multi-Agentic Agentic A: tml-system-step-9b-agenticai-dag

Description: This DAG integrates your real-time solution seamlessly with

Multi-Agentic AI see TML and Agentic AI.

This is a very powerful, secure, and low-cost way of harnessing the power of

Multi-Agentic AI for fast Agent-Based analysis of your streaming data. No data is sent outside

your network, the agentic AI solution container runs locally.

STEP 10: Create TML Solution Documentation: tml-system-step-10-documentation-dag

Description: This DAG will automatically create

the documentation for your solution

on readthedocs.io.

7.5.3. STEP 1: Get TML Core Params: tml_system_step_1_getparams_dag

Below is the complete definition of the tml_system_step_1_getparams_dag. Users only need to configure the code highlighted in the USER CHOSEN PARAMETERS.

Tip

For details on the parameters below refer to MAADS-VIPER Environmental Variable Configuration (Viper.env)

Watch the YouTube video on dag configurations: YouTube video

   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.operators.bash import BashOperator
   from datetime import datetime
   from airflow.decorators import dag, task
   import os
   import sys
   import tsslogging
   import time
   import subprocess
   import shutil
   import glob

   sys.dont_write_bytecode = True
   ######################################################USER CHOSEN PARAMETERS ###########################################################
   default_args = {
    'owner': 'Sebastian Maurice',  # <<< ******** change as needed
    'brokerhost' : '127.0.0.1',  # <<<<***************** THIS WILL ACCESS LOCAL KAFKA - YOU CAN CHANGE TO CLOUD KAFKA HOST
    'brokerport' : '9092',     # <<<<***************** LOCAL AND CLOUD KAFKA listen on PORT 9092
    'cloudusername' : '',  # <<<< --THIS WILL BE UPDATED FOR YOU IF USING KAFKA CLOUD WITH API KEY  - LEAVE BLANK
    'cloudpassword' : '',  # <<<< --THIS WILL BE UPDATED FOR YOU IF USING KAFKA CLOUD WITH API SECRET - LEAVE BLANK
    'solutionname': '_mysolution_',   # <<< *** DO NOT MODIFY - THIS WILL BE AUTOMATICALLY UPDATED
    'solutiontitle': 'My Solution Title', # <<< *** Provide a descriptive title for your solution
    'solutionairflowport' : '4040', # << If -1, TSS will choose a free port randonly, or set this to a fixed number
    'solutionexternalport' : '5050', # << If -1, TSS will choose a free port randonly, or set this to a fixed number
    'solutionvipervizport' : '6060', # << If -1, TSS will choose a free port randonly, or set this to a fixed number
    'description': 'This is an awesome real-time solution built by TSS',   # <<< *** Provide a description of your solution
    'HTTPADDR' : 'https://',
    'COMPANYNAME' : 'My company',
    'WRITELASTCOMMIT' : '0',   ## <<<<<<<<< ******************** FOR DETAILS ON BELOW PARAMETER SEE: https://tml.readthedocs.io/en/latest/viper.html
    'NOWINDOWOVERLAP' : '0',
    'NUMWINDOWSFORDUPLICATECHECK' : '5',
    'DATARETENTIONINMINUTES' : '1440',
    'USEHTTP' : '0',
    'ONPREM' : '0',
    'WRITETOVIPERDB' : '0',
    'VIPERDEBUG' : '2',
    'MAXOPENREQUESTS' : '10',
    'LOGSTREAMTOPIC' : 'viperlogs',
    'LOGSTREAMTOPICPARTITIONS' : '1',
    'LOGSTREAMTOPICREPLICATIONFACTOR' : '3',
    'LOGSENDTOEMAILS' : '',
    'LOGSENDTOEMAILSSUBJECT' : '[VIPER]',
    'LOGSENDTOEMAILFOOTER' : 'This e-mail is auto-generated by Transactional Machine Learning (TML) Technology Binaries: Viper, HPDE or Viperviz.  For more information please contact your TML Administrator.  Or, e-mail info@otics.ca for any questions or concerns regarding this e-mail. If you received this e-mail in error please delete it and inform your TML Admin or e-mail info@otics.ca, website: https://www.otics.ca.  Thank you for using TML Data Stream Processing and Real-Time Transactional Machine Learning technologies.',
    'LOGSENDINTERVALMINUTES' : '500',
    'LOGSENDINTERVALONLYERROR' : '1',
    'MAXTRAININGROWS' : '300',
    'MAXPREDICTIONROWS' : '50',
    'MAXPREPROCESSMESSAGES' : '5000',
    'MAXPERCMESSAGES' : '5000',
    'MAXCONSUMEMESSAGES' : '5000',
    'MAXVIPERVIZROLLBACKOFFSET' : '',
    'MAXVIPERVIZCONNECTIONS' : '10',
    'MAXURLQUERYSTRINGBYTES' : '10000',
    'MYSQLMAXLIFETIMEMINUTES' : '4',
    'MYSQLMAXCONN' : '4',
    'MYSQLMAXIDLE' : '10',
    'MYSQLHOSTNAME' : '127.0.0.1:3306',
    'KUBEMYSQLHOSTNAME' : 'mysql-service:3306', # this is the mysql service in kubernetes
    'MYSQLDB' : 'tmlids',
    'MYSQLUSER' : 'root',
    'SASLMECHANISM' : 'PLAIN',
    'MINFORECASTACCURACY' : '55',
    'COMPRESSIONTYPE' : 'gzip',
    'MAILSERVER' : '', #i.e.  smtp.broadband.rogers.com,
    'MAILPORT' : '', #i.e. 465,
    'FROMADDR' : '',
    'SMTP_USERNAME' : '',
    'SMTP_PASSWORD' : '',
    'SMTP_SSLTLS' : 'true',
    'SSL_CLIENT_CERT_FILE' : 'client.cer.pem',
    'SSL_CLIENT_KEY_FILE' : 'client.key.pem',
    'SSL_SERVER_CERT_FILE' : 'server.cer.pem',
    'KUBERNETES' : '0',
   }

   ############################################################### DO NOT MODIFY BELOW ####################################################

   def reinitbinaries(sname):
       pywindowfiles=glob.glob("/tmux/pythonwindows_*")

       for f in pywindowfiles:
           try:
             with open(f, 'r', encoding='utf-8') as file:
               data = file.readlines()
               for d in data:
                 if d != "":
                   d=d.rstrip()
                   v=subprocess.call(["tmux", "kill-window", "-t", "{}".format(d)])
             os.remove(f)
           except Exception as e:
            print("ERROR=",e)
            pass

       vizwindowfiles=glob.glob("/tmux/vipervizwindows_*")

       for f in vizwindowfiles:
           try:
             with open(f, 'r', encoding='utf-8') as file:
                data = file.readlines()
                for d in data:
                    d=d.rstrip()
                    dsw = d.split(",")[0]
                    dsp = d.split(",")[1]
                    if dsw != "":
                      subprocess.call(["tmux", "kill-window", "-t", "{}".format(dsw)])
                      v=subprocess.call(["kill", "-9", "$(lsof -i:{} -t)".format(dsp)])
                      time.sleep(1)
             os.remove(f)
           except Exception as e:
            pass

       # copy folders
       shutil.copytree("/tss_readthedocs", "/{}".format(sname),dirs_exist_ok=True)
       #remove local logs
       try:
         os.remove('/dagslocalbackup/logs.txt')
       except Exception as e:
         pass

   def updateviperenv():
       # update ALL
       os.environ['tssbuild']="0"
       os.environ['tssdoc']="0"

       cloudusername = ""
       cloudpassword = ""

       if 'KAFKACLOUDUSERNAME' in os.environ:
             cloudusername = os.environ['KAFKACLOUDUSERNAME']
       if 'KAFKACLOUDPASSWORD' in os.environ:
             cloudpassword = os.environ['KAFKACLOUDPASSWORD']
       if 'KAFKABROKERHOST' in os.environ:
             default_args['brokerhost'] = os.environ['KAFKABROKERHOST']
             default_args['brokerport']=''
       if 'SASLMECHANISM' in os.environ:
          default_args['SASLMECHANISM']=os.environ['SASLMECHANISM']

       if '127.0.0.1' in default_args['brokerhost']:
         cloudusername = ""
         cloudpassword = ""
         if 'KUBE' in os.environ:
            if os.environ['KUBE'] == "1":
             if 'KAFKABROKERHOST' in os.environ:
                 default_args['brokerhost'] = os.environ['KAFKABROKERHOST']
                 default_args['brokerport']=''
             if "KUBEBROKERHOST" in os.environ:
                 buf = os.environ['KUBEBROKERHOST']
                 sp = buf.split(":")
                 default_args['brokerhost']=sp[0]
                 default_args['brokerport']=sp[1]
             else:
                default_args['brokerhost']="kafka-service"

       filepaths = ['/Viper-produce/viper.env','/Viper-preprocess/viper.env','/Viper-preprocess1/viper.env','/Viper-preprocess-pgpt/viper.env','/Viper-preprocess-agenticai/viper.env','/Viper-preprocess2/viper.env','/Viper-preprocess3/viper.env','/Viper-ml/viper.env','/Viper-predict/viper.env','/Viperviz/viper.env']
       for mainfile in filepaths:
        with open(mainfile, 'r', encoding='utf-8') as file:
          data = file.readlines()
        r=0
        for d in data:
          if d[0] == '#':
             r += 1
             continue

          if 'KAFKA_CONNECT_BOOTSTRAP_SERVERS' in d:
            if default_args['brokerport'] == '':
              data[r] = "KAFKA_CONNECT_BOOTSTRAP_SERVERS={}\n".format(default_args['brokerhost'])
            else:
              data[r] = "KAFKA_CONNECT_BOOTSTRAP_SERVERS={}:{}\n".format(default_args['brokerhost'],default_args['brokerport'])
          if 'CLOUD_USERNAME' in d:
            data[r] = "CLOUD_USERNAME={}\n".format(cloudusername)
          if 'CLOUD_PASSWORD' in d:
            data[r] = "CLOUD_PASSWORD={}\n".format(cloudpassword)
          if 'WRITELASTCOMMIT' in d:
            data[r] = "WRITELASTCOMMIT={}\n".format(default_args['WRITELASTCOMMIT'])
          if 'NOWINDOWOVERLAP' in d:
            data[r] = "NOWINDOWOVERLAP={}\n".format(default_args['NOWINDOWOVERLAP'])
          if 'NUMWINDOWSFORDUPLICATECHECK' in d:
            data[r] = "NUMWINDOWSFORDUPLICATECHECK={}\n".format(default_args['NUMWINDOWSFORDUPLICATECHECK'])
          if 'USEHTTP' in d:
            data[r] = "USEHTTP={}\n".format(default_args['USEHTTP'])
          if 'ONPREM' in d:
            data[r] = "ONPREM={}\n".format(default_args['ONPREM'])
          if 'WRITETOVIPERDB' in d:
            data[r] = "WRITETOVIPERDB={}\n".format(default_args['WRITETOVIPERDB'])
          if 'VIPERDEBUG' in d:
            data[r] = "VIPERDEBUG={}\n".format(default_args['VIPERDEBUG'])
          if 'MAXOPENREQUESTS' in d:
            data[r] = "MAXOPENREQUESTS={}\n".format(default_args['MAXOPENREQUESTS'])
          if 'LOGSTREAMTOPIC' in d:
            data[r] = "LOGSTREAMTOPIC={}\n".format(default_args['LOGSTREAMTOPIC'])
          if 'LOGSTREAMTOPICPARTITIONS' in d:
            data[r] = "LOGSTREAMTOPICPARTITIONS={}\n".format(default_args['LOGSTREAMTOPICPARTITIONS'])
          if 'LOGSTREAMTOPICREPLICATIONFACTOR' in d:
            data[r] = "LOGSTREAMTOPICREPLICATIONFACTOR={}\n".format(default_args['LOGSTREAMTOPICREPLICATIONFACTOR'])
          if 'LOGSENDTOEMAILS' in d:
            data[r] = "LOGSENDTOEMAILS={}\n".format(default_args['LOGSENDTOEMAILS'])
          if 'LOGSENDTOEMAILSSUBJECT' in d:
            data[r] = "LOGSENDTOEMAILSSUBJECT={}\n".format(default_args['LOGSENDTOEMAILSSUBJECT'])
          if 'LOGSENDTOEMAILFOOTER' in d:
            data[r] = "LOGSENDTOEMAILFOOTER={}\n".format(default_args['LOGSENDTOEMAILFOOTER'])
          if 'LOGSENDINTERVALMINUTES' in d:
            data[r] = "LOGSENDINTERVALMINUTES={}\n".format(default_args['LOGSENDINTERVALMINUTES'])
          if 'LOGSENDINTERVALONLYERROR' in d:
            data[r] = "LOGSENDINTERVALONLYERROR={}\n".format(default_args['LOGSENDINTERVALONLYERROR'])
          if 'MAXTRAININGROWS' in d:
            data[r] = "MAXTRAININGROWS={}\n".format(default_args['MAXTRAININGROWS'])
          if 'MAXPREDICTIONROWS' in d:
            data[r] = "MAXPREDICTIONROWS={}\n".format(default_args['MAXPREDICTIONROWS'])
          if 'MAXPREPROCESSMESSAGES' in d:
            data[r] = "MAXPREPROCESSMESSAGES={}\n".format(default_args['MAXPREPROCESSMESSAGES'])
          if 'MAXPERCMESSAGES' in d:
            data[r] = "MAXPERCMESSAGES={}\n".format(default_args['MAXPERCMESSAGES'])
          if 'MAXCONSUMEMESSAGES' in d:
            data[r] = "MAXCONSUMEMESSAGES={}\n".format(default_args['MAXCONSUMEMESSAGES'])
          if 'MAXVIPERVIZROLLBACKOFFSET' in d:
            data[r] = "MAXVIPERVIZROLLBACKOFFSET={}\n".format(default_args['MAXVIPERVIZROLLBACKOFFSET'])
          if 'MAXVIPERVIZCONNECTIONS' in d:
            data[r] = "MAXVIPERVIZCONNECTIONS={}\n".format(default_args['MAXVIPERVIZCONNECTIONS'])
          if 'MAXURLQUERYSTRINGBYTES' in d:
            data[r] = "MAXURLQUERYSTRINGBYTES={}\n".format(default_args['MAXURLQUERYSTRINGBYTES'])
          if 'MYSQLMAXLIFETIMEMINUTES' in d:
            data[r] = "MYSQLMAXLIFETIMEMINUTES={}\n".format(default_args['MYSQLMAXLIFETIMEMINUTES'])
          if 'MYSQLMAXCONN' in d:
            data[r] = "MYSQLMAXCONN={}\n".format(default_args['MYSQLMAXCONN'])
          if 'MYSQLMAXIDLE' in d:
            data[r] = "MYSQLMAXIDLE={}\n".format(default_args['MYSQLMAXIDLE'])
          if 'SASLMECHANISM' in d:
            data[r] = "SASLMECHANISM={}\n".format(default_args['SASLMECHANISM'])
          if 'MINFORECASTACCURACY' in d:
            data[r] = "MINFORECASTACCURACY={}\n".format(default_args['MINFORECASTACCURACY'])
          if 'COMPRESSIONTYPE' in d:
            data[r] = "COMPRESSIONTYPE={}\n".format(default_args['COMPRESSIONTYPE'])
          if 'MAILSERVER' in d:
            data[r] = "MAILSERVER={}\n".format(default_args['MAILSERVER'])
          if 'MAILPORT' in d:
            data[r] = "MAILPORT={}\n".format(default_args['MAILPORT'])
          if 'FROMADDR' in d:
            data[r] = "FROMADDR={}\n".format(default_args['FROMADDR'])
          if 'SMTP_USERNAME' in d:
            data[r] = "SMTP_USERNAME={}\n".format(default_args['SMTP_USERNAME'])
          if 'SMTP_PASSWORD' in d:
            data[r] = "SMTP_PASSWORD={}\n".format(default_args['SMTP_PASSWORD'])
          if 'SMTP_SSLTLS' in d:
            data[r] = "SMTP_SSLTLS={}\n".format(default_args['SMTP_SSLTLS'])
          if 'SSL_CLIENT_CERT_FILE' in d:
            data[r] = "SSL_CLIENT_CERT_FILE={}\n".format(default_args['SSL_CLIENT_CERT_FILE'])
          if 'SSL_CLIENT_KEY_FILE' in d:
            data[r] = "SSL_CLIENT_KEY_FILE={}\n".format(default_args['SSL_CLIENT_KEY_FILE'])
          if 'SSL_SERVER_CERT_FILE' in d:
            data[r] = "SSL_SERVER_CERT_FILE={}\n".format(default_args['SSL_SERVER_CERT_FILE'])
          if 'KUBERNETES' in d:
            data[r] = "KUBERNETES={}\n".format(default_args['KUBERNETES'])
          if 'COMPANYNAME' in d:
            data[r] = "COMPANYNAME={}\n".format(default_args['COMPANYNAME'])
          if 'MYSQLHOSTNAME' in d:
            if "KUBE" in os.environ:
              if os.environ["KUBE"] == "1":
               data[r] = "MYSQLHOSTNAME={}\n".format(default_args['KUBEMYSQLHOSTNAME'])
              else:
               data[r] = "MYSQLHOSTNAME={}\n".format(default_args['MYSQLHOSTNAME'])
            else:
              data[r] = "MYSQLHOSTNAME={}\n".format(default_args['MYSQLHOSTNAME'])
          if 'MYSQLDB' in d:
            data[r] = "MYSQLDB={}\n".format(default_args['MYSQLDB'])
          if 'MYSQLUSER' in d:
            data[r] = "MYSQLUSER={}\n".format(default_args['MYSQLUSER'])

          r += 1
        with open(mainfile, 'w', encoding='utf-8') as file:
         file.writelines(data)

       subprocess.call("/tmux/starttml.sh", shell=True)
       time.sleep(3)

   def getparams(**context):
     args = default_args
     VIPERHOST = ""
     VIPERPORT = ""
     HTTPADDR = args['HTTPADDR']
     HPDEHOST = ""
     HPDEPORT = ""
     VIPERTOKEN = ""
     HPDEHOSTPREDICT = ""
     HPDEPORTPREDICT = ""

     tsslogging.locallogs("INFO", "STEP 1: Build started")

     try:
       if os.environ['TSS']=="1":
        if 'READTHEDOCS' in os.environ:
         if  len(os.environ['READTHEDOCS']) < 4:
           sys.exit()
         f = open("/tmux/rd4.txt", "w")
         rd=os.environ['READTHEDOCS']
         f.write(rd[:4])
         f.close()
        else:
          sys.exit()
     except Exception as e:
       pass

     if os.environ['TSS']=="1":
       try:
         shutil.rmtree("/rawdata/rtms")
       except Exception as e:
          pass
       try:
          with open("/tmux/step5.txt", "r") as f:
              dirbuf=f.read()
              shutil.rmtree(dirbuf)
       except Exception as e:
         pass

     sd = context['dag'].dag_id
     pname = args['solutionname']
     sname = tsslogging.rtdsolution(pname,sd)
     try:
       f = open("/tmux/step1projectname.txt", "w")
       f.write(pname)
       f.close()
     except Exception as e:
       pass

     try:
       f = open("/tmux/step1solution.txt", "w")
       f.write(sname)
       f.close()
     except Exception as e:
       pass

     if 'step1description' in os.environ:
       desc = os.environ['step1description']
     else:
       desc = args['description']

     if 'step1solutiontitle' in os.environ:
       stitle = os.environ['step1solutiontitle']
     else:
       stitle = args['solutiontitle']

     brokerhost = args['brokerhost']
     brokerport = args['brokerport']
     reinitbinaries(sname)
     updateviperenv()

     with open("/Viper-produce/admin.tok", "r") as f:
       VIPERTOKEN=f.read()

     if VIPERHOST=="":
       with open('/Viper-produce/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOST = output.split(",")[0]
         VIPERPORT = output.split(",")[1]
       with open('/Viper-preprocess/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESS = output.split(",")[0]
         VIPERPORTPREPROCESS = output.split(",")[1]
       with open('/Viper-preprocess1/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESS1 = output.split(",")[0]
         VIPERPORTPREPROCESS1 = output.split(",")[1]
       with open('/Viper-preprocess2/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESS2 = output.split(",")[0]
         VIPERPORTPREPROCESS2 = output.split(",")[1]
       with open('/Viper-preprocess3/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESS3 = output.split(",")[0]
         VIPERPORTPREPROCESS3 = output.split(",")[1]
       with open('/Viper-preprocess-pgpt/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESSPGPT = output.split(",")[0]
         VIPERPORTPREPROCESSPGPT = output.split(",")[1]
       with open('/Viper-preprocess-agenticai/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREPROCESSAGENTICAI = output.split(",")[0]
         VIPERPORTPREPROCESSAGENTICAI = output.split(",")[1]
       with open('/Viper-ml/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTML = output.split(",")[0]
         VIPERPORTML = output.split(",")[1]
       with open('/Viper-predict/viper.txt', 'r') as f:
         output = f.read()
         VIPERHOSTPREDICT = output.split(",")[0]
         VIPERPORTPREDICT = output.split(",")[1]
       with open('/Hpde/hpde.txt', 'r') as f:
         output = f.read()
         HPDEHOST = output.split(",")[0]
         HPDEPORT = output.split(",")[1]
       with open('/Hpde-predict/hpde.txt', 'r') as f:
         output = f.read()
         HPDEHOSTPREDICT = output.split(",")[0]
         HPDEPORTPREDICT = output.split(",")[1]


     if 'CHIP' in os.environ:
        chip = os.environ['CHIP']
        chip = chip.lower()
     else:
         chip = 'amd64'

     if 'VIPERVIZPORT' in os.environ:
         if os.environ['VIPERVIZPORT'] != '' and os.environ['VIPERVIZPORT'] != '-1':
              vipervizport = int(os.environ['VIPERVIZPORT'])
         else:
              vipervizport=tsslogging.getfreeport()
     else:
              vipervizport=tsslogging.getfreeport()

     #   Check the solution airflow port and see if user modfifed port in kubernetes
     if default_args['solutionairflowport'] != '-1':
             solutionairflowport = int(default_args['solutionairflowport'])
             if 'KUBE' in os.environ:
               if os.environ['KUBE'] == '1' and int(os.environ['SOLUTIONAIRFLOWPORT']) != '-1':
                 solutionairflowport = int(os.environ['SOLUTIONAIRFLOWPORT'])
     else:
        if 'KUBE' in os.environ:
           if os.environ['KUBE'] == "0":
             solutionairflowport=tsslogging.getfreeport()
           elif int(os.environ['SOLUTIONAIRFLOWPORT']) != '-1':
            solutionairflowport=int(os.environ['SOLUTIONAIRFLOWPORT'])
           else:
             solutionairflowport=tsslogging.getfreeport()
        else:
         solutionairflowport=tsslogging.getfreeport()

     #   Check the solution external port and see if user modfifed port in kubernetes
     if default_args['solutionexternalport'] != '-1':
             solutionexternalport = int(default_args['solutionexternalport'])
             if 'KUBE' in os.environ:
               if os.environ['KUBE'] == '1' and int(os.environ['SOLUTIONEXTERNALPORT']) != '-1':
                 solutionexternalport = int(os.environ['SOLUTIONEXTERNALPORT'])
     else:
        if 'KUBE' in os.environ:
           if os.environ['KUBE'] == "0":
             solutionexternalport=tsslogging.getfreeport()
           elif int(os.environ['SOLUTIONEXTERNALPORT']) != '-1':
            solutionexternalport=int(os.environ['SOLUTIONEXTERNALPORT'])
           else:
             solutionexternalport=tsslogging.getfreeport()
        else:
         solutionexternalport=tsslogging.getfreeport()

     #   Check the solution visualization port and see if user modfifed port in kubernetes
     if default_args['solutionvipervizport'] != '-1':
             solutionvipervizport = int(default_args['solutionvipervizport'])
             if 'KUBE' in os.environ:
               if os.environ['KUBE'] == '1' and int(os.environ['SOLUTIONVIPERVIZPORT']) != '-1':
                 solutionvipervizport = int(os.environ['SOLUTIONVIPERVIZPORT'])
     else:
        if 'KUBE' in os.environ:
           if os.environ['KUBE'] == "0":
             solutionvipervizport=tsslogging.getfreeport()
           elif int(os.environ['SOLUTIONVIPERVIZPORT']) != '-1':
            solutionvipervizport=int(os.environ['SOLUTIONVIPERVIZPORT'])
           else:
             solutionvipervizport=tsslogging.getfreeport()
        else:
         solutionvipervizport=tsslogging.getfreeport()

     if 'AIRFLOWPORT' in  os.environ:
         airflowport = os.environ['AIRFLOWPORT']
     else:
         airflowport = tsslogging.getfreeport()

     externalport=VIPERPORT
     if 'EXTERNALPORT' in  os.environ:
         if os.environ['EXTERNALPORT'] != "-1":
           externalport = os.environ['EXTERNALPORT']

     tss = os.environ['TSS']
     task_instance = context['task_instance']

     if tss == "1":
       task_instance.xcom_push(key="{}_SOLUTIONEXTERNALPORT".format(sname),value="_{}".format(solutionexternalport))
       task_instance.xcom_push(key="{}_SOLUTIONVIPERVIZPORT".format(sname),value="_{}".format(solutionvipervizport))
       task_instance.xcom_push(key="{}_SOLUTIONAIRFLOWPORT".format(sname),value="_{}".format(solutionairflowport))
     else:
       task_instance.xcom_push(key="{}_SOLUTIONEXTERNALPORT".format(sname),value="_{}".format(os.environ['SOLUTIONEXTERNALPORT']))
       task_instance.xcom_push(key="{}_SOLUTIONVIPERVIZPORT".format(sname),value="_{}".format(os.environ['SOLUTIONVIPERVIZPORT']))
       task_instance.xcom_push(key="{}_SOLUTIONAIRFLOWPORT".format(sname),value="_{}".format(os.environ['SOLUTIONAIRFLOWPORT']))
      # killports()

     if 'MQTTUSERNAME' in os.environ:
       task_instance.xcom_push(key="{}_MQTTUSERNAME".format(sname),value=os.environ['MQTTUSERNAME'])
     else:
       task_instance.xcom_push(key="{}_MQTTUSERNAME".format(sname),value="")

     if 'MQTTPASSWORD' in os.environ:
       task_instance.xcom_push(key="{}_MQTTPASSWORD".format(sname),value=os.environ['MQTTPASSWORD'])
     else:
       task_instance.xcom_push(key="{}_MQTTPASSWORD".format(sname),value="")

     if 'KAFKACLOUDUSERNAME' in os.environ:
       task_instance.xcom_push(key="{}_KAFKACLOUDUSERNAME".format(sname),value=os.environ['KAFKACLOUDUSERNAME'])
     else:
       task_instance.xcom_push(key="{}_KAFKACLOUDUSERNAME".format(sname),value="")

     if 'KAFKACLOUDPASSWORD' in os.environ:
       task_instance.xcom_push(key="{}_KAFKACLOUDPASSWORD".format(sname),value=os.environ['KAFKACLOUDPASSWORD'])
     else:
       task_instance.xcom_push(key="{}_KAFKACLOUDPASSWORD".format(sname),value="")

     task_instance.xcom_push(key="{}_TSS".format(sname),value="_{}".format(tss))

     task_instance.xcom_push(key="{}_EXTERNALPORT".format(sname),value="_{}".format(externalport))
     task_instance.xcom_push(key="{}_AIRFLOWPORT".format(sname),value="_{}".format(airflowport))

     task_instance.xcom_push(key="{}_VIPERVIZPORT".format(sname),value="_{}".format(vipervizport))
     task_instance.xcom_push(key="{}_VIPERTOKEN".format(sname),value=VIPERTOKEN)
     task_instance.xcom_push(key="{}_VIPERHOST".format(sname),value=VIPERHOST)
     task_instance.xcom_push(key="{}_VIPERPORT".format(sname),value="_{}".format(VIPERPORT))
     task_instance.xcom_push(key="{}_VIPERHOSTPRODUCE".format(sname),value=VIPERHOST)
     task_instance.xcom_push(key="{}_VIPERPORTPRODUCE".format(sname),value="_{}".format(VIPERPORT))
     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESS".format(sname),value=VIPERHOSTPREPROCESS)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESS".format(sname),value="_{}".format(VIPERPORTPREPROCESS))
     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESS1".format(sname),value=VIPERHOSTPREPROCESS1)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESS1".format(sname),value="_{}".format(VIPERPORTPREPROCESS1))

     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESS2".format(sname),value=VIPERHOSTPREPROCESS2)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESS2".format(sname),value="_{}".format(VIPERPORTPREPROCESS2))
     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESS3".format(sname),value=VIPERHOSTPREPROCESS3)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESS3".format(sname),value="_{}".format(VIPERPORTPREPROCESS3))

     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESSPGPT".format(sname),value=VIPERHOSTPREPROCESSPGPT)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESSPGPT".format(sname),value="_{}".format(VIPERPORTPREPROCESSPGPT))

     task_instance.xcom_push(key="{}_VIPERHOSTPREPROCESSAGENTICAI".format(sname),value=VIPERHOSTPREPROCESSAGENTICAI)
     task_instance.xcom_push(key="{}_VIPERPORTPREPROCESSAGENTICAI".format(sname),value="_{}".format(VIPERPORTPREPROCESSAGENTICAI))

     task_instance.xcom_push(key="{}_VIPERHOSTML".format(sname),value=VIPERHOSTML)
     task_instance.xcom_push(key="{}_VIPERPORTML".format(sname),value="_{}".format(VIPERPORTML))
     task_instance.xcom_push(key="{}_VIPERHOSTPREDICT".format(sname),value=VIPERHOSTPREDICT)
     task_instance.xcom_push(key="{}_VIPERPORTPREDICT".format(sname),value="_{}".format(VIPERPORTPREDICT))
     task_instance.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)
     task_instance.xcom_push(key="{}_HPDEHOST".format(sname),value=HPDEHOST)
     task_instance.xcom_push(key="{}_HPDEPORT".format(sname),value="_{}".format(HPDEPORT))
     task_instance.xcom_push(key="{}_HPDEHOSTPREDICT".format(sname),value=HPDEHOSTPREDICT)
     task_instance.xcom_push(key="{}_HPDEPORTPREDICT".format(sname),value="_{}".format(HPDEPORTPREDICT))
     task_instance.xcom_push(key="{}_solutionname".format(sd),value=sname)
     task_instance.xcom_push(key="{}_projectname".format(sd),value=pname)
     task_instance.xcom_push(key="{}_solutiondescription".format(sname),value=desc)
     task_instance.xcom_push(key="{}_solutiontitle".format(sname),value=stitle)

     task_instance.xcom_push(key="{}_containername".format(sname),value='')
     task_instance.xcom_push(key="{}_brokerhost".format(sname),value=brokerhost)
     task_instance.xcom_push(key="{}_brokerport".format(sname),value="_{}".format(brokerport))
     task_instance.xcom_push(key="{}_chip".format(sname),value=chip)

     tsslogging.locallogs("INFO", "STEP 1: completed - TML system parameters successfully gathered")

7.5.3.1. DAG STEP 1: Parameter Explanation

Json Key

Description

owner

Change as needed.

start_date

Date of solution creation

brokerhost

This is the IP address for Kafka.

If Kafka is running on localhost then

use ‘127.0.0.1’ or add Kafka Cloud

cluster address. Note, if using multiple brokers,

you can separate them by a comma, and set brokerport

as empty.

brokerport

The default port for Kafka on-premise

or in the cloud is ‘9092’

cloudusername

If you are running Kafka on-premise

on 127.0.0.1 - then this should be left

blank. If you are using Kafka Cloud

then this is the API KEY

cloudpassword

If you are running Kafka on-premise on

127.0.0.1 - then this should be left blank.

If you are using Kafka Cloud then this

is the API SECRET

solutionairflowport

This is your solution airflow port. If -1, TSS will choose

a free port randomly, or set this to a fixed number to prevent

the port from changing.

solutionexternalport

This is an external port that you WILL need to stream external

data to your TML solution when using:

You will need this port in the REST, and gRPC clients.

If -1, TSS will choose a free port

randomly, or set this to a fixed number to prevent the port

from changing.

solutionvipervizport

This is your solution dashboard port. If -1, TSS will choose

a free port randomly, or set this to a fixed number to prevent

port from changing.

ingestdatamethod

You must choose how you will ingest your data.

Choose ONE Method from:

  1. localfile

  2. mqtt

  3. rest

  4. grpc

solutionname

DO NOT MODIFY THIS WILL BE AUTOMATICALLY UPDATED when you create your solution. Refer to Lets Start Building a TML Solution

solutiontitle

Provide a descriptive title for your solution

description

Describe your solution in one-line.

retries

Change are neede, i.e. 1 is usually fine.

KUBEMYSQLHOSTNAME

If deploying in Kubernetes - the MySql service will be used.

7.5.4. STEP 2: Create Kafka Topics: tml_system_step_2_kafka_createtopic_dag

Below is the complete definition of the tml_system_step_2_kafka_createtopic_dag that creates all the topics for your solution. Users only need to configure the code highlighted in the USER CHOSEN PARAMETERS.

Tip

Watch the YouTube video for Step 2 dag configurations. YouTube Video

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import maadstml
 import sys
 import tsslogging
 import os
 import subprocess

 sys.dont_write_bytecode = True

 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
  'owner' : 'Sebastian Maurice', # <<< ********** You change as needed
  'companyname': 'Otics',  # <<< ********** You change as needed
   'myname' : 'Sebastian',  # <<< ********** You change as needed
   'myemail' : 'Sebastian.Maurice',  # <<< ********** You change as needed
   'mylocation' : 'Toronto',  # <<< ********** You change as needed
   'replication' : '1',  # <<< ********** You change as needed
   'numpartitions': '1',  # <<< ********** You change as needed
   'enabletls': '1',  # <<< ********** You change as needed
   'brokerhost' : '',  # <<< ********** Leave as is
   'brokerport' : '-999',  # <<< ********** Leave as is
   'microserviceid' : '',  # <<< ********** You change as needed
   'raw_data_topic' : 'iot-raw-data', # Separate multiple topics with comma <<< ********** You change topic names as needed
   'preprocess_data_topic' : 'iot-preprocess,iot-preprocess2', # Separate multiple topics with comma <<< ********** You change topic names as needed
   'ml_data_topic' : 'ml-data', # Separate multiple topics with comma <<< ********** You change topic names as needed
   'prediction_data_topic' : 'prediction-data', # Separate multiple topics with comma <<< ********** You change topic names as needed
   'pgpt_data_topic' : 'cisco-network-privategpt',  #  PrivateGPT will produce responses to this topic - change as  needed
   'description' : 'Topics to store iot data',
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 def deletetopics(topic):

     if 'KUBE' in os.environ:
        if os.environ['KUBE'] == "1":
          return
     buf = "/Kafka/kafka_2.13-3.0.0/bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic {} --delete".format(topic)

     proc=subprocess.run(buf, shell=True)
     #proc.terminate()
     #proc.wait()

     repo=tsslogging.getrepo()
     tsslogging.tsslogit("Deleting topic {} in {}".format(topic,os.path.basename(__file__)), "INFO" )
     tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

 def setupkafkatopics(**context):
  # Set personal data

   tsslogging.locallogs("INFO", "STEP 2: Create topics started")

   args = default_args
   companyname=args['companyname']
   myname=args['myname']
   myemail=args['myemail']
   mylocation=args['mylocation']
   description=args['description']

   # Replication factor for Kafka redundancy
   replication=int(args['replication'])
   # Number of partitions for joined topic
   numpartitions=int(args['numpartitions'])
   # Enable SSL/TLS communication with Kafka
   enabletls=int(args['enabletls'])
   # If brokerhost is empty then this function will use the brokerhost address in your
   brokerhost=args['brokerhost']
   # If this is -999 then this function uses the port address for Kafka in VIPER.ENV in the
   # field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
   brokerport=int(args['brokerport'])
   # If you are using a reverse proxy to reach VIPER then you can put it here - otherwise if
   # empty then no reverse proxy is being used
   microserviceid=args['microserviceid']

   if 'step2raw_data_topic' in os.environ:
      args['raw_data_topic']=os.environ['step2raw_data_topic']

   if 'step2preprocess_data_topic' in os.environ:
      args['preprocess_data_topic']=os.environ['step2preprocess_data_topic']

   raw_data_topic=args['raw_data_topic']
   preprocess_data_topic=args['preprocess_data_topic']
   ml_data_topic=args['ml_data_topic']
   prediction_data_topic=args['prediction_data_topic']

   sd = context['dag'].dag_id
   sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))

   VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
   VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
   VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
   mainbroker = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_brokerhost".format(sname))
   HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

   ti = context['task_instance']
   ti.xcom_push(key="{}_companyname".format(sname), value=companyname)
   ti.xcom_push(key="{}_myname".format(sname), value=myname)
   ti.xcom_push(key="{}_myemail".format(sname), value=myemail)
   ti.xcom_push(key="{}_mylocation".format(sname), value=mylocation)
   ti.xcom_push(key="{}_replication".format(sname), value="_{}".format(replication))
   ti.xcom_push(key="{}_numpartitions".format(sname), value="_{}".format(numpartitions))
   ti.xcom_push(key="{}_enabletls".format(sname), value="_{}".format(enabletls))
   ti.xcom_push(key="{}_microserviceid".format(sname), value=microserviceid)
   ti.xcom_push(key="{}_raw_data_topic".format(sname), value=raw_data_topic)
   ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=preprocess_data_topic)
   ti.xcom_push(key="{}_ml_data_topic".format(sname), value=ml_data_topic)
   ti.xcom_push(key="{}_prediction_data_topic".format(sname), value=prediction_data_topic)



   #############################################################################################################
   #                         CREATE TOPIC TO STORE TRAINED PARAMS FROM ALGORITHM

   topickeys = ['raw_data_topic','preprocess_data_topic','ml_data_topic','prediction_data_topic','pgpt_data_topic']
   VIPERHOSTMAIN = "{}{}".format(HTTPADDR,VIPERHOST)
   ptarr = ""
   for k in topickeys:
     producetotopic=args[k]
     description=args['description']
     if producetotopic != "":
       ptarr = ptarr + producetotopic.strip() + ","
     topicsarr = producetotopic.split(",")
     for topic in topicsarr:
         if topic != '' and "127.0.0.1" in mainbroker:
           try:
             deletetopics(topic)
           except Exception as e:
             print("ERROR: ",e)
             continue

   if '127.0.0.1' in mainbroker:
         replication=1

     #for topic in topicsarr:
   if ptarr != '':
      ptarr=ptarr[:-1]
      print("Creating topic=",ptarr)
      try:
         result=maadstml.vipercreatetopic(VIPERTOKEN,VIPERHOSTMAIN,VIPERPORT[1:],ptarr,companyname,
                                  myname,myemail,mylocation,description,enabletls,
                                  brokerhost,brokerport,numpartitions,replication,
                                  microserviceid='')
      except Exception as e:
        tsslogging.locallogs("ERROR", "STEP 2: Cannot create topic {} in {} - {}".format(ptarr,os.path.basename(__file__),e))

        repo=tsslogging.getrepo()
        tsslogging.tsslogit("Cannot create topic {} in {} - {}".format(topic,os.path.basename(__file__),e), "ERROR" )
        tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

   tsslogging.locallogs("INFO", "STEP 2: Completed")

7.5.4.1. DAG STEP 2: Parameter Explanation

owner

Change as needed

companyname

Change as needed

myname

Change as needed

myemail

Change as needed

mylocation

Change as needed

replication

If using on-premise Kafka at address 127.0.0.1

then this should be 1.

If using Kafka Cloud - then this

MUST a minimum of 3

numpartitions

Specific partition for topics -

usually mimimum 3 partitions is fine

enabletls

Set to 1 for TLS encryption, 0 for no encryption

brokerhost

The setting in Step 1 is fine

brokerport

The setting in Step 1 is fine

microserviceid

If you are using a microservice in a

load balancer i.e. NGINX you can specify

the route here

raw_data_topic

This is the data your solution

will produce raw data to -

see STEP 3: Produce to Kafka Topics

preprocess_data_topic

This is where all the preprocess

data will be stored - separate

multiple topics with a comma

ml_data_topic

This is where the ML estimated paramters

are stored.

prediction_data_topic

This is where all the predictions

will be stored.

description

Description for the topics.

start_date

Solution start date

retries

DAG retries, i.e. 1 is usually fine

7.5.5. STEP 3: Produce to Kafka Topics

Important

You must CHOOSE how you want to ingest data and produce to a Kafka topic.

TML solution provides 4 (FOUR) ways to ingest data and produce to a topic: MQTT, gRPC, RESTAPI, LOCALFILE. The following DAGs in the table are SERVER files. These server files wait for connections from the client files. For further convenience, client files are provides to access the server DAGs below.

Tip

The client examples for LOCALFILE, REST, MQTT, gRPC the data file can be download from Github:

https://github.com/smaurice101/raspberrypi/tree/main/tml-airflow/data

Also, watch this youtube video that describes the four ingeston methods: YouTube

7.5.5.1. Four Ways to Ingest Data Into Your TML Solution Container

_images/fourways.png

Data Ingest DAG Name

Client File Name

Description

tml-read-MQTT-step-3-kafka-producetotopic-dag

An on_message(client, userdata, msg) event

is triggered by the MQTT broker. This DAGs

will automatically handle the on_message event

and produce the data to Kafka.

This DAG is an MQTT server and will

listen for a connection from a client.

You use this if your TML solution

ingests data from MQTT system like HiveMQ and

stream it to Kafka.

tml-read-LOCALFILE-step-3-kafka-producetotopic-dag

You can process a localfile and stream the

data to kafka.

This DAG will read a local CSV file

for data and stream it to Kafka.

tml-read-gRPC-step-3-kafka-producetotopic-dag

NOTE: For this client you will also

need: tml_grpc_pb2_grpc,

and tml_grpc_pb2

This DAG is an gRPC server and will

listen for a connection from

a gRPC client. You use this if your TML

solution ingests data from devices and you want to

leverage a gRPC connection and stream the data to Kafka.

tml-read-RESTAPI-step-3-kafka-producetotopic-dag

This is one of the most popular APIs.

This DAG is an RESTAPI server and will

listen for a connection from a

REST client. You use this if your TML

solution ingests data from devices and you want

to leverage a rest connection and stream

the data to Kafka.

7.5.5.2. STEP 3a: Produce Data Using MQTT: tml-read-MQTT-step-3-kafka-producetotopic-dag

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import paho.mqtt.client as paho
 from paho import mqtt
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random
 import json

 sys.dont_write_bytecode = True
 ##################################################  MQTT SERVER #####################################
 # This is a MQTT server that will handle connections from a client.  It will handle connections
 # from an MQTT client for on_message, on_connect, and on_subscribe

 # If Connecting to HiveMQ cluster you will need USERNAME/PASSWORD and mqtt_enabletls = 1
 # USERNAME/PASSWORD should be set in your DOCKER RUN command of the TSS container

 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice',
   'enabletls': '1',
   'microserviceid' : '',
   'producerid' : 'iotsolution',
   'topics' : 'iot-raw-data', # *************** This is one of the topic you created in SYSTEM STEP 2
   'identifier' : 'TML solution',
   'mqtt_broker' : '', # <<<****** Enter MQTT broker i.e. test.mosquitto.org
   'mqtt_port' : '', # <<<******** Enter MQTT port i.e. 1883, 8883    (for HiveMQ cluster)
   'mqtt_subscribe_topic' : '', # <<<******** enter name of MQTT to subscribe to i.e. tml/iot
   'mqtt_enabletls': '0', # set 1=TLS, 0=no TLSS
   'delay' : '7000', # << ******* 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'topicid' : '-999', # <<< ********* do not modify
 }

 ######################################## DO NOT MODIFY BELOW #############################################


 # This sets the lat/longs for the IoT devices so it can be map
 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""
 VIPERHOSTFROM=""
 # this is change 5
 # setting callbacks for different events to see if it works, print the message etc.
 def on_connect(client, userdata, flags, rc, properties=None):
   print("CONNACK received with code %s." % rc)

 # print which topic was subscribed to
 def on_subscribe(client, userdata, mid, granted_qos, properties=None):
   print("Subscribed: " + str(mid) + " " + str(granted_qos))

 def on_message(client, userdata, msg):
   data=json.loads(msg.payload.decode("utf-8"))
   datad = json.dumps(data)
   readdata(datad)

 def mqttserverconnect():

  repo = tsslogging.getrepo()
  tsslogging.tsslogit("MQTT producing DAG in {}".format(os.path.basename(__file__)), "INFO" )
  tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

  username = ""
  password = ""
  if 'MQTTUSERNAME' in os.environ:
        username = os.environ['MQTTUSERNAME']
  if 'MQTTPASSWORD' in os.environ:
        password = os.environ['MQTTPASSWORD']

  try:
    client = paho.Client(paho.CallbackAPIVersion.VERSION2)
    mqttBroker = default_args['mqtt_broker']
    mqttport = int(default_args['mqtt_port'])
    if default_args['mqtt_enabletls'] == "1":
      client.tls_set(tls_version=mqtt.client.ssl.PROTOCOL_TLS)
      client.username_pw_set(username, password)
  except Exception as e:
    tsslogging.locallogs("ERROR", "Cannot connect to MQTT broker in {} - {}".format(os.path.basename(__file__),e))

    tsslogging.tsslogit("ERROR: Cannot connect to MQTT broker in {} - {}".format(os.path.basename(__file__),e), "ERROR" )
    tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
    print("ERROR: Cannot connect to MQTT broker")
    return

  client.connect(mqttBroker,mqttport)

  if client:
    print("Connected")
    tsslogging.locallogs("INFO", "MQTT connection established...")
    client.on_subscribe = on_subscribe
    client.on_message = on_message
    b=client.subscribe(default_args['mqtt_subscribe_topic'], qos=1)
    if 'MQTT_ERR_SUCCESS' not in str(b):
            print("ERROR Making a connection to HiveMQ:",b)
            tsslogging.locallogs("ERROR", "Cannot connect to MQTT broker in {} - {}".format(os.path.basename(__file__),str(b)))
            tsslogging.tsslogit("CANNOT Connect to MQTT Broker in {}".format(os.path.basename(__file__)), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
    else:
      client.on_connect = on_connect
      client.loop_forever()
  else:
     print("Cannot Connect")
     tsslogging.locallogs("ERROR", "Cannot connect to MQTT broker in {} - {}".format(os.path.basename(__file__),e))
     tsslogging.tsslogit("CANNOT Connect to MQTT Broker in {}".format(os.path.basename(__file__)), "ERROR" )
     tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")


 def producetokafka(value, tmlid, identifier,producerid,maintopic,substream,args):
  inputbuf=value
  topicid=int(args['topicid'])

  # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
  delay=int(args['delay'])
  enabletls = int(args['enabletls'])
  identifier = args['identifier']

  try:
     result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,substream,
                                         topicid,identifier)
  except Exception as e:
     print("ERROR:",e)


 def readdata(valuedata):
   # MAin Kafka topic to store the real-time data
   maintopic = default_args['topics']
   producerid = default_args['producerid']
   try:
       producetokafka(valuedata, "", "",producerid,maintopic,"",default_args)
       # change time to speed up or slow down data
       #time.sleep(0.15)
   except Exception as e:
       print(e)
       pass

 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startproducing(**context):
        global VIPERTOKEN
        global VIPERHOST
        global VIPERPORT
        global HTTPADDR
        global VIPERHOSTFROM

        tsslogging.locallogs("INFO", "STEP 3: producing data started")

        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        hs,VIPERHOSTFROM=tsslogging.getip(VIPERHOST)
        ti = context['task_instance']
        ti.xcom_push(key="{}_PRODUCETYPE".format(sname),value='MQTT')
        ti.xcom_push(key="{}_TOPIC".format(sname),value=default_args['topics'])
        buf = default_args['mqtt_broker'] + ":" + default_args['mqtt_port']
        ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="")
        buf="MQTT Subscription Topic: " + default_args['mqtt_subscribe_topic']
        ti.xcom_push(key="{}_IDENTIFIER".format(sname),value=buf)
        ti.xcom_push(key="{}_FROMHOST".format(sname),value="{},{}".format(hs,VIPERHOSTFROM))
        ti.xcom_push(key="{}_TOHOST".format(sname),value=VIPERHOST)

        ti.xcom_push(key="{}_TSSCLIENTPORT".format(sname),value="_{}".format(default_args['mqtt_port']))
        ti.xcom_push(key="{}_TMLCLIENTPORT".format(sname),value="_{}".format(default_args['mqtt_port']))

        ti.xcom_push(key="{}_PORT".format(sname),value="_{}".format(VIPERPORT))
        ti.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('produce',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-produce", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOSTFROM,VIPERPORT[1:]), "ENTER"])


 if __name__ == '__main__':

     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
          VIPERTOKEN = sys.argv[2]
          VIPERHOST = sys.argv[3]
          VIPERPORT = sys.argv[4]

          mqttserverconnect()

Note

There is no MQTT client because MQTT is machine to machine communication, which means if a machine is writing to an MQTT broker, the above DAG automatically gets an on_message(client, userdata, msg) event and streams the data to Kafka. This is a powerful way to use TML with MQTT to process real-time data instantly.

7.5.5.3. DAG STEP 3a: Parameter Explantion

Json Key

Explanation

owner

Change as needed

enabletls

Set to 1 for TLS encryption, 0 no encryption

microserviceid

Enter route is using a load balancer i.e. NGINX

producerid

Enter a name i.e. ‘iotsolution’

topics

The topic to store the raw data. You created

in SYSTEM STEP 2

identifier

Some identifier for the data i.e.

‘TML solution data’

mqtt_broker

Enter the address of the mqtt broker

i.e. test.mosquitto.org

mqtt_port

Enter MQTT port i.e. 1883

mqtt_subscribe_topic

Enter name of MQTT topic to

subscribe to i.e. tml/iot

mqtt_enabletls

You can set to 1 to enable TLS or 0 no TLS.

If you are using a HiveMQ cluster or some other MQTT cloud cluster,

this is usually set to 1. If you are using a cloud cluster,

a USERNAME/PASSWORD is also usually needed.

Set the MQTTUSERNAME and MQTTPASSWORD on the Docker RUN command

of your TSS container: TSS Docker Run Command

delay

Maximum delay for VIPER to wait for

Kafka to return confirmation message

is received and written to topic

topicid

Leave at -999

start_date

Solution start date

retries

DAG retries

7.5.5.4. STEP 3a.i: MQTT CLIENT

tml_client_MQTT_step_3_kafka_producetotopic.py

 import paho.mqtt.client as paho
 from paho import mqtt
 import time
 import sys
 from datetime import datetime

 default_args = {
   'mqtt_broker' : 'b526253c5560459da5337e561c142369.s1.eu.hivemq.cloud', # <<<****** Enter MQTT broker i.e. test.mosquitto.org
   'mqtt_port' : '8883', # <<<******** Enter MQTT port i.e. 1883
   'mqtt_subscribe_topic' : 'tml/iot', # <<<******** enter name of MQTT to subscribe to i.e. encyclopedia/#
   'mqtt_enabletls' : '1', # << Enable TLS if connecting to a cloud cluster like HiveMQ
 }


 sys.dont_write_bytecode = True
 ##################################################  MQTT SERVER #####################################
 # This is a MQTT server that will handle connections from a client.  It will handle connections
 # from an MQTT client for on_message, on_connect, and on_subscribe

 ######################################## USER CHOOSEN PARAMETERS ########################################


 def mqttconnection():
      username="<Enter MQTT username>"
      password="<Enter MQTT password>"

      client = paho.Client(paho.CallbackAPIVersion.VERSION2)
      mqttBroker = default_args['mqtt_broker']
      mqttport = int(default_args['mqtt_port'])
      client.tls_set(tls_version=mqtt.client.ssl.PROTOCOL_TLS)
      client.username_pw_set(username, password)
      client.connect(mqttBroker,mqttport)

      client.subscribe(default_args['mqtt_subscribe_topic'], qos=1)
      return client

 def publishtomqttbroker(client,line):

      b=client.publish(topic=default_args['mqtt_subscribe_topic'], payload=line, qos=1, retain=False)
      if 'MQTT_ERR_SUCCESS' in str(b):
         print(line)
         client.loop()
      else:
         print("ERROR Making a connection to HiveMQ:",b)

 def readdatafile(client,inputfile):

   ##############################################################
   # NOTE: You can send any "EXTERNAL" data through this API
   # It is reading a localfile as an example
   ############################################################

   try:
     file1 = open(inputfile, 'r')
     print("Data Producing to Kafka Started:",datetime.now())
   except Exception as e:
     print("ERROR: Something went wrong ",e)
     return
   k = 0
   while True:
     line = file1.readline()
     line = line.replace(";", " ")
     print("line=",line)
     # add lat/long/identifier
     k = k + 1
     try:
       if line == "":
         #break
         file1.seek(0)
         k=0
         print("Reached End of File - Restarting")
         print("Read End:",datetime.now())
         continue
       publishtomqttbroker(client,line)
       # change time to speed up or slow down data
       time.sleep(.15)
     except Exception as e:
       print(e)
       time.sleep(.15)
       pass

 client=mqttconnection()
 inputfile = "IoTDatasample.txt"
 readdatafile(client,inputfile)

7.5.5.5. MQTT Reference Architecture

_images/mqttimg.png

If using HiveMQ cluster:

_images/hivemq.png

7.5.5.6. STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

   import maadstml
   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.operators.bash import BashOperator
   import json
   from datetime import datetime, timezone
   from airflow.decorators import dag, task
   from flask import Flask, request, jsonify
   from gevent.pywsgi import WSGIServer
   import sys
   import tsslogging
   import os
   import subprocess
   import time
   import random
   import shlex
   from typing import Dict, Any
   import re
   import threading
   from fastapi import FastAPI
   from fastapi.middleware.cors import CORSMiddleware
   import uvicorn
   from typing import List
   #import nest_asyncio
   #nest_asyncio.apply()

   lock = threading.Lock()
   mqtt_lock = threading.Lock()


   sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
   import scadaglobals as sg
   import scada_modbus as cv
   import mqtt_loop as mq

   VIPERTOKEN = "" #os.environ['VIPERTOKEN']
   VIPERHOST = "" #os.environ['VIPERHOST']
   VIPERPORT = "" #os.environ['VIPERPORT']
   HTTPADDR = ""
   sys.dont_write_bytecode = True
   ##################################################  REST API SERVER #####################################
   # This is a REST API server that will handle connections from a client
   # There are two endpoints you can use to stream data to this server:
   # 1. jsondataline -  You can POST a single JSONs from your client app. Your json will be streamed to Kafka topic.
   # 2. jsondataarray -  You can POST JSON arrays from your client app. Your json will be streamed to Kafka topic.


   ######################################## USER CHOOSEN PARAMETERS ########################################
   default_args = {
     'owner' : 'Sebastian Maurice',
     'enabletls': '1',
     'microserviceid' : '',
     'producerid' : 'iotsolution',
     'topics' : 'iot-raw-data', # *************** This is one of the topic you created in SYSTEM STEP 2
     'identifier' : 'TML solution',
     'tss_rest_port' : '9001',  # <<< ***** replace replace with port number i.e. this is listening on port 9000
     'rest_port' : '9002',  # <<< ***** replace replace with port number i.e. this is listening on port 9000
     'delay' : '7000', # << ******* 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
     'topicid' : '-999', # <<< ********* do not modify
   }

   ######################################## DO NOT MODIFY BELOW #############################################

   def writeviperlogs(errortype,message,VIPERTOKEN, VIPERHOST, VIPERPORT):

     args = default_args
     dt = datetime.now(timezone.utc)
     timestamp = dt.strftime("[%a, %d %b %Y %H:%M:%S UTC]")

     vmsg=f"{timestamp} {errortype.upper()} [{message}]"
     Logjson = json.dumps({
         "MESSAGE": str(vmsg),
         "SERVICE": "TML-Plugin",
         "HOST": VIPERHOST,
         "PORT": str(VIPERPORT),
         "KAFKA_CONNECT_BOOTSTRAP_SERVERS": "Kafka Broker"
     })

     #Logjson=f'{"MESSAGE":"{vmsg}","SERVICE": "TML-Plugin", "HOST": "{VIPERHOST}","PORT": "{str(VIPERPORT)}","KAFKA_CONNECT_BOOTSTRAP_SERVERS": "Kafka Broker"}'

   #  print("Logjson=",Logjson)
     producetokafka(Logjson, "", "","plugin-producer","viperlogs","",args,VIPERTOKEN, VIPERHOST, VIPERPORT)

   def producetokafka(value, tmlid, identifier,producerid,maintopic,substream,args,VIPERTOKEN, VIPERHOST, VIPERPORT):
        inputbuf=value
        topicid=int(args['topicid'])

        # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
        delay=int(args['delay'])
        enabletls = int(args['enabletls'])
        identifier = args['identifier']

        try:
           result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,substream,
                                               topicid,identifier)
           print("produce result========",result)
        except Exception as e:
           print("ERROR:",e)


   # Check if tmux window exists BEFORE creating
   def tmuxsession(windowinstance,steps):

       chip='amd64'
       mainos='linux'
       cdir=''
       isnew1=0
       isnew2=0
       viperrun=''
       viperport=-1

       if 'CHIP' in os.environ:
         chip=os.environ['CHIP']

       chip=chip.lower()
       windowinstance=windowinstance.replace("_","-")

       # start the binary
       if steps=="4":
          cdir="/Viper-preprocess"
          viperrun=f"/Viper-preprocess/viper-{mainos}-{chip}"
       if steps=="5":
          cdir="/Viper-ml"
          viperrun=f"/Viper-ml/viper-{mainos}-{chip}"
       if steps=="6":
          cdir="/Viper-predict"
          viperrun=f"/Viper-predict/viper-{mainos}-{chip}"
       if steps=="9":
          cdir="/Viper-preprocess-pgpt"
          viperrun=f"/Viper-preprocess-pgpt/viper-{mainos}-{chip}"
       if steps=="9b":
          cdir="/Viper-preprocess-agenticai"
          viperrun=f"/Viper-preprocess-agenticai/viper-{mainos}-{chip}"

       if windowinstance != 'default':
         check_result = subprocess.run(
             ["tmux", "has-session", "-t", f"plugin_{windowinstance}"],
             capture_output=True
         )
         check_result2 = subprocess.run(
             ["tmux", "has-session", "-t", f"plugin_{windowinstance}_{steps}"],
             capture_output=True
         )

         if check_result.returncode != 0:
             # Window doesn't exist - create it
             subprocess.run(["tmux", "new-session", "-d", "-s", f"plugin_{windowinstance}"])
             subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}", f"cd /{cdir}", "ENTER"], capture_output=True, text=True)
             isnew1=1
         else:
            subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}", "C-c"])

         if check_result2.returncode != 0:
             # Window doesn't exist - create it
             subprocess.run(["tmux", "new-session", "-d", "-s", f"plugin_{windowinstance}_{steps}"])
             isnew2=1
         else:
             subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}_{steps}", "C-c"])

       with open(f"{cdir}/viper.txt", 'r', encoding='utf-8') as file:
           line = file.readline()
           oldviperport=line.split(",")[1]

       if windowinstance!='default':
         subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}_{steps}", f"cd /{cdir}", "ENTER"], capture_output=True, text=True)
         subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}_{steps}", viperrun, "ENTER"], capture_output=True, text=True)

       if isnew2:
         time.sleep(5)

       with open(f"{cdir}/viper.txt", 'r', encoding='utf-8') as file:
           line = file.readline()
           viperport=line.split(",")[1]

       return oldviperport,viperport,f"plugin_{windowinstance}_{steps}",f"plugin_{windowinstance}"
       #start the script
     #  subprocess.run(["tmux", "send-keys", "-t", f"plugin_{windowinstance}", new_pythonrun, "ENTER"], capture_output=True, text=True)


   def flatten_for_shell(arg_list):
       """Flatten lists and remove newlines from strings"""
       flat_args = []
       for arg in arg_list:
           if isinstance(arg, list):
               # Strip newlines/spaces from each list item before joining
               cleaned_items = [str(x).replace('\n', '').replace('\r', '').strip() for x in arg]
               joined = ' '.join(cleaned_items)
               flat_args.append(f'"{joined}"')
           else:
               # Strip newlines from single args too
               arg_str = str(arg).replace('\n', '').replace('\r', '').strip()
               if ' ' in arg_str or ',' in arg_str:
                   flat_args.append(f'"{arg_str}"')
               else:
                   if arg_str.isdigit():
                     flat_args.append(arg_str)
                   else:
                     flat_args.append(f'"{arg_str}"')

       return ' '.join(flat_args)

   def stopstart(step,stepsarr,windowinstance='default'):

     print("Stopstart")
     pythonrun=''

     print("windowinstance==",windowinstance)
     print("step==",isinstance(step,str),step)
     step=str(step)

     if step=="4":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step4_preprocess.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)
           args[-4] = stepsarr[-5]    # raw_data_topic
           args[-3] = stepsarr[-4]    # preprocesstypes
           args[-2] = stepsarr[-3]    # jsoncriteria
           args[-1] = stepsarr[-2]    # preprocess_data_topic

           args[-6] = viperport    # rollbackoffset
           args[-5] = stepsarr[-1]    # rollbackoffset

           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")
     elif step=="5":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step5_ml.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)
           args[-11] = viperport  # viper port
           args[-8] = stepsarr[-8]
           args[-7] = stepsarr[-7]
           args[-6] = stepsarr[-6]
           args[-5] = stepsarr[-5]
           args[-4] = stepsarr[-4]
           args[-3] = stepsarr[-3]
           args[-2] = stepsarr[-2]
           args[-1] = stepsarr[-1]
           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")

     elif step=="6":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step6_predictions.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)
           args[-10] = viperport  # viper port
           args[-7] = stepsarr[-7]
           args[-6] = stepsarr[-6]
           args[-5] = stepsarr[-5]
           args[-4] = stepsarr[-4]
           args[-3] = stepsarr[-3]
           args[-2] = stepsarr[-2]
           args[-1] = stepsarr[-1]
           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")
     elif step=="9":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step9_ai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)

           args[-24] = viperport  # viper port
           args[-23] = stepsarr[-18]   #vectorcollectionname
           args[-22] = stepsarr[-17]   #consumefrom
           args[-21] = stepsarr[-16]   #pgpt data topic
           args[-18] = stepsarr[-15]    #rollback
           args[-17] = stepsarr[-14]    #prompt
           args[-16] = stepsarr[-13]    #context
           args[-15] = stepsarr[-12]   #keyattribute
           args[-14] = stepsarr[-11]   #keyprocess

           args[-13] = stepsarr[-10]    #hyperbatch
           args[-12] = stepsarr[-9]     #docfolder
           args[-11] = stepsarr[-8]    #docingestinterval

           args[-7] = stepsarr[-7]    #temp
           args[-6] = stepsarr[-6]    #vectorsearch
           args[-5] = stepsarr[-5]    ##context window
           args[-4] = stepsarr[-4]    #pgptcontainername
           args[-3] = stepsarr[-3]    #pgpthost
           args[-2] = stepsarr[-2]    #pgptport
           args[-1] = stepsarr[-1]    #vectordimension
           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")
     elif step=="9b":
       oldviperport,viperport,vwn,swn=tmuxsession(windowinstance,step)
       if windowinstance=='default':
         viperport=oldviperport

       with open("/tmux/step9b_agenticai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           pythonrun = lines[2].strip()  # Index 2 = 3rd line
           wn = lines[1].strip()
           args = shlex.split(pythonrun)

           args[-27] = viperport  # viper port
           args[-26] = stepsarr[-17]
           args[-25] = stepsarr[-16]
           args[-23] = stepsarr[-15]
           args[-22] = stepsarr[-14]
           args[-18] = stepsarr[-13]
           args[-17] = stepsarr[-12]
           args[-14] = stepsarr[-11]
           args[-13] = stepsarr[-10]
           args[-12] = stepsarr[-9]
           args[-11] = stepsarr[-8]
           args[-10] = stepsarr[-7]
           args[-9] = stepsarr[-6]
           args[-8] = stepsarr[-5]
           args[-7] = stepsarr[-4]
           args[-3] = stepsarr[-3]
           args[-2] = stepsarr[-2]
           args[-1] = stepsarr[-1]
           new_pythonrun = flatten_for_shell(args) #shlex.join(flatten_for_shell(args))
           print(f"new_pythonrun: {new_pythonrun}")

     new_pythonrun=new_pythonrun.replace("<<n>>",'\n')
     if windowinstance=='default':
       subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
       subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "{}".format(new_pythonrun), "ENTER"],capture_output=True, text=True)
     else:
       subprocess.run(["tmux", "send-keys", "-t", "{}".format(swn), "{}".format(new_pythonrun), "ENTER"],capture_output=True, text=True)

       #subprocess.run(["tmux", "new", "-d", "-s", "{}".format(windowinstance)])
       #subprocess.run(["tmux", "send-keys", "-t", "{}".format(windowinstance), "{}".format(new_pythonrun), "ENTER"],capture_output=True, text=True)

   def terminatetmuxwindows(step,wn):
     # Get all tmux sessions
     wt=""
     if wn == 'all':
       result = subprocess.run(['tmux', 'list-sessions'], capture_output=True, text=True)
       sessions = result.stdout.strip().split('\n')

       for session in sessions:
           if session.startswith('plugin_'):
               session_name = session.split(':')[0]
               subprocess.run(['tmux', 'kill-session', '-t', session_name])

               print(f"Killed tmux session: {session_name}")

               mw=session_name.split("_")[1]#session_name.replace("plugin_", "", 1)
               mw=session_name
               wt = wt + mw + ","
       wt = wt[:-1]
       with open("/tmux/step4_preprocess.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
       with open("/tmux/step5_ml.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
       with open("/tmux/step6_predictions.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
       with open("/tmux/step9_ai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn
       with open("/tmux/step9b_agenticai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn
     elif wn=='default':
       if step=="4":
         with open("/tmux/step4_preprocess.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="5":
         with open("/tmux/step5_ml.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="6":
         with open("/tmux/step6_predictions.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="9b":
         with open("/tmux/step9b_agenticai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="9":
         with open("/tmux/step9_ai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt=wn
       if step=="0":
         with open("/tmux/step4_preprocess.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
         with open("/tmux/step5_ml.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
         with open("/tmux/step6_predictions.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn + ","
         with open("/tmux/step9_ai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn
         with open("/tmux/step9b_agenticai.txt", 'r', encoding='utf-8') as file:
           lines = file.readlines()
           wn = lines[1].strip()
           subprocess.run(["tmux", "send-keys", "-t", wn, "C-c"])
           wt = wt + wn
     else:
          subprocess.run(['tmux', 'kill-session', '-t', f"plugin_{wn}_{step}"])
          subprocess.run(['tmux', 'kill-session', '-t', f"plugin_{wn}"])
          wt = wn
     return wt

   def gettmlsystemsparams():
       repo=tsslogging.getrepo()

     ############################################### API Routes ########################################

       if VIPERHOST != "":
           #app = Flask(__name__)
           app = FastAPI()

           app.add_middleware(
                 CORSMiddleware,
                 allow_origins=["*"],  # Allow all for dev
                 allow_credentials=True,
                 allow_methods=["*"],
                 allow_headers=["*"],
           )

   #-------------------------------- TERMINATE WINDOW -----------------------------------------------------
           @app.post('/api/v1/terminatewindow')
           def windowterminate(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing windows", 400

             step = jdata.get('step','')
             windowname = jdata.get('windowname','')

             if windowname != '':
                  wd=terminatetmuxwindows(step,windowname)
                  return {
                       'status': f"success: windows terminated: {wd}",
                  }

             return {
                 'status': 'success: no windows terminated',
             }

   #-------------------------------- CREATETOPIC -----------------------------------------------------
           @app.post('/api/v1/createtopic')
           def storecreatetopic(jdata: dict):
   #          jdata = request.get_json()
             if not jdata or not jdata.get('topics'):
               return "Missing topics", 400

             topics = jdata.get('topics')
             numpartitions = int(jdata.get('numpartitions',3))
             replication = int(jdata.get('replication',1))
             description = jdata.get('description','user topic')

             enabletls = int(jdata.get('enabletls',1))
             ptarr = [t.strip() for t in topics.split(",") if t.strip()]
             brokerhost=''
             brokerport=''
             try:
               for pt in ptarr:
                 if len(pt)>0:
                   result=maadstml.vipercreatetopic(VIPERTOKEN,VIPERHOST,VIPERPORT,pt,'companyname',
                                    'myname','myemail','mylocation',description,enabletls,
                                    brokerhost,brokerport,numpartitions,replication,'')
                   print(result)
                   writeviperlogs("INFO",f"Creating Topic: {pt}",VIPERTOKEN,VIPERHOST,VIPERPORT)
               return {
                 'status': 'success',
                 'topics': topics,
                 'partitions': numpartitions,
                 'replication': replication,
                 'description': description
               }
             except Exception as e:
               writeviperlogs("ERROR",f"Creating Topic failed: {pt}: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
               return {
                 'status': f"error: {e}",
                 'topics': topics,
                 'partitions': numpartitions,
                 'replication': replication,
                 'description': description
               }


   #-------------------------------- PREPROCESS -----------------------------------------------------
           @app.post('/api/v1/preprocess')
           def storepreprocess(jdata: dict):
   #          jdata = request.get_json()
             if not jdata or not jdata.get('rawdatatopic'):
               return "Missing preprocess or invalid preprocess", 400

             step = str(jdata.get('step','') )
             try:
              if step=='4':
               step4raw_data_topic = jdata.get('rawdatatopic','')
               step4preprocess_data_topic = jdata.get('preprocessdatatopic','')
               step4preprocesstypes = jdata.get('preprocesstypes','')
               step4jsoncriteria = jdata.get('jsoncriteria','')
               rollbackoffset = jdata.get('rollbackoffsets',200)

               windowinstance = jdata.get("windowinstance","default")
               step4arr = [step4raw_data_topic,step4preprocesstypes,step4jsoncriteria,step4preprocess_data_topic,rollbackoffset]
               stopstart(step,step4arr,windowinstance)

              elif step=='4c':
                maxrows = jdata.get('maxrows',10)
                searchterms = jdata.get('searchterms','')
                rememberpastwindows = jdata.get('rememberpastwindows',5)
                patternwindowthreshold = jdata.get('patternwindowthreshold',30)
                raw_data_topic = jdata.get('raw_data_topic','')
                rtmsstream = jdata.get('rtmsstream','')
                rtmsscorethreshold = jdata.get('rtmsscorethreshold',0.6)
                attackscorethreshold = jdata.get('attackscorethreshold',0.6)
                patternscorethreshold = jdata.get('patternscorethreshold',0.6)
                localsearchtermfolder = jdata.get('localsearchtermfolder','')
                localsearchtermfolderinterval = jdata.get('localsearchtermfolderinterval','')
                rtmsfoldername = jdata.get('rtmsfoldername','')
                rtmsmaxwindows = jdata.get('rtmsmaxwindows',10000)
                windowinstance = jdata.get("windowinstance","default")
                step4carr = [maxrows,searchterms,rememberpastwindows,patternwindowthreshold,raw_data_topic,rtmsstream,rtmsscorethreshold,attackscorethreshold,patternscorethreshold,
                            localsearchtermfolder,localsearchtermfolderinterval,rtmsfoldername,rtmsmaxwindows]
                stopstart(step,step4carr,windowinstance)

              return {
                 'status': 'success',
                 'step4raw_data_topic': jdata.get('rawdatatopic',''),
                 'step4preprocess_data_topic': jdata.get('preprocessdatatopic',''),
                 'step4preprocesstypes': jdata.get('preprocesstypes',''),
                 'step4jsoncriteria': jdata.get('jsoncriteria',''),
                 'rollbackoffset': jdata.get('rollbackoffset',400),
                 'windowinstance': jdata.get("windowinstance","default")
                 }
             except Exception as e:
              writeviperlogs("ERROR",f"Preprocessing failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
              return {
                 'status': f"error:{e}",
                 'step4raw_data_topic': jdata.get('rawdatatopic',''),
                 'step4preprocess_data_topic': jdata.get('preprocessdatatopic',''),
                 'step4preprocesstypes': jdata.get('preprocesstypes',''),
                 'step4jsoncriteria': jdata.get('jsoncriteria',''),
                 'rollbackoffset': jdata.get('rollbackoffset',400),
                 'windowinstance': jdata.get("windowinstance","default")
                 }


   #-------------------------------- MACHINE LEARNING -----------------------------------------------------
           @app.post('/api/v1/ml')
           def storeml(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing ml or invalid ml", 400

             step = str(jdata.get('step','') )
             try:
               if step=="5":
                trainingdatafolder = jdata.get('trainingdatafolder','')
                ml_data_topic = jdata.get('ml_data_topic','')
                preprocess_data_topic = jdata.get('preprocess_data_topic','')
                islogistic = jdata.get('islogistic',0)
                dependentvariable = jdata.get('dependentvariable','failure')
                independentvariables = jdata.get('independentvariables','')
                processlogic = jdata.get('processlogic','')
                rollbackoffsets = jdata.get('rollbackoffsets',50)
                windowinstance = jdata.get('windowinstance','default')
                step5arr = [rollbackoffsets,processlogic,independentvariables,dependentvariable,
                            islogistic,preprocess_data_topic,ml_data_topic,trainingdatafolder]
                stopstart(step,step5arr,windowinstance)
                return {
                 'status': "success",
                 'trainingdatafolder': jdata.get('trainingdatafolder',''),
                 'ml_data_topic': jdata.get('ml_data_topic',''),
                 'preprocess_data_topic': jdata.get('preprocess_data_topic',''),
                 'islogistic': jdata.get('islogistic',0),
                 'dependentvariable': jdata.get('dependentvariable','failure'),
                 'independentvariables': jdata.get('independentvariables',''),
                 'processlogic': jdata.get('processlogic',''),
                 'rollbackoffsets': jdata.get('rollbackoffsets',50),
                 'windowinstance': jdata.get('windowinstance','default')
                 }
             except Exception as e:
                writeviperlogs("ERROR",f"Machine learning failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                return {
                 'status': f"error:{e}",
                 'trainingdatafolder': jdata.get('trainingdatafolder',''),
                 'ml_data_topic': jdata.get('ml_data_topic',''),
                 'preprocess_data_topic': jdata.get('preprocess_data_topic',''),
                 'islogistic': jdata.get('islogistic',0),
                 'dependentvariable': jdata.get('dependentvariable','failure'),
                 'independentvariables': jdata.get('independentvariables',''),
                 'processlogic': jdata.get('processlogic',''),
                 'rollbackoffsets': jdata.get('rollbackoffsets',50),
                 'windowinstance': jdata.get("windowinstance","default")
                 }

   #-------------------------------- PREDICTIONS -----------------------------------------------------
           @app.post('/api/v1/predict')
           def predictdata(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing ml or invalid prediction", 400

             step = str(jdata.get('step','') )

             try:
               if step=="6":
                pathtoalgos = jdata.get('pathtoalgos','')
                maxrows = jdata.get('rollbackoffsets',50)
                consumefrom = jdata.get('consumefrom','')
                inputdata = jdata.get('inputdata','')
                streamstojoin = jdata.get('streamstojoin','')
                ml_prediction_topic = jdata.get('ml_prediction_topic','')
                preprocess_data_topic = jdata.get('preprocess_data_topic','')
                windowinstance = jdata.get('windowinstance','default')
                step6arr = [maxrows,preprocess_data_topic,ml_prediction_topic,streamstojoin,inputdata,consumefrom,pathtoalgos]
                stopstart(step,step6arr,windowinstance)
                return {
                 'status': "success",
                  'pathtoalgos': jdata.get('pathtoalgos',''),
                  'maxrows': jdata.get('rollbackoffsets',50),
                  'consumefrom': jdata.get('consumefrom',''),
                  'inputdata': jdata.get('inputdata',''),
                  'streamstojoin': jdata.get('streamstojoin',''),
                  'ml_prediction_topic': jdata.get('ml_prediction_topic',''),
                  'preprocess_data_topic': jdata.get('preprocess_data_topic',''),
                  'windowinstance': jdata.get('windowinstance','default')
                 }
             except Exception as e:
                writeviperlogs("ERROR",f"Predictions failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                return {
                 'status': f"error:{e}",
                  'pathtoalgos': jdata.get('pathtoalgos',''),
                  'maxrows': jdata.get('rollbackoffsets',50),
                  'consumefrom': jdata.get('consumefrom',''),
                  'inputdata': jdata.get('inputdata',''),
                  'streamstojoin': jdata.get('streamstojoin',''),
                  'ml_prediction_topic': jdata.get('ml_prediction_topic',''),
                  'preprocess_data_topic': jdata.get('preprocess_data_topic',''),
                  'windowinstance': jdata.get('windowinstance','default')
                 }

   #-------------------------------- AI -----------------------------------------------------
           @app.post('/api/v1/ai')
           def aidata(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing ai or invalid ai", 400

             step = str(jdata.get('step','') )
             try:
               if step=="9":
                vectordimension = jdata.get('vectordimension','768')
                contextwindowsize= jdata.get('contextwindowsize','8192') #agent - team lead - supervisor
                vectorsearchtype= jdata.get('vectorsearchtype','Manhattan')
                temperature= float(jdata.get('temperature','0.1'))
                docfolderingestinterval= jdata.get('docfolderingestinterval','900')
                docfolder= jdata.get('docfolder','')
                vectordbcollectionname= jdata.get('vectordbcollectionname','tml-pgpt')
                hyperbatch= jdata.get('hyperbatch','0')
                keyprocesstype= jdata.get('keyprocesstype','')
                keyattribute= jdata.get('keyattribute','hyperprediction')
                context= jdata.get('context','')
                prompt= jdata.get('prompt','')
                pgptport= jdata.get('pgptport','8001')
                pgpthost= jdata.get('pgpthost','http://127.0.0.1')
                pgpt_data_topic = jdata.get('pgpt_data_topic','')
                consumefrom = jdata.get('consumefrom','')
                rollbackoffset = jdata.get('rollbackoffset','5')
                pgptcontainername = jdata.get('pgptcontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2')
                windowinstance = jdata.get('windowinstance','default')

                step9arr = [vectordbcollectionname,consumefrom,pgpt_data_topic, rollbackoffset, prompt,context,keyattribute,keyprocesstype,
                            hyperbatch,docfolder,docfolderingestinterval, temperature,vectorsearchtype,contextwindowsize,pgptcontainername, pgpthost,pgptport,vectordimension]

                stopstart(step,step9arr,windowinstance)

                return {
                 'status': "success",
                  'vectordimension': jdata.get('vectordimension','768'),
                  'contextwindowsize': jdata.get('contextwindowsize','8192'), #agent - team lead - supervisor
                  'vectorsearchtype': jdata.get('vectorsearchtype','Manhattan'),
                  'temperature': jdata.get('temperature','0.1'),
                  'docfolderingestinterval': jdata.get('docfolderingestinterval','900'),
                  'docfolder': jdata.get('docfolder',''),
                  'vectordbcollectionname': jdata.get('vectordbcollectionname','tml-pgpt'),
                  'hyperbatch': jdata.get('hyperbatch','0'),
                  'keyprocesstype': jdata.get('keyprocesstype',''),
                  'keyattribute': jdata.get('keyattribute','hyperprediction'),
                  'context': jdata.get('context',''),
                  'prompt': jdata.get('prompt',''),
                  'pgptport': jdata.get('pgptport','8001'),
                  'pgpthost': jdata.get('pgpthost','http://127.0.0.1'),
                  'pgpt_data_topic': jdata.get('pgpt_data_topic',''),
                  'consumefrom': jdata.get('consumefrom',''),
                  'rollbackoffset': jdata.get('rollbackoffset','5'),
                  'pgptcontainername': jdata.get('pgptcontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2'),
                  'windowinstance': jdata.get('windowinstance','default')
                 }
             except Exception as e:
                writeviperlogs("ERROR",f"AI failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                return {
                 'status': f"error:{e}",
                  'vectordimension': jdata.get('vectordimension','768'),
                  'contextwindowsize': jdata.get('contextwindowsize','8192'), #agent - team lead - supervisor
                  'vectorsearchtype': jdata.get('vectorsearchtype','Manhattan'),
                  'temperature': jdata.get('temperature','0.1'),
                  'docfolderingestinterval': jdata.get('docfolderingestinterval','900'),
                  'docfolder': jdata.get('docfolder',''),
                  'vectordbcollectionname': jdata.get('vectordbcollectionname','tml-pgpt'),
                  'hyperbatch': jdata.get('hyperbatch','0'),
                  'keyprocesstype': jdata.get('keyprocesstype',''),
                  'keyattribute': jdata.get('keyattribute','hyperprediction'),
                  'context': jdata.get('context',''),
                  'prompt': jdata.get('prompt',''),
                  'pgptport': jdata.get('pgptport','8001'),
                  'pgpthost': jdata.get('pgpthost','http://127.0.0.1'),
                  'pgpt_data_topic': jdata.get('pgpt_data_topic',''),
                  'consumefrom': jdata.get('consumefrom',''),
                  'rollbackoffset': jdata.get('rollbackoffset','5'),
                  'pgptcontainername': jdata.get('pgptcontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2'),
                  'windowinstance': jdata.get('windowinstance','default')
                 }

   #-------------------------------- AGENTIC AI -----------------------------------------------------
           @app.post('/api/v1/agenticai')
           def agenticaidata(jdata: dict):
   #          jdata = request.get_json()
             if not jdata:
               return "Missing agentic ai or invalid agentic ai", 400

             step = str(jdata.get('step','') )

             try:
               if step=="9b":
                maxrows = jdata.get('rollbackoffsets',10)
                ollamamodel= jdata.get('ollama-model','phi3:3.8b,phi3:3.8b,llama3.2:3b') #agent - team lead - supervisor
                vectordbpath= jdata.get('vectordbpath','/rawdata/vectordb')
                temperature= float(jdata.get('temperature','0.1'))
                vectordbcollectionname= jdata.get('vectordbcollectionname','tml-llm-model')
                ollamacontainername= jdata.get('ollamacontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools')
                embedding= jdata.get('embedding','nomic-embed-text')
                agents_topic_prompt= jdata.get('agents_topic_prompt','')
                teamlead_topic= jdata.get('teamlead_topic','team-lead-responses')
                teamleadprompt= jdata.get('teamleadprompt','')
                supervisor_topic= jdata.get('supervisor_topic','supervisor-responses')
                supervisorprompt= jdata.get('supervisorprompt','')
                agenttoolfunctions= jdata.get('agenttoolfunctions','')
                agent_team_supervisor_topic= jdata.get('agent_team_supervisor_topic','all-agents-responses')
                contextwindow = jdata.get('contextwindow','4096')
                localmodelsfolder = jdata.get('localmodelsfolder','/rawdata/ollama')
                agenttopic = jdata.get('agenttopic','agent-responses')
                windowinstance = jdata.get('windowinstance','default')
                step9barr = [maxrows,ollamamodel,vectordbpath,temperature,vectordbcollectionname,ollamacontainername,embedding,agents_topic_prompt,teamlead_topic,teamleadprompt,
                            supervisor_topic,supervisorprompt,agenttoolfunctions,agent_team_supervisor_topic,contextwindow,localmodelsfolder,agenttopic]
                stopstart(step,step9barr,windowinstance)

                return {
                 'status': "success",
                 'rollbackoffset': jdata.get('rollbackoffsets',10),
                 'ollamamodel': jdata.get('ollama-model','phi3:3.8b,phi3:3.8b,llama3.2:3b'), #agent - team lead - supervisor
                 'vectordbpath': jdata.get('vectordbpath','/rawdata/vectordb'),
                 'temperature': jdata.get('temperature','0.1'),
                 'vectordbcollectionname': jdata.get('vectordbcollectionname','tml-llm-model'),
                 'ollamacontainername': jdata.get('ollamacontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools'),
                 'embedding': jdata.get('embedding','nomic-embed-text'),
                 'agents_topic_prompt': jdata.get('agents_topic_prompt',''),
                 'teamlead_topic': jdata.get('teamlead_topic','team-lead-responses'),
                 'teamleadprompt': jdata.get('teamleadprompt',''),
                 'supervisor_topic': jdata.get('supervisor_topic','supervisor-responses'),
                 'supervisorprompt': jdata.get('supervisorprompt',''),
                 'agenttoolfunctions': jdata.get('agenttoolfunctions',''),
                 'agent_team_supervisor_topic': jdata.get('agent_team_supervisor_topic','all-agents-responses'),
                 'contextwindow': jdata.get('contextwindow','4096'),
                 'localmodelsfolder': jdata.get('localmodelsfolder','/rawdata/ollama'),
                 'agenttopic': jdata.get('agenttopic','agent-responses'),
                 'windowinstance': jdata.get('windowinstance','default')
                 }
             except Exception as e:
                writeviperlogs("ERROR",f"Agentic AI failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                return {
                 'status': f"error:{e}",
                 'rollbackoffset': jdata.get('rollbackoffsets',10),
                 'ollamamodel': jdata.get('ollama-model','phi3:3.8b,phi3:3.8b,llama3.2:3b'), #agent - team lead - supervisor
                 'vectordbpath': jdata.get('vectordbpath','/rawdata/vectordb'),
                 'temperature': jdata.get('temperature','0.1'),
                 'vectordbcollectionname': jdata.get('vectordbcollectionname','tml-llm-model'),
                 'ollamacontainername': jdata.get('ollamacontainername','maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools'),
                 'embedding': jdata.get('embedding','nomic-embed-text'),
                 'agents_topic_prompt': jdata.get('agents_topic_prompt',''),
                 'teamlead_topic': jdata.get('teamlead_topic','team-lead-responses'),
                 'teamleadprompt': jdata.get('teamleadprompt',''),
                 'supervisor_topic': jdata.get('supervisor_topic','supervisor-responses'),
                 'supervisorprompt': jdata.get('supervisorprompt',''),
                 'agenttoolfunctions': jdata.get('agenttoolfunctions',''),
                 'agent_team_supervisor_topic': jdata.get('agent_team_supervisor_topic','all-agents-responses'),
                 'contextwindow': jdata.get('contextwindow','4096'),
                 'localmodelsfolder': jdata.get('localmodelsfolder','/rawdata/ollama'),
                 'agenttopic': jdata.get('agenttopic','agent-responses'),
                 'windowinstance': jdata.get('windowinstance','default')
                 }

   #-------------------------------- CONSUME -----------------------------------------------------
           @app.post('/api/v1/consume')
           def consumedata(jdata: dict):
   #          jdata = request.get_json()
             osdu = jdata.get('osdu','false')
             kind = jdata.get('kind','tml')

             if not jdata or not jdata.get('topic'):
               if osdu=='false':
                 return "Missing ml or invalid consume", 400
               else:
                 return {
                     "kind": f"{kind}",
                     "id": "consume-error",
                     "error": {
                         "code": 400,
                         "message": "Missing topic or invalid consume request",
                         "reason": "Topic parameter required"
                     }
                 }
             forward_statuses = []
             maintopic = jdata.get('topic','')
             forwardurl = jdata.get('forwardurl','')
             legal = jdata.get('legal','tml-legal')

             forward_headers = {'Content-Type': 'application/json'}

             if maintopic != '':
              try:
               rollbackoffsets = int(jdata.get('rollbackoffsets',100))
               enabletls = int(jdata.get('enabletls',1))
               consumerid='tmlconsumerplugin'
               companyname='companyname'
               offset = int(jdata.get('offset',-1))
               brokerhost = ''
               brokerport = -999
               microserviceid = ''
               topicid = jdata.get('topicid','-999')
               preprocesstype = ''
               delay = 100
               partition = -1

               result=maadstml.viperconsumefromtopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,
                           consumerid,companyname,partition,enabletls,delay,
                           offset, brokerhost,brokerport,microserviceid,
                           topicid,rollbackoffsets,preprocesstype)
               now_iso = datetime.utcnow().isoformat() + "Z"
               result = json.loads(result)
               if osdu=='false':
                   response =  {
                       'status': 'consumed',
                       'topic': maintopic,
                       'Messages': result,  # viperconsumefromtopic output
                       'consumer_id': consumerid
                   }
               else:
                   response = {
                       "kind": f"{kind}",
                       "id": f"osdu:tml:consume:{maintopic}:{int(time.time())}",
                       "data": {
                           "Topic": maintopic,
                           "ConsumerID": consumerid,
                           "CompanyName": companyname,
                           "Messages": result,  # Your viperconsumefromtopic output
                           "Partition": partition,
                           "Offset": offset,
                           "RollbackOffsets": rollbackoffsets,
                           "meta": {
                               "dataPartitionId": "tml-id",
                               "createTime": f"{now_iso}",
                               "modificationTime": f"{now_iso}",
                               "acl": {
                                   "viewers": ["data.default.viewers@tml.group"],
                                   "owners": ["data.default.owners@tml.group"]
                               },
                               "legal": {
                                   "legaltags": f"{legal}",
                                   "status": "compliant"
                               }
                           }
                       }
                   }

               if forwardurl == '':
                   #print("response=",response)
                   return response
               else:
                  farr = [fw.strip() for fw in forwardurl.split(",")]  # Clean whitespace
                  for fw in farr:
                    try:
                      fwdresponse = requests.post(
                       f"{fw}",
                        json=response,
                        headers={'Content-Type': 'application/json', 'data-partition-id': 'tml-id'}, timeout=30 )
                      forward_statuses.append({
                         'url': fw.strip(),
                         'status': fwdresponse.status_code,
                         'success': fwdresponse.ok
                      })
                    except Exception as e:
                       forward_statuses.append({'url': fw.strip(), 'error': str(e)})
                       writeviperlogs("ERROR",f"Forwarding URL failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)

                  response['forward_statuses'] = forward_statuses
                  return response
              except Exception as e:
                  print("Error=",e)
                  writeviperlogs("ERROR",f"Consume failed: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
                  return {"error": f"Consumption failed: {e}"}


   ##################### INDUSTRIAL API ##############################################################
   #-------------------------------- SCADA/MODBUS -----------------------------------------------------
           @app.post("/api/v1/scada_modbus_read")
           async def start_vessel_read(req: dict):

               #req = request.get_json()
               job_id = str(time.time())

               scada_cfg = {
                   "host": req.get("scada_host", "127.0.0.1"),
                   "port": req.get("scada_port", 2502),
                   "unit_id": req.get("slave_id", 1),
               }

               with lock:  # ✅ Thread-safe
                   if sg.read_job and sg.read_job["stop"]:
                       # Don't sleep - just skip or queue
                       pass

                   # Stop existing thread first
                   if sg.read_thread and sg.read_thread.is_alive():
                       sg.read_job["stop"] = True
                       sg.read_thread.join(timeout=float(req.get("read_interval_seconds", 0.3))+1.0)


                   sg.read_job = {"stop": False, "job_id": job_id}
                   sg.read_thread = threading.Thread(
                   target=cv.modbus_read_loop,
                   args=(
                       scada_cfg,
                       req.get("read_interval_seconds", 0.3),
                       req.get("callback_url",""),
                       req.get("max_reads",-1),
                       req.get("fields", []),
                       req.get("scaling", {}),
                       req.get("start_register", 40001) - 40001,
                       req.get("sendtotopic", ""),
                       job_id,
                       VIPERTOKEN,
                       VIPERHOST,
                       VIPERPORT,
                       default_args,
                       req.get("vessel_names", {}),
                       req.get("createvariables", "")  # ✅ Dynamic from request
                      ),
                      daemon=True,
                   )
                   sg.read_thread.start()

               return {
                   "message": "SCADA Vessel read started",
                   "job_id": job_id,
                   "config_from_request": {
                       "fields": len(req.get("fields", [])),
                       "has_createvariables": bool(req.get("createvariables"))
                   }
               }


           @app.post("/api/v1/vessel_data")
           def vessel_data_callback(data: dict):
   #            data = request.get_json()

               # DYNAMIC: Handle ANY data structure from callback
               vessel = data.get('vessel', data)  # Nested OR flat

               # DYNAMIC: Find vessel identifier (vesselIndex OR first field)
               vessel_id = (vessel or {}).get('vesselIndex',
                    next(iter(vessel), 'N/A') if vessel else 'N/A')

               # DYNAMIC: Find pressure field (operatingPressure OR first numeric)
               pressure = 0
               for key, val in vessel.items():
                   if isinstance(val, (int, float)) and 'pressure' in key.lower():
                      pressure = val
                      break

               print(f"📨 Job {data.get('job_id', 'N/A')} | Vessel {vessel_id}: {pressure:.1f}")
               print(f"   Total fields: {len(vessel) if vessel else 0}")

               # DYNAMIC: Show computed vars (anything not in original fields list)
               original_fields = data.get('fields', [])
               computed_fields = {k: v for k, v in vessel.items()
                                 if k not in original_fields and isinstance(v, (int, float))}

               for field, value in list(computed_fields.items())[:3]:
                   print(f"   {field}: {value:.0f}")

               print(json.dumps(data))
               return json.dumps(data)


           @app.post("/api/v1/scada_read_stop")
           def stop_vessel_read():
               if sg.read_job:
                   sg.read_job["stop"] = True
               return {"message": "Stop signal sent"}

           @app.get("/api/v1/scada_status")
           def status():
               return {
                   "running": sg.read_job is not None and not sg.read_job.get("stop", True) if sg.read_job else False
               }

   ################################# MQTT #############################################################

           @app.post("/api/v1/mqtt_subscribe")
           def start_mqtt_subscribe(req: dict):

            try:
             job_id = str(time.time())
             mqtt_cfg = {
               "broker": req.get("mqtt_broker", ""),
               "port": int(req.get("mqtt_port", "8883")),
               "topic": req.get("mqtt_subscribe_topic", ""),
               "sendtotopic": req.get("sendtotopic",""),
               "username": os.environ.get('MQTTUSERNAME', ''),
               "password": os.environ.get('MQTTPASSWORD', ''),
               "enable_tls": req.get("mqtt_enabletls","1"),
               "VIPERTOKEN": app.config['VIPERTOKEN'],
               "VIPERHOST":  app.config['VIPERHOST'],
               "VIPERPORT": app.config['VIPERPORT'],
               "default_args": default_args,
             }

             with mqtt_lock:  # New lock for MQTT globals (add to scadaglobals.py)
             # Stop existing MQTT thread
               if sg.mqtt_thread and sg.mqtt_thread.is_alive():
                 sg.mqtt_job["stop"] = True
                 sg.mqtt_client.disconnect()
   #              sg.mqtt_thread.join(timeout=2.0)

               sg.mqtt_job = {"stop": False, "job_id": job_id}
               sg.mqtt_thread = threading.Thread(
                  target=mq.mqttserverconnect_threaded,  # Your function, modified below
                  args=(mqtt_cfg, job_id),
                  daemon=False
                )
               sg.mqtt_thread.start()

               # Keep this thread alive as long as the job is running

             return {
               "message": "MQTT subscription started",
               "job_id": job_id
             }

            except Exception as e:
               print("❌ JSON ERROR:", str(e))
               return {"error": f"JSON parse failed: {str(e)}"}
   ####################################################################################################

           @app.post('/api/v1/jsondataline')
           def storejsondataline(jdata: dict):
   #          jdata = request.get_json()
             topic = jdata.get('sendtotopic','')
             jdata = json.dumps(jdata)
             readdata(jdata,VIPERTOKEN,VIPERHOST,VIPERPORT,topic)
             return "ok"

           @app.post('/api/v1/jsondataarray')
           def storejsondataarray(jdata: List[dict]):
   #          jdata = request.get_json()

             for item in jdata:
                topic = item.get('sendtotopic','')
                item = json.dumps(item)
                readdata(item,VIPERTOKEN,VIPERHOST,VIPERPORT,topic)
             return "ok"

   ####################################################################################################
           @app.post('/api/v1/health')
           def tmux_health_check_json() -> Dict[str, Any]:
               def run_tmux(cmd):
                   try:
                       result = subprocess.run(['tmux'] + cmd, capture_output=True, text=True, timeout=10)
                       return result.stdout.strip()
                   except:
                       return ""

               result = {
                   "timestamp": datetime.now().isoformat(),
                   "sessions": [],
                   "summary": {
                       "total_plugin_windows": 0,
                       "error_count": 0,
                       "healthy": True
                   }
               }

               # Get clean session list
               sessions_raw = run_tmux(['ls', '-F', '#{session_name}']) or run_tmux(['list-sessions', '-F', '#{session_name}'])
               sessions = [s.strip() for s in sessions_raw.split('\n') if s.strip()]

               crash_patterns = [r'panic[:\s]', r'fatal\s+error', r'segmentation.*fault',
                                r'SIGSEGV', r'runtime\s+error', r'goroutine\s+panic',
                                r'signal:.*killed', r'signal:.*abrt']

               for session_name in sessions:
                   # ✅ FIX 1: Check if SESSION starts with plugin_
                   is_plugin_session = session_name.startswith('plugin_')
                   session_name_user ="n/a"
                   if is_plugin_session:
                     session_name_user=session_name.split("_")[1]

                   session_data = {
                       "name": session_name,
                       "user_session": session_name_user,
                       "is_plugin_session": is_plugin_session,
                       "plugin_windows": [],
                       "status": "healthy",
                       "plugin_window_count": 0
                   }

                   # Get windows for this session
                   windows_raw = run_tmux(['list-windows', '-t', session_name,
                                          '-F', '#{window_index}:#{window_name}'])
                   windows = [w for w in windows_raw.split('\n') if ':' in w]

                   # ✅ FIX 2: Include ANY window starting with plugin_ OR session is plugin_
                   plugin_windows = []
                   for win in windows:
                       win_index, win_name = win.split(':', 1)
                       # Check if WINDOW starts with plugin_ OR SESSION is plugin_
                       #if win_name.startswith('plugin_') or is_plugin_session:
                       plugin_windows.append((win_index, win_name))

                   # Process plugin windows
                   for win_index, win_name in plugin_windows:
                       result["summary"]["total_plugin_windows"] += 1
                       session_data["plugin_window_count"] += 1

                       pane_content = run_tmux(['capture-pane', '-t', f'{session_name}:{win_index}.0',
                                              '-S', '-1000', '-e', '-q'])

                       crashes = [line.strip() for line in pane_content.split('\n')
                                 if any(re.search(p, line, re.IGNORECASE) for p in crash_patterns)]

                       window_data = {
                           "index": win_index,
                           "name": win_name,
                           "status": "healthy" if not crashes else "crashed",
                           "crash_lines": crashes[:5]
                       }

                       if crashes:
                           result["summary"]["error_count"] += 1
                           session_data["status"] = "unhealthy"
                           result["summary"]["healthy"] = False

                       session_data["plugin_windows"].append(window_data)

                   # ✅ FIX 3: Include ANY session with plugin activity
                   if session_data["plugin_window_count"] > 0 or is_plugin_session:
                       result["sessions"].append(session_data)

               writeviperlogs("INFO",f"{result}",VIPERTOKEN,VIPERHOST,VIPERPORT)

               return result



   ####################################################################################################
           #app.run(port=default_args['rest_port']) # for dev
           if os.environ['TSS']=="0":
             try:
               #http_server = WSGIServer(('', int(default_args['rest_port'])), app)

               uvicorn.run(
                 app,  # Replace 'your_file_name' with actual filename
                 host="0.0.0.0",
                 port=int(default_args['rest_port']),
                 log_level="info",
                 reload=False  # Disable reload in production
               )

             except Exception as e:
              tsslogging.locallogs("ERROR", "STEP 3: Cannot connect to WSGIServer in {} - {}".format(os.path.basename(__file__),e))

              tsslogging.tsslogit("ERROR: Cannot connect to WSGIServer in {}".format(os.path.basename(__file__)), "ERROR" )
    #          tsslogging.git_push("/{}".format(repo),"Entry from {} - {}".format(os.path.basename(__file__),e),"origin")
              print("ERROR: Cannot connect to  WSGIServer")
              writeviperlogs("ERROR",f"Cannot start TML Plugin server: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
              return
           else:
             try:
               print("Listening")
               writeviperlogs("INFO","TML Plugin Server Started",VIPERTOKEN,VIPERHOST,VIPERPORT)
               #http_server = WSGIServer(('', int(default_args['tss_rest_port'])), app)

               uvicorn.run(
                  app,  # Replace 'your_file_name' with actual filename
                  host="0.0.0.0",
                  port=int(default_args['tss_rest_port']),
                  log_level="info",
                  reload=False  # Disable reload in production
               )
             except Exception as e:
              tsslogging.locallogs("ERROR", "STEP 3: Cannot connect to WSGIServer in {} - {}".format(os.path.basename(__file__),e))
              tsslogging.tsslogit("ERROR: Cannot connect to WSGIServer in {}".format(os.path.basename(__file__)), "ERROR" )
   #           tsslogging.git_push("/{}".format(repo),"Entry from {} - {}".format(os.path.basename(__file__),e),"origin")
              print("ERROR: Cannot connect to  WSGIServer")
              writeviperlogs("ERROR",f"Cannot start plugin server: {e}",VIPERTOKEN,VIPERHOST,VIPERPORT)
              return

           tsslogging.locallogs("INFO", "STEP 3: RESTAPI HTTP Server started ... successfully")
   #        http_server.serve_forever()

        #return [VIPERTOKEN,VIPERHOST,VIPERPORT]

   def readdata(valuedata,VIPERTOKEN, VIPERHOST, VIPERPORT,topic=''):
         args = default_args

         # MAin Kafka topic to store the real-time data
         if topic=='':
           maintopic = args['topics']
         else:
           maintopic = topic

         producerid = args['producerid']
         try:
             producetokafka(valuedata, "", "",producerid,maintopic,"",args,VIPERTOKEN, VIPERHOST, VIPERPORT)
             # change time to speed up or slow down data
             #time.sleep(0.15)
         except Exception as e:
             print(e)
             pass

   def windowname(wtype,sname,dagname):
       randomNumber = random.randrange(10, 9999)
       wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
       with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
         file.writelines("{}\n".format(wn))

       return wn

   def startproducing(**context):
          global VIPERTOKEN, VIPERHOST, VIPERPORT, HTTPADDR
          sd = context['dag'].dag_id
          sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
          pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

          VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
          VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
          VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
          HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

          tsslogging.locallogs("INFO", "STEP 3: producing data started")

          chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

          repo=tsslogging.getrepo()
          if sname != '_mysolution_':
           fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
          else:
            fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

          hs,VIPERHOSTFROM=tsslogging.getip(VIPERHOST)
          ti = context['task_instance']
          ti.xcom_push(key="{}_PRODUCETYPE".format(sname),value='REST')
          ti.xcom_push(key="{}_TOPIC".format(sname),value=default_args['topics'])
          if os.environ['TSS']=="0":
            ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="_{}".format(default_args['rest_port']))
          else:
            ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="_{}".format(default_args['tss_rest_port']))

          ti.xcom_push(key="{}_TSSCLIENTPORT".format(sname),value="_{}".format(default_args['tss_rest_port']))
          ti.xcom_push(key="{}_TMLCLIENTPORT".format(sname),value="_{}".format(default_args['rest_port']))

          ti.xcom_push(key="{}_IDENTIFIER".format(sname),value=default_args['identifier'])
          ti.xcom_push(key="{}_FROMHOST".format(sname),value="{},{}".format(hs,VIPERHOSTFROM))
          ti.xcom_push(key="{}_TOHOST".format(sname),value=VIPERHOST)

          ti.xcom_push(key="{}_PORT".format(sname),value="_{}".format(VIPERPORT))
          ti.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)

          wn = windowname('produce',sname,sd)
          subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
          subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-produce", "ENTER"])
          subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOSTFROM,VIPERPORT[1:]), "ENTER"])

   if __name__ == '__main__':

       if len(sys.argv) > 1:
          if sys.argv[1] == "1":
            VIPERTOKEN = sys.argv[2]
            VIPERHOST = sys.argv[3]
            VIPERPORT = sys.argv[4]
            os.environ['VIPERTOKEN']=VIPERTOKEN
            os.environ['VIPERHOST']=VIPERHOST
            os.environ['VIPERPORT']=VIPERPORT

            gettmlsystemsparams()

7.5.5.7. STEP 3b: Parameter Explanation

Parameter

Explanation

owner

Specify owner for Dag

enabletls

Set to 1, for encrytion, 0 no encryption

microserviceid

If using load balancer set this to the microservice id or else leave blank

producerid

Specifies an identifier name i.e. iotsolution’

topics

Specify name of topic to store data into -

Note: This is the raw_data_topic in STEP 2 Dag

identifier

Specify some identifying name for solution i.e. TML solution

tss_rest_port

This is the port for TSS dev testing.

You can point your REST API client (rest_port)

to match this port.

rest_port

This is the TML solution port.

Point your client rest_port to here

when running the TML in its own container.

The tss_rest_port and rest_port are

different numbers but they perform the

same use: tss is for DEV, rest is container.

delay

System delay parameter when VIPER streams to Kafka.

topicid

Monitors all device entities. Leave at -999

7.5.5.8. STEP 3b.i: REST API CLIENT

tml-client-RESTAPI-step-3-kafka-producetotopic.py

import requests
import sys
from datetime import datetime
import time
import json

sys.dont_write_bytecode = True

# defining the api-endpoint
rest_port = "9002"  # <<< ***** Change Port to match the Server Rest_PORT
httpaddr = "http:" # << Change to https or http

# Modify the apiroute: jsondataline, or jsondataarray
# 1. jsondataline: You can send One Json message at a time
# 1. jsondatarray: You can send a Json array

apiroute = "jsondataline"

# USE THIS ENDPOINT IF TML RUNNING IN DOCKER CONTAINER
# DOCKER CONTAINER ENDPOINT
#API_ENDPOINT = "{}//localhost:{}/{}".format(httpaddr,rest_port,apiroute)

# USE THIS ENDPOINT IF TML RUNNING IN KUBERNETES
# KUBERNETES ENDPOINT
API_ENDPOINT = "{}//tml.tss/ext/{}".format(httpaddr,apiroute)

def send_tml_data(data):
  # data to be sent to api
  headers = {'Content-type': 'application/json'}
  print(API_ENDPOINT)
  r = requests.post(url=API_ENDPOINT, data=json.dumps(data), headers=headers)

  # extracting response text
  return r.text


def readdatafile(inputfile):

  ##############################################################
  # NOTE: You can send any "EXTERNAL" data through this API
  # It is reading a localfile as an example
  ############################################################

  try:
    file1 = open(inputfile, 'r')
    print("Data Producing to Kafka Started:",datetime.now())
  except Exception as e:
    print("ERROR: Something went wrong ",e)
    return
  k = 0
  while True:
    line = file1.readline()
    line = line.replace(";", " ")
    print("line=",line)
    # add lat/long/identifier
    k = k + 1
    try:
      if line == "":
        #break
        file1.seek(0)
        k=0
        print("Reached End of File - Restarting")
        print("Read End:",datetime.now())
        continue
      ret = send_tml_data(line)
      print(ret)
      # change time to speed up or slow down data
      time.sleep(.1)
    except Exception as e:
      print(e)
      time.sleep(0.1)
      pass

def start():
      inputfile = "IoTData.txt"
      readdatafile(inputfile)

if __name__ == '__main__':
    start()

7.5.5.9. STEP 3b.i: REST API CLIENT: Explanation

The REST API client runs outside the TML solution container. The client api gives you the capability of connecting to your internal systems or devices and stream the data directly to the TML server producer. The TML server producer receives data from REST API client and produces the data to Kafka.

Important

The REST API client runs outside the TML solution container. This is a very simple and convenient way to stream any type of json data from any device in your environment.

Client Core Variables

Explanation

rest_port

This is the same rest_port Json field

in

STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

apiroute

This indicates how you are sending your

Json message. You have two options:

  1. jsondataline: You can send One Json

    message at a time in each Api call

  2. jsondatarray: You can send a

    Json array in each Api call

Note: Your Json must be a valid Json. Just store your json in datajson

API_ENDPOINT

API_ENDPOINT = “http://localhost:{}/{}”.format(rest_port,apiroute)

This connects to the endpoint defined

in STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

httpaddr

This adds the http prefix.

readdatafile(inputfile)

This function is only for demo purposes. You can send any data you want using this API.

start()

This function starts the process.

Note: You can simply modify this

function as you wish repeatly to

stream your data.

send_tml_data(data)

This is the main function that streams

your data to

STEP 3b: Produce Data Using RESTAPI: tml-read-RESTAPI-step-3-kafka-producetotopic-dag

7.5.5.10. REST API Reference Architecture

_images/restimg.png

7.5.5.11. STEP 3c: Produce Data Using gRPC: tml-read-gRPC-step-3-kafka-producetotopic-dag

 import asyncio
 import signal
 from google.protobuf.json_format import MessageToJson
 from grpc_reflection.v1alpha import reflection
 import maadstml
 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import grpc
 from concurrent import futures
 import time
 import tml_grpc_pb2_grpc as pb2_grpc
 import tml_grpc_pb2 as pb2

 import tsslogging
 import sys
 import os
 import subprocess
 import random
 import json
 import nest_asyncio
 nest_asyncio.apply()
 #from grpc.experimental import aio
 sys.dont_write_bytecode = True
 ##################################################  gRPC SERVER ###############################################
 # This is a gRPCserver that will handle connections from a client
 # There are two endpoints you can use to stream data to this server:
 # 1. jsondataline -  You can POST a single JSONs from your client app. Your json will be streamed to Kafka topic.
 # 2. jsondataarray -  You can POST JSON arrays from your client app. Your json will be streamed to Kafka topic.

 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice', # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '', # <<< ***** leave blank
   'producerid' : 'iotsolution',  # <<< *** Change as needed
   'topics' : 'iot-raw-data', # *************** This is one of the topic you created in SYSTEM STEP 2
   'identifier' : 'TML solution',  # <<< *** Change as needed
   'tss_gRPC_Port' : '9001',  # <<< ***** replace with gRPC port i.e. this gRPC server listening on port 9001
   'gRPC_Port' : '9002',  # <<< ***** replace with gRPC port i.e. this gRPC server listening on port 9001
   'delay' : '7000', # << ******* 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'topicid' : '-999', # <<< ********* do not modify
 }

 ######################################## DO NOT MODIFY BELOW #############################################


 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""
 VIPERHOSTFROM=""


 class TmlprotoService(pb2_grpc.TmlprotoServicer):

   def __init__(self, *args, **kwargs):
     pass

   async def GetServerResponse(self, request, context):

     maintopic = default_args['topics']
     producerid = default_args['producerid']


     if request != None:
      try:
       message = json.dumps(json.loads(request.message))
       inputbuf=f"{message}"
       print("inputbuf=",inputbuf)

       topicid=default_args['topicid']

      # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topi> delay=int(args['delay'])
       enabletls = int(default_args['enabletls'])
       identifier = default_args['identifier']
       delay = int(default_args['delay'])
       try:
         result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,'',
                                             topicid,identifier)
         return pb2.MessageResponse(message="Success producing message",received=True)
       except Exception as e:
         return pb2.MessageResponse(message="Failed to produce message, err={} message={}".format(e,inputbuf),received=False)
      except Exception as e:
       return pb2.MessageResponse(message="Failed to produce message, err={} message={}".format(e,inputbuf),received=False)


     return pb2.MessageResponse(message="Failed to produce message",received=False)

 async def serve():


     tsslogging.locallogs("INFO", "STEP 3: producing data started")
     repo=tsslogging.getrepo()
     tsslogging.tsslogit("gRPC producing DAG in {}".format(os.path.basename(__file__)), "INFO" )
     tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
     mainport=0
     server_options = [
         ("grpc.keepalive_time_ms", 20000),
         ("grpc.keepalive_timeout_ms", 10000),
         ("grpc.http2.min_ping_interval_without_data_ms", 5000),
         ("grpc.max_connection_idle_ms", 10000),
         ("grpc.max_connection_age_ms", 30000),
         ("grpc.max_connection_age_grace_ms", 5000),
         ("grpc.http2.max_pings_without_data", 5),
         ("grpc.keepalive_permit_without_calls", 1),
     ]

     try:
         server = grpc.aio.server(futures.ThreadPoolExecutor(),options=server_options)
 #        server = grpc.server(futures.ThreadPoolExecutor(max_workers=100))
         SERVICE_NAMES = (
           pb2.DESCRIPTOR.services_by_name["Tmlproto"].full_name,
           reflection.SERVICE_NAME,
         )
         reflection.enable_server_reflection(SERVICE_NAMES, server)

         pb2_grpc.add_TmlprotoServicer_to_server(TmlprotoService(), server)
         if os.environ['TSS']=="0":
 #          server_creds = grpc.alts_server_credentials()
           with open('/{}/tml-airflow/certs/server.key'.format(repo), 'rb') as f:
             server_key = f.read()
           with open('/{}/tml-airflow/certs/server.crt'.format(repo), 'rb') as f:
            server_cert = f.read()
           server_creds = grpc.ssl_server_credentials( [(server_key, server_cert)] )
           mainport=int(default_args['gRPC_Port'])
           server.add_secure_port("[::]:{}".format(int(default_args['gRPC_Port'])), server_creds)

         else:
           server.add_insecure_port("[::]:{}".format(int(default_args['tss_gRPC_Port'])))
           mainport=int(default_args['tss_gRPC_Port'])
     except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 3: Cannot connect to gRPC server in {} - {}".format(os.path.basename(__file__),e))

            tsslogging.tsslogit("ERROR: Cannot connect to gRPC server in {} - {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
            print("ERROR: Cannot connect to gRPC server in:",e)
            return

     tsslogging.locallogs("INFO", "STEP 3: gRPC server started .. waiting for connections")
     await server.start()
     print("gRPC server started - listening on port ",mainport)
     await server.wait_for_termination()

 async def shutdown_server(server) -> None:
     #logging.info ("Shutting down server...")
     await server.stop(None)

 def handle_sigterm(sig, frame) -> None:
     asyncio.create_task(shutdown_server(server))

 async def handle_sigint() -> None:
     loop = asyncio.get_running_loop()
     for sig in (signal.SIGINT, signal.SIGTERM):
         loop.add_signal_handler(sig, loop.stop)

 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startproducing(**context):
        global VIPERTOKEN
        global VIPERHOST
        global VIPERPORT
        global HTTPADDR
        global VIPERHOSTFROM

        tsslogging.locallogs("INFO", "STEP 3: producing data started")

        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))                                                 VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
        repo=tsslogging.getrepo()

        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        hs,VIPERHOSTFROM=tsslogging.getip(VIPERHOST)
        ti = context['task_instance']
        ti.xcom_push(key="{}_PRODUCETYPE".format(sname),value='gRPC')
        ti.xcom_push(key="{}_TOPIC".format(sname),value=default_args['topics'])

        if os.environ['TSS']=="0":
         ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="_{}".format(default_args['gRPC_Port']))
        else:
         ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="_{}".format(default_args['tss_gRPC_Port']))

        ti.xcom_push(key="{}_TSSCLIENTPORT".format(sname),value="_{}".format(default_args['tss_gRPC_Port']))
        ti.xcom_push(key="{}_TMLCLIENTPORT".format(sname),value="_{}".format(default_args['gRPC_Port']))

        ti.xcom_push(key="{}_IDENTIFIER".format(sname),value=default_args['identifier'])

        ti.xcom_push(key="{}_FROMHOST".format(sname),value="{},{}".format(hs,VIPERHOSTFROM))
        ti.xcom_push(key="{}_TOHOST".format(sname),value=VIPERHOST)

        ti.xcom_push(key="{}_PORT".format(sname),value=VIPERPORT)
        ti.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)

        wn = windowname('produce',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-produce", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOSTFROM,VIPERPORT[1:]), "ENTER"])

        tsslogging.locallogs("INFO", "STEP 3: producing data completed")

 if __name__ == '__main__':

     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
          VIPERTOKEN = sys.argv[2]
          VIPERHOST = sys.argv[3]
          VIPERPORT = sys.argv[4]
 #         serve()

          server = None
          signal.signal(signal.SIGTERM, handle_sigterm)
          try:
             print("Starting asyncio event loop")
             asyncio.get_event_loop().run_until_complete(serve())
          except KeyboardInterrupt:
            pass

7.5.5.12. STEP 3c: Parameter Explanation

Parameter

Explanation

owner

Specify owner for Dag

enabletls

Set to 1, for encrytion, 0 no encryption

microserviceid

If using load balancer set this to the microservice id or else leave blank

producerid

Specifies an identifier name i.e. iotsolution’

topics

Specify name of topic to store data into -

Note: This is the raw_data_topic in STEP 2 Dag

identifier

Specify some identifying name for solution i.e. TML solution

tss_gRPC_port

This is the port for TSS dev testing.

You can point your gRPC API client (self.server_port)

to match this port.

gRPC_port

This is the TML solution port.

Point your client rest_port to here

when running the TML in its own container.

The tss_gRPC_port and gRPC_port are

different numbers but they perform the

same use: tss is for DEV, rest is container.

delay

System delay parameter when VIPER streams to Kafka.

topicid

Monitors all device entities. Leave at -999

7.5.5.13. STEP 3c.i: gRPC API CLIENT

tml_client_gRPC_step_3_kafka_producetotopic.py

 import grpc
 import tml_grpc_pb2_grpc as pb2_grpc
 import tml_grpc_pb2 as pb2
 import sys
 from datetime import datetime
 import time
 import os
 import subprocess
 import base64
 import json
 # Set kubernetes = 1 if TML solution running in kubernetes
 # Set kubernetes = 0 if TML solution running in docker
 import warnings
 #warnings.filterwarnings("error")
 host='tml.tss:443'

 sys.dont_write_bytecode = True

 # NOTE YOU WILL NEED TO INSTALL grpcurl in Linux

 def sendgrpcurl(mjson):
     #first encode the json
     mainjson = '{"message":' + json.dumps(mjson) + '}'

    # mainjson=pb2.Message(message=mjson)
     sent=0
     while sent==0:
             cmd="grpcurl -insecure -keepalive-time 10 -import-path . -proto tml_grpc.proto -d '{}' {} tmlproto.Tmlproto/GetServerResponse 2>/dev/null".format(mainjson,host)
            # print("CMD=",cmd.replace("\n",""))
             cmd=cmd.replace("\n","")
             print(cmd)
             proc = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE)
             out, err = proc.communicate()
             proc.terminate()
             proc.wait()

             if out.decode('utf-8')=="":
                sent=0
             else:
                print(out.decode('utf-8'))
                sent=1
                break


 def readdata(inputfile):

       ##############################################################
       # NOTE: You can send any "EXTERNAL" data through this API
       # It is reading a localfile as an example
       ############################################################

       try:
         file1 = open(inputfile, 'r')
         print("Data Producing to Kafka Started:",datetime.now())
       except Exception as e:
         print("ERROR: Something went wrong ",e)
         return
       k = 0
       while True:
         line = file1.readline()
         line = line.replace(";", " ")
     #    print("line2=",line)
         # add lat/long/identifier
         k = k + 1
         try:
           if line == "":
             #break
             file1.seek(0)
             k=0
             print("Reached End of File - Restarting")
             print("Read End:",datetime.now())
             continue
           sendgrpcurl(line.rstrip())
           time.sleep(.0)
         except Exception as e:
           print("Main loop error=",e)
           time.sleep(.5)
           pass

 if __name__ == '__main__':
     try:

       inputfile = "IoTData.txt"
       #result = readdata(inputfile) ##### UNCOMMENT TO READ FILE
       print(f'{result}')
     except Exception as e:
       print("ERROR: ",e)

7.5.5.14. STEP 3c.i: gRPC API CLIENT: Explanation

The gRPC API client runs outside the TML solution container. The client api gives you the capability of connecting to your internal systems or devices and stream the data directly to the TML server producer. The TML server producer receives data from gRPC API client and produces the data to Kafka.

Important

The gRPC API client runs outside the TML solution container. This is a very simple and convenient way to stream any type of json data from any device in your environment.

Client Core Variables

Explanation

gRPC imports

You will need the gRPC imports:

Simply download and place these

files in the same folder as your

gRPC client.

grpcurl

The client library makes grpcurl calls to the TML server through NGINX secure proxy on port 443.

You must have the grpcurl tool installed: see Using gRPcurl to Write Data to the TML gRPC Server

connection parameters

You need to set:

  1. self.host = ‘tml.tss’

  2. self.server_port = 443

This the gRPC_port in

STEP 3c: Produce Data Using gRPC: tml-read-gRPC-step-3-kafka-producetotopic-dag

sendgrpcurl

You put your Json message here in line.

You can send any JSON message using this gRPC client to the gRPC TML server.

7.5.5.15. gRPC Reference Architecture

_images/grpcimg.png

7.5.5.16. STEP 3d: Produce Data Using LOCALFILE: tml-read-LOCALFILE-step-3-kafka-producetotopic-dag

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import json
 import time
 import random
 import threading
 from contextlib import contextmanager
 from contextlib import ExitStack
 import re

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice', # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '', # <<< *** leave blank
   'producerid' : 'iotsolution',   # <<< *** Change as needed
   'topics' : 'iot-raw-data', # *************** This is one of the topic you created in SYSTEM STEP 2
   'identifier' : 'TML solution',   # <<< *** Change as needed
   'inputfile' : '',#'/rawdatademo/cisco_network_data.txt',  # <<< ***** replace ?  to input file name to read. NOTE this data file should be JSON messages per line and stored in the HOST folder mapped to /rawdata folder
   'delay' : '7000', # << ******* 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'topicid' : '-999', # <<< ********* do not modify
   'sleep' : 0.15, # << Control how fast data streams - if 0 - the data will stream as fast as possible - BUT this may cause connecion reset by peer
   'docfolder' : 'mylogs,mylogs2', # You can read TEXT files or any file in these folders that are inside the volume mapped to /rawdata
   'doctopic' : 'rtms-stream-mylogs,rtms-stream-mylogs2',  # This is the topic that will contain the docfolder file data
   'chunks' :3000, # if 0 the files in docfolder are read line by line, otherwise they are read by chunks i.e. 512
   'docingestinterval' : 0, # specify the frequency in seconds to read files in docfolder - if 0 the files are read ONCE
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 # This sets the lat/longs for the IoT devices so it can be map
 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""

 def read_in_chunks(file_object, chunk_size=1024):
     """Lazy function (generator) to read a file piece by piece.
     Default chunk size: 1k."""
     while True:
         try:
           if chunk_size != 0:
             data = file_object.read(chunk_size).decode('utf-8')
             if len(data)>0 and data[-1] != ' ':
                  ct=0
                  for c in reversed(data):
                    if c == ' ':
                         break
                    ct = ct +1
                  if ct < len(data):
                    file_object.seek(file_object.tell()-ct)
                    data = data[:len(data)-ct]
           else:
             data = file_object.readline().decode('utf-8')
           data=data.replace('"','').replace("'","").replace("\\n"," ").replace('\n'," ").replace("\\r"," ").replace('\r'," ").replace(';'," ").replace('&'," ").strip()
           if not data:
                break
           yield data
         except Exception as e:
            break

 def readallfiles(fd,tr,cs=1024):
   args=default_args
   producerid='userfilestream'
   print("fd=",fd.name)
   for piece in read_in_chunks(fd,cs):
         piece=re.sub(' +', ' ', piece)
         pj='{"RTMSMessage":"' + piece + '"}'

         producetokafka(pj, "", "",producerid,tr,"",args)
   return []

 def ingestfiles():
     args = default_args
     buf = default_args['docfolder']
     chunks = int(default_args['chunks'])
     maintopic = default_args['doctopic']
     producerid='userfilestream'
     interval=int(default_args['docingestinterval'])

     #gather files in the folders
     dirbuf = buf.split(",")
     # check if user wants to split folders to separate topics
     maintopicbuf = maintopic.split(",")
     if len(maintopicbuf) > 1:
       if len(dirbuf) != len(maintopicbuf):
         tsslogging.locallogs("ERROR", "STEP 3: Produce LOCALFILE in {} You specified multiple doctopics, then must match docfolder".format(os.path.basename(__file__)))
         return
     elif len(maintopicbuf) == 1 and len(dirbuf) > 1:
        for i in range(len(dirbuf)-1):
          maintopicbuf.append(maintopic)
     else:
        return

     while True:
        for dr,tr in zip(dirbuf,maintopicbuf):
          filenames = []
          if os.path.isdir("/rawdata/{}".format(dr)):
            a = [os.path.join("/rawdata/{}".format(dr), f) for f in os.listdir("/rawdata/{}".format(dr)) if
            os.path.isfile(os.path.join("/rawdata/{}".format(dr), f))]
            filenames.extend(a)
            print("filename=",filenames)
            if len(filenames) > 0:
              with ExitStack() as stack:
                files = [stack.enter_context(open(i, "rb")) for i in filenames]
                contents = [readallfiles(file,tr,chunks) for file in files]
        if interval==0:
          break
        else:
         time.sleep(interval)

 def startdirread():
   if 'docfolder' not in default_args and 'doctopic' not in default_args and 'chunks' not in default_args and 'docingestinterval' not in default_args:
      return

   if default_args['docfolder'] != '' and default_args['doctopic'] != '':
     print("INFO startdirread")
     try:
       t = threading.Thread(name='child procs', target=ingestfiles)
       t.start()
     except Exception as e:
       print(e)

 def producetokafka(value, tmlid, identifier,producerid,maintopic,substream,args):
  inputbuf=value
  topicid=int(args['topicid'])

  # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
  delay = int(args['delay'])
  enabletls = int(args['enabletls'])
  identifier = args['identifier']

  try:
     result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,substream,
                                         topicid,identifier)
 #    print("result=",result)
  except Exception as e:
     print("ERROR:",e)

 def readdata():

   repo = tsslogging.getrepo()
   tsslogging.tsslogit("Localfile producing DAG in {}".format(os.path.basename(__file__)), "INFO" )
   tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

   args = default_args
   inputfile=args['inputfile']

   # MAin Kafka topic to store the real-time data
   maintopic = args['topics']
   producerid = args['producerid']

   startdirread()

   if maintopic=='' or inputfile=='':
      return
   k=0
   try:
     file1 = open(inputfile, 'r')
     print("Data Producing to Kafka Started:",datetime.now())
   except Exception as e:
     tsslogging.locallogs("ERROR", "Localfile producing DAG in {} - {}".format(os.path.basename(__file__),e))

     tsslogging.tsslogit("Localfile producing DAG in {}".format(os.path.basename(__file__)), "INFO" )
     tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
     return

   tsslogging.locallogs("INFO", "STEP 3: reading local file..successfully")

   while True:
     line = file1.readline()
     line = line.replace(";", " ")
     print("line=",line)
     # add lat/long/identifier
     k = k + 1
     try:
       if line == "":
         #break
         file1.seek(0)
         k=0
         print("Reached End of File - Restarting")
         print("Read End:",datetime.now())
         continue
       producetokafka(line.strip(), "", "",producerid,maintopic,"",args)
       # change time to speed up or slow down data
       time.sleep(args['sleep'])
     except Exception as e:
       print(e)
       pass

   file1.close()

 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startproducing(**context):

   tsslogging.locallogs("INFO", "STEP 3: producing data started")

   sd = context['dag'].dag_id

   sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
   pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))
   VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
   VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
   VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
   HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

   VIPERHOSTFROM=tsslogging.getip(VIPERHOST)
   ti = context['task_instance']
   ti.xcom_push(key="{}_PRODUCETYPE".format(sname),value='LOCALFILE')
   ti.xcom_push(key="{}_TOPIC".format(sname),value=default_args['topics'])
   ti.xcom_push(key="{}_CLIENTPORT".format(sname),value="")
   ti.xcom_push(key="{}_IDENTIFIER".format(sname),value="{},{}".format(default_args['identifier'],default_args['inputfile']))

   ti.xcom_push(key="{}_FROMHOST".format(sname),value=VIPERHOSTFROM)
   ti.xcom_push(key="{}_TOHOST".format(sname),value=VIPERHOST)

   ti.xcom_push(key="{}_TSSCLIENTPORT".format(sname),value="")
   ti.xcom_push(key="{}_TMLCLIENTPORT".format(sname),value="")

   ti.xcom_push(key="{}_PORT".format(sname),value="_{}".format(VIPERPORT))
   ti.xcom_push(key="{}_HTTPADDR".format(sname),value=HTTPADDR)

   inputfile=default_args['inputfile']
   if 'step3localfileinputfile' in os.environ:
        default_args['inputfile']=os.environ['step3localfileinputfile']
        ti.xcom_push(key="{}_inputfile".format(sname),value=default_args['inputfile'])
   else:
        ti.xcom_push(key="{}_inputfile".format(sname),value=default_args['inputfile'])

   docfolder=''
   if 'docfolder' in default_args and 'doctopic' in default_args:
     docfolder=default_args['docfolder']
     ti.xcom_push(key="{}_docfolder".format(sname),value=default_args['docfolder'])
     ti.xcom_push(key="{}_doctopic".format(sname),value=default_args['doctopic'])
     ti.xcom_push(key="{}_chunks".format(sname),value="_{}".format(default_args['chunks']))
     ti.xcom_push(key="{}_docingestinterval".format(sname),value="_{}".format(default_args['docingestinterval']))
   else:
     ti.xcom_push(key="{}_docfolder".format(sname),value='')
     ti.xcom_push(key="{}_doctopic".format(sname),value='')
     ti.xcom_push(key="{}_chunks".format(sname),value='')
     ti.xcom_push(key="{}_docingestinterval".format(sname),value='')

   if 'step3localfiledocfolder' in os.environ:
        default_args['docfolder']=os.environ['step3localfiledocfolder']
        ti.xcom_push(key="{}_docfolder".format(sname),value=default_args['docfolder'])

   chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

   repo=tsslogging.getrepo()

   if sname != '_mysolution_':
      fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
   else:
      fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

   wn = windowname('produce',sname,sd)
   subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
   subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-produce", "ENTER"])
   subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} \"{}\" \"{}\"".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],inputfile,docfolder), "ENTER"])

 if __name__ == '__main__':

     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
          VIPERTOKEN = sys.argv[2]
          VIPERHOST = sys.argv[3]
          VIPERPORT = sys.argv[4]
          inputfile = sys.argv[5]
          default_args['inputfile']=inputfile
          docfolder = sys.argv[6]
          default_args['docfolder']=docfolder
          readdata()

7.5.5.17. Core Parameter Explanation

Note

The parameters docfolder, doctopic, are needed for https://tml.readthedocs.io/en/latest/tmlbuilds.html#step-4c-preprocesing-3-data-tml-system-step-4c-kafka-preprocess-dag. For details on correlating past information in real-time using sliding time windows, refer to: How TML Maintains Past Memory of Events Using Sliding Time Windows

Parameter

Explanation

inputfile

This is the container path to your local filename. For example,

When you start TSS you must do a volume mapping to the /rawdata

folder for TSS to read your local file.

This is explained below in section: Producing Data Using a Local File

docfolder

You specify a folder name(s) you want TML to read. For example,

if docfolder=mylogs, TML assumes container path /rawdata/mylogs

that is mapped to your local machine. All TEXT will be read in this

folder.

doctopic

This is the Kafka topic that will contain the data from

the files in docfolder. NOTE: You can specify different

folder names to go to different topic. For example,

if doctopic=topic1,topic2, and docfolder=folder1,folder2

TML will stream files in folder1 -> topic1, and files

in folder2 -> topic2. This is convenient if you have lots

of logs and want to analyse them separately.

chunks

This specifies how to read the files: line by line or in chunks.

If chunks=0, the files are read and streamed to Kafka line by line,

if chunks=512, then 512 chunks are read and streamed to Kafka.

docingestinterval

This specifies if you want to freuqently read these files in

docfolder. If docingestinterval=0, they are read ONCE,

if non-zero i.e. docingestinterval=120, they are read every

120 seconds.

7.5.5.18. Producing Data Using a Local File

Important

If you are producing data by reading from a local file, you must ensure that when you run the TSS Docker Run Command that you map a volume on your host system to the rawdata folder in the container; then change the inputfile to /rawdata/<your filename> For example, you need add -v <path to a local folder on your machine>:/rawdata. to the docker run command:

  1. -v /your_localmachine/foldername:/rawdata:z

For example, your TSS Docker Run should look similar to this - replace /your_localmachine/foldername with actual name:

docker run -d --net="host" \
--env CHIP="AMD64" \
--env MAINHOST=127.0.0.1 \
--env TSS=1 \
--env SOLUTIONNAME=TSS \
--env AIRFLOWPORT=9000 \
--env VIPERVIZPORT=9005 \
--env EXTERNALPORT=-1 \
-v /var/run/docker.sock:/var/run/docker.sock:z \
-v /<your local dagsbackup folder>:/dagslocalbackup:z \
-v /your_localmachine/foldername:/rawdata:z \
--env READTHEDOCS='<Token>' \
--env GITREPOURL='<your git hub repo>' \
--env  GITUSERNAME='<your github username>' \
--env GITPASSWORD='<Personal Access Token>' \
--env DOCKERUSERNAME='<your docker hub account>' \
--env DOCKERPASSWORD='<password>' \
--env MQTTUSERNAME='<enter MQTT username>' \
--env MQTTPASSWORD='<enter MQTT password>' \
--env KAFKACLOUDUSERNAME='' \
--env KAFKACLOUDPASSWORD='<Enter your API secret>' \
--env UPDATE=1 \
maadsdocker/tml-solution-studio-with-airflow-amd64

Then,

  1. Add the filename of the file you want to read by updating the ‘inputfile’ : ‘/rawdata/?’ in STEP 3d: Produce Data Using LOCALFILE: tml-read-LOCALFILE-step-3-kafka-producetotopic-dag

7.5.5.19. Local File Reference Architecture

_images/localfileimg.png

7.5.6. STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag

Note

All preprocess data is also written to “/rawdata/preprocess” folder in the container.

If you mapped the rawdata folder then you can access these files.

7.5.6.1. Preprocessing Types

TML preprocesses real-time data for every entity along each sliding time window. This is quick and powerful way to accelerate insights from real-time data with very little effort. TML provide over 35 different preprocessing types:

Tip

Watch the YouTube on how to configure the parameters in this dag. YouTube Video

Preprocessing Type

Description

anomprob

This will determine the probability

that there is an anomaly for each

entity in the sliding time windows

anomprobx-y

where X and Y are numbers or “n”,

if “n” means examine all anomalies for recurring

patterns.

This will find the anomalies in the data

  • ignoring set patterns. They allow you to check

if the anomaly

in the streams are truly anomalies

and not some pattern. For example,

if a IoT device shuts off and turns

on again routinely,

this may be picked up as an anomaly

when in fact it is normal behaviour.

So, to ignore these cases,

if ANOMPROB2-5, tells Viper,

check anomaly with patterns of 2-5 peaks.

If the stream has two classes and these

two classes are like 0 and 1000, and show

a pattern,

then they should not be considered

an anomaly. Meaning, class=0, is the

device shutting down, class=1000

is the device turning back on.

If ANOMPROB3-10, Viper will check for

patterns of classes 3 to 10 to see if

they recur routinely. This is very helpful

to reduce false

positives and false negatives.

autocorr

This will determine the autocorrelation

in the data for each entity in the

sliding time windows

avg

This will determine the average

value for each entity in the sliding

time windows

std

This will determine the standard deviation

value for each entity in the sliding

time windows

datacleanstd#_#

This is a powerful function for data cleaning.

It uses a Standard Deviation Filter (often referred to as Z-Score filtering).

In data science and AI, this is a standard technique used to

automatically remove “outliers” or “noise” from a dataset to ensure

your model is looking at reliable trends rather than anomalies.

It also allows users to eliminate extreme values before the analysis

begins.

The code defines an “envelope” or a safe zone as:

  • upperLimit: Mean + (Tolerance * StdDev)

  • lowerLimit: Mean - (Tolerance * StdDev)

where Tolerance = #, Mean=mean of all data in the sliding time window,

StdDev=standard deviation of all data in the sliding time window.

For example, if you specify ddatacleanstd3:

then TML defines the envelope as:

  • upperLimit: Mean + (3 * StdDev)

  • lowerLimit: Mean - (3 * StdDev)

any data point inside this envelope (inclusive)

is considered “safe” - any point outside this envelope

is consider an outlier or noise and will be removed from analysis.

You can specify any reasonable number:

  • datacleanstd5,

    • upperLimit: Mean + (5 * StdDev)

    • lowerLimit: Mean - (5 * StdDev)

  • datacleanstd10,

    • upperLimit: Mean + (10 * StdDev)

    • lowerLimit: Mean - (10 * StdDev)

  • etc.

Or, to delete extreme values first you can specify:

  • datacleanstd5_10000, this will delete any value

    less than -10000 or greater 10000, it will then perform

    the Z-score filtering.

This function ensures you have clean data in your analysis

and machine learning/AI.

datacleanmad_#

This is another powerful function for data cleaning.

It uses Mean Absolute Deviation (MAD) to clean the data.

You can choose to delete extreme values first: i.e.

datacleanmad_10000

datacleaniqr_#

This is another powerful function for data cleaning.

It uses Inter Quartile Range (IQR) to clean the data.

You can choose to delete extreme values first: i.e.

datacleaniqr_10000

avgtimediff

This will determine the average time

in seconds between the first and last

timestamp for each entity in sliding windows;

time should be in this

layout:2006-01-02T15:04:05.

consistency

This will check if the data all have

consistent data types. Returns 1 for

consistent data types, 0 otherwise for

each entity in sliding windows

count

This will count the number of numeric

data points in the sliding time

windows for each entity

countstr

This will count the number of string

values in the sliding time windows for

each entity

cv

This will determine the coefficient of

variation average of the median and

the midhinge for each entity in sliding

windows

dataage_[UTC offset]_[timetype]

dataage can be used to check the

last update time of the data in

the data stream from current local

time. You can specify the UTC offset

to adjust the

current time to match the timezone of

the data stream. You can specify timetype

as millisecond, second, minute, hour, day.

For example, if

dataage_1_minute, then this processtype

will compare the last timestamp in the data

stream, to the local UTC time offset +1 and

compute the time difference

between the data stream timestamp and

current local time and return the difference

in minutes. This is a very powerful processtype

for data quality and

data assurance programs for any number of

data streams.

diff

This will find the difference between

the highest and lowest points in

the sliding time windows for each entity

diffmargin

This will find the percentage difference

between the highest and lowest points

in the sliding time windows for each entity

entropy

This will determine the entropy in the

data for each entity in the sliding

time windows; will compute the amount

of information in the data stream.

geodiff

This will determine the distance

in kilimetres between two latitude

and longitude points for each entity

in sliding windows

gm (geometric mean)

This will determine the geometric

mean for each entity in sliding windows

hm (harmonic mean)

This will determine the harmonic

mean for each entity in sliding windows

iqr

This will compute the interquartile

range between Q1 and Q3 for each

entity in sliding windows

kurtosis

This will determine the kurtosis

for each entity in sliding windows

mad

This will determine the mean absolute

deviation for each entity in sliding windows

max

This will determine the maximum

value for each entity in the sliding

time windows

median

This will find the median of

the numeric points in the sliding

time windows for each entity

meanci95

returns a 95% confidence interval:

mean, low, high for each entity in

sliding windows.

meanci99

returns a 99% confidence interval:

mean, low, high for each entity in

sliding windows.

midhinge

This will determine the average

of the first and third quartiles

for each entity in sliding windows

min

This will determine the minimum

value for each entity in the sliding

time windows

outliers

This will find the outliers of the

numeric points in the sliding time

windows for each entity

outliersx-y

where X and Y are numbers or “n”,

if “n” means examine all outliers for

recurring patterns.

This will find the outliers in the data

  • ignoring set patterns. They allow you to check

if the outlier

in the streams are truly outliers and not

some pattern. For example, if a IoT device

shuts off and turns on again routinely,

this may be picked up as an outlier when

in fact it is normal behaviour. So, to

ignore these cases, if OUTLIER2-5, tells Viper,

check outliers with patterns of 2-5 peaks.

If the stream has two classes and these two

classes are like 0 and 1000, and show a pattern,

then they should not be considered an outlier.

Meaning, class=0, is the device shutting down,

class=1000 is the device turning back on.

If OUTLIER3-10, Viper will check for patterns

of classes 3 to 10 to see if they recur routinely.

This is very helpful to reduce false

positives and false negatives.

raw

Will not process data stream for

each entity in sliding windows.

skewness

This will determine the skewness

for each entity in sliding windows

spikedetect

This will determine if there are any

spikes in the data using the zscore,

using lag = 5, threshold = 3.5

(standard deviation), influence = 0.5,

for each

entity in sliding windows

sum

This will find the sum of the numeric

points in the sliding time windows

for each entity

timediff

This will determine, in seconds,

the time difference between the

first and last timestamp for each

entity in sliding windows; time should

be in this

layout:2006-01-02T15:04:05.

trend

This will determine the trend

value for each entity in the sliding

time windows. If the trend value is

less than zero then

data in the sliding time window is decreasing,

if trend value is greater than zero then

it is increasing.

trimean

This will determine the average of

the median and the midhinge for each

entity in sliding windows

unique

This will determine if there are unique

numeric values in the data for each

entity in sliding windows. Returns 1

if no data duplication (unique), 0

otherwise.

uniquestr

This will determine if there are

unique string values in the data

for each entity in sliding windows.

Checks string data for duplication.

Returns 1 if no

data duplication (unique), 0 otherwise.

variance

This will find the variane of the

numeric points in the sliding time

windows for each entity

varied

This will determine if there is variation

in the data in the sliding time windows

for each entity.

7.5.7. Data Cleaning

Ensuring high data quality is critical for machine learning.

Users can autoclean the data using three methods:

Data Cleaning Preprocessing Type

Description

datacleanstd#_#

This is a powerful function for data cleaning.

It uses a Standard Deviation Filter (often referred to as Z-Score filtering).

In data science and AI, this is a standard technique used to

automatically remove “outliers” or “noise” from a dataset to ensure

your model is looking at reliable trends rather than anomalies.

It also allows users to eliminate extreme values before the analysis

begins.

The code defines an “envelope” or a safe zone as:

  • upperLimit: Mean + (Tolerance * StdDev)

  • lowerLimit: Mean - (Tolerance * StdDev)

where Tolerance = #, Mean=mean of all data in the sliding time window,

StdDev=standard deviation of all data in the sliding time window.

For example, if you specify ddatacleanstd3:

then TML defines the envelope as:

  • upperLimit: Mean + (3 * StdDev)

  • lowerLimit: Mean - (3 * StdDev)

any data point inside this envelope (inclusive)

is considered “safe” - any point outside this envelope

is consider an outlier or noise and will be removed from analysis.

You can specify any reasonable number:

  • datacleanstd5,

    • upperLimit: Mean + (5 * StdDev)

    • lowerLimit: Mean - (5 * StdDev)

  • datacleanstd10,

    • upperLimit: Mean + (10 * StdDev)

    • lowerLimit: Mean - (10 * StdDev)

  • etc.

Or, to delete extreme values first you can specify:

  • datacleanstd5_10000, this will delete any value

    less than -10000 or greater 10000, it will then perform

    the Z-score filtering.

This function ensures you have clean data in your analysis

and machine learning/AI.

datacleanmad_#

This is another powerful function for data cleaning.

It uses Mean Absolute Deviation (MAD) to clean the data.

You can choose to delete extreme values first: i.e.

datacleanmad_10000

datacleaniqr_#

This is another powerful function for data cleaning.

It uses Inter Quartile Range (IQR) to clean the data.

You can choose to delete extreme values first: i.e.

datacleaniqr_10000

Note

Deleting extreme values could be important because with sensor data one may have very extreme values that may seem normal if the above algorithms have nothing to compare those values against. These extreme values may be due to a sensor malfunction. In this case, deleting extreme values like 999999999 are sensible.

7.5.8. STEP 4: Preprocesing Data Dag: tml-system-step-4-kafka-preprocess-dag

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice',  # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '',  # <<< *** leave blank
   'producerid' : 'iotsolution',   # <<< *** Change as needed
   'raw_data_topic' : 'iot-raw-data', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'preprocess_data_topic' : 'iot-preprocess', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'maxrows' : '800', # <<< ********** Number of offsets to rollback the data stream -i.e. rollback stream by 500 offsets
   'offset' : '-1', # <<< Rollback from the end of the data streams
   'brokerhost' : '',   # <<< *** Leave as is
   'brokerport' : '-999',  # <<< *** Leave as is
   'preprocessconditions' : '', ## <<< Leave blank
   'delay' : '70', # Add a 70 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'array' : '0', # do not modify
   'saveasarray' : '1', # do not modify
   'topicid' : '-999', # do not modify
   'rawdataoutput' : '1', # <<< 1 to output raw data used in the preprocessing, 0 do not output
   'asynctimeout' : '120', # <<< 120 seconds for connection timeout
   'timedelay' : '0', # <<< connection delay
   'tmlfilepath' : '', # leave blank
   'usemysql' : '1', # do not modify
   'streamstojoin' : '', # leave blank
   'identifier' : 'IoT device performance and failures', # <<< ** Change as needed
   'preprocesstypes' : 'anomprob,trend,avg', # <<< **** MAIN PREPROCESS TYPES CHNAGE AS NEEDED refer to https://tml-readthedocs.readthedocs.io/en/latest/
   'pathtotmlattrs' : 'oem=n/a,lat=n/a,long=n/a,location=n/a,identifier=n/a', # Change as needed
   'jsoncriteria' : 'uid=metadata.dsn,filter:allrecords~\
 subtopics=metadata.property_name~\
 values=datapoint.value~\
 identifiers=metadata.display_name~\
 datetime=datapoint.updated_at~\
 msgid=datapoint.id~\
 latlong=lat:long' # <<< **** Specify your json criteria. Here is an example of a multiline json --  refer to https://tml-readthedocs.readthedocs.io/en/latest/
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""

 def processtransactiondata():
  global VIPERTOKEN
  global VIPERHOST
  global VIPERPORT
  global HTTPADDR
  preprocesstopic = default_args['preprocess_data_topic']
  maintopic =  default_args['raw_data_topic']
  mainproducerid = default_args['producerid']

 #############################################################################################################
   #                                    PREPROCESS DATA STREAMS


   # Roll back each data stream by 10 percent - change this to a larger number if you want more data
   # For supervised machine learning you need a minimum of 30 data points in each stream
  maxrows=int(default_args['maxrows'])

   # Go to the last offset of each stream: If lastoffset=500, then this function will rollback the
   # streams to offset=500-50=450
  offset=int(default_args['offset'])
   # Max wait time for Kafka to response on milliseconds - you can increase this number if
   #maintopic to produce the preprocess data to
  topic=maintopic
   # producerid of the topic
  producerid=mainproducerid
   # use the host in Viper.env file
  brokerhost=default_args['brokerhost']
   # use the port in Viper.env file
  brokerport=int(default_args['brokerport'])
   #if load balancing enter the microsericeid to route the HTTP to a specific machine
  microserviceid=default_args['microserviceid']


   # You can preprocess with the following functions: MAX, MIN, SUM, AVG, COUNT, DIFF,OUTLIERS
   # here we will take max values of the arcturus-humidity, we will Diff arcturus-temperature, and average arcturus-Light_Intensity
   # NOTE: The number of process logic functions MUST match the streams - the operations will be applied in the same order
 #
  preprocessconditions=default_args['preprocessconditions']

  # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
  delay=int(default_args['delay'])
  # USE TLS encryption when sending to Kafka Cloud (GCP/AWS/Azure)
  enabletls=int(default_args['enabletls'])
  array=int(default_args['array'])
  saveasarray=int(default_args['saveasarray'])
  topicid=int(default_args['topicid'])

  rawdataoutput=int(default_args['rawdataoutput'])
  asynctimeout=int(default_args['asynctimeout'])
  timedelay=int(default_args['timedelay'])

  jsoncriteria = default_args['jsoncriteria']

  tmlfilepath=default_args['tmlfilepath']
  usemysql=int(default_args['usemysql'])

  streamstojoin=default_args['streamstojoin']
  identifier = default_args['identifier']

  # if dataage - use:dataage_utcoffset_timetype
  preprocesstypes=default_args['preprocesstypes']
  pathtotmlattrs=default_args['pathtotmlattrs']

  try:
     result=maadstml.viperpreprocesscustomjson(VIPERTOKEN,VIPERHOST,VIPERPORT,topic,producerid,offset,jsoncriteria,rawdataoutput,maxrows,enabletls,delay,brokerhost,
                                       brokerport,microserviceid,topicid,streamstojoin,preprocesstypes,preprocessconditions,identifier,
                                       preprocesstopic,array,saveasarray,timedelay,asynctimeout,usemysql,tmlfilepath,pathtotmlattrs)
     #print(result)
     return result
  except Exception as e:
     print(e)
     return e

 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def dopreprocessing(**context):
        tsslogging.locallogs("INFO", "STEP 4: Preprocessing started")
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

        if 'step4raw_data_topic' in os.environ:
          default_args['raw_data_topic']=os.environ['step4raw_data_topic']
        if 'step4preprocesstypes' in os.environ:
            default_args['preprocesstypes']=os.environ['step4preprocesstypes']
        if 'step4jsoncriteria' in os.environ:
            default_args['jsoncriteria']=os.environ['step4jsoncriteria']
        if 'step4preprocess_data_topic'  in os.environ:
            default_args['preprocess_data_topic']=os.environ['step4preprocess_data_topic']

        ti = context['task_instance']
        ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_preprocessconditions".format(sname), value=default_args['preprocessconditions'])
        ti.xcom_push(key="{}_delay".format(sname), value="_{}".format(default_args['delay']))
        ti.xcom_push(key="{}_array".format(sname), value="_{}".format(default_args['array']))
        ti.xcom_push(key="{}_saveasarray".format(sname), value="_{}".format(default_args['saveasarray']))
        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_rawdataoutput".format(sname), value="_{}".format(default_args['rawdataoutput']))
        ti.xcom_push(key="{}_asynctimeout".format(sname), value="_{}".format(default_args['asynctimeout']))
        ti.xcom_push(key="{}_timedelay".format(sname), value="_{}".format(default_args['timedelay']))
        ti.xcom_push(key="{}_usemysql".format(sname), value="_{}".format(default_args['usemysql']))
        ti.xcom_push(key="{}_preprocesstypes".format(sname), value=default_args['preprocesstypes'])
        ti.xcom_push(key="{}_pathtotmlattrs".format(sname), value=default_args['pathtotmlattrs'])
        ti.xcom_push(key="{}_identifier".format(sname), value=default_args['identifier'])
        ti.xcom_push(key="{}_jsoncriteria".format(sname), value=default_args['jsoncriteria'])

        maxrows=default_args['maxrows']
        if 'step4maxrows' in os.environ:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(os.environ['step4maxrows']))
          maxrows=os.environ['step4maxrows']
        else:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(default_args['maxrows']))


        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('preprocess',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {} \"{}\" \"{}\" \"{}\" \"{}\"".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],maxrows,default_args['raw_data_topic'],default_args['preprocesstypes'],default_args['jsoncriteria'],default_args['preprocess_data_topic']), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()
         try:
           tsslogging.tsslogit("Preprocessing DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         maxrows =  sys.argv[5]
         default_args['maxrows'] = maxrows
         default_args['raw_data_topic'] =  sys.argv[6]
         default_args['preprocesstypes'] =  sys.argv[7]
         default_args['jsoncriteria'] =  sys.argv[8]
         default_args['preprocess_data_topic'] =  sys.argv[9]

         tsslogging.locallogs("INFO", "STEP 4: Preprocessing started")

         while True:
           try:
             processtransactiondata()
             time.sleep(1)
           except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 4: Preprocessing DAG in {} {}".format(os.path.basename(__file__),e))
            tsslogging.tsslogit("Preprocessing DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
            break

7.5.8.1. Preprocessed Variable Naming Standard

Important

When a raw variable is processed, TML renames this raw in this standard:

[Variable Name]_processed_[Process Type]

For example, say you want to perform an AnomProb on the variable Voltage. The new preprocessed variable name will be: Voltage_preprocessed_AnomProb

If you want to take the min of Voltage, then the new preprocessed variable name will be: Voltage_preprocessed_Min

This standard naming will be very important when you want to perform machine learning on the “preproccesed” variable.

7.5.8.2. Preprocessed Sample JSON Output

{
 "hyperprediction": "0.980",
 "Maintopic": "iot-preprocess2",
 "Topic": "topicid287_Current_preprocessed_AnomProb_preprocessed_Avg",
 "Type": "External",
 "ProducerId": "ProducerId-OAA--s0Ee-sqUX8QqLfdtivZSKRHoMShBe",
 "TimeStamp": "2024-08-15 19:49:24",
 "Unixtime": 1723751364617162000,
 "kafkakey": "OAA-tFTP8Ym6BHy-bnw2X5XdSUoUSOjns7",
 "Preprocesstype": "Avg",
 "WindowStartTime": "2024-08-15 19:49:08.36546688 +0000 UTC",
 "WindowEndTime": "2024-08-15 19:49:21.600164096 +0000 UTC",
 "WindowStartUnixTime": "1723751348365466880",
 "WindowEndUnixTime": "1723751361600164096",
 "Conditions": "",
 "Identifier": "Current~Current-(mA)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name
  (Current), value:datapoint.value, identifier:metadata.display_name,datetime:datapoint.updated_at,:allrecords, Joinedidentifiers:
  ~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a,IoT device performance and failures~Msgsjoined=dd4dfbbc-7fb3-11ec-e36d-
  28c9ca7b5376(145,34.04893,-111.09373,Current,n/a,n/a,{}); dd781c12-7fb3-11ec-fa99-012971124b46(0,34.04893,-111.09373,Current,n/a,n/a,{});dd94c90c-7fb3-11ec-
  727b-6d558b1c7fe4(0,34.04893,-111.09373,Current,n/a,n/a,{}); ddb6f676-7fb3-11ec-5c48-b5377c00ff05(0,34.04893,-111.09373,Current,n/a,n/a,{});dde3be22-7fb3-
  11ec-4c2e-f10dea945ccd(0,34.04893,-111.09373,Current,n/a,n/a,{}); ddf6a5e6-7fb3-11ec-c25b-509766b7a301(0,34.04893,-111.09373,Current,n/a,n/a,{});de11b6d8-
  7fb3-11ec-77c8-a93cc4b538b6(0,34.04893,-111.09373,Current,n/a,n/a,{}); de2850f0-7fb3-11ec-5b6a-ac3b205641e0(0,34.04893,-111.09373,Current,n/a,n/a,
  {});de405510-7fb3-11ec-bba7-9b0ce93d49d2(0,34.04893,-111.09373,Current,n/a,n/a,{}); de4ee062-7fb3-11ec-3252-
  7c7e46faf86b(0,34.04893,-111.09373,Current,n/a,n/a,{})~latlong=~mainuid=AC000W020496398",
  "PreprocessIdentifier": "IoT Data preprocess",
  "Numberofmessages": 6,
  "Offset": 27041,
  "Consumerid": "StreamConsumer",
  "Generated": "2024-08-15T19:49:55.619+00:00",
  "Partition": 0
  }

7.5.8.3. Preprocessed Sample JSON Output: Explanations

Important

It will be important to carefully study these fields for the visualization or for other downstream analysis.

JSON Field

Description

hyperprediction

This contains the preprocced value for

the Preprocesstype: Avg. In this case,

the value is 0.980

Maintopic

This is the topic being consumed:

iot-preprocess2

Topic

This is the topic name for the preprocessed

variable.

For example, topicid287_Current_preprocessed_AnomProb_preprocessed_Avg,

means entity id 287 was

processed (this number 287 is an internal

number associated with device serial

number: AC000W020496398)

Type

This is an internal parameter

ProducerId

This is an internal parameter:

ProducerId-OAA–s0Ee-sqUX8QqLfdtivZSKRHoMShBe

TimeStamp

This is the UTC timestamp of the

calculation creation: 2024-08-15 19:49:24

Unixtime

This is the Unixtime of the

calculation: 1723751364617162000

kafkakey

This is the TML Kafka key that

identifies it came from

TML: OAA-tFTP8Ym6BHy-bnw2X5XdSUoUSOjns7

Preprocesstype

This is the preprocessed type used: Avg

WindowStartTime

This is the start of the sliding time

window: 2024-08-15 19:49:08.36546688 +0000 UTC

WindowEndTime

This is the end of the sliding time

window: 2024-08-15 19:49:21.600164096 +0000 UTC

WindowStartUnixTime

This is the start of the sliding time

window in Unix time: 1723751348365466880

WindowEndUnixTime

This is end of the sliding time window

in Unix time: 1723751361600164096

Conditions

This contains any preprocessed conditions

Identifier

This will store all the data using in the

Avg calculation of Current variable.

It is delimited by “~”. If you parse the

“Msgsjoined” field

you can get the RAW data: dd4dfbbc-7fb3-11ec-e36d-28c9ca7b5376(145,34.04893,

-111.09373,Current,n/a,n/a,{}), the first alphanumeric:

dd4dfbbc-7fb3-11ec-

e36d-28c9ca7b5376 is the msgis, the second

number 145 is the current value used in

the calculation, then latitude (34.04893)

and logitude (-111.09373),

the variable being processed (Current),

and any additional information.

Another important field is mainuid=AC000W020496398,

mainuid is the entity identifier in the UID

field of the Json criteria (JSON PROCESSING).

**In summary, TML processed (took average of)

6 messages from this one device (with DSN=AC000W020496398)

for the Current stream, in the sliding time window

starting at: 2024-08-15 19:49:08, and ending at:

2024-08-15 19:49:21**

“Current~Current-(mA)~iot-preprocess~uid:metadata.dsn,

subtopic:metadata.property_name

(Current), value:datapoint.value,

identifier:metadata.display_name,datetime:datapoint.updated_at,

allrecords, Joinedidentifiers:

~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a,

IoT device performance and failures~

Msgsjoined=dd4dfbbc-7fb3-11ec-e36d-

28c9ca7b5376(145,34.04893,-111.09373,Current,n/a,n/a,{});

dd781c12-7fb3-11ec-fa99-012971124b46(0,34.04893,

-111.09373,Current,n/a,n/a,{});dd94c90c-7fb3-

11ec-727b-6d558b1c7fe4(0,34.04893,-111.09373,

Current,n/a,n/a,{}); ddb6f676-7fb3-11ec-5c48-

b5377c00ff05(0,34.04893,-111.09373,Current,n/a,n/a,

{});dde3be22-

7fb3-11ec-4c2e-f10dea945ccd(0,34.04893,-111.09373,

Current,n/a,n/a,{}); ddf6a5e6-7fb3-11ec-c25b-

509766b7a301(0,34.04893,-111.09373,Current,n/a,n/a,

{});de11b6d8-7fb3-11ec-77c8-a93cc4b538b6(0,34.04893,

-111.09373,Current,n/a,n/a,{}); de2850f0-7fb3-11ec-5b6a-

ac3b205641e0(0,34.04893,-111.09373,Current,n/a,n/a,

{});de405510-7fb3-11ec-bba7-9b0ce93d49d2(0,34.04893,

-111.09373,Current,n/a,n/a,{}); de4ee062-7fb3-11ec-3252-

7c7e46faf86b(0,34.04893,-111.09373,Current,

n/a,n/a,{})~latlong=~mainuid=AC000W020496398”,

PreprocessIdentifier

This is the preprocess identifier:

IoT Data preprocess

Numberofmessages

This is the number of messages

used in the Avg calculation: 6

Offset

This is the Kafka Offset where

this message is stored: 27041

Consumerid

This is the id of the consumer: StreamConsumer

Generated

This is the time stamp when this

message was consumed: 2024-08-15T19:49:55.619+00:00

Partition

This is the Kafka partition this

message was stored in: 0

7.5.9. STEP 4a: Preprocesing Data: tml-system-step-4a-kafka-preprocess-dag

Note

This Step 4a is similar to Step 4b, only difference is it allows for jsoncriteria.

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator

from datetime import datetime
from airflow.decorators import dag, task
import sys
import maadstml
import tsslogging
import os
import subprocess
import time
import random

sys.dont_write_bytecode = True
######################################## USER CHOOSEN PARAMETERS ########################################
default_args = {
  'owner' : 'Sebastian Maurice',  # <<< *** Change as needed
  'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
  'microserviceid' : '',  # <<< *** leave blank
  'producerid' : 'iotsolution',   # <<< *** Change as needed
  'raw_data_topic' : 'rtms-pgpt-ai', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
  'preprocess_data_topic' : 'rtms-pgpt-ai-mitre', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
  'maxrows' : '50', # <<< ********** Number of offsets to rollback the data stream -i.e. rollback stream by 500 offsets
  'offset' : '-1', # <<< Rollback from the end of the data streams
  'brokerhost' : '',   # <<< *** Leave as is
  'brokerport' : '-999',  # <<< *** Leave as is
  'preprocessconditions' : '', ## <<< Leave blank
  'delay' : '70', # Add a 70 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
  'array' : '0', # do not modify
  'saveasarray' : '1', # do not modify
  'topicid' : '-999', # do not modify
  'rawdataoutput' : '1', # <<< 1 to output raw data used in the preprocessing, 0 do not output
  'asynctimeout' : '120', # <<< 120 seconds for connection timeout
  'timedelay' : '0', # <<< connection delay
  'tmlfilepath' : '', # leave blank
  'usemysql' : '1', # do not modify
  'streamstojoin' : '', # Change as needed - THESE VARIABLES ARE CREATED BY TML IN tml_system_step_4_kafka_preprocess2_dag.py
  'identifier' : 'Mitre ATTCK', # <<< ** Change as needed
  'preprocesstypes' : 'avg', # <<< **** MAIN PREPROCESS TYPES CHNAGE AS NEEDED refer to https://tml-readthedocs.readthedocs.io/en/latest/
  'pathtotmlattrs' : 'oem=n/a,lat=n/a,long=n/a,location=n/a,identifier=n/a', # Change as needed
  'jsoncriteria' : 'uid=tactic,filter:allrecords~\
subtopics=technique,technique,technique~\
values=FinalAttackScore,FinalPatternScore,RTMSSCORE~\
identifiers=FinalAttackScore,FinalPatternScore,RTMSSCORE~\
datetime=TimeStamp~\
msgid=Entity,PartitionOffsetFound,NumAttackWindowsFound,NumPatternWindowsFound,SearchEntity,rtmsfolder,CurrentRTMSMAXWINDOW~\
latlong=' # <<< **** Specify your json criteria. Here is an example of a multiline json --  refer to https://tml-readthedocs.readthedocs.io/en/latest/
}

######################################## DO NOT MODIFY BELOW #############################################

VIPERTOKEN=""
VIPERHOST=""
VIPERPORT=""
HTTPADDR=""

def processtransactiondata():
         global VIPERTOKEN
         global VIPERHOST
         global VIPERPORT
         global HTTPADDR
         preprocesstopic = default_args['preprocess_data_topic']
         maintopic =  default_args['raw_data_topic']
         mainproducerid = default_args['producerid']

        #############################################################################################################
          #                                    PREPROCESS DATA STREAMS


          # Roll back each data stream by 10 percent - change this to a larger number if you want more data
          # For supervised machine learning you need a minimum of 30 data points in each stream
         maxrows=int(default_args['maxrows'])

          # Go to the last offset of each stream: If lastoffset=500, then this function will rollback the
          # streams to offset=500-50=450
         offset=int(default_args['offset'])
          # Max wait time for Kafka to response on milliseconds - you can increase this number if
          #maintopic to produce the preprocess data to
         topic=maintopic
          # producerid of the topic
         producerid=mainproducerid
          # use the host in Viper.env file
         brokerhost=default_args['brokerhost']
          # use the port in Viper.env file
         brokerport=int(default_args['brokerport'])
          #if load balancing enter the microsericeid to route the HTTP to a specific machine
         microserviceid=default_args['microserviceid']


          # You can preprocess with the following functions: MAX, MIN, SUM, AVG, COUNT, DIFF,OUTLIERS
          # here we will take max values of the arcturus-humidity, we will Diff arcturus-temperature, and average arcturus-Light_Intensity
          # NOTE: The number of process logic functions MUST match the streams - the operations will be applied in the same order
        #
         preprocessconditions=default_args['preprocessconditions']

         # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
         delay=int(default_args['delay'])
         # USE TLS encryption when sending to Kafka Cloud (GCP/AWS/Azure)
         enabletls=int(default_args['enabletls'])
         array=int(default_args['array'])
         saveasarray=int(default_args['saveasarray'])
         topicid=int(default_args['topicid'])

         rawdataoutput=int(default_args['rawdataoutput'])
         asynctimeout=int(default_args['asynctimeout'])
         timedelay=int(default_args['timedelay'])

         jsoncriteria = default_args['jsoncriteria']

         tmlfilepath=default_args['tmlfilepath']
         usemysql=int(default_args['usemysql'])

         streamstojoin=default_args['streamstojoin']
         identifier = default_args['identifier']

         # if dataage - use:dataage_utcoffset_timetype
         preprocesstypes=default_args['preprocesstypes']

         try:
                result=maadstml.viperpreprocessproducetotopicstream(VIPERTOKEN,VIPERHOST,VIPERPORT,topic,producerid,offset,maxrows,enabletls,delay,brokerhost,
                                                  brokerport,microserviceid,topicid,streamstojoin,preprocesstypes,preprocessconditions,identifier,
                                                  preprocesstopic,jsoncriteria)
                #print(result)
         except Exception as e:
                print("ERROR:",e)

def windowname(wtype,sname,dagname):
    randomNumber = random.randrange(10, 9999)
    wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
    with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
      file.writelines("{}\n".format(wn))

    return wn

def dopreprocessing(**context):
       tsslogging.locallogs("INFO", "STEP 4a: Preprocessing started")
       sd = context['dag'].dag_id
       sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
       pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

       VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
       VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS1".format(sname))
       VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS1".format(sname))
       HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

       chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

       if 'step4ajsoncriteria' in os.environ:
          default_args['jsoncriteria']=os.environ['step4ajsoncriteria']
       if 'step4apreprocesstypes' in os.environ:
          default_args['preprocesstypes']=os.environ['step4apreprocesstypes']
       if 'step4araw_data_topic' in os.environ:
         default_args['raw_data_topic']=os.environ['step4araw_data_topic']
       if 'step4apreprocess_data_topic' in os.environ:
          default_args['preprocess_data_topic']=os.environ['step4apreprocess_data_topic']

       ti = context['task_instance']
       ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])
       ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
       ti.xcom_push(key="{}_preprocessconditions".format(sname), value=default_args['preprocessconditions'])
       ti.xcom_push(key="{}_delay".format(sname), value="_{}".format(default_args['delay']))
       ti.xcom_push(key="{}_array".format(sname), value="_{}".format(default_args['array']))
       ti.xcom_push(key="{}_saveasarray".format(sname), value="_{}".format(default_args['saveasarray']))
       ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
       ti.xcom_push(key="{}_rawdataoutput".format(sname), value="_{}".format(default_args['rawdataoutput']))
       ti.xcom_push(key="{}_asynctimeout".format(sname), value="_{}".format(default_args['asynctimeout']))
       ti.xcom_push(key="{}_timedelay".format(sname), value="_{}".format(default_args['timedelay']))
       ti.xcom_push(key="{}_usemysql".format(sname), value="_{}".format(default_args['usemysql']))
       ti.xcom_push(key="{}_preprocesstypes".format(sname), value=default_args['preprocesstypes'])
       ti.xcom_push(key="{}_pathtotmlattrs".format(sname), value=default_args['pathtotmlattrs'])
       ti.xcom_push(key="{}_identifier".format(sname), value=default_args['identifier'])
       ti.xcom_push(key="{}_jsoncriteria".format(sname), value=default_args['jsoncriteria'])

       maxrows=default_args['maxrows']
       if 'step4amaxrows' in os.environ:
         ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(os.environ['step4amaxrows']))
         maxrows=os.environ['step4amaxrows']
       else:
         ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(default_args['maxrows']))


       repo=tsslogging.getrepo()
       if sname != '_mysolution_':
        fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
       else:
         fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

       wn = windowname('preprocess1',sname,sd)
       subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
       subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess1", "ENTER"])
       subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {} \"{}\" \"{}\" \"{}\" \"{}\"".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],maxrows,default_args['jsoncriteria'],default_args['preprocesstypes'],default_args['raw_data_topic'],default_args['preprocess_data_topic']), "ENTER"])

if __name__ == '__main__':
    if len(sys.argv) > 1:
       if sys.argv[1] == "1":
        repo=tsslogging.getrepo()
        try:
          tsslogging.tsslogit("Preprocessing DAG in {}".format(os.path.basename(__file__)), "INFO" )
          tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
        except Exception as e:
            #git push -f origin main
            os.chdir("/{}".format(repo))
            subprocess.call("git push -f origin main", shell=True)

        VIPERTOKEN = sys.argv[2]
        VIPERHOST = sys.argv[3]
        VIPERPORT = sys.argv[4]
        maxrows =  sys.argv[5]
        default_args['maxrows'] = maxrows

        default_args['jsoncriteria'] =  sys.argv[6]
        default_args['preprocesstypes'] =  sys.argv[7]
        default_args['raw_data_topic'] =  sys.argv[8]
        default_args['preprocess_data_topic'] =  sys.argv[9]

        tsslogging.locallogs("INFO", "STEP 4a: Preprocessing started")

        while True:
          try:
            processtransactiondata()
            time.sleep(1)
          except Exception as e:
           tsslogging.locallogs("ERROR", "STEP 4a: Preprocessing DAG in {} {}".format(os.path.basename(__file__),e))
           tsslogging.tsslogit("Preprocessing DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
           break

7.5.10. STEP 4b: Preprocesing 2 Data: tml-system-step-4b-kafka-preprocess-dag

Tip

Watch the YouTube that discussed how to configure this Dag, used to process preprocessed variables in Step 4. YouTube Video

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice',  # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '',  # <<< *** leave blank
   'producerid' : 'iotsolution',   # <<< *** Change as needed
   'raw_data_topic' : 'iot-preprocess', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'preprocess_data_topic' : 'iot-preprocess2', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'maxrows' : '350', # <<< ********** Number of offsets to rollback the data stream -i.e. rollback stream by 500 offsets
   'offset' : '-1', # <<< Rollback from the end of the data streams
   'brokerhost' : '',   # <<< *** Leave as is
   'brokerport' : '-999',  # <<< *** Leave as is
   'preprocessconditions' : '', ## <<< Leave blank
   'delay' : '70', # Add a 70 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'array' : '0', # do not modify
   'saveasarray' : '1', # do not modify
   'topicid' : '-1', # do not modify
   'rawdataoutput' : '1', # <<< 1 to output raw data used in the preprocessing, 0 do not output
   'asynctimeout' : '120', # <<< 120 seconds for connection timeout
   'timedelay' : '0', # <<< connection delay
   'tmlfilepath' : '', # leave blank
   'usemysql' : '1', # do not modify
   'streamstojoin' : 'Voltage_preprocessed_AnomProb,Current_preprocessed_AnomProb', # Change as needed - THESE VARIABLES ARE CREATED BY TML IN tml_system_step_4_kafka_preprocess2_dag.py
   'identifier' : 'IoT device performance and failures', # <<< ** Change as needed
   'preprocesstypes' : 'avg,avg', # <<< **** MAIN PREPROCESS TYPES CHNAGE AS NEEDED refer to https://tml-readthedocs.readthedocs.io/en/latest/
   'pathtotmlattrs' : 'oem=n/a,lat=n/a,long=n/a,location=n/a,identifier=n/a', # Change as needed
   'jsoncriteria' : '', # <<< **** Specify your json criteria. Here is an example of a multiline json --  refer to https://tml-readthedocs.readthedocs.io/en/latest/
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""

 def processtransactiondata():
          global VIPERTOKEN
          global VIPERHOST
          global VIPERPORT
          global HTTPADDR
          preprocesstopic = default_args['preprocess_data_topic']
          maintopic =  default_args['raw_data_topic']
          mainproducerid = default_args['producerid']

         #############################################################################################################
           #                                    PREPROCESS DATA STREAMS


           # Roll back each data stream by 10 percent - change this to a larger number if you want more data
           # For supervised machine learning you need a minimum of 30 data points in each stream
          maxrows=int(default_args['maxrows'])

           # Go to the last offset of each stream: If lastoffset=500, then this function will rollback the
           # streams to offset=500-50=450
          offset=int(default_args['offset'])
           # Max wait time for Kafka to response on milliseconds - you can increase this number if
           #maintopic to produce the preprocess data to
          topic=maintopic
           # producerid of the topic
          producerid=mainproducerid
           # use the host in Viper.env file
          brokerhost=default_args['brokerhost']
           # use the port in Viper.env file
          brokerport=int(default_args['brokerport'])
           #if load balancing enter the microsericeid to route the HTTP to a specific machine
          microserviceid=default_args['microserviceid']


           # You can preprocess with the following functions: MAX, MIN, SUM, AVG, COUNT, DIFF,OUTLIERS
           # here we will take max values of the arcturus-humidity, we will Diff arcturus-temperature, and average arcturus-Light_Intensity
           # NOTE: The number of process logic functions MUST match the streams - the operations will be applied in the same order
         #
          preprocessconditions=default_args['preprocessconditions']

          # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
          delay=int(default_args['delay'])
          # USE TLS encryption when sending to Kafka Cloud (GCP/AWS/Azure)
          enabletls=int(default_args['enabletls'])
          array=int(default_args['array'])
          saveasarray=int(default_args['saveasarray'])
          topicid=int(default_args['topicid'])

          rawdataoutput=int(default_args['rawdataoutput'])
          asynctimeout=int(default_args['asynctimeout'])
          timedelay=int(default_args['timedelay'])

          jsoncriteria = default_args['jsoncriteria']

          tmlfilepath=default_args['tmlfilepath']
          usemysql=int(default_args['usemysql'])

          streamstojoin=default_args['streamstojoin']
          identifier = default_args['identifier']

          # if dataage - use:dataage_utcoffset_timetype
          preprocesstypes=default_args['preprocesstypes']

          pathtotmlattrs=default_args['pathtotmlattrs']

          try:
                 result=maadstml.viperpreprocessproducetotopicstream(VIPERTOKEN,VIPERHOST,VIPERPORT,topic,producerid,offset,maxrows,enabletls,delay,brokerhost,
                                                   brokerport,microserviceid,topicid,streamstojoin,preprocesstypes,preprocessconditions,identifier,preprocesstopic)
                 #print(result)
          except Exception as e:
                 print("ERROR:",e)


 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def dopreprocessing(**context):
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS2".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS2".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

        if 'step4bpreprocesstypes' in os.environ:
           default_args['preprocesstypes']=os.environ['step4bpreprocesstypes']

        if 'step4bjsoncriteria' in os.environ:
           default_args['jsoncriteria']=os.environ['step4bjsoncriteria']

        if 'step4braw_data_topic' in os.environ:
           default_args['raw_data_topic']=os.environ['step4braw_data_topic']

        if 'step4bpreprocess_data_topic' in os.environ:
          default_args['preprocess_data_topic']=os.environ['step4bpreprocess_data_topic']

        ti = context['task_instance']
        ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_preprocessconditions".format(sname), value=default_args['preprocessconditions'])
        ti.xcom_push(key="{}_delay".format(sname), value="_{}".format(default_args['delay']))
        ti.xcom_push(key="{}_array".format(sname), value="_{}".format(default_args['array']))
        ti.xcom_push(key="{}_saveasarray".format(sname), value="_{}".format(default_args['saveasarray']))
        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_rawdataoutput".format(sname), value="_{}".format(default_args['rawdataoutput']))
        ti.xcom_push(key="{}_asynctimeout".format(sname), value="_{}".format(default_args['asynctimeout']))
        ti.xcom_push(key="{}_timedelay".format(sname), value="_{}".format(default_args['timedelay']))
        ti.xcom_push(key="{}_usemysql".format(sname), value="_{}".format(default_args['usemysql']))
        ti.xcom_push(key="{}_preprocesstypes".format(sname), value=default_args['preprocesstypes'])
        ti.xcom_push(key="{}_pathtotmlattrs".format(sname), value=default_args['pathtotmlattrs'])
        ti.xcom_push(key="{}_identifier".format(sname), value=default_args['identifier'])
        ti.xcom_push(key="{}_jsoncriteria".format(sname), value=default_args['jsoncriteria'])

        maxrows=default_args['maxrows']
        if 'step4bmaxrows' in os.environ:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(os.environ['step4bmaxrows']))
          maxrows=os.environ['step4bmaxrows']
        else:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(default_args['maxrows']))

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('preprocess2',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess2", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {} \"{}\" \"{}\" \"{}\" \"{}\"".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],maxrows,default_args['preprocesstypes'],default_args['jsoncriteria'],default_args['raw_data_topic'],default_args['preprocess_data_topic']), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()
         try:
           tsslogging.tsslogit("Preprocessing2 DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         maxrows =  sys.argv[5]
         default_args['maxrows'] = maxrows

         default_args['preprocesstypes'] =  sys.argv[6]
         default_args['jsoncriteria'] =  sys.argv[7]
         default_args['raw_data_topic'] =  sys.argv[8]
         default_args['preprocess_data_topic'] =  sys.argv[9]

         tsslogging.locallogs("INFO", "STEP 4b: Preprocessing 2 started")

         while True:
           try:
             processtransactiondata()
             time.sleep(1)
           except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 4b: Preprocessing2 DAG in {} {}".format(os.path.basename(__file__),e))
            tsslogging.tsslogit("Preprocessing2 DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
            break

7.5.11. STEP 4c: Preprocesing 3 Data: tml-system-step-4c-kafka-preprocess-dag

Important

This Step 4c is a very powerful task that will incorporate real-time memory using sliding time windows: for details see How TML Maintains Past Memory of Events Using Sliding Time Windows.

Users can cross-reference entities with TXT files. The advantage of this is now you can incorporate machine learning outputs with TXT files to mesh data together to get a deeper understanding of each entity. This could be important to analyse log files for any search terms that could be unusual like: authentication failures, unknow users, etc.

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random
 import base64
 import threading
 import shutil

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'owner' : 'Sebastian Maurice',  # <<< *** Change as needed
   'enabletls': '1', # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '',  # <<< *** leave blank
   'producerid' : 'rtmssolution',   # <<< *** Change as needed
   'raw_data_topic' : 'iot-preprocess', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'preprocess_data_topic' : 'rtms-preprocess', # *************** INCLUDE ONLY ONE TOPIC - This is one of the topic you created in SYSTEM STEP 2
   'maxrows' : '200', # <<< ********** Number of offsets to rollback the data stream -i.e. rollback stream by 500 offsets
   'offset' : '-1', # <<< Rollback from the end of the data streams
   'brokerhost' : '',   # <<< *** Leave as is
   'brokerport' : '-999',  # <<< *** Leave as is
   'delay' : '70', # Add a 70 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
   'array' : '0', # do not modify
   'saveasarray' : '1', # do not modify
   'topicid' : '-999', # do not modify
   'rawdataoutput' : '1', # <<< 1 to output raw data used in the preprocessing, 0 do not output
   'asynctimeout' : '120', # <<< 120 seconds for connection timeout
   'timedelay' : '0', # <<< connection delay
   'tmlfilepath' : '', # leave blank
   'usemysql' : '1', # do not modify
   'rtmsstream' : 'rtms-stream-mylogs', # Change as needed - STREAM containing log file data (or other data) for RTMS
                                                     # If entitystream is empty, TML uses the preprocess type only.
   'identifier' : 'RTMS Past Memory of Events', # <<< ** Change as needed
   'searchterms' : 'rgx:p([a-z]+)ch ~~~ |authentication failure,--entity-- password failure ~~~ |unknown--entity--', # main Search terms, if AND add @, if OR use | s first characters, default OR
                                                              # Must include --entity-- if correlating with entity - this will be replaced
                                                              # dynamically with the entities found in raw_data_topic
   'localsearchtermfolder': '|mysearchfile1,|mysearchfile2', # Specify a folder of files containing search terms - each term must be on a new line - use comma
                                # to apply each folder to the rtmstream topic
                                # Use @ =AND, |=OR to specify whether the terms in the file should be AND, OR
                                # For example, @mysearchfolder1,|mysearchfolder2, means all terms in mysearchfolder1 should be AND
                                # |mysearchfolder2, means all search terms should be OR'ed
   'localsearchtermfolderinterval': '60', # This is the number of seconds between reading the localsearchtermfolder.  For example, if 60,
                                        # The files will be read every 60 seconds - and searchterms will be updated
   'rememberpastwindows' : '500', # Past windows to remember
   'patternwindowthreshold' : '30', # check for the number of patterns for the items in searchterms
   'rtmsscorethreshold': '0.6',  # RTMS score threshold i.e. '0.8'
   'rtmsscorethresholdtopic': 'rtmstopic',   # All rtms score greater than rtmsscorethreshold will be streamed to this topic
   'attackscorethreshold': '0.6',   # Attack score threshold i.e. '0.8'
   'attackscorethresholdtopic': 'attacktopic',   # All attack score greater than attackscorethreshold will be streamed to this topic
   'patternscorethreshold': '0.6',   # Pattern score threshold i.e. '0.8'
   'patternscorethresholdtopic': 'patterntopic',   # All pattern score greater thn patternscorethreshold will be streamed to this topic
   'rtmsfoldername': 'rtms',
   'rtmsmaxwindows': '10000'
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""

 def processtransactiondata():
          global VIPERTOKEN
          global VIPERHOST
          global VIPERPORT
          global HTTPADDR
          preprocesstopic = default_args['preprocess_data_topic']
          maintopic =  default_args['raw_data_topic']
          mainproducerid = default_args['producerid']

         #############################################################################################################
           #                                    PREPROCESS DATA STREAMS


           # Roll back each data stream by 10 percent - change this to a larger number if you want more data
           # For supervised machine learning you need a minimum of 30 data points in each stream
          maxrows=int(default_args['maxrows'])

           # Go to the last offset of each stream: If lastoffset=500, then this function will rollback the
           # streams to offset=500-50=450
          offset=int(default_args['offset'])
           # Max wait time for Kafka to response on milliseconds - you can increase this number if
           #maintopic to produce the preprocess data to
          topic=maintopic
           # producerid of the topic
          producerid=mainproducerid
           # use the host in Viper.env file
          brokerhost=default_args['brokerhost']
           # use the port in Viper.env file
          brokerport=int(default_args['brokerport'])
           #if load balancing enter the microsericeid to route the HTTP to a specific machine
          microserviceid=default_args['microserviceid']

          # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
          delay=int(default_args['delay'])
          # USE TLS encryption when sending to Kafka Cloud (GCP/AWS/Azure)
          enabletls=int(default_args['enabletls'])
          array=int(default_args['array'])
          saveasarray=int(default_args['saveasarray'])
          topicid=int(default_args['topicid'])

          rawdataoutput=int(default_args['rawdataoutput'])
          asynctimeout=int(default_args['asynctimeout'])
          timedelay=int(default_args['timedelay'])
          tmlfilepath=default_args['tmlfilepath']
          usemysql=int(default_args['usemysql'])

          rtmsstream=default_args['rtmsstream']
          identifier = default_args['identifier']
          searchterms=default_args['searchterms']
          rememberpastwindows = default_args['rememberpastwindows']
          patternwindowthreshold = default_args['patternwindowthreshold']

          rtmsscorethreshold = default_args['rtmsscorethreshold']
          rtmsscorethresholdtopic = default_args['rtmsscorethresholdtopic']
          attackscorethreshold = default_args['attackscorethreshold']
          attackscorethresholdtopic = default_args['attackscorethresholdtopic']
          patternscorethreshold = default_args['patternscorethreshold']
          patternscorethresholdtopic = default_args['patternscorethresholdtopic']
          rtmsmaxwindows=default_args['rtmsmaxwindows']

          searchterms = str(base64.b64encode(searchterms.encode('utf-8')))
          try:
                 result=maadstml.viperpreprocessrtms(VIPERTOKEN,VIPERHOST,VIPERPORT,topic,producerid,offset,maxrows,enabletls,delay,brokerhost,
                                                   brokerport,microserviceid,topicid,rtmsstream,searchterms,rememberpastwindows,identifier,
                                                   preprocesstopic,patternwindowthreshold,array,saveasarray,rawdataoutput,
                                                   rtmsscorethreshold,rtmsscorethresholdtopic,attackscorethreshold,
                                                   attackscorethresholdtopic,patternscorethreshold,patternscorethresholdtopic,rtmsmaxwindows)
 #                print(result)
          except Exception as e:
                 print("ERROR:",e)


 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 # add any non-fle search terms to the file search terms
 def updatesearchterms(searchtermsfile,regx):
     # check if search terms exist
     stcurr = default_args['searchterms']
     stcurrfile = searchtermsfile
     mainsearchterms=""

     if len(regx) > 0:
         for r in regx:
            mainsearchterms = mainsearchterms + r + "~~~"

     if stcurr != "":
        stcurrarr = stcurr.split("~~~")
        stcurrarrfile = stcurrfile.split("~~~")
        for a in stcurrarr:
           stcurrarrfile.append(a)
        stcurrarrfile = set(stcurrarrfile)
        mainsearchterms = mainsearchterms + '~~~'.join(stcurrarrfile)
        #mainsearchterms = mainsearchterms[:-1]
     else:
        stcurrarrfile = stcurrfile.split("~~~")
        stcurrarrfile = set(stcurrarrfile)
        mainsearchterms = mainsearchterms + '~~~'.join(stcurrarrfile)
        #mainsearchterms = mainsearchterms[:-1]


     return  mainsearchterms

 def ingestfiles():
     buf = default_args['localsearchtermfolder']
     interval=int(default_args['localsearchtermfolderinterval'])
     searchtermsfile = ""

     dirbuf = buf.split(",")
     if len(dirbuf) == 0:
        return

     while True:
      try:
       lg=""
       buf = default_args['localsearchtermfolder']
       interval=int(default_args['localsearchtermfolderinterval'])
       searchtermsfile = ""
       dirbuf = buf.split(",")
       rgx = []
       for dr in dirbuf:
          filenames = []
          linebuf=""
          ibx = []
          if dr != "":
             if dr[0]=='@':
               dr = dr[1:]
               lg="@"
             elif dr[0]=='|':
               dr = dr[1:]
               lg="|"
             else:
               lg="|"

          if os.path.isdir("/rawdata/{}".format(dr)):
            a = [os.path.join("/rawdata/{}".format(dr), f) for f in os.listdir("/rawdata/{}".format(dr)) if
            os.path.isfile(os.path.join("/rawdata/{}".format(dr), f))]
            filenames.extend(a)

          if len(filenames) > 0:
            filenames = set(filenames)

            for fdr in filenames:
              with open(fdr) as f:
               lines = [line.rstrip('\n').strip() for line in f]
               lines = set(lines)
               # check regex
               for m in lines:
                 if len(m) > 0:
                   if 'rgx:' in m and m[:4]=="rgx:":
                     rgx.append(m)
                   elif '~~~' in m and m[:3]=="~~~":
                     ibx.append(m)
                   else:
                     m=m.replace(",", " ")
                     if m[0] != "~":
                       linebuf = linebuf + m + ","

          if linebuf != "":
            linebuf = linebuf[:-1]
            searchtermsfile = searchtermsfile + lg + linebuf +"~~~"
          if len(ibx)>0:
             ibxs = ''.join(ibx)
             ibxs=ibxs[3:]
             searchtermsfile = searchtermsfile + ibxs +"~~~"

       if searchtermsfile != "":
         searchtermsfile = searchtermsfile[:-3]
         searchtermsfile=updatesearchterms(searchtermsfile,rgx)
         default_args['searchterms']=searchtermsfile
         print("INFO:", searchtermsfile)

       if interval==0:
         break
       else:
        time.sleep(interval)
      except Exception as e:
        print("ERROR: ingesting files:",e)
        continue


 def startdirread():
   if 'localsearchtermfolder' not in default_args:
      return

   if default_args['localsearchtermfolder'] != '' and default_args['localsearchtermfolderinterval'] != '':
     print("INFO startdirread")
     try:
       t = threading.Thread(name='child procs', target=ingestfiles)
       t.start()
     except Exception as e:
       print(e)

 def dopreprocessing(**context):
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS3".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS3".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

        ti = context['task_instance']
        ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_delay".format(sname), value="_{}".format(default_args['delay']))
        ti.xcom_push(key="{}_array".format(sname), value="_{}".format(default_args['array']))
        ti.xcom_push(key="{}_saveasarray".format(sname), value="_{}".format(default_args['saveasarray']))
        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_rawdataoutput".format(sname), value="_{}".format(default_args['rawdataoutput']))
        ti.xcom_push(key="{}_asynctimeout".format(sname), value="_{}".format(default_args['asynctimeout']))
        ti.xcom_push(key="{}_timedelay".format(sname), value="_{}".format(default_args['timedelay']))
        ti.xcom_push(key="{}_usemysql".format(sname), value="_{}".format(default_args['usemysql']))
        ti.xcom_push(key="{}_identifier".format(sname), value=default_args['identifier'])

        ti.xcom_push(key="{}_rtmsscorethresholdtopic".format(sname), value=default_args['rtmsscorethresholdtopic'])
        ti.xcom_push(key="{}_attackscorethresholdtopic".format(sname), value=default_args['attackscorethresholdtopic'])
        ti.xcom_push(key="{}_patternscorethresholdtopic".format(sname), value=default_args['patternscorethresholdtopic'])

        localsearchtermfolder=default_args['localsearchtermfolder']
        if 'step4clocalsearchtermfolder' in os.environ:
          ti.xcom_push(key="{}_localsearchtermfolder".format(sname), value=os.environ['step4clocalsearchtermfolder'])
          localsearchtermfolder=os.environ['step4clocalsearchtermfolder']
        else:
         ti.xcom_push(key="{}_localsearchtermfolder".format(sname), value=default_args['localsearchtermfolder'])

        localsearchtermfolderinterval=default_args['localsearchtermfolderinterval']
        if 'step4clocalsearchtermfolderinterval' in os.environ:
          ti.xcom_push(key="{}_localsearchtermfolderinterval".format(sname), value=os.environ['step4clocalsearchtermfolderinterval'])
          localsearchtermfolderinterval=os.environ['step4clocalsearchtermfolderinterval']
        else:
         ti.xcom_push(key="{}_localsearchtermfolderinterval".format(sname), value="_{}".format(default_args['localsearchtermfolderinterval']))

        rtmsstream=default_args['rtmsstream']
        if 'step4crtmsstream' in os.environ:
          ti.xcom_push(key="{}_rtmsstream".format(sname), value=os.environ['step4crtmsstream'])
          rtmsstream=os.environ['step4crtmsstream']
        else:
          ti.xcom_push(key="{}_rtmsstream".format(sname), value=default_args['rtmsstream'])

        maxrows=default_args['maxrows']
        if 'step4cmaxrows' in os.environ:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(os.environ['step4cmaxrows']))
          maxrows=os.environ['step4cmaxrows']
        else:
          ti.xcom_push(key="{}_maxrows".format(sname), value="_{}".format(default_args['maxrows']))

        searchterms=default_args['searchterms']
        if 'step4csearchterms' in os.environ:
          ti.xcom_push(key="{}_searchterms".format(sname), value="{}".format(os.environ['step4csearchterms']))
          searchterms=os.environ['step4csearchterms']
        else:
          ti.xcom_push(key="{}_searchterms".format(sname), value=default_args['searchterms'])

        raw_data_topic=default_args['raw_data_topic']
        if 'step4crawdatatopic' in os.environ:
          ti.xcom_push(key="{}_raw_data_topic".format(sname), value="{}".format(os.environ['step4crawdatatopic']))
          raw_data_topic=os.environ['step4crawdatatopic']
        else:
          ti.xcom_push(key="{}_raw_data_topic".format(sname), value=default_args['raw_data_topic'])

        rememberpastwindows=default_args['rememberpastwindows']
        if 'step4crememberpastwindows' in os.environ:
          ti.xcom_push(key="{}_rememberpastwindows".format(sname), value="_{}".format(os.environ['step4crememberpastwindows']))
          rememberpastwindows=os.environ['step4crememberpastwindows']
        else:
          ti.xcom_push(key="{}_rememberpastwindows".format(sname), value="_{}".format(default_args['rememberpastwindows']))

        patternwindowthreshold=default_args['patternwindowthreshold']
        if 'step4cpatternwindowthreshold' in os.environ:
          ti.xcom_push(key="{}_patternwindowthreshold".format(sname), value="_{}".format(os.environ['step4cpatternwindowthreshold']))
          patternwindowthreshold=os.environ['step4cpatternwindowthreshold']
        else:
          ti.xcom_push(key="{}_patternwindowthreshold".format(sname), value="_{}".format(default_args['patternwindowthreshold']))

        rtmsscorethreshold=default_args['rtmsscorethreshold']
        if 'step4crtmsscorethreshold' in os.environ:
          ti.xcom_push(key="{}_rtmsscorethreshold".format(sname), value="_{}".format(os.environ['step4crtmsscorethreshold']))
          rtmsscorethreshold=os.environ['step4crtmsscorethreshold']
        else:
          ti.xcom_push(key="{}_rtmsscorethreshold".format(sname), value="_{}".format(default_args['rtmsscorethreshold']))

        attackscorethreshold=default_args['attackscorethreshold']
        if 'step4cattackscorethreshold' in os.environ:
          ti.xcom_push(key="{}_attackscorethreshold".format(sname), value="_{}".format(os.environ['step4cattackscorethreshold']))
          attackscorethreshold=os.environ['step4cattackscorethreshold']
        else:
          ti.xcom_push(key="{}_attackscorethreshold".format(sname), value="_{}".format(default_args['attackscorethreshold']))

        patternscorethreshold=default_args['patternscorethreshold']
        if 'step4cpatternscorethreshold' in os.environ:
          ti.xcom_push(key="{}_patternscorethreshold".format(sname), value="_{}".format(os.environ['step4cpatternscorethreshold']))
          patternscorethreshold=os.environ['step4cpatternscorethreshold']
        else:
          ti.xcom_push(key="{}_patternscorethreshold".format(sname), value="_{}".format(default_args['patternscorethreshold']))

        rtmsfoldername=default_args['rtmsfoldername']
        if 'step4crtmsfoldername' in os.environ:
          ti.xcom_push(key="{}_rtmsfoldername".format(sname), value="{}".format(os.environ['step4crtmsfoldername']))
          rtmsfoldername=os.environ['step4crtmsfoldername']
        else:
          ti.xcom_push(key="{}_rtmsfoldername".format(sname), value="{}".format(default_args['rtmsfoldername']))
        os.environ["step4crtmsfoldername"] = rtmsfoldername
        try:
          f = open("/tmux/rtmsfoldername.txt", "w")
          f.write(rtmsfoldername)
          f.close()
        except Exception as e:
          pass

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        if 'step4crtmsmaxwindows' in os.environ:
           rtmsmaxwindows=os.environ['step4crtmsmaxwindows']
           default_args['rtmsmaxwindows']=rtmsmaxwindows
        else:
           rtmsmaxwindows = default_args['rtmsmaxwindows']
        ti.xcom_push(key="{}_rtmsmaxwindows".format(sname), value="_{}".format(rtmsmaxwindows))
        try:
          f = open("/tmux/rtmsmax.txt", "w")
          f.write(rtmsmaxwindows)
          f.close()
        except Exception as e:
          pass

        wn = windowname('preprocess3',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess3", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {} \"{}\" {} {} \"{}\" \"{}\" {} {} {} \"{}\" {} \"{}\" {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],maxrows,searchterms,rememberpastwindows,patternwindowthreshold,raw_data_topic,rtmsstream,rtmsscorethreshold,attackscorethreshold,patternscorethreshold,localsearchtermfolder,localsearchtermfolderinterval,rtmsfoldername,rtmsmaxwindows), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()
         try:
           tsslogging.tsslogit("Preprocessing3 DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         maxrows =  sys.argv[5]
         default_args['maxrows'] = maxrows
         subprocess.Popen("/tmux/rtmstrunc.sh", shell=True)

         searchterms =  sys.argv[6]
         default_args['searchterms'] = searchterms
         rememberpastwindows =  sys.argv[7]
         default_args['rememberpastwindows'] = rememberpastwindows
         patternwindowthreshold =  sys.argv[8]
         default_args['patternwindowthreshold'] = patternwindowthreshold
         rawdatatopic =  sys.argv[9]
         default_args['raw_data_topic'] = rawdatatopic
         rtmsstream =  sys.argv[10]
         default_args['rtmsstream'] = rtmsstream

         rtmsscorethreshold =  sys.argv[11]
         default_args['rtmsscorethreshold'] = rtmsscorethreshold
         attackscorethreshold =  sys.argv[12]
         default_args['attackscorethreshold'] = attackscorethreshold
         patternscorethreshold =  sys.argv[13]
         default_args['patternscorethreshold'] = patternscorethreshold

         localsearchtermfolder =  sys.argv[14]
         default_args['localsearchtermfolder'] = localsearchtermfolder
         localsearchtermfolderinterval =  sys.argv[15]
         default_args['localsearchtermfolderinterval'] = localsearchtermfolderinterval
         rtmsfoldername =  sys.argv[16]
         default_args['rtmsfoldername'] = rtmsfoldername
         rtmsmaxwindows =  sys.argv[17]
         default_args['rtmsmaxwindows'] = rtmsmaxwindows

         tsslogging.locallogs("INFO", "STEP 4c: Preprocessing 3 started")
         try:
           shutil.rmtree("/rawdata/{}".format(rtmsfoldername),ignore_errors=True)
         except Exception as e:
            pass

         try:
          directory="/rawdata/{}".format(rtmsfoldername)
          if not os.path.exists(directory):
             os.makedirs(directory)
         except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 4c: Cannot make directory /rawdata/{} in {} {}".format(rtmsfoldername,os.path.basename(__file__),e))

         startdirread()
         while True:
           try:
             processtransactiondata()
             time.sleep(1)
           except Exception as e:
            tsslogging.locallogs("ERROR", "STEP 4c: Preprocessing3 DAG in {} {}".format(os.path.basename(__file__),e))
            tsslogging.tsslogit("Preprocessing3 DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
            break

7.5.11.1. Core Parameters in Step 4c

Parameter

Description

rtmsstream

This is the Kafka topic where you stream your

text data in STEP 3: if using local file.

Note, if you are directly streaming from LogStash

just enter the kafka topic name. You can also

separate multiple topics with a comma.

searchterms

These are the search terms you want to look for

in the data streaming to rtmsstream. Multiple

terms must be separated by comman. To specify

AND, the first character must be @, for OR use |.

If you are cross-referencing entities use --entity--

and TML will replace --entity-- with the actual entity

in the raw_data_topic. NOTE: if you DO NOT include

--entity-- then TML will search the rtmsstream as usual.

**NOTE: You can specify search terms from different topics

using ~~~** THREE (3) times.

For example, if rtmsstream=topic1,topic2 and

searchterms=search1 ~~~ search2 - then TML will apply

search1 to topic1, and search2 to topic2. This is

convenient for more complex and varied logs.

rememberpastwindows

This is the number of past sliding time windows you want

TML to remember:

This is where TML captures memory of past events.

patternwindowthreshold

This is the threshold for patterns in the data. For example

if you are looking for ‘authentication failures’ and

patternscorethreshold=10, then 10 or more occurences of

‘authentication failures’ will affect the patternscore.

localsearchtermfolder

You can specify folders containing search terms.

These are local folders that contain search terms. These local

folder must exist under your /rawdata mapping that you did when you

started the TSS container: Refer to TSS Docker Run

TML will read this folder based on the interval in seconds

set in the field localsearchtermfolderinterval

This is convenient to update search terms in real-time

to manage evolving threats or frequently changing events.

localsearchtermfolderinterval

The number of seconds between reading the search terms files

in the localsearchtermfolder. TML RTMS solution

will update the search terms in real-time.

rtmsscorethreshold

The score threshold for RTMS i.e. 0.8

rtmsscorethresholdtopic

This topic will contain all messages exceeding

rtmsscorethreshold. This is convenient to setup

alerts on this topc.

attackscorethreshold

The score threshold for Attack score i.e. 0.8

attackscorethresholdtopic

This topic will contain all messages exceeding

attackscorethreshold. This is convenient to setup

alerts on this topc.

patternscorethreshold

The score threshold for Pattern score i.e. 0.8

patternscorethresholdtopic

This topic will contain all messages exceeding

patternscorethreshold. This is convenient to setup

alerts on this topc.

rtmsfoldername

This folder is where RTMS stored the output of the logs files analysed.

The rtmsfoldername is a subfolder in the /rawdata TSS container folder:

You MUST volume map a local folder name to /rawdata when you start

your TSS container. Refer to TSS Docker Run

Also refer to RTMS for further details.

Important

Your Log files are ingested in STEP 3: Produce to Kafka. Specifically, in STEP 3:

‘docfolder’ : ‘mylogs,mylogs2’, specifies the subfolders in this example, mylogs and mylogs2 contain your log files.

You can specify different folder names and add as many files in these folder(s), RTMS will automatically read and process them.

For more details refer here.

Tip

You can use RegEX statements in the search terms. This allows you to do build powerful RegEx expressions to filter log files.

If using Regex expressions, you must prefix the expression by rgx:. For example, rgx:p([a-z]+)ch

Regex expressions should be the only statement between ~, this is important if your Regex has a comma.

7.5.12. STEP 5: Entity Based Machine Learning : tml-system-step-5-kafka-machine-learning-dag

Tip

Watch the YouTube video to learn how to configure this Step 5 dag. YouTube Video

7.5.12.1. Entity Based Machine Learning By TML

Another powerful feature of TML is performing machine learning at the entity level. See TML Performs Entity Level Machine Learning and Processing for refresher. For example, if TML is processing real-time data from 1 million IoT devices, it can create 1 million individual machine learnig models for each device. TML uses the following ML algorithms:

Note

All ML data are also written to “/rawdata/ml” folder in the container.

If you mapped the rawdata folder then you can access these files.

Algorithm

Description

Logistic Regression

Performs classification regression

and predicts probabilities

Linear Regression

Performs linear regression using

OLS algorithm

Gradient Boosting

Gradient boosting for non-linear real-time data

Ridge Regression

Ridge Regression for non-linear real-time data

Neural networks

Neural networks non-linear real-time data

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import maadstml
 import tsslogging
 import os
 import subprocess
 import time
 import random

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'myname' : 'Sebastian Maurice',   # <<< *** Change as needed
   'enabletls': '1',   # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '', # <<< *** leave blank
   'producerid' : 'iotsolution',    # <<< *** Change as needed
   'preprocess_data_topic' : 'iot-preprocess', # << *** topic/data to use for training datasets - You created this in STEP 2
   'ml_data_topic' : 'ml-data', # topic to store the trained algorithms  - You created this in STEP 2
   'identifier' : 'TML solution',    # <<< *** Change as needed
   'companyname' : 'Your company', # <<< *** Change as needed
   'myemail' : 'Your email', # <<< *** Change as needed
   'mylocation' : 'Your location', # <<< *** Change as needed
   'brokerhost' : '', # <<< *** Leave as is
   'brokerport' : '-999', # <<< *** Leave as is
   'deploy' : '1', # <<< *** do not modofy
   'modelruns': '100', # <<< *** Change as needed
   'offset' : '-1', # <<< *** Do not modify
   'islogistic' : '1',  # <<< *** Change as needed, 1=logistic, 0=not logistic
   'networktimeout' : '600', # <<< *** Change as needed
   'modelsearchtuner' : '90', # <<< *This parameter will attempt to fine tune the model search space - A number close to 100 means you will have fewer models but their predictive quality will be higher.
   'dependentvariable' : 'failure', # <<< *** Change as needed,
   'independentvariables': 'Power_preprocessed_AnomProb', # <<< *** Change as needed,
   'rollbackoffsets' : '1000', # <<< *** Change as needed,
   'consumeridtrainingdata2': '', # leave blank
   'partition_training' : '',  # leave blank
   'consumefrom' : '',  # leave blank
   'topicid' : '-1',  # leave as is
   'fullpathtotrainingdata' : '/Viper-ml/viperlogs/iotlogistic',  #  # <<< *** Change as needed - add name for foldername that stores the training datasets
   'processlogic' : 'classification_name=failure_prob:Power_preprocessed_AnomProb=55,n',  # <<< *** Change as needed, i.e. classification_name=failure_prob:Voltage_preprocessed_AnomProb=55,n:Current_preprocessed_AnomProb=55,n
   'array' : '0',  # leave as is
   'transformtype' : '', # Sets the model to: log-lin,lin-log,log-log
   'sendcoefto' : '',  # you can send coefficients to another topic for further processing -- MUST BE SET IN STEP 2
   'coeftoprocess' : '', # indicate the index of the coefficients to process i.e. 0,1,2 For example, for a 3 estimated parameters 0=constant, 1,2 are the other estmated paramters
   'coefsubtopicnames' : '',  # Give the coefficients a name: constant,elasticity,elasticity2
   'viperconfigfile' : '/Viper-ml/viper.env', # Do not modify
   'HPDEADDR' : 'http://'
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 # This sets the lat/longs for the IoT devices so it can be map
 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HPDEHOST = ''
 HPDEPORT = ''
 HTTPADDR=""
 maintopic =  default_args['preprocess_data_topic']
 mainproducerid = default_args['producerid']

 def performSupervisedMachineLearning():

       viperconfigfile = default_args['viperconfigfile']
       # Set personal data
       companyname=default_args['companyname']
       myname=default_args['myname']
       myemail=default_args['myemail']
       mylocation=default_args['mylocation']

       # Enable SSL/TLS communication with Kafka
       enabletls=int(default_args['enabletls'])
       # If brokerhost is empty then this function will use the brokerhost address in your
       # VIPER.ENV in the field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
       brokerhost=default_args['brokerhost']
       # If this is -999 then this function uses the port address for Kafka in VIPER.ENV in the
       # field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
       brokerport=int(default_args['brokerport'])
       # If you are using a reverse proxy to reach VIPER then you can put it here - otherwise if
       # empty then no reverse proxy is being used
       microserviceid=default_args['microserviceid']

       #############################################################################################################
       #                         VIPER CALLS HPDE TO PERFORM REAL_TIME MACHINE LEARNING ON TRAINING DATA


       # deploy the algorithm to ./deploy folder - otherwise it will be in ./models folder
       deploy=int(default_args['deploy'])
       # number of models runs to find the best algorithm
       modelruns=int(default_args['modelruns'])
       # Go to the last offset of the partition in partition_training variable
       offset=int(default_args['offset'])
       # If 0, this is not a logistic model where dependent variable is discreet
       islogistic=int(default_args['islogistic'])
       # set network timeout for communication between VIPER and HPDE in seconds
       # increase this number if you timeout
       networktimeout=int(default_args['networktimeout'])

       # This parameter will attempt to fine tune the model search space - a number close to 0 means you will have lots of
       # models but their quality may be low.  A number close to 100 means you will have fewer models but their predictive
       # quality will be higher.
       modelsearchtuner=int(default_args['modelsearchtuner'])

       #this is the dependent variable
       dependentvariable=default_args['dependentvariable']
       # Assign the independentvariable streams
       independentvariables=default_args['independentvariables'] #"Voltage_preprocessed_AnomProb,Current_preprocessed_AnomProb"

       rollbackoffsets=int(default_args['rollbackoffsets'])
       consumeridtrainingdata2=default_args['consumeridtrainingdata2']
       partition_training=default_args['partition_training']
       producerid=default_args['producerid']
       consumefrom=default_args['consumefrom']

       topicid=int(default_args['topicid'])
       fullpathtotrainingdata=default_args['fullpathtotrainingdata']

      # These are the conditions that sets the dependent variable to a 1 - if condition not met it will be 0
       processlogic=default_args['processlogic'] #'classification_name=failure_prob:Voltage_preprocessed_AnomProb=55,n:Current_preprocessed_AnomProb=55,n'

       identifier=default_args['identifier']

       producetotopic = default_args['ml_data_topic']

       array=int(default_args['array'])
       transformtype=default_args['transformtype'] # Sets the model to: log-lin,lin-log,log-log
       sendcoefto=default_args['sendcoefto']  # you can send coefficients to another topic for further processing
       coeftoprocess=default_args['coeftoprocess']  # indicate the index of the coefficients to process i.e. 0,1,2
       coefsubtopicnames=default_args['coefsubtopicnames']  # Give the coefficients a name: constant,elasticity,elasticity2


      # Call HPDE to train the model
       result=maadstml.viperhpdetraining(VIPERTOKEN,VIPERHOST,VIPERPORT,consumefrom,producetotopic,
                                       companyname,consumeridtrainingdata2,producerid, HPDEHOST,
                                       viperconfigfile,enabletls,partition_training,
                                       deploy,modelruns,modelsearchtuner,HPDEPORT,offset,islogistic,
                                       brokerhost,brokerport,networktimeout,microserviceid,topicid,maintopic,
                                       independentvariables,dependentvariable,rollbackoffsets,fullpathtotrainingdata,processlogic,identifier)


 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startml(**context):
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTML".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTML".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))
        HPDEADDR = default_args['HPDEADDR']

        HPDEHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOST".format(sname))
        HPDEPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORT".format(sname))
        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

        ti = context['task_instance']
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname), value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_ml_data_topic".format(sname), value=default_args['ml_data_topic'])
        ti.xcom_push(key="{}_modelruns".format(sname), value="_{}".format(default_args['modelruns']))
        ti.xcom_push(key="{}_offset".format(sname), value="_{}".format(default_args['offset']))
        ti.xcom_push(key="{}_islogistic".format(sname), value="_{}".format(default_args['islogistic']))
        ti.xcom_push(key="{}_networktimeout".format(sname), value="_{}".format(default_args['networktimeout']))
        ti.xcom_push(key="{}_modelsearchtuner".format(sname), value="_{}".format(default_args['modelsearchtuner']))
        ti.xcom_push(key="{}_dependentvariable".format(sname), value=default_args['dependentvariable'])
        ti.xcom_push(key="{}_independentvariables".format(sname), value=default_args['independentvariables'])

        rollback=default_args['rollbackoffsets']
        if 'step5rollbackoffsets' in os.environ:
          ti.xcom_push(key="{}_rollbackoffsets".format(sname), value="_{}".format(os.environ['step5rollbackoffsets']))
          rollback=os.environ['step5rollbackoffsets']
        else:
          ti.xcom_push(key="{}_rollbackoffsets".format(sname), value="_{}".format(default_args['rollbackoffsets']))

        processlogic=default_args['processlogic']
        if 'step5processlogic' in os.environ:
          ti.xcom_push(key="{}_processlogic".format(sname), value="{}".format(os.environ['step5processlogic']))
          processlogic=os.environ['step5processlogic']
        else:
          ti.xcom_push(key="{}_processlogic".format(sname), value="{}".format(default_args['processlogic']))

        independentvariables=default_args['independentvariables']
        if 'step5independentvariables' in os.environ:
          ti.xcom_push(key="{}_independentvariables".format(sname), value="{}".format(os.environ['step5independentvariables']))
          independentvariables=os.environ['step5independentvariables']
        else:
          ti.xcom_push(key="{}_independentvariables".format(sname), value="{}".format(default_args['independentvariables']))


        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_consumefrom".format(sname), value=default_args['consumefrom'])
        ti.xcom_push(key="{}_fullpathtotrainingdata".format(sname), value=default_args['fullpathtotrainingdata'])
        ti.xcom_push(key="{}_transformtype".format(sname), value=default_args['transformtype'])
        ti.xcom_push(key="{}_sendcoefto".format(sname), value=default_args['sendcoefto'])
        ti.xcom_push(key="{}_coeftoprocess".format(sname), value=default_args['coeftoprocess'])
        ti.xcom_push(key="{}_coefsubtopicnames".format(sname), value=default_args['coefsubtopicnames'])
        ti.xcom_push(key="{}_HPDEADDR".format(sname), value=HPDEADDR)

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('ml',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-ml", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {}{} {} {} \"{}\" \"{}\"".format(fullpath,VIPERTOKEN, HTTPADDR, VIPERHOST, VIPERPORT[1:], HPDEADDR, HPDEHOST, HPDEPORT[1:],rollback,processlogic,independentvariables), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()
         try:
           tsslogging.tsslogit("Machine Learning DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         HPDEHOST = sys.argv[5]
         HPDEPORT = sys.argv[6]
         rollbackoffsets =  sys.argv[7]
         default_args['rollbackoffsets'] = rollbackoffsets
         processlogic =  sys.argv[8]
         default_args['processlogic'] = processlogic
         independentvariables =  sys.argv[9]
         default_args['independentvariables'] = independentvariables
         subprocess.run("rm -rf {}".format(default_args['fullpathtotrainingdata']), shell=True)

         tsslogging.locallogs("INFO", "STEP 5: Machine learning started")

         while True:
          try:
           performSupervisedMachineLearning()
 #          time.sleep(10)
          except Exception as e:
           tsslogging.locallogs("ERROR", "STEP 5: Machine Learning DAG in {} {}".format(os.path.basename(__file__),e))
           tsslogging.tsslogit("Machine Learning DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
           break

7.5.12.2. Additional Details on Machine Learning

Entity based machine learning is a core function of TML. This section discusses some of key defaul_args in the tml-system-step-5-kafka-machine-learning-dag. These are as follows.

Important

TML generates training algorithms and stores them on disk in the ./models or ./deploy folder, and in the Kafka topic specified in the ml_data_topic default_args json key. TML accesses these trained algorithms, for predictions, automatically for each entity specified by topicid. Everything is managed by the TML binary: Viper (see 1. TML Components: Three Binaries)

TML manages the topicid, which represents individual entities in MariaDB. Note, a topicid is uniquely associated with a primary identifier for the device or entity like its Device Serial Number (DSN). So as data streams from all devices, there must be a json key that indicates a DSN from these devices. TML binary Viper, aggregates data for each DSN and process the data for each device in every sliding time window.

TML generates trained algorithms for each sliding time window. This means, as new real-time data is captured in the sliding time windows, TML re-runs algorithms for this sliding time window to see if there is a better algorithm using the MAPE measure. - If the MAPE in the previous sliding time window is higher than the MAPE on the next windows, the older algorithm will be used in the next window, otherwise TML overwrites the older algorithm with the newer, better, algorithm. NOTE: TML is generating brand new algorithms for sliding windows, it is NOT simply updating the estimated parameters for ONE algorithm, as is common in convetional approaches.

All algorithm are Json serialized files that are less than 1K in size. This makes it very efficient to store millions of algorithms on disk without consuming much storage.

All training and predictions happen in parallel using different instances of the Viper binary.

Here are the core parameters in the above dag 5:

Step 5 DAG parameter

Explanation

modelruns

This instructs HPDE to try to find the best

trained algorithms out of many. For example,

if modelruns=100, it will iterate over 100 models

before it

finds the best model out of these 100 models.

It will perform hyperparameter tuning as well.

islogistic

TML can do classification and regression.

If islogistic=1, then TML assumes the

dependent variable is a binary variable with

value 1 or 0, otherwise if

islogistic=0, then it assumes the dependent

variable is continuous.

modelsearchtuner

This parameter will attempt to fine tune the

model search space - A number close to 100 means

you will have fewer models but their predictive

quality will

be higher.

dependentvariable

You specify the json path of the dependent

variable in your Json message.

Refer to Json Path Example. If using

preprocessed variables refer to

Preprocessed Variable Naming Standard

independentvariables

You must specify the independent variables

(separate multiple variables by a comma).

Refer to the Json Path Example. If

using preprocessed

variables refer to Preprocessed Variable Naming Standard

topicid

The topicid is an internal directive

for TML. If set to -1, it tell the

TML Viper binary to process Json

by their unique indentifier. Usually,

leaving

this at -1 is fine.

fullpathtotrainingdata

You must specify the full path to where

the training dataset will be store on

disk. The formation for the path

is /Viper-ml/viperlogs/<choose

foldername>, where you specify the

foldername.

processlogic

This is the processlogic needed for

the dependent variable if you are

estimating a logistic model.

Specifically, if the conditions in

your logic are

TRUE, the dependent variable will

be set to 1, otherwise it will be 0.

For example, **classification_name =

failure_prob:Voltage_preprocessed_AnomProb=55,

n:Current_preprocessed_AnomProb=55,n**

means, if the preprocessed variable

Voltage_preprocessed_AnomProb is

greater than 55, and Current_preprocessed_AnomProb

is greater than 55, then set dependent variable

failure_prob to 1, otherwise set it to 0; the variable n and -n

indicates no upper bound, or lower bound, respectively.

if you want less than 55, then use **classification_name =

failure_prob:

Voltage_preprocessed_AnomProb=-n,55:

Current_preprocessed_AnomProb=-n,55**

Note: classification_name must be

specified, the name of the dependent variable

failure_prob can be changed to any name you want.

Performing real-time logistic regression is

a very powerful way to perform probability

predictions on real-time data generated by devices.

transformtype

You can specify transformation of

your machine learning model by

specifying: log-lin, lin-log, log-log

log-lin: take log of the dependent

variable, and leave the independent

variable as is.

lin-log: leave the dependent variable

as is, but take log of the independent variables.

log-log: take log of the dependent variable,

and take log of the independent variables.

sendcoefto

You can send the coefficients for each

trained model to another Kafka topic.

This topic MUST BE SET IN STEP 2.

coeftoprocess

You can specify which coeffients to process

i.e. 0,1,2 For example, for 3 estimator parameters

0=constant, 1,2 are the other estmated paramters

coefsubtopicnames

You can give names to the coefficients in

your model: constant, elasticity, elasticity2

7.5.12.3. Classification Models: Details on the Processlogic field

Important

If you are estimating a classification model, and want to predict probabilities, then you must define the processlogic field.

The processlogic define the rules to classify the dependent variable into 1 or 0 based on the rules. The table below shows how to

specify these rules for the variables you are using or processed in STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag. We will set rules on the processed variables: Voltage and Current.

Tip

You should refer to Preprocessed Variable Naming Standard to properly specify the names of the processed variables: Voltage and Current If Voltage and Current are processed with anomaly probability processing type (i.e. AnomProb), then the new processed variables for Voltage and Current will be named:

  1. Voltage_preprocessed_AnomProb

  2. Current_preprocessed_AnomProb

Similarly, if processing any variable, this naming standard will apply.

For example, lets breakdown the following rule for prepcoccesed variables Voltage and Current - this rule would be the value of the processlogic field in Dag 5 above:

classification_name=failure_prob:Voltage_preprocessed_AnomProb=55,n : Current_preprocessed_AnomProb=55,n

NOTE: Separate multiple rules by a colon (:). The colon acts as an “AND”. Specifically, if Voltage_preprocessed_AnomProb AND Current_preprocessed_AnomProb both satisfy their rules, then failure_prob is set to 1, otherwise, 0.

Variable/Rule

Upper Bound

Lower Bound

Explanation

classification_name

n/a

n/a

This simply tells TML that

this is a classification model

failure_prob

n/a

n/a

This is simply the name for

your generated classified variable.

You can put any name you want.

Voltage_preprocessed_AnomProb=55,n

n

55

This sets the rule for the

Voltage_preprocessed_AnomProb

and sets the failure_prob

to 1 IF the values of the variable

Voltage_preprocessed_AnomProb are

between 55 and n, where n

signifies no upper bound.

If rule was

Voltage_preprocessed_AnomProb=55,95,

then failure_prob will

be 1, if it is between 55 and 95,

inclusive.

Current_preprocessed_AnomProb=55,n

n

55

This sets the rule for the

Current_preprocessed_AnomProb

and sets the failure_prob

to 1 IF the values of the variable

Current_preprocessed_AnomProb

are between 55 and n, where n

signifies no upper bound.

If rule was

Current_preprocessed_AnomProb=55,95,

then failure_prob will

be 1, if it is between 55 and 95,

inclusive.

Important

The 1 and 0’s are then compared between the variables to see if they match. For example, if Voltage_preprocessed_AnomProb AND Current_preprocessed_AnomProb both are 1, then the failure_prob variable is 1, otherwise 0.

Tip

If Current_preprocessed_AnomProb=-n,55, then this rule is if Current_preprocessed_AnomProb is less then 55, then set failure_prob to 1, otherwise 0.

Both -n and n indicate that the variable has NO lower bound or upper bound, respectively. If you want a specific lower and upper bound, just replace -n, and n with exact numbers.

7.5.12.4. Machine Learning Trained Model Sample JSON Output

Below is the JSON output after TML binary: HPDE has performed machine learning using the eal-time data streams.

{
    "Algokey": "StreamConsumer_topicid59_json",
    "Algo": "StreamConsumer_topicid59_jsonlgt",
    "Forecastaccuracy": 0.747,
    "DependentVariable": "failure_prob",
    "Filename": "/Viper-tml/viperlogs/iotlogistic/StreamConsumer_topicid59.csv",
    "Fieldnames": "Date,topicid59_Voltage_preprocessed_AnomProb,topicid59_Current_preprocessed_AnomProb",
    "TestResultsFile": "/Viper-tml/viperlogs/iotlogistic/StreamConsumer_topicid59_json_predictions.csv",
    "Deployed": 1,
    "DeployedTo": "Local Machine Deploy Folder",
    "Created": "2024-08-15T22:05:55.692145224Z",
    "Fullpathtomodels": "/Viper-tml/viperlogs/iotlogistic",
    "Identifier": "Voltage~Line-Voltage-(mV)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name (Voltage),value:datapoint.value,identifier:metadata.display_name,datetime:datapoint.updated_at,:allrecords,Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a,IoT device performance and failures~Msgsjoined=e951b524-7faa-11ec-4107-b4937c8d3c24(120743,51.16569,10.45153,Voltage,n/a,n/a,{});e9870b70-7faa-11ec-7911-7438f38e028a(120929,51.16569,10.45153,Voltage,n/a,n/a,{});e9b56d62-7faa-11ec-d0c0-c3d1d2b8ba2b(120824,51.16569,10.45153,Voltage,n/a,n/a,{})~latlong=~mainuid=AC000W018740175",
    "AccuracyThreshold": 0.51,
    "Minmax": "27.774:82.392,27.592:82.013",
    "MachineLearningAlgorithm": "Logistic Regression",
    "ParameterEstimates": "-2.8284930,0.8076427,2.7328265",
    "HasConstantTerm": 1,
    "Topicid": 59,
    "ConsumeridFrom": "StreamConsumer",
    "Producerid": "StreamProducer",
    "ConsumingFrom": "/Viper-tml/viperlogs/iotlogistic/trainingdata_topicid59_.json",
    "ProduceTo": "iot-trained-params-input",
    "Companyname": "OTICS Advanced Analytics",
    "BrokerhostPort": "127.0.0.1:9092",
    "Islogistic": 1,
    "HPDEHOST": "172.18.0.2:44269",
    "HPDEMACHINENAME": "329e7b30d9b8",
    "Modelruns": 100,
    "ModelSearchTuner": 90,
    "TrainingData_Partition": -1,
    "Transformtype": "",
    "Sendcoefto": "",
    "Coeftoprocess": "",
    "Coefsubtopicnames": "",
    "BytesWritten": 1912,
    "kafkakey": "OAA-KK6EoesoB8KX8mkL17D5y5ejN-N7Le",
    "Numberofmessages": 239,
    "Partition": 0,
    "Offset": 59
}

7.5.12.5. Machine Learning Trained Model Sample JSON Output: Explanations

JSON Field

Description

Algokey

This is the Algoirithm key:

StreamConsumer_topicid59_json

Algo

This is the physical algorithm on

disk: StreamConsumer_topicid59_jsonlgt

Forecastaccuracy

This is the forecast accuracy using MAPE: 0.747,

DependentVariable

This is the computed discreet dependent variable:

failure_prob

Filename

File name of the training dataset:

/Viper-tml/viperlogs/iotlogistic/StreamConsumer_topicid59.csv

The above path is in the Docker container.

You can volume this path to save it on your

host machine.

Fieldnames

These are the independent variables: Date,

topicid59_Voltage_preprocessed_AnomProb,

topicid59_Current_preprocessed_AnomProb

TestResultsFile

A results of the predictions using the

test dataset is saved here:

/Viper-tml/viperlogs/iotlogistic/StreamConsumer_topicid59_json_predictions.csv

Deployed

Model is deployed to the ./deploy folder if its 1

DeployedTo

It is deployed to: Local Machine Deploy Folder”,

Created

The time the trained algorithm was generated:

2024-08-15T22:05:55.692145224Z

Fullpathtomodels

The full path to the model:

/Viper-tml/viperlogs/iotlogistic,

the ./models and ./deploy folder are

relative to this path

Identifier

Additional information about the data

Voltage~Line-Voltage-(mV)~iot-preprocess~uid:metadata.dsn,

subtopic:metadata.property_name

(Voltage), value:datapoint.value,

identifier:metadata.display_name,datetime:datapoint.updated_at,

:allrecords,Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a

~location:n/a~identifier:n/a,IoT device performance

and failures~Msgsjoined=e951b524-7faa-11ec-

4107-b4937c8d3c24(120743, 51.16569,10.45153,Voltage,

n/a,n/a,{});e9870b70-7faa-11ec-7911-7438f38e028a(120929,

51.16569,10.45153,Voltage,n/a,n/a,

{});e9b56d62-7faa-11ec-d0c0-c3d1d2b8ba2b(120824,51.16569,

10.45153,Voltage,n/a,n/a,{})~

latlong=~mainuid=AC000W018740175”,

AccuracyThreshold

Accuracy threshold for any must

be greater than: 0.51 (or 51%)

Minmax

The normalization of the

variables: 27.774:82.392,27.592:82.013

MachineLearningAlgorithm

The machine learning algorithm used:

Logistic Regression

ParameterEstimates

The parameter estimates: -2.8284930,0.8076427,

2.7328265

HasConstantTerm

Indicates if it has a constant term:

1 - means it does

Topicid

Internal topicid associated with the uid: 59

ConsumeridFrom

The consumerid: StreamConsumer

Producerid

The producerid: StreamProducer

ConsumingFrom

The physical training dataset file

in the container:

/Viper-tml/viperlogs/iotlogistic/trainingdata_topicid59_.json

ProduceTo

Topic where the estimated parameters are

saved:iot-trained-params-input

Companyname

Your company name

BrokerhostPort

Kafka brokerhostport: 127.0.0.1:9092

using On-Premise Kafka

Islogistic

Indicates if the model is

logistic: 1 - means it is

HPDEHOST

Address where HPDE is listening

for a connection from Viper:

172.18.0.2:44269

HPDEMACHINENAME

Machine name where the HPDE binary

is running: 329e7b30d9b8

Modelruns

Number of models to iterate through

before stopping: 100

ModelSearchTuner

Hyper parameter tuner: 90 - closer

to 100 means higher quality models

TrainingData_Partition

Ignored

Transformtype

This is the log-lin, lin-log,

log-log transformations if any

Sendcoefto

You can send the estimated

coefficients to a topic

Coeftoprocess

The coeffienct index to process

Coefsubtopicnames

The names of the coefficients

BytesWritten

The size of this json: 1912

kafkakey

The TML kafka key:

OAA-KK6EoesoB8KX8mkL17D5y5ejN-N7Le

Numberofmessages

The number of rows in the training

dataset: 239

Partition

The partition where this json

is store in kafka: 0

Offset

The offset of this json in Kafka: 59

7.5.13. TML Physical Location of Machine Learning Models

All entity level machine learning models are stored in the container folder specified in fullpathtotrainingdata in Step 5.

Important

Step 6 task uses the trained models in this folder for entity level predictions.

Therefore, in Step 6 below, the pathtoalgos must be the same as fullpathtotrainingdata in Step 5.

There are 5 file outputs from STEP 5 stored in the folder fullpathtotrainingdata. For example, for Entity 53 associated wth DSN:AC000W020485383 here are the output files:

Filename

Description

StreamConsumer_topicid53.csv

Training dataset

StreamConsumer_topicid53_json_.info

Information about the trained algorithm.

This is shown below Entity 53 Trained Algorithm Information

StreamConsumer_topicid53_json_predictions.csv

The prediction data using the test data.

StreamConsumer_topicid53_jsonlgt

The ACTUAL alorithm used by Step 6 for predictions.

This file is encrypted.

This is the MOST important file.

StreamConsumer_topicid53_jsonlgt_.param

Parameter estimates.

7.5.14. Entity 53 Trained Algorithm Information

  1. The JSON below is the information on the trained algorithm: “Algo”: “StreamConsumer_topicid53_jsonlgt”

  2. It’s name is “MachineLearningAlgorithm”: “Logistic Regression”.

  3. The independent variables are in the Fieldnames,

  4. The training dataset is in the filename: /Viper-ml/viperlogs/iotlogistic/StreamConsumer_topicid53.csv

Note that the training dataset is normalizied using minmax scaler. The parameter estimates are in the field: “ParameterEstimates”

{
 "Algokey": "StreamConsumer_topicid53_json",
 "Algo": "StreamConsumer_topicid53_jsonlgt",
 "Forecastaccuracy": 1,
 "DependentVariable": "failure_prob",
 "Filename": "/Viper-ml/viperlogs/iotlogistic/StreamConsumer_topicid53.csv",
 "Fieldnames": "Date,topicid53_Power_preprocessed_AnomProb",
 "TestResultsFile": "/Viper-ml/viperlogs/iotlogistic/StreamConsumer_topicid53_json_predictions.csv",
 "Deployed": 1,
 "DeployedTo": "Local Machine Deploy Folder",
 "Created": "2025-01-19T22:39:58.766388441Z",
 "Fullpathtomodels": "/Viper-ml/viperlogs/iotlogistic",
 "Identifier": "Power~Power-(mW)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name (Power),value:datapoint.value,ide> "AccuracyThreshold": 0.55,
 "Minmax": "27.555:82.016",
 "MachineLearningAlgorithm": "Logistic Regression",
 "ParameterEstimates": "-3.4493501,9.3446499",
 "HasConstantTerm": 1
}

7.5.14.1. How TML Optimizes ML Models and Acheives High Forecast Accuracy

TML uses the binaries Viper and HPDE to optimize ML models for high forecast accuracy. All ML models estimated by Viper and HPDE are applied to data in each sliding time window.

Below describes how TML (Viper/HPDE) optimizes ML models for each sliding time window:

  1. TML processes each sliding time window which can be expanded to increase the model training data sets for ML models

  1. More training data allows TML to learn the patterns effectively, BUT because TML does ALL of this processing IN-MEMORY having too large of a training dataset will slow down TML processing/ML

  1. TML applies several different algorithms to the streaming data:

Algorithm

Description

Logistic Regression

Performs classification regression and predicts probabilities

Linear Regression

Performs linear regression using OLS algorithm

Gradient Boosting

Gradient boosting for non-linear real-time data

Ridge Regression

Ridge Regression for non-linear real-time data

Neural networks

Neural networks non-linear real-time data

  1. TML performs real-time data normalization: All data are put on the same scale, between 0-1 – this prevents large variables (with large numbers) from dominating small variables (with small numbers, like decimals)

  2. TML performs real-time hyper parameter tuning in the algorithms in 2 above. This is IMPORTANT to ensure algorithms are properly calibrated for the best prediction accuracy (algorithm MAPE)

  3. TML performs constant machine learning of the streamed data by constantly trying different algorithms for EVERY sliding time window. This is how TML is able to learn highly complex, NON-LINEAR, data in real-Time. So if the underlying pattern changes in the subsequent sliding time windows, these new patterns will be learned by TML immediately.

7.5.15. STEP 6: Entity Based Predictions: tml-system-step-6-kafka-predictions-dag

Tip

Watch the YouTube video to see how this dag is configured. YouTube Video

Note

All Prediction data are also written to “/rawdata/ml” folder in the container.

If you mapped the rawdata folder then you can access these files.

 import maadstml
 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import tsslogging
 import os
 import subprocess
 import random
 import time

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'myname' : 'Sebastian Maurice',   # <<< *** Change as needed
   'enabletls': '1',   # <<< *** 1=connection is encrypted, 0=no encryption
   'microserviceid' : '', # <<< *** leave blank
   'producerid' : 'iotsolution',    # <<< *** Change as needed
   'preprocess_data_topic' : 'iot-preprocess', # << *** data for the independent variables - You created this in STEP 2
   'ml_prediction_topic' : 'iot-ml-prediction-results-output', # topic to store the predictions - You created this in STEP 2
   'description' : 'TML solution',    # <<< *** Change as needed
   'companyname' : 'Otics', # <<< *** Change as needed
   'myemail' : 'Your email', # <<< *** Change as needed
   'mylocation' : 'Your location', # <<< *** Change as needed
   'brokerhost' : '', # <<< *** Leave as is
   'brokerport' : '-999', # <<< *** Leave as is
   'streamstojoin' : 'Power_preprocessed_AnomProb', # << ** These are the streams in the preprocess_data_topic for these independent variables
   'inputdata' : '', # << ** You can specify independent variables manually - rather than consuming from the preprocess_data_topic stream
   'consumefrom' : 'ml-data', # << This is ml_data_topic in STEP 5 that contains the estimated parameters
   'mainalgokey' : '', # leave blank
   'offset' : '-1', # << ** input data will start from the end of the preprocess_data_topic and rollback maxrows
   'delay' : '60', # << network delay parameter
   'usedeploy' : '1', # << 1=use algorithms in ./deploy folder, 0=use ./models folder
   'networktimeout' : '6000', # << additional network parameter
   'maxrows' : '50',  # << ** the number of offsets to rollback - For example, if 50, you will get 50 predictions continuously
   'produceridhyperprediction' : '',  # << leave blank
   'consumeridtraininedparams' : '',  # << leave blank
   'groupid' : '',  # << leave blank
   'topicid' : '-1',   # << leave as is
   'pathtoalgos' : '/Viper-ml/viperlogs/iotlogistic', # << this is specified in fullpathtotrainingdata in STEP 5
   'array' : '0', # 0=do not save as array, 1=save as array
   'HPDEADDR' : 'http://' # Do not modify
 }
 ######################################## DO NOT MODIFY BELOW #############################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HPDEHOSTPREDICT=''
 HPDEPORTPREDICT=''
 HTTPADDR=""

 # that is a change 2
 # Set Global variable for Viper confifuration file - change the folder path for your computer
 viperconfigfile="/Viper-predict/viper.env"

 mainproducerid = default_args['producerid']
 maintopic=default_args['preprocess_data_topic']
 predictiontopic=default_args['ml_prediction_topic']


 def performPrediction():


       # Set personal data
       companyname=default_args['companyname']
       myname=default_args['myname']
       myemail=default_args['myemail']
       mylocation=default_args['mylocation']

       # Enable SSL/TLS communication with Kafka
       enabletls=int(default_args['enabletls'])
       # If brokerhost is empty then this function will use the brokerhost address in your
       # VIPER.ENV in the field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
       brokerhost=default_args['brokerhost']
       # If this is -999 then this function uses the port address for Kafka in VIPER.ENV in the
       # field 'KAFKA_CONNECT_BOOTSTRAP_SERVERS'
       brokerport=int(default_args['brokerport'])
       # If you are using a reverse proxy to reach VIPER then you can put it here - otherwise if
       # empty then no reverse proxy is being used
       microserviceid=default_args['microserviceid']

       description=default_args['description']

       # Note these are the same streams or independent variables that are in the machine learning python file
       streamstojoin=default_args['streamstojoin']  #"Voltage_preprocessed_AnomProb,Current_preprocessed_AnomProb"

       #############################################################################################################
       #                                     START HYPER-PREDICTIONS FROM ESTIMATED PARAMETERS
       # Use the topic created from function viperproducetotopicstream for new data for
       # independent variables
       inputdata=default_args['inputdata']

       # Consume from holds the algorithms
       consumefrom=default_args['consumefrom'] #"iot-trained-params-input"

       # if you know the algorithm key put it here - this will speed up the prediction
       mainalgokey=default_args['mainalgokey']
       # Offset=-1 means go to the last offset of hpdetraining_partition
       offset=int(default_args['offset']) #-1
       # wait 60 seconds for Kafka - if exceeded then VIPER will backout
       delay=int(default_args['delay'])
       # use the deployed algorithm - must exist in ./deploy folder
       usedeploy=int(default_args['usedeploy'])
       # Network timeout
       networktimeout=int(default_args['networktimeout'])
       # maxrows - this is percentage to rollback stream

       if 'step6maxrows' in os.environ:
         maxrows=int(os.environ['step6maxrows'])
       else:
         maxrows=int(default_args['maxrows'])
       #Start predicting with new data streams
       produceridhyperprediction=default_args['produceridhyperprediction']
       consumeridtraininedparams=default_args['consumeridtraininedparams']
       groupid=default_args['groupid']
       topicid=int(default_args['topicid'])  # -1 to predict for current topicids in the stream

       # Path where the trained algorithms are stored in the machine learning python file
       pathtoalgos=default_args['pathtoalgos'] #'/Viper-tml/viperlogs/iotlogistic'
       array=int(default_args['array'])
       ml_prediction_topic = default_args['ml_prediction_topic']

       result6=maadstml.viperhpdepredict(VIPERTOKEN,VIPERHOST,VIPERPORT,consumefrom,ml_prediction_topic,
                                      companyname,consumeridtraininedparams,
                                      produceridhyperprediction, HPDEHOSTPREDICT,inputdata,maxrows,mainalgokey,
                                      -1,offset,enabletls,delay,HPDEPORTPREDICT,
                                      brokerhost,brokerport,networktimeout,usedeploy,microserviceid,
                                      topicid,maintopic,streamstojoin,array,pathtoalgos)



 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startpredictions(**context):

        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREDICT".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREDICT".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))
        HPDEADDR = default_args['HPDEADDR']

        HPDEHOSTPREDICT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOSTPREDICT".format(sname))
        HPDEPORTPREDICT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORTPREDICT".format(sname))

        chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
        ti = context['task_instance']
        ti.xcom_push(key="{}_preprocess_data_topic".format(sname),value=default_args['preprocess_data_topic'])
        ti.xcom_push(key="{}_ml_prediction_topic".format(sname),value=default_args['ml_prediction_topic'])
        ti.xcom_push(key="{}_streamstojoin".format(sname),value=default_args['streamstojoin'])
        ti.xcom_push(key="{}_inputdata".format(sname),value=default_args['inputdata'])
        ti.xcom_push(key="{}_consumefrom".format(sname),value=default_args['consumefrom'])
        ti.xcom_push(key="{}_offset".format(sname),value="_{}".format(default_args['offset']))
        ti.xcom_push(key="{}_delay".format(sname),value="_{}".format(default_args['delay']))
        ti.xcom_push(key="{}_usedeploy".format(sname),value="_{}".format(default_args['usedeploy']))
        ti.xcom_push(key="{}_networktimeout".format(sname),value="_{}".format(default_args['networktimeout']))

        maxrows=default_args['maxrows']
        if 'step6maxrows' in os.environ:
           ti.xcom_push(key="{}_maxrows".format(sname),value="_{}".format(os.environ['step6maxrows']))
           maxrows=os.environ['step6maxrows']
        else:
          ti.xcom_push(key="{}_maxrows".format(sname),value="_{}".format(default_args['maxrows']))
        ti.xcom_push(key="{}_topicid".format(sname),value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_pathtoalgos".format(sname),value=default_args['pathtoalgos'])
        ti.xcom_push(key="{}_HPDEADDR".format(sname), value=HPDEADDR)

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        wn = windowname('predict',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-predict", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} {}{} {} {}".format(fullpath,VIPERTOKEN,HTTPADDR,VIPERHOST,VIPERPORT[1:],HPDEADDR,HPDEHOSTPREDICT,HPDEPORTPREDICT[1:],maxrows), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
          repo=tsslogging.getrepo()
          try:
            tsslogging.tsslogit("Predictions DAG in {}".format(os.path.basename(__file__)), "INFO" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
          except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

          VIPERTOKEN=sys.argv[2]
          VIPERHOST=sys.argv[3]
          VIPERPORT=sys.argv[4]
          HPDEHOSTPREDICT=sys.argv[5]
          HPDEPORTPREDICT=sys.argv[6]
          maxrows =  sys.argv[7]
          default_args['maxrows'] = maxrows

          tsslogging.locallogs("INFO", "STEP 6: Predictions started")
          while True:
           try:
             performPrediction()
             time.sleep(1)
           except Exception as e:
             tsslogging.locallogs("ERROR", "STEP 6: Predictions DAG in {} {}".format(os.path.basename(__file__),e))
             tsslogging.tsslogit("Predictions DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
             tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
             break

Here are the core parameters in the above dag 6:

Step 6 DAG parameter

Explanation

preprocess_data_topic

This is the topic that contain the data

for the independent variables. Note: this

is NOT different from conventional BATCH machine

learning, where you

train a model on batch data, and then you use

new values for the independent variables

for prediction of the dependent variable.

In the real-time case,

we are streaming values for the

independent variables contained in this topic.

ml_prediction_topic

This topic will contain the predictions.

The predictions can then be used for

visualization in STEP 7.

description

You can provide a description for your

solution here.

streamstojoin

This is where you specify the independent

variables for your predctions. Specifically,

if you are preprocessing, the “new”

preprocessed variables will be given a standard naming convention -

see Preprocessed Variable Naming Standard

for details. For example, if you used preprocessed

variables

Voltage and Current in your model, and used

AnomProb (see Preprocessing Types), then the

names for the preprocessed Voltage and Current streams

will be: Voltage_preprocessed_AnomProb,

Current_preprocessed_AnomProb.

inputdata

You can also manually enter the values for

the independent variables in this variable.

Specifically, if you do NOT want to join streams

for the

independent variables, buy use different

values then enter them here. Note: You can either

use streamstojoin or inputdata, not BOTH. The data in the

inputdata field MUST be in the exact position

of your model. For example, if your model

is y = a + b, then inputdata=a_value,b_value, not

inputdata=b_value,a_value, since the estimated

coefficients will be for a and b, in this

precise position.

consumefrom

This is the topic from STEP 5 (ml_data_topic)

that contains the trained algorithm with the

estimated parameters. You need these estimated

parameters for

the predictions. This is exactly the same

as in conventional machine learning.

mainalgokey

This is the AlgoKey generated by TML

it is a unique key identifying the algorithm

for the entities.

offset

This determines where to start consuming

the data from the stream. For example, if offset=-1,

then consumption of the data will start from the latest

data in the stream variables specified in streamstojoin.

The amount of data to consume is determined by the

maxrows parameter.

maxrows

This determines the number of offsets to

rollback the stream. For example, if maxrows=50,

and the last offset is 1000, then Viper will

start consuming

data from starting offset 1000-50=950,

upto the last offset of 1000.

delay

This is a network delay parameter, that

accomodates from any delays in Kafka (if any)

networktimeout

This variable accounts for any connection

latency from Python

usedeploy

When algorithms are trained they put in

the ./models or ./deploy folder. If usedeploy=1,

then trained algorithms will be read from the

./deploy folder,

otherwise models from ./models will be used.

topicid

This is an internal parameter that TML

uses to keep track of entity ids. Setting

this to -1 tells Viper to process individual

entities.

pathtoalgos

This is the same path you specified in the

key fullpathtotrainingdata in STEP 5.

This is the location of the training

datasets and algorithms. This is

also important if you wanted to keep track

of training datasets for auditing and governance.

7.6. Machine Learning Prediction Sample JSON Output

{
 "Hyperprediction": 0.347,
 "Probability1": 0.347,
 "Probability0": 0.653,
 "Algokey": "StreamConsumer_topicid1370_json",
 "Algo": "StreamConsumer_topicid1370_jsonlgt",
 "Usedeploy": 1,
 "Created": "2022-10-29T18:24:27.5145458-04:00",
 "Inputdata": "0.000,0.000,0.000,122022.000,0.000,0.000",
 "Fieldnames":
 "Date, topicid1370_Voltage_preprocessed_AnomProb, topicid1370_Current_preprocessed_AnomProb, topicid1370_Power_preprocessed_Trend,
   topicid1370_Voltage_preprocessed_Avg, topicid1370_Current_preprocessed_Avg,topicid1370_Power_preprocessed_Avg",
 "Topicid": 1370,
 "Fullpathtomodels": "c:/maads/golang/go/bin/viperlogs/iotlogistic/deploy",
 "Identifier": "Power~Power-(mW)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name (
   (Power), value:datapoint.value, identifier:metadata.display_name, datetime:datapoint.updated_at,:allrecords,
   Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a,IoT device performance and failures~Msgsjoined=7c54e7d8-7fab-11ec-1a0b-
   b4bd125d9af1(0);7ce0b024-7fab-11ec-9ac5-3ffbb1c36dfe(0);7ca71d1e-7fab-11ec-223f-87fb225a1c75(0);7cfe6880-7fab-11ec-ea23-17d1132d4605(0);7c7fdd12-7fab-11ec-
   41f5-50aa3db0fe21(0);7cc487c8-7fab-11ec-408e-149982099613(0)~latlong=46.151241,14.995463~mainuid=AC000W020486693",
 "Islogistic": "1",
 "Compression": "GZIP",
 "Produceto": "iot-ml-prediction-results-output",
 "Kafkacluster": "pkc-6ojv2.us-west4.gcp.confluent.cloud:9092",
 "Minmax": "35.487:104.175,35.144:103.602,0.000:0.000,0.000:0.000,0.000:0.000,0.000:0.000",
 "MachineLearningAlgorithm": "Logistic Regression",
 "ParameterEstimates": "-0.6322068,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000",
 "HasConstantTerm": "1"
}

Tip

It will be important to carefully study these fields for the visualization or for other downstream analysis.

Here is the table explaining the fields in the prediction JSON.

JSON Field

Description

Hyperprediction

This contains the probability prediction of

failure for the device: mainuid=AC000W020486693

A value of 0.347 means this device has a

34.7% chance of failure.

Probability1

Probability of Class 1: Failure: 0.347

Probability0

Probability of Class 0: No Failure: 0.653

Algokey

Internal algorithm key identifying this

algorithm for this device: StreamConsumer_topicid1370_json,

internal ID 1370 is mapped to

device ID AC000W020486693

Algo

The algorithm used: StreamConsumer_topicid1370_jsonlgt,

lgt is logitic

Usedeploy

Determines which folder to grab the

algorithm: 1, means use the ./deploy folder

Created

Create time for this prediction

in UTC: 2022-10-29T18:24:27.5145458-04:00

Inputdata

Inputdata used in the model: 0.000,0.000,0.000,

122022.000,0.000,0.000 - These are the independent variables

Fieldnames

These are the independent variable streams used

in the model: Date, topicid1370_Voltage_preprocessed_AnomProb,

topicid1370_Current_preprocessed_AnomProb,

topicid1370_Power_preprocessed_Trend,

topicid1370_Voltage_preprocessed_Avg,

topicid1370_Current_preprocessed_Avg,

topicid1370_Power_preprocessed_Avg

Topicid

The topicid associated with this device id: 1370

Fullpathtomodels

This is the full path to trained algorithm:

c:/maads/golang/go/bin/viperlogs/iotlogistic/deploy

Identifier

This contains additional information

about the json criteria used.

Power~Power-(mW)~iot-preprocess~uid:metadata.dsn,

subtopic:metadata.property_name (

(Power), value:datapoint.value, identifier:

metadata.display_name, datetime:datapoint.updated_at,:allrecords,

Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a~location:n/a~

identifier:n/a,IoT device performance and failures~

Msgsjoined=7c54e7d8-7fab-11ec-1a0b-

b4bd125d9af1(0);7ce0b024-7fab-11ec-9ac5-3ffbb1c36dfe(0);

7ca71d1e-7fab-11ec-223f-87fb225a1c75(0);

7cfe6880-7fab-11ec-ea23-17d1132d4605(0);7c7fdd12-7fab-

11ec-

41f5-50aa3db0fe21(0);7cc487c8-7fab-11ec-408e-149982099613(0)~

latlong=46.151241,14.995463~mainuid=AC000W020486693

Islogistic

This is a logistic if the value is: 1

Compression

Compression used in the data storage: GZIP

Produceto

The topic the predictions are produced

to: iot-ml-prediction-results-output

Kafkacluster

This is the Kafka cluster used:

pkc-6ojv2.us-west4.gcp.confluent.cloud:9092

Minmax

All values of the independent variable

streams are transformed using minmax -

here are the values for each independent variable (Fieldnames):

35.487:104.175,35.144:103.602,0.000:0.000,

0.000:0.000,0.000:0.000,0.000:0.000

MachineLearningAlgorithm

The name of the machine learning algorithm:

Logistic Regression

ParameterEstimates

The parameter estimates from the trained model:

-0.6322068,0.0000000,0.0000000,0.0000000,

0.0000000,0.0000000,0.0000000

HasConstantTerm

Indicates if the model has a constant

term: 1 - indicates it does.

7.6.1. STEP 7: Real-Time Visualization: tml-system-step-7-kafka-visualization-dag

Fields to visualize can be determined from Preprocessed Sample JSON Output and Machine Learning Prediction Sample JSON Output and Machine Learning Trained Model Sample JSON Output.

 from airflow import DAG
 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator

 from datetime import datetime
 from airflow.decorators import dag, task
 import sys
 import subprocess
 import tsslogging
 import os
 import time
 import random

 sys.dont_write_bytecode = True
 ######################################## USER CHOOSEN PARAMETERS ########################################
 default_args = {
   'topic' : 'iot-preprocess,iot-preprocess2',    # <<< *** Separate multiple topics by a comma - Viperviz will stream data from these topics to your browser
   'dashboardhtml': 'dashboard.html', # <<< *** name of your dashboard file: This one is ONLY for preprocessing
   'dashboardhtml-ml': 'dashboard-ml.html', # <<< *** This one is IF you include ML dag
   'topic-ml' : 'iot-preprocess,iot-preprocess2',    # <<< *** Separate multiple topics by a comma
   'dashboardhtml-ai': 'dashboard-ai.html', # <<< *** This one is you include AI dag
   'topic-ai' : 'iot-preprocess,iot-preprocess2',    # <<< *** Separate multiple topics by a comma
   'dashboardhtml-ml-ai': 'dashboard-ml-ai.html', # <<< *** This one is you include ML-AI dag
   'topic-ml-ai' : 'iot-preprocess,iot-preprocess2',    # <<< *** Separate multiple topics by a comma
   'secure': '1',   # <<< *** 1=connection is encrypted, 0=no encryption
   'offset' : '-1',    # <<< *** -1 indicates to read from the last offset always
   'append' : '0',   # << ** Do not append new data in the browser
   'rollbackoffset' : '400', # *************** Rollback the data stream by rollbackoffset.  For example, if 500, then Viperviz wll grab all of the data from the last offset - 500
 }

 ######################################## DO NOT MODIFY BELOW #############################################

 def windowname(wtype,vipervizport,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "viperviz-{}-{}-{}={}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/vipervizwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{},{}\n".format(wn,vipervizport))

     return wn

 def startstreamingengine(**context):
         repo=tsslogging.getrepo()
         tsslogging.locallogs("INFO", "STEP 7: Visualization started")
         try:
           tsslogging.tsslogit("Visualization DAG in {}".format(os.path.basename(__file__)), "INFO" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
         except Exception as e:
             #git push -f origin main
             os.chdir("/{}".format(repo))
             subprocess.call("git push -f origin main", shell=True)

         sd = context['dag'].dag_id
         sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
         chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
         vipervizport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERVIZPORT".format(sname))
         solutionvipervizport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONVIPERVIZPORT".format(sname))
         tss = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_TSS".format(sname))

         if '_ml_ai_' in sd:
           topic = default_args['topic-ml-ai']
           dashboardhtml = default_args['dashboardhtml-ml-ai']
         elif '_ai_' in sd:
           topic = default_args['topic-ai']
           dashboardhtml = default_args['dashboardhtml-ai']
         elif '_ml_' in sd:
           topic = default_args['topic-ml']
           dashboardhtml = default_args['dashboardhtml-ml']
         else:
           topic = default_args['topic']
           dashboardhtml = default_args['dashboardhtml']

         secure = default_args['secure']
         offset = default_args['offset']
         append = default_args['append']
         rollbackoffset = default_args['rollbackoffset']

         ti = context['task_instance']
         ti.xcom_push(key="{}_topic".format(sname),value="{}".format(topic))
         ti.xcom_push(key="{}_dashboardhtml".format(sname),value="{}".format(dashboardhtml))
         ti.xcom_push(key="{}_secure".format(sname),value="_{}".format(secure))
         ti.xcom_push(key="{}_offset".format(sname),value="_{}".format(offset))
         ti.xcom_push(key="{}_append".format(sname),value="_{}".format(append))
         ti.xcom_push(key="{}_chip".format(sname),value=chip)
         ti.xcom_push(key="{}_rollbackoffset".format(sname),value="_{}".format(rollbackoffset))

         # start the viperviz on Vipervizport
         # STEP 5: START Visualization Viperviz
         vizgood=0
         for i in range(5):
           wn = windowname('visual',vipervizport,sname,sd)
           subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
           subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viperviz", "ENTER"])
           mainport=0
           if tss[1:] == "1":
             subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "/Viperviz/viperviz-linux-{} 0.0.0.0 {}".format(chip,vipervizport[1:]), "ENTER"])
             mainport=int(vipervizport[1:])
           else:
             subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "/Viperviz/viperviz-linux-{} 0.0.0.0 {}".format(chip,solutionvipervizport[1:]), "ENTER"])
             mainport=int(solutionvipervizport[1:])

           time.sleep(5)
           if tsslogging.testvizconnection(mainport)==1:
             tsslogging.locallogs("INFO", "STEP 7: /Viperviz/viperviz-linux-{} 0.0.0.0 {}".format(chip,mainport))
             vizgood=1
             break
           else:
              if i < 4:
                subprocess.call(["tmux", "kill-window", "-t", "{}".format(wn)])
                subprocess.call(["kill", "-9", "$(lsof -i:{} -t)".format(mainport)])
              tsslogging.locallogs("WARN", "STEP 7: Cannot make a connection to Viperviz on port {}.  Going to try again...".format(mainport))


         if vizgood==0:
           tsslogging.locallogs("ERROR", "STEP 7: Network issue.  Cannot make a connection to Viperviz on port {}".format(mainport))

7.7. Visualization DAG Parameter Explanation

DAG Parameter

Explanation

topic

This is the topic that Viperviz will consume from.

For example, Viperviz will automatically connect

into this topic: iot-preprocess, and start streaming

to your browser. If you want to consume from

multiple topics, you can specify topic: iot-preprocess,

iot-preprocess2,iot-preprocess3

topic-ml

Based on the TML Solution Templates you are using

you can specify different topics for the appropriate

solution. So, topic-ml, is for any solution template

that is ML related or has “_ml_” in the solution name.

This gives users flexibility in using different dashboards

for different solutions.

topic-ai

Based on the TML Solution Templates you are using

you can specify different topics for the appropriate

solution. So, topic-ai, is for any solution template

that is AI related or has “_ai_” in the solution name.

topic-ml-ai

Based on the TML Solution Templates you are using

you can specify different topics for the appropriate

solution. So, topic-ml-ai, is for any solution template

that is AI related or has “_ml_ai_” in the solution name.

dashboardhtml

This dashboard will use the topics in the topic field.

dashboardhtml-ml

This dashboard will use the topics in the topic-ml field.

dashboardhtml-ai

This dashboard will use the topics in the topic-ai field.

dashboardhtml-ml-ai

This is dashboard will use the topics in the topic-ml-ai field.

secure

If set to 1, then connection is

TLS secure, if 0 it is not.

vipervizport

This is the port you want the Viperviz

binary to listen on. For example, if 9005,

Viperviz will listen on Port 9005

offset

Indicate where in the stream to consume from.

If -1, latest data is consumed.

append

If 0, data will not accumulate in your

dashboard, if 1 it will accumulate.

chip

Viperviz can run on Windows/Mac/Linux.

Use ‘amd64’ for Windows/Linux,

use ‘arm64’ for Mac/Linux

rollbackoffset

This indicates the number of offsets to

rollack from the latest (or end of the stream).

If 500, then Viperviz wll grab all of the

data from the last

offset - 500

7.7.1. STEP 8: Deploy TML Solution to Docker : tml-system-step-8-deploy-solution-to-docker-dag

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
from datetime import datetime
from airflow.decorators import dag, task
import os
import subprocess
import tsslogging
import git
import time
import sys

sys.dont_write_bytecode = True

############################################################### DO NOT MODIFY BELOW ####################################################

def doparse(fname,farr):
      data = ''
      with open(fname, 'r', encoding='utf-8') as file:
        data = file.readlines()
        r=0
        for d in data:
            for f in farr:
                fs = f.split(";")
                if fs[0] in d:
                    data[r] = d.replace(fs[0],fs[1])
            r += 1
      with open(fname, 'w', encoding='utf-8') as file:
        file.writelines(data)

def dockerit(**context):
     if 'tssbuild' in os.environ:
        if os.environ['tssbuild']=="1":
            return
     try:

       sd = context['dag'].dag_id
       sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
       pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

       repo=tsslogging.getrepo()
       tsslogging.tsslogit("Docker DAG in {}".format(os.path.basename(__file__)), "INFO" )
       tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

       chip = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))
       cname = os.environ['DOCKERUSERNAME']  + "/{}-{}".format(sname,chip)

       print("Containername=",cname)
       tsslogging.locallogs("INFO", "STEP 8: Starting docker push for: {}".format(cname))
       if os.environ['TSS'] == "1":
         try:
           f = open("/tmux/cname.txt", "w")
           f.write(cname)
           f.close()
         except Exception as e:
           pass

       ti = context['task_instance']
       ti.xcom_push(key="{}_containername".format(sname),value=cname)
       ti.xcom_push(key="{}_solution_dag_to_trigger".format(sname), value=sd)

       scid = tsslogging.getrepo('/tmux/cidname.txt')
       cid = scid # cid added

       key = "trigger-{}".format(sname)
       os.environ[key] = sd
       if os.environ['TSS'] == "1" and len(cid) > 1:
         print("[INFO] docker commit {} {}".format(cid,cname))
         subprocess.call("docker rmi -f $(docker images --filter 'dangling=true' -q --no-trunc)", shell=True)
         cbuf="docker commit {} {}".format(cid,cname)
         v=subprocess.call("docker commit {} {}".format(cid,cname), shell=True)

         status=tsslogging.optimizecontainer(cname,sname,sd)
         if status=="":
           tsslogging.locallogs("WARN", "STEP 8: There seems to be an issue optimizing the container.  Here is the commit command: {} - message={}.  Container may NOT pushed.".format(cbuf,v))
         else:
           tsslogging.locallogs("INFO", "STEP 8: Docker Container created and optimized.  Will push it now.  Here is the commit command: {} - message={}".format(cbuf,v))

         #v=subprocess.call("docker push {}".format(cname), shell=True)
         proc=subprocess.Popen("docker push {}".format(cname), shell=True)
         time.sleep(3)
         proc.terminate()
         proc.wait()

       elif len(cid) <= 1:
              tsslogging.locallogs("ERROR", "STEP 8: There seems to be an issue with docker commit. Here is the command: docker commit {} {}".format(cid,cname))
              tsslogging.tsslogit("Deploying to Docker in {}".format(os.path.basename(__file__)), "ERROR" )
              tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

       os.environ['tssbuild']="1"

       doparse("/{}/tml-airflow/dags/tml-solutions/{}/docker_run_stop-{}.py".format(repo,pname,pname), ["--solution-name--;{}".format(sname)])
       doparse("/{}/tml-airflow/dags/tml-solutions/{}/docker_run_stop-{}.py".format(repo,pname,pname), ["--solution-dag--;{}".format(sd)])

     except Exception as e:
        print("[ERROR] Step 8: ",e)
        tsslogging.locallogs("ERROR", "STEP 8: Deploying to Docker in {}: {}".format(os.path.basename(__file__),e))
        tsslogging.tsslogit("Deploying to Docker in {}: {}".format(os.path.basename(__file__),e), "ERROR" )
        tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")

7.7.2. STEP 9: PrivateGPT and Qdrant Integration: tml-system-step-9-privategpt_qdrant-dag

Tip

Watch the YouTube video to learn how to configure the key paramaters in the Step 9 dag.

Also, it would be advised to pull the PrivateGPT containers before running this step 9.

 from airflow.operators.python import PythonOperator
 from airflow.operators.bash import BashOperator
 from datetime import datetime
 from airflow.decorators import dag, task
 import os
 import tsslogging
 import sys
 import time
 import maadstml
 import subprocess
 import random
 import json
 import threading
 import re
 from binaryornot.check import is_binary
 docidstrarr = []

 sys.dont_write_bytecode = True

 ######################################################USER CHOSEN PARAMETERS ###########################################################
 default_args = {
  'owner': 'Sebastian Maurice',   # <<< *** Change as needed
  'pgptcontainername' : 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2', #'maadsdocker/tml-privategpt-no-gpu-amd64',  # enter a valid container https://hub.docker.com/r/maadsdocker/tml-privategpt-no-gpu-amd64
  'rollbackoffset' : '5',  # <<< *** Change as needed
  'offset' : '-1', # leave as is
  'enabletls' : '1', # change as needed
  'brokerhost' : '', # <<< *** Leave as is
  'brokerport' : '-999', # <<< *** Leave as is
  'microserviceid' : '',  # change as needed
  'topicid' : '-999', # leave as is
  'delay' : '100', # change as needed
  'companyname' : 'otics',  # <<< *** Change as needed
  'consumerid' : 'streamtopic',  # <<< *** Leave as is
  'consumefrom' : 'cisco-network-preprocess',    # <<< *** Change as needed
  'pgpt_data_topic' : 'cisco-network-privategpt',
  'producerid' : 'private-gpt',   # <<< *** Leave as is
  'identifier' : 'This is analysing TML output with privategpt',
  'pgpthost': 'http://127.0.0.1', # PrivateGPT container listening on this host
  'pgptport' : '8001', # PrivateGPT listening on this port
  'preprocesstype' : '', # Leave as is
  'partition' : '-1', # Leave as is
  'prompt': '[INST] Are there any errors in the  logs? Give s detailed response including IP addresses and host machines.[/INST]', # Enter your prompt here
  'context' : 'This is network data from inbound and outbound packets. The data are \
 anomaly probabilities for cyber threats from analysis of inbound and outbound packets. If inbound or outbound \
 anomaly probabilities are less than 0.60, it is likely the risk of a cyber attack is also low. If its above 0.60, then risk is mid to high.', # what is this data about? Provide context to PrivateGPT
  'jsonkeytogather' : 'hyperprediction', # enter key you want to gather data from to analyse with PrivateGpt i.e. Identifier or hyperprediction
  'keyattribute' : 'inboundpackets,outboundpackets', # change as needed
  'keyprocesstype' : 'anomprob',  # change as needed
  'hyperbatch' : '0', # Set to 1 if you want to batch all of the hyperpredictions and sent to chatgpt, set to 0, if you want to send it one by one
  'vectordbcollectionname' : 'tml-llm-model-v2', # change as needed
  'concurrency' : '2', # change as needed Leave at 1
  'CUDA_VISIBLE_DEVICES' : '0', # change as needed
  'docfolder': 'mylogs,mylogs2',  # You can specify the sub-folder that contains TEXT or PDF files..this is a subfolder in the MAIN folder mapped to /rawdata
                    # if this field in NON-EMPTY, privateGPT will query these documents as the CONTEXT to answer your prompt
                    # separate multiple folders with a comma
  'docfolderingestinterval': '900', # how often you want TML to RE-LOAD the files in docfolder - enter the number of SECONDS, if 0 they are read ONCE
  'useidentifierinprompt': '1', # If 1, this uses the identifier in the TML json output and appends it to prompt, If 0, it uses the prompt only
  'searchterms': '192.168.--identifier--,authentication failure',
  'temperature' : '0.1', # This value ranges between 0 and 1, it controls how conservative LLM model will be, if 0 very very, if 1 it will hallucinate
  'vectorsearchtype' : 'Manhattan', # this is for the Qdrant Search algorithm.  it can be: Cosine, Euclid, Dot, or Manhattan
  'streamall': '1',
  'contextwindowsize': '8192', # Size of the context window.  This controls the number of tokens to process by LLM model
  'vectordimension': '768',
  'mitrejson': '/rawdata/mitre.json'
 }

 ############################################################### DO NOT MODIFY BELOW ####################################################

 VIPERTOKEN=""
 VIPERHOST=""
 VIPERPORT=""
 HTTPADDR=""
 maintopic =  default_args['consumefrom']
 mainproducerid = default_args['producerid']
 GPTONLINE=0

 def checkresponse(response,ident):
     global GPTONLINE
     st="false"

     if "ERROR:" in response:
          return response,st,""

     GPTONLINE=1

     response = response.replace("null","-1").replace("\\n","").replace("\n","")
     r1=json.loads(response)
     c1=r1['choices'][0]['message']['content']
     c1=c1.replace('"','\\"').replace("'","\'").replace("\\n"," ").replace("&","and")
     c1 = re.sub(' +', ' ', c1)
     if '=' in c1 and ('Answer:' in c1 or 'A:' in c1):
       r1['choices'][0]['message']['content'] = "The analysis of the document(s) did not find a proper result."
       response = json.dumps(r1)
       return response,st,c1.strip()

     if default_args['searchterms'] != '':
           starr = default_args['searchterms'].split(",")
           for t in starr:
               if '--identifier--' in t:
                   t = t.replace("--identifier--",ident)
               if t in  c1:
                 st="true"
                 break

     return response,st,c1.strip()

 def stopcontainers():
    pgptcontainername = default_args['pgptcontainername']
    cfound=0
    subprocess.call("docker image ls > gptfiles.txt", shell=True)
    with open('gptfiles.txt', 'r', encoding='utf-8') as file:
         data = file.readlines()
         r=0
         for d in data:
           darr = d.split(" ")
           if '-privategpt-' in darr[0]:
             buf="docker stop $(docker ps -q --filter ancestor={} )".format(darr[0])
             if pgptcontainername in darr[0]:
                 cfound=1
             print(buf)
             subprocess.call(buf, shell=True)
    if cfound==0:
       print("INFO STEP 9: PrivateGPT container {} not found.  It may need to be pulled.".format(pgptcontainername))
       tsslogging.locallogs("WARN", "STEP 9: PrivateGPT container not found. It may need to be pulled if it does not start: docker pull {}".format(pgptcontainername))

 def llmattrs(pgptcontainername):
   if '-deepseek-medium' in pgptcontainername:
      return "DeepSeek-R1-Distill-Llama-8B-Q5_K_M.gguf","BAAI/bge-base-en-v1.5"
   elif pgptcontainername=='maadsdocker/tml-privategpt-with-gpu-nvidia-amd64':
      return "TheBloke/Mistral-7B-Instruct-v0.1-GGUF","BAAI/bge-small-en-v1.5"
   elif 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v2' == pgptcontainername:
      return "mistralai/Mistral-7B-Instruct-v0.2","BAAI/bge-small-en-v1.5"
   elif 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v3' == pgptcontainername:
      return "mistralai/Mistral-7B-Instruct-v0.3","BAAI/bge-base-en-v1.5"
   elif 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-v3-large' == pgptcontainername:
      return "mistralai/Mistral-7B-Instruct-v0.3","BAAI/bge-m3"

   return "",""

 def startpgptcontainer():
       print("Starting PGPT container: {}".format(default_args['pgptcontainername']))
       collection = default_args['vectordbcollectionname']
       concurrency = default_args['concurrency']
       pgptcontainername = default_args['pgptcontainername']
       pgptport = int(default_args['pgptport'])
       cuda = int(default_args['CUDA_VISIBLE_DEVICES'])
       temp = default_args['temperature']
       vectorsearchtype = default_args['vectorsearchtype']
       cw = default_args['contextwindowsize']
       vectordimension=default_args['vectordimension']

       stopcontainers()
       time.sleep(10)
       if '-no-gpu-' in pgptcontainername:
           buf = "docker run -d -p {}:{} --net=host --env PORT={} --env GPU=0 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env temperature={} --env vectorsearchtype=\"{}\" {}".format(pgptport,pgptport,pgptport,collection,concurrency,cuda,temperature,vectorsearchtype,pgptcontainername)
       else:
         mainmodel,mainembedding=llmattrs(pgptcontainername)
         if os.environ['TSS'] == "1":
           buf = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z --env PORT={} --env TSS=1 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env vectorsearchtype=\"{}\" --env contextwindowsize={} --env vectordimension={} --env mainmodel=\"{}\" --env mainembedding=\"{}\" {}".format(pgptport,pgptport,pgptport,collection,concurrency,cuda,temperature,vectorsearchtype,cw,vectordimension,mainmodel,mainembedding,pgptcontainername)
         else:
           buf = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z --env PORT={} --env TSS=0 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env vectorsearchtype=\"{}\" --env contextwindowsize={} --env vectordimension={}  --env mainmodel=\"{}\" --env mainembedding=\"{}\" {}".format(pgptport,pgptport,pgptport,collection,concurrency,cuda,temperature,vectorsearchtype,cw,vectordimension,mainmodel,mainembedding,pgptcontainername)

       v=subprocess.call(buf, shell=True)
       print("INFO STEP 9: PrivateGPT container.  Here is the run command: {}, v={}".format(buf,v))
       tsslogging.locallogs("INFO", "STEP 9: PrivateGPT container.  Here is the run command: {}, v={}".format(buf,v))

       return v,buf,mainmodel,mainembedding

 def qdrantcontainer():
     v=0
     buf=""
     buf="docker stop $(docker ps -q --filter ancestor=qdrant/qdrant )"
     subprocess.call(buf, shell=True)
     time.sleep(4)
     if os.environ['TSS'] == "1":
       buf = "docker run -d -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant"
     else:
        buf = "docker run -d --network=bridge -v /var/run/docker.sock:/var/run/docker.sock:z -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant"

     v=subprocess.call(buf, shell=True)
     print("INFO STEP 9: Qdrant container.  Here is the run command: {}, v={}".format(buf,v))

     tsslogging.locallogs("INFO", "STEP 9: Qdrant container.  Here is the run command: {}, v={}".format(buf,v))

     return v,buf

 def pgptchat(prompt,context,docfilter,port,includesources,ip,endpoint):
   prompt=prompt.replace("&","and")

   print("Pgptchat=",prompt)
   response=maadstml.pgptchat(prompt,context,docfilter,port,includesources,ip,endpoint)
   return response

 def producegpttokafka(value,maintopic):
      inputbuf=value
      topicid=int(default_args['topicid'])
      producerid=default_args['producerid']
      identifier = default_args['identifier']

      # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
      delay=default_args['delay']
      enabletls=default_args['enabletls']

      try:
         result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,'',
                                             topicid,identifier)
         print(result)
      except Exception as e:
         print("ERROR:",e)

 def consumetopicdata():
       maintopic = default_args['consumefrom']
       rollbackoffsets = int(default_args['rollbackoffset'])
       enabletls = int(default_args['enabletls'])
       consumerid=default_args['consumerid']
       companyname=default_args['companyname']
       offset = int(default_args['offset'])
       brokerhost = default_args['brokerhost']
       brokerport = int(default_args['brokerport'])
       microserviceid = default_args['microserviceid']
       topicid = default_args['topicid']
       preprocesstype = default_args['preprocesstype']
       delay = int(default_args['delay'])
       partition = int(default_args['partition'])

       result=maadstml.viperconsumefromtopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,
                   consumerid,companyname,partition,enabletls,delay,
                   offset, brokerhost,brokerport,microserviceid,
                   topicid,rollbackoffsets,preprocesstype)

       return result

 def writetortmslogfile(fname,jsonbuf):
        print("fname=",fname)
        print("jsonbuf=",jsonbuf)
        try:
          f = open(fname, "w")
          f.write(jsonbuf +"\n")
          f.close()
        except Exception as e:
          pass

 def getsearchtext(res,context,prompt):
    privategptmessage = []
    messages = ""
    mainmessages=""
    cw = int(default_args['contextwindowsize'])

    for r in res['StreamTopicDetails']['TopicReads']:
       fname=r['Filename']
       messages=""
       for d in r['SearchTextFound']:
         messages = messages + str(d[15:].strip()) + ". "
         if len(messages) > cw:
           messages = messages[0:cw-1]
           break


       mainmessages = "{}. Here are the messages: {}. {}".format(context,messages,prompt)
       privategptmessage.append([mainmessages,"SearchTextFound",fname,json.dumps(r)])

    return privategptmessage

 def gatherdataforprivategpt(result):

    privategptmessage = []
    if 'step9prompt' in os.environ:
       if os.environ['step9prompt'] != '':
         prompt = os.environ['step9prompt']
         prompt=prompt.replace("&","and")
         default_args['prompt'] = prompt
       else:
        prompt = default_args['prompt']
        prompt=prompt.replace("&","and")
    else:
       prompt = default_args['prompt']
       prompt=prompt.replace("&","and")

    if 'step9context' in os.environ:
       if os.environ['step9context'] != '':
         context = os.environ['step9context']
         context=context.replace("&","and")
         default_args['context'] = context
       else:
         context = default_args['context']
         context=context.replace("&","and")
    else:
      context = default_args['context']
      context=context.replace("&","and")

    jsonkeytogather = default_args['jsonkeytogather']
    if default_args['docfolder'] != '':
        context = ''
        if default_args['useidentifierinprompt'] == "1":
           jsonkeytogather = "Identifier"

    if 'step9keyattribute' in os.environ:
      if os.environ['step9keyattribute'] != '':
        attribute = os.environ['step9keyattribute']
        default_args['keyattribute'] = attribute
      else:
        attribute = default_args['keyattribute']
    else:
     attribute = default_args['keyattribute']

    if 'step9keyprocesstype' in os.environ:
      if os.environ['step9keyprocesstype'] != '':
         processtype = os.environ['step9keyprocesstype']
         default_args['keyprocesstype'] = processtype
      else:
        processtype = default_args['keyprocesstype']
    else:
      processtype = default_args['keyprocesstype']

    if 'step9hyperbatch' in os.environ:
      if os.environ['step9hyperbatch'] != '':
         hyperbatch = os.environ['step9hyperbatch']
         default_args['hyperbatch'] = hyperbatch
      else:
        hyperbatch = default_args['hyperbatch']
    else:
      hyperbatch = default_args['hyperbatch']

    try:
      res=json.loads(result,strict='False')
    except Exception as e:
      print("Error=",e)
      tsslogging.tsslogit("PrivateGPT DAG jsonkeytogather is empty in {} {}".format(os.path.basename(__file__),e), "ERROR" )
      tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
      return


    message = ""
    found=0

    if jsonkeytogather == '':
      tsslogging.tsslogit("PrivateGPT DAG jsonkeytogather is empty in {} {}".format(os.path.basename(__file__),e), "ERROR" )
      tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
      return

    if jsonkeytogather.lower()=="searchtextfound":
      privategptmessage=getsearchtext(res,context,prompt)
      return privategptmessage

    for r in res['StreamTopicDetails']['TopicReads']:
        if jsonkeytogather == 'Identifier' or jsonkeytogather == 'identifier':
          identarr=r['Identifier'].split("~")
          try:
            attribute = attribute.lower()
            aar = attribute.split(",")
            isin=any(x in r['Identifier'].lower() for x in aar)
            if isin:
              found=0
              for d in r['RawData']:
                 found=1
                 message = message  + str(d) + ', '
              if found:
                if context != '':
                   message = "{}.  Data: {}. {}".format(context,message,prompt)
                elif '--identifier--' in prompt:
                   prompt2 = prompt.replace('--identifier--',identarr[0])
                   message = "{}".format(prompt2)
                else:
                  message = "{}".format(prompt)
                privategptmessage.append([message,identarr[0]])
              message = ""
          except Excepption as e:
            tsslogging.tsslogit("PrivateGPT DAG in {} {}".format(os.path.basename(__file__),e), "ERROR" )
            tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
        else:
          isin1 = False
          isin2 = False
          found=0
          message = ""
          identarr=r['Identifier'].split("~")
          if processtype != '' and attribute != '':
            processtype = processtype.lower()
            ptypearr = processtype.split(",")
            isin1=any(x in r['Preprocesstype'].lower() for x in ptypearr)

            attribute = attribute.lower()
            aar = attribute.split(",")
            isin2=any(x in r['Identifier'].lower() for x in aar)

            if isin1 and isin2:
              buf = r[jsonkeytogather]
              if buf != '':
                found=1
                message = message  + "{} (Identifier={})".format(buf,identarr[0]) + ', '
          elif processtype != '' and attribute == '':
            processtype = processtype.lower()
            ptypearr = processtype.split(",")
            isin1=any(x in r['Preprocesstype'].lower() for x in ptypearr)
            if isin1:
              buf = r[jsonkeytogather]
              if buf != '':
                found=1
                message = message  + "{} (Identifier={})".format(buf,identarr[0]) + ', '
          elif processtype == '' and attribute != '':
            attribute = attribute.lower()
            aar = attribute.split(",")
            isin2=any(x in r['Identifier'].lower() for x in aar)
            if isin2:
              buf = r[jsonkeytogather]
              if buf != '':
                found=1
                message = message  + "{} (Identifier={})".format(buf,identarr[0]) + ', '
          else:
            buf = r[jsonkeytogather]
            if buf != '':
              found=1
              message = message  + "{} (Identifier={})".format(buf,identarr[0]) + ', '

          if found and hyperbatch=="0":
               if '--identifier--' in prompt:
                   prompt2 = prompt.replace('--identifier--',identarr[0])
                   message = "{}.  Data: {}.  {}".format(context,message,prompt2)
               else:
                   message = "{}.  Data: {}.  {}".format(context,message,prompt)
               privategptmessage.append([message,identarr[0]])


    if jsonkeytogather != 'Identifier' and found and hyperbatch=="1":
      message = "{}.  Data: {}.  {}".format(context,message,prompt)
      privategptmessage.append(message)


    return privategptmessage

 def startdirread():
   global GPTONLINE
   print("INFO startdirread")
   try:
     t = threading.Thread(name='child procs', target=ingestfiles)
     t.start()
   except Exception as e:
     print(e)

 def deleteembeddings(docids):
   pgptendpoint="/v1/ingest/"
   pgptip = default_args['pgpthost']
   pgptport = default_args['pgptport']
   maadstml.pgptdeleteembeddings(docids,pgptip,pgptport,pgptendpoint)


 def getingested(docname):
   pgptendpoint="/v1/ingest/list"
   pgptip = default_args['pgpthost']
   pgptport = default_args['pgptport']
   docids,docstr,docidsstr=maadstml.pgptgetingestedembeddings(docname,pgptip,pgptport,pgptendpoint)
   return docids,docstr,docidsstr

 def ingestfiles():
     global docidstrarr, GPTONLINE
     pgptendpoint="/v1/ingest"
     docidstrarr = []
     basefolder='/rawdata/'
     pgptip = default_args['pgpthost']
     pgptport = default_args['pgptport']
     buf = default_args['docfolder']

     bufarr=buf.split(",")
     while True:
      if GPTONLINE:
       docidstrarr = []
       for dirp in bufarr:
         # lock the directory
         dirp = basefolder + dirp
         if os.path.exists(dirp):
           with tsslogging.LockDirectory(dirp) as lock:
             newfd = os.dup(lock.dir_fd)
             files = [ os.path.join(dirp,f) for f in os.listdir(dirp) if os.path.isfile(os.path.join(dirp,f)) ]
             for mf in files:
                docids,docstr,docidstr=getingested(mf)
                deleteembeddings(docids)
                print("INFO Ingestfiles:",mf)

                if is_binary(mf):
                  maadstml.pgptingestdocs(mf,'binary',pgptip,pgptport,pgptendpoint)
                else:
                  try:
                     maadstml.pgptingestdocs(mf,'text',pgptip,pgptport,pgptendpoint)
                  except Exception as e:
                      print("ERROR:",e)

                docids,docstr,docidstr=getingested(mf)
                if len(docidstr) >=1:
                  docidstrarr.append(docidstr[0])

         else:
           print("WARN Directory Path: {} does not exist".format(dirp))
       if int(default_args['docfolderingestinterval'])==0:
         break
       time.sleep(int(default_args['docfolderingestinterval']))
       print("docidsstr=",docidstrarr)
      time.sleep(1)

 def sendtoprivategpt(maindata,docfolder):
    global docidstrarr
    counter = 0
    maxc = 300
    pgptendpoint="/v1/completions"

    prompt = default_args['prompt']
    prompt=prompt.replace("&","and")

    context = default_args['context']
    context=context.replace("&","and")

    mcontext = False
    usingqdrant = ''
    if docfolder != '':
      mcontext = True
      usingqdrant = 'Using documents in Qdrant VectorDB for context.'

    maintopic = default_args['pgpt_data_topic']
    if os.environ['TSS']=="1":
      mainip = default_args['pgpthost']
    else:
      mainip = "http://" + os.environ['qip']
      if os.environ['qip']=="":
           mainip=default_args['pgpthost']

    mainport = default_args['pgptport']

    if 'step9keyattribute' in os.environ:
      if os.environ['step9keyattribute'] != '':
        attribute = os.environ['step9keyattribute']
        default_args['keyattribute'] = attribute
      else:
        attribute = default_args['keyattribute']
    else:
     attribute = default_args['keyattribute']

    if 'step9hyperbatch' in os.environ:
      if os.environ['step9hyperbatch'] != '':
         hyperbatch = os.environ['step9hyperbatch']
         default_args['hyperbatch'] = hyperbatch
      else:
        hyperbatch = default_args['hyperbatch']
    else:
      hyperbatch = default_args['hyperbatch']

    for mess in maindata:
         if default_args['jsonkeytogather']=='Identifier' or hyperbatch=="0" or default_args['jsonkeytogather'].lower()=="searchtextfound":
            m = mess[0]
            m1 = mess[1]
         else:
            m = mess
            m1 = attribute #default_args['keyattribute']

         m=m.replace("&","and")
         response=pgptchat(m,mcontext,docidstrarr,mainport,False,mainip,pgptendpoint)
         response=response.strip()
         # Produce data to Kafka
         sf="false"
         response,sf,contentmessage=checkresponse(response,m1)
         tactic,technique,jbm=tsslogging.getmitre(response,default_args['mitrejson'])
         if usingqdrant != '':
            if default_args['streamall']=="0": # Only stream if search terms found in response
               if sf=="false":
                  response="ERROR:"
            m = m + ' (' + usingqdrant + ')'
         if 'ERROR:' not in response and contentmessage != "":
           if default_args['jsonkeytogather'].lower()=="searchtextfound":
              jmess = mess[3]
              response1 = jmess[:-1] + ",\"privateGPT_AI_response\":\"" + contentmessage.strip().rstrip().lstrip() + \
                        "\"," + "\"prompt\":\"" + prompt + "\",\"context\":\""+context + \
                        "\",\"pgptcontainer\":\"" + default_args['pgptcontainername'] + "\",\"pgpt_consumefrom\":\"" + \
                         default_args['consumefrom'] + "\", \"pgpt_data_topic\":\"" + default_args['pgpt_data_topic'] + \
                         "\",\"contextwindowsize\":" + default_args['contextwindowsize'] + ",\"temperature\":\""+default_args['temperature'] + \
                         "\",\"pgptrollbackoffset\":"+default_args['rollbackoffset'] + jbm + "}"
              writetortmslogfile(mess[2],response1)
           else:
              response1 = response[:-1] + "," + "\"prompt\":\"" + m.strip() + "\",\"identifier\":\"" + m1.strip() + "\",\"searchfound\":\"" + sf.strip() + "\"}"
           response1=response1.replace(";",":")
           producegpttokafka(response1,maintopic)
         else:
           counter += 1
           time.sleep(1)
           if counter > maxc:
              startpgptcontainer()
              qdrantcontainer()
              counter = 0
              tsslogging.tsslogit("PrivateGPT Step 9 DAG PrivateGPT Container restarting in {} {}".format(os.path.basename(__file__),response), "WARN" )
              tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")


 def windowname(wtype,sname,dagname):
     randomNumber = random.randrange(10, 9999)
     wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
     with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
       file.writelines("{}\n".format(wn))

     return wn

 def startprivategpt(**context):
        sd = context['dag'].dag_id
        sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
        pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

        if 'step9rollbackoffset' in os.environ:
           if os.environ['step9rollbackoffset'] != '':
             default_args['rollbackoffset'] = os.environ['step9rollbackoffset']

        if 'step9prompt' in os.environ:
           if os.environ['step9prompt'] != '':
             default_args['prompt'] = os.environ['step9prompt']
        if 'step9context' in os.environ:
           if os.environ['step9context'] != '':
             default_args['context'] = os.environ['step9context']

        if 'step9contextwindowsize' in os.environ:
           if os.environ['step9contextwindowsize'] != '':
             default_args['contextwindowsize'] = os.environ['step9contextwindowsize']

        if 'step9pgptcontainername' in os.environ:
           if os.environ['step9pgptcontainername'] != '':
             default_args['pgptcontainername'] = os.environ['step9pgptcontainername']

        if 'step9keyattribute' in os.environ:
           if os.environ['step9keyattribute'] != '':
             default_args['keyattribute'] = os.environ['step9keyattribute']
        if 'step9keyprocesstype' in os.environ:
           if os.environ['step9keyprocesstype'] != '':
             default_args['keyprocesstype'] = os.environ['step9keyprocesstype']
        if 'step9hyperbatch' in os.environ:
           if os.environ['step9hyperbatch'] != '':
             default_args['hyperbatch'] = os.environ['step9hyperbatch']
        if 'step9vectordbcollectionname' in os.environ:
           if os.environ['step9vectordbcollectionname'] != '':
             default_args['vectordbcollectionname'] = os.environ['step9vectordbcollectionname']
        if 'step9concurrency' in os.environ:
           if os.environ['step9concurrency'] != '':
             default_args['concurrency'] = os.environ['step9concurrency']
        if 'CUDA_VISIBLE_DEVICES' in os.environ:
           if os.environ['CUDA_VISIBLE_DEVICES'] != '':
             default_args['CUDA_VISIBLE_DEVICES'] = os.environ['CUDA_VISIBLE_DEVICES']

        if 'step9docfolder' in os.environ:
           if os.environ['step9docfolder'] != '':
             default_args['docfolder'] = os.environ['step9docfolder']
        if 'step9docfolderingestinterval' in os.environ:
           if os.environ['step9docfolderingestinterval'] != '':
             default_args['docfolderingestinterval'] = os.environ['step9docfolderingestinterval']
        if 'step9useidentifierinprompt' in os.environ:
           if os.environ['step9useidentifierinprompt'] != '':
             default_args['useidentifierinprompt'] = os.environ['step9useidentifierinprompt']

        if 'step9searchterms' in os.environ:
           if os.environ['step9searchterms'] != '':
             default_args['searchterms'] = os.environ['step9searchterms']

        if 'step9temperature' in os.environ:
           if os.environ['step9temperature'] != '':
             default_args['temperature'] = os.environ['step9temperature']
        if 'step9vectorsearchtype' in os.environ:
           if os.environ['step9vectorsearchtype'] != '':
             default_args['vectorsearchtype'] = os.environ['step9vectorsearchtype']


        if 'step9pgpthost' in os.environ:
           if os.environ['step9pgpthost'] != '':
             default_args['pgpthost'] = os.environ['step9pgpthost']
        if 'step9pgptport' in os.environ:
           if os.environ['step9pgptport'] != '':
             default_args['pgptport'] = os.environ['step9pgptport']

        if 'step9vectordimension' in os.environ:
           if os.environ['step9vectordimension'] != '':
             default_args['vectordimension'] = os.environ['step9vectordimension']

        VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
        VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESSPGPT".format(sname))
        VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESSPGPT".format(sname))
        HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))

        ti = context['task_instance']
        ti.xcom_push(key="{}_consumefrom".format(sname), value=default_args['consumefrom'])
        ti.xcom_push(key="{}_pgpt_data_topic".format(sname), value=default_args['pgpt_data_topic'])
        ti.xcom_push(key="{}_pgptcontainername".format(sname), value=default_args['pgptcontainername'])
        ti.xcom_push(key="{}_offset".format(sname), value="_{}".format(default_args['offset']))
        ti.xcom_push(key="{}_rollbackoffset".format(sname), value="_{}".format(default_args['rollbackoffset']))

        ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
        ti.xcom_push(key="{}_enabletls".format(sname), value="_{}".format(default_args['enabletls']))
        ti.xcom_push(key="{}_partition".format(sname), value="_{}".format(default_args['partition']))

        ti.xcom_push(key="{}_prompt".format(sname), value=default_args['prompt'])
        ti.xcom_push(key="{}_context".format(sname), value=default_args['context'])
        ti.xcom_push(key="{}_jsonkeytogather".format(sname), value=default_args['jsonkeytogather'])
        ti.xcom_push(key="{}_keyattribute".format(sname), value=default_args['keyattribute'])
        ti.xcom_push(key="{}_keyprocesstype".format(sname), value=default_args['keyprocesstype'])

        ti.xcom_push(key="{}_vectordbcollectionname".format(sname), value=default_args['vectordbcollectionname'])

        ti.xcom_push(key="{}_concurrency".format(sname), value="_{}".format(default_args['concurrency']))
        ti.xcom_push(key="{}_cuda".format(sname), value="_{}".format(default_args['CUDA_VISIBLE_DEVICES']))
        ti.xcom_push(key="{}_pgpthost".format(sname), value=default_args['pgpthost'])
        ti.xcom_push(key="{}_pgptport".format(sname), value="_{}".format(default_args['pgptport']))
        ti.xcom_push(key="{}_hyperbatch".format(sname), value="_{}".format(default_args['hyperbatch']))

        ti.xcom_push(key="{}_docfolder".format(sname), value="{}".format(default_args['docfolder']))
        ti.xcom_push(key="{}_docfolderingestinterval".format(sname), value="_{}".format(default_args['docfolderingestinterval']))
        ti.xcom_push(key="{}_useidentifierinprompt".format(sname), value="_{}".format(default_args['useidentifierinprompt']))
        ti.xcom_push(key="{}_searchterms".format(sname), value="{}".format(default_args['searchterms']))
        ti.xcom_push(key="{}_streamall".format(sname), value="_{}".format(default_args['streamall']))
        ti.xcom_push(key="{}_temperature".format(sname), value="_{}".format(default_args['temperature']))
        ti.xcom_push(key="{}_vectorsearchtype".format(sname), value="{}".format(default_args['vectorsearchtype']))
        ti.xcom_push(key="{}_contextwindowsize".format(sname), value="_{}".format(default_args['contextwindowsize']))
        ti.xcom_push(key="{}_vectordimension".format(sname), value="_{}".format(default_args['vectordimension']))
        ti.xcom_push(key="{}_mitrejson".format(sname), value="{}".format(default_args['mitrejson']))

        repo=tsslogging.getrepo()
        if sname != '_mysolution_':
         fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
        else:
          fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

        mainmodel,mainembedding=llmattrs(default_args['pgptcontainername'])
        ti.xcom_push(key="{}_mainmodel".format(sname), value="{}".format(mainmodel))
        ti.xcom_push(key="{}_mainembedding".format(sname), value="{}".format(mainembedding))

        wn = windowname('ai',sname,sd)
        subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess-pgpt", "ENTER"])
        subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" {} {} {} {} \"{}\" \"{}\" {} {}".format(fullpath,VIPERTOKEN, HTTPADDR, VIPERHOST, VIPERPORT[1:],
                        default_args['vectordbcollectionname'],default_args['concurrency'],default_args['CUDA_VISIBLE_DEVICES'],default_args['rollbackoffset'],
                        default_args['prompt'],default_args['context'],default_args['keyattribute'],default_args['keyprocesstype'],
                        default_args['hyperbatch'],default_args['docfolder'],default_args['docfolderingestinterval'],
                        default_args['useidentifierinprompt'],default_args['searchterms'],default_args['streamall'],default_args['temperature'],
                        default_args['vectorsearchtype'], default_args['contextwindowsize'], default_args['pgptcontainername'],
                        default_args['pgpthost'],default_args['pgptport'],default_args['vectordimension']), "ENTER"])

 if __name__ == '__main__':
     if len(sys.argv) > 1:
        if sys.argv[1] == "1":
         repo=tsslogging.getrepo()

         VIPERTOKEN = sys.argv[2]
         VIPERHOST = sys.argv[3]
         VIPERPORT = sys.argv[4]
         vectordbcollectionname =  sys.argv[5]
         concurrency =  sys.argv[6]

         cuda =  sys.argv[7]
         rollbackoffset =  sys.argv[8]
         prompt =  sys.argv[9]
         context =  sys.argv[10]
         keyattribute =  sys.argv[11]
         keyprocesstype =  sys.argv[12]
         hyperbatch =  sys.argv[13]
         docfolder =  sys.argv[14]
         docfolderingestinterval =  sys.argv[15]
         useidentifierinprompt =  sys.argv[16]
         searchterms =  sys.argv[17]
         streamall =  sys.argv[18]
         temperature = sys.argv[19]
         vectorsearchtype = sys.argv[20]

         contextwindowsize = sys.argv[21]
         pgptcontainername = sys.argv[22]

         pgpthost = sys.argv[23]
         pgptport = sys.argv[24]
         vectordimension=sys.argv[25]

         default_args['vectordimension']=vectordimension

         default_args['rollbackoffset']=rollbackoffset
         default_args['prompt'] = prompt
         default_args['context'] = context

         default_args['keyattribute'] = keyattribute
         default_args['keyprocesstype'] = keyprocesstype
         default_args['hyperbatch'] = hyperbatch
         default_args['vectordbcollectionname'] = vectordbcollectionname
         default_args['concurrency'] = concurrency
         default_args['CUDA_VISIBLE_DEVICES'] = cuda

         default_args['docfolder'] = docfolder
         default_args['docfolderingestinterval'] = docfolderingestinterval
         default_args['useidentifierinprompt'] = useidentifierinprompt
         default_args['searchterms'] = searchterms
         default_args['streamall'] = streamall
         default_args['temperature'] = temperature
         default_args['vectorsearchtype'] = vectorsearchtype

         default_args['contextwindowsize'] = contextwindowsize
         default_args['pgptcontainername'] = pgptcontainername

         default_args['pgpthost'] = pgpthost
         default_args['pgptport'] = pgptport

         if "KUBE" not in os.environ:
           v,buf=qdrantcontainer()
           if buf != "":
            if v==1:
             tsslogging.locallogs("WARN", "STEP 9: There seems to be an issue starting the Qdrant container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
            else:
             tsslogging.locallogs("INFO", "STEP 9: Success starting Qdrant.  Here is the run command: {}".format(buf))

           time.sleep(5)  # wait for containers to start

           tsslogging.locallogs("INFO", "STEP 9: Starting privateGPT")
           v,buf,mainmodel,mainembedding=startpgptcontainer()
           if v==1:
             tsslogging.locallogs("WARN", "STEP 9: There seems to be an issue starting the privateGPT container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
           else:
             tsslogging.locallogs("INFO", "STEP 9: Success starting privateGPT.  Here is the run command: {}".format(buf))

           time.sleep(10)  # wait for containers to start
           tsslogging.getqip()
         elif  os.environ["KUBE"] == "0":
           v,buf=qdrantcontainer()
           if buf != "":
            if v==1:
             tsslogging.locallogs("WARN", "STEP 9: There seems to be an issue starting the Qdrant container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
            else:
             tsslogging.locallogs("INFO", "STEP 9: Success starting Qdrant.  Here is the run command: {}".format(buf))

           time.sleep(5)  # wait for containers to start

           tsslogging.locallogs("INFO", "STEP 9: Starting privateGPT")
           v,buf,mainmodel,mainembedding=startpgptcontainer()
           if v==1:
             tsslogging.locallogs("WARN", "STEP 9: There seems to be an issue starting the privateGPT container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
           else:
             tsslogging.locallogs("INFO", "STEP 9: Success starting privateGPT.  Here is the run command: {}".format(buf))

           time.sleep(10)  # wait for containers to start
           tsslogging.getqip()
         else:
           tsslogging.locallogs("INFO", "STEP 9: [KUBERNETES] Starting privateGPT - LOOKS LIKE THIS IS RUNNING IN KUBERNETES")
           tsslogging.locallogs("INFO", "STEP 9: [KUBERNETES] Make sure you have applied the private GPT YAML files and have the privateGPT Pod running")

         if docfolder != '':
           startdirread()
         count=0
         while True:
          try:
              # Get preprocessed data from Kafka
              result = consumetopicdata()
 #             print("Result=",result)
              if result != "" and result is not None:
              # Format the preprocessed data for PrivateGPT
                maindata = gatherdataforprivategpt(result)
              # Send the data to PrivateGPT and produce to Kafka
                if len(maindata) > 0:
                 sendtoprivategpt(maindata,docfolder)
 #             time.sleep(2)
              count=0
          except Exception as e:
           print("Error=",e)
           tsslogging.locallogs("ERROR", "STEP 9: PrivateGPT Step 9 DAG in {} {}  Aborting after 10 consecutive errors.".format(os.path.basename(__file__),e))
           tsslogging.tsslogit("PrivateGPT Step 9 DAG in {} {} Aborting after 10 consecutive errors.".format(os.path.basename(__file__),e), "ERROR" )
           tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
           time.sleep(5)
           count = count + 1
           if count > 10:
             break

7.8. STEP 9 DAG Core Parameter Explanation

Step 9 DAG parameter

Explanation

pgptcontainername

Enter the privateGPT container to use. For example:

  • maadsdocker/tml-privategpt-with-gpu-nvidia-amd64

  • maadsdocker/tml-privategpt-no-gpu-amd64

Containers can be found in Docker Hub under MAADSDOCKER account name

rollbackoffset

Choose rollback offset

offset

Choose offset - usually leave at -1

enabletls

Set to 1 for TLS encryption, or 0 no encryption.

consumefrom

Enter the topic to consume from

pgpt_data_topic

This is the topic that will store the privateGPT responses.

pgpthost

This is the host where privateGPT is running i.e. http://127.0.0.1

pgptport

This is the port privateGPT is listening on i.e. 8001

prompt

This the prompt for privateGPT. For example,

Do the device data show any malfunction or defects?

context

Provide the context for the data. For example,

This is IoT data from devices. The data are anomaly probabilities

for each IoT device. If voltage or current probabilities are low,

it is likely the device is not working properly.

hyperbatch

Set to 1, if you want to sen privateGPT batch grouping

of hyperpredictions. Or set to 1, if you want to send privateGPT

one result of the hyperpredictions at a time. For example,

if doing anomaly predictions on each IoT device, set hyperbatch to 0

and TML will send individyual hyperpredictions to privateGPT, or

in a batch.

jsonkeytogather

This is the JSON key to use to gather the data for privateGPT.

Normally, you two options (only ONE value can be specified):

  1. hyperprediction: TML will store predictions and other outcomes

    in this variable.

  2. Identifier: TML will store additional output details here.

keyattribute

This is the attiribute you are analysing with TML i.e. Voltage,current

keyprocesstype

This is the type of processing you are doing on the keyattribute

i.e. anomprob, avg, trend etc. See Preprocessing Types for

a complete list.

vectordbcollectionname

This is the name of the collection on Qdrant Vector DB

concurrency

The number of instances of privateGPT to run i.e. 2

CUDA_VISIBLE_DEVICES

If you have NVIDIA GPU enter the location here i.e. 0

docfolder

You can specify the sub-folder that contains TEXT or PDF files..this is a

subfolder in the MAIN folder mapped to /rawdata if this field in NON-EMPTY,

privateGPT will query these documents as the CONTEXT to answer your prompt

separate multiple folders with a comma

docfolderingestinterval

How often you want TML to RE-LOAD the files in docfolder - enter the number of SECONDS

useidentifierinprompt

If 1, this uses the identifier in the TML json output and appends it to prompt, If 0,

it uses the prompt only

searchterms

If you are searching document embeddings, you can specify search

terms like: ‘192.168.–identifier–,authentication failure’, etc..

In the privateGPT responses to the prompt, TML does a further

search of the responses to see if the search terms exist in the response.

This is very powerful, because you can raise alerts on the responses

that contain special terms that raise an alerts i.e. hacking attempt

streamall

This determines whether to stream all of the privateGPT responses

or just the ones that contain search terms.

If set to ‘1’, all responses are streamed, if ‘0’,

only response containing search terms are streamed.

temperature

This determines how the LLM responds, it is a number

between 0 and 1.

If 0, the response will be very conservative.

If 1, the LLM will hallucinate.

vectorsearchtype

This determines how similarity searches are performed

in the Qdrant vector DB. You must choose one of the

following: Cosine, Dot, Manhattan or Euclid.

See Qdrant for more details

contextwindowsize

The size of the context window. This is the maximum

number of tokens to send to PGPT for processing. For exampled,

if contextwindow is 8192, then a maximum of 8192 words can be sent to

privateGPT for processing. You can increase this number, but it will

consume more memory.

vectordimension

This is the size of the embedding array.

It is specific to the embedding model being used.

For example, 384, 768, 1024 etc. see the figure below.

mitrejson

You can use the mitre.json

and save it to your mapped /rawdata folder.

RTMS will ask AI to classifiy the messages in accordance

with the MITRE ATT&CK classification matrix.

7.9. Vector Dimensions

This shows the different dimensions for embedding models. See here for more details.

_images/vecdim.png

7.10. privateGPT Processing Explanation

Consider the following JSON. This JSON is the output from STEP 4: Preprocesing Data: tml-system-step-4-kafka-preprocess-dag

{
        "hyperprediction": "120714.692",
        "Maintopic": "iot-preprocess",
        "Topic": "topicid155_Voltage_preprocessed_Avg",
        "Type": "External",
        "ProducerId": "customjson",
        "TimeStamp": "2024-09-13 17:04:36",
        "Unixtime": 1726247076213196638,
        "kafkakey": "OAA-Tvw04fZB3lr7bDehMDMAmK1ug2p0jw",
        "Preprocesstype": "Avg",
        "WindowStartTime": "2022-01-27 19:55:07 +0000 UTC",
        "WindowEndTime": "2022-01-27 19:55:09 +0000 UTC",
        "WindowStartUnixTime": "1643313307000000000",
        "WindowEndUnixTime": "1643313309000000000",
        "Conditions": "",
        "Identifier": "Voltage~Line-Voltage-(mV)~iot-preprocess~uid:metadata.dsn,subtopic:metadata.property_name (Voltage),value:datapoint.value,identifier:metadata.display_name,datetime:datapoint.updated_at,:allrecords,Joinedidentifiers:~oem:n/a~lat:n/a~long:n/a~location:n/a~identifier:n/a,TML solution~Msgsjoined=06d99238-7fab-11ec-16dd-04357e6ea60c(120609,41.60322,-73.08775,Voltage,n/a,n/a,{});06f7a066-7fab-11ec-b57e-c6fecac720c2(120456,41.60322,-73.08775,Voltage,n/a,n/a,{});071a7abe-7fab-11ec-d105-4ccdd61deb1a(120609,41.60322,-73.08775,Voltage,n/a,n/a,{});0733212c-7fab-11ec-d162-80400f9d10d6(120609,41.60322,-73.08775,Voltage,n/a,n/a,{});0758c90e-7fab-11ec-24d3-2c9b20193b60(120609,41.60322,-73.08775,Voltage,n/a,n/a,{});0780e5a6-7fab-11ec-4416-1bf4bf386653(120812,41.60322,-73.08775,Voltage,n/a,n/a,{});07a1965c-7fab-11ec-ab45-fb68b835cee7(120712,41.60322,-73.08775,Voltage,n/a,n/a,{});07b56970-7fab-11ec-2762-03c9c43b6eac(120812,41.60322,-73.08775,Voltage,n/a,n/a,{});07ce4558-7fab-11ec-f91b-bce1f12d0bdc(120712,41.60322,-73.08775,Voltage,n/a,n/a,{});07ea1986-7fab-11ec-3b6d-d650f04215e1(120812,41.60322,-73.08775,Voltage,n/a,n/a,{});08014156-7fab-11ec-924c-3d9a32b7def1(120915,41.60322,-73.08775,Voltage,n/a,n/a,{});08197cd0-7fab-11ec-5c87-5902076c89be(120812,41.60322,-73.08775,Voltage,n/a,n/a,{});083c9760-7fab-11ec-f6e0-05d9b27e71d5(120812,41.60322,-73.08775,Voltage,n/a,n/a,{})~latlong=~mainuid=AC000W017810194",
        "PreprocessIdentifier": "",
        "Numberofmessages": 13,
        "RawData": [
                120609,
                120456,
                120812,
                120712,
                120915
        ],
        "MsgIdData": [
                "06d99238-7fab-11ec-16dd-04357e6ea60c(120609):{1}",
                "06f7a066-7fab-11ec-b57e-c6fecac720c2(120456):{1}",
                "071a7abe-7fab-11ec-d105-4ccdd61deb1a(120609):{1}",
                "0733212c-7fab-11ec-d162-80400f9d10d6(120609):{1}",
                "0758c90e-7fab-11ec-24d3-2c9b20193b60(120609):{1}",
                "0780e5a6-7fab-11ec-4416-1bf4bf386653(120812):{1}",
                "07a1965c-7fab-11ec-ab45-fb68b835cee7(120712):{1}",
                "07b56970-7fab-11ec-2762-03c9c43b6eac(120812):{1}",
                "07ce4558-7fab-11ec-f91b-bce1f12d0bdc(120712):{1}",
                "07ea1986-7fab-11ec-3b6d-d650f04215e1(120812):{1}",
                "08014156-7fab-11ec-924c-3d9a32b7def1(120915):{1}",
                "08197cd0-7fab-11ec-5c87-5902076c89be(120812):{1}",
                "083c9760-7fab-11ec-f6e0-05d9b27e71d5(120812):{1}"
        ],
        "Offset": 524247,
        "Consumerid": "StreamConsumer",
        "Generated": "2024-09-13T17:04:37.459+00:00",
        "Partition": 0
}

Important

It is important to note the format of this JSON as follows.

  1. hyperprediction - all TML output is stored in this variable. This could be the name of the value of jsonkeytogather. The Step 9 DAG, will gather all the data from this key and ask privateGPT the question in your prompt.

  2. Identifier - Additional details are put in this key. Specifically, the data used in the analysis is stored in the RawData JSON array, that can also be gathered and presented to privateGPT for prompting.

Now,

keyattribute is the variable you are processing. This is seen in the “Topic”: “topicid155_Voltage_preprocessed_Avg”, here TML is taking Average of voltage from the devices. Clearly, you can specify any name for key attribute you are processing.

keyprocesstype is the type of processing you are doing, as listed in Preprocessing Types. This is seen in the “Preprocesstype”: “Avg”,, here TML is taking Average of voltage from the devices. Clearly, you can specify any name for key processing type from the processing types table.

Tip

You can separate multiple keyattribute, and keyprocesstype with a comma.

This way of using processed data with privateGPT for further analysis, offers a tremendously powerful way to leverage GenAI technology with real-time data streams at no cost: since all API calls are done to the privateGPT container that is running locally. Also, no data are sent outside your environment, this further makes this solution very secure giving you 100% data control.

7.11. Using Qdrant VectorDB for Local Document Analysis

Users can search local documents to cross-reference the Identifier field in the privateGPT Processing Explanation

7.12. TML, PrivateGPT and Qdrant Example Scenarios

  1. You can map local folders to the /rawdata folder and store your files (TEXT or PDF) as subfolders.

  1. For example: docfolder=’mylog1,mylog2’, these two folders would be subfolders in the local folder mapped to /rawdata

  1. The contents of these folders would be ingested into Qdrant Vector DB

  1. These folder will automatically rel-loaded every docfolderingestinterval seconds. For example, if you want to analyse log files, then if docfolderingestinterval=60, these folders will be ingested every 60 seconds

  2. If useidentifierinprompt is 1, then TML will add the Identifier as part of the prompt. For example, if you are analysing IP addresses for anomalies, and compute an anomaly score, you can further complement this score by looking in to log files, to see if this IP address has authentication failures, which may indicate this IP address is a HACKING attempt.

  1. You can even add a placeholder for identifier in the prompt by adding --identifier--. For example, prompt=Does the following **--identifier-- have any errors in the logs?** TML will replace --identifier-- is the real-time IP address or value in the Identifier JSON field.

This way, you can use TML, privateGPT and Qdrant for powerful analysis of documents, by cross-referencing and meshing information together to get greater real-time insights from your real-time data.

7.13. STEP 9b: Multi-Agentic Agentic A: tml-system-step-9b-agenticai-dag

This DAG implements multi-agentic AI to real-time data processing. Take a look at ref:TML and Agentic AI for more information.

   from airflow.operators.python import PythonOperator
   from airflow.operators.bash import BashOperator
   from datetime import datetime, timezone
   from airflow.decorators import dag, task
   from langgraph_supervisor import create_supervisor
   from llama_index.core.indices.vector_store.base import VectorStoreIndex
   from llama_index.core.schema import Document  # Document is often found here
   from langgraph.prebuilt import create_react_agent
   from llama_index.embeddings.ollama import OllamaEmbedding
   from langchain_ollama import ChatOllama
   import importlib
   import json
   import pprint
   from llama_index.core.settings import Settings
   from datetime import datetime, timezone
   import os
   import tsslogging
   import sys
   import time
   import maadstml
   import subprocess
   import random
   import json
   import threading
   import re
   from binaryornot.check import is_binary
   import base64
   import requests
   from json_repair import repair_json

   sys.dont_write_bytecode = True

   ######################################################USER CHOSEN PARAMETERS ###########################################################
   SMTP_SERVER=''
   SMTP_PORT=0
   SMTP_USERNAME=''
   SMTP_PASSWORD='' # this should be base64 encoded
   recipient=''

   if 'SMTP_SERVER' in os.environ:
      SMTP_SERVER=os.environ['SMTP_SERVER']
   if 'SMTP_PORT' in os.environ:
      SMTP_PORT=int(os.environ['SMTP_PORT'])
   if 'SMTP_USERNAME' in os.environ:
      SMTP_USERNAME=os.environ['SMTP_USERNAME']
   if 'SMTP_PASSWORD' in os.environ:
      SMTP_PASSWORD=os.environ['SMTP_PASSWORD']
      SMTP_PASSWORD=base64.b64decode(SMTP_PASSWORD)
      SMTP_PASSWORD = SMTP_PASSWORD.decode('utf-8')
   if 'recipient' in os.environ:
      recipient=os.environ['recipient']

   default_args = {
    'owner': 'Sebastian Maurice',   # <<< *** Change as needed
    'ollamacontainername' : 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools', #'maadsdocker/tml-privategpt-no-gpu-amd64',  # enter a valid container https://hub.docker.com/r/maadsdocker/tml-privategpt-no-gpu-amd64
    'rollbackoffset' : '5',  # <<< *** Change as needed
    'offset' : '-1', # leave as is
    'enabletls' : '1', # change as needed
    'brokerhost' : '', # <<< *** Leave as is
    'brokerport' : '-999', # <<< *** Leave as is
    'microserviceid' : '',  # change as needed
    'topicid' : '-999', # leave as is
    'delay' : '100', # change as needed
    'companyname' : 'otics',  # <<< *** Change as needed
    'consumerid' : 'streamtopic',  # <<< *** Leave as is
    'agenttopic' : '', # this topic contains the individual agent responses
    'agents_topic_prompt' : """
   <consumefrom - topic agent will monitor:prompt you want for the agent to answer->>consumefrom - topic2 agent will monitor<<-prompt you want for the agent to answer>
   """, # <topic agent will monitor:prompt you want for the agent>, separate multiple topic agents with ->>
    'teamlead_topic' : '', # Enter the team lead topic - all team lead responses will be written to this topic
    'teamleadprompt' : """
   Enter the prompt for the Team lead agent
   """, # Enter the team lead prompt
    'supervisor_topic' : '', # Enter the supervisor topic - all supervisor responses will be written to this topic
    'supervisorprompt' : '', # Enter the supervisor prompt
    'agenttoolfunctions' : """
   tool_function:agent_name:system_prompt;tool_function2:agent_name2:sysemt_prompt2;....
   """,  # enter the tools : tool_function is the name of the funtions in the agenttools python file
    'agent_team_supervisor_topic': '', # this topic will hold the responses from agents, team lead and supervisor
    'producerid' : 'agentic-ai',   # <<< *** Leave as is
    'identifier' : 'This is analysing TML output with Agentic AI',
    'mainip': 'http://127.0.0.1', # Ollama server container listening on this host
    'mainport' : '11434', # Ollama listening on this port
    'embedding': 'nomic-embed-text', # Embedding model
    'preprocesstype' : '', # Leave as is
    'partition' : '-1', # Leave as is
    'vectordbcollectionname' : 'tml-llm-model-v2', # change as needed
    'concurrency' : '2', # change as needed Leave at 1
    'CUDA_VISIBLE_DEVICES' : '0', # change as needed
    'temperature' : '0.1', # This value ranges between 0 and 1, it controls how conservative LLM model will be, if 0 very very, if 1 it will hallucinate
    #--------------------
    'ollama-model': 'llama3.1',
    'deletevectordbcount': '10',
    'vectordbpath': '/rawdata/vectordb',
    'contextwindow': '10000',
    'localmodelsfolder': '/mnt/c/maads/tml-airflow/rawdata/ollama'
   }

   ############################################################### DO NOT MODIFY BELOW ####################################################

   VIPERTOKEN=""
   VIPERHOST=""
   VIPERPORT=""
   HTTPADDR=""
   mainproducerid = default_args['producerid']

   def setollama(model):
       ###############  Ollama Model #################################
   #    model=default_args['ollama-model']
       temperature=float(default_args['temperature'])
       embeddingmodel=default_args['embedding'] #"nomic-embed-text"
       mainip=default_args['mainip']
       mainport=int(default_args['mainport'])
       contextwindow=default_args['contextwindow']

   #    mainmodels = model.split(",") # agent,teamlead,supervisor

       if 'KUBE' in os.environ:
         if os.environ['KUBE'] == "1":
            default_args['mainip']="ollama-service"
            mainip=default_args['mainip']

       print("model====",model)
       gotllm=0
       for i in range(30):
         print("Checking if LLM loaded..wait")
         try:
           llm = ChatOllama(model=model, base_url=mainip+":"+str(mainport), temperature=temperature, num_ctx=int(contextwindow))
           gotllm=1
           print("LLM loaded")
           break
         except Exception as e:
           print("Error=",e)
           time.sleep(5)

       if gotllm==0:
           print("ERROR STEP 9b: Cannot load Ollama LLM model '{}' not found.".format(model))
           tsslogging.locallogs("ERROR", "STEP 9b: Cannot load Ollama LLM model '{}' not found.".format(model))
           return "",""

       try:
         ollama_emb = OllamaEmbedding(
           base_url=mainip+":"+str(mainport),
           model_name=embeddingmodel
         )
       except Exception as e:
         print("ERROR STEP 9b: Cannot load Ollama embedding '{}' not found.".format(embeddingmodel))
         tsslogging.locallogs("ERROR", "STEP 9b: Cannot load Ollama embedding '{}' not found.".format(embeddingmodel))
         return "",""

       Settings.embed_model = ollama_emb
       Settings.llm = llm

       return llm,ollama_emb


   def checkforloadedmodels(mainmodel):

       if 'KUBE' in os.environ:
         if os.environ['KUBE'] == "1":
            default_args['mainip']="ollama-service"
            mainip=default_args['mainip']

       mainip=default_args['mainip']
       mainport=int(default_args['mainport'])

       OLLAMA_URL = f"{mainip}:{mainport}/api/tags"
       count = 0

       while True:
         try:
           response = requests.get(OLLAMA_URL)
           response.raise_for_status()
           data = response.json()
           # Assume 'models' key contains the list of available/loaded models
           loaded_models = [model for model in data.get("models", [])]
           print("loaded_models=",loaded_models)
           if mainmodel in json.dumps(loaded_models) or mainmodel+":latest" in json.dumps(loaded_models):
             print(f"Model {mainmodel} found")
             return 1
           else:
             pull_ollama_model(mainmodel) # pull the model
             time.sleep(5)
             count += 1
             if count > 600:
              break
             else:
               continue
         except Exception as e:
           print(f"Error querying Ollama server: {e} Will keep trying")
           time.sleep(5)
           count += 1
           if count > 20:
             break
           continue

       return 0


   def get_loaded_models():

       if 'KUBE' in os.environ:
         if os.environ['KUBE'] == "1":
            default_args['mainip']="ollama-service"
            mainip=default_args['mainip']

       mainip=default_args['mainip']
       mainport=int(default_args['mainport'])
       mainmodel=default_args['ollama-model']
       mainmodel = mainmodel.split(",")[0] #check if one model is there
       OLLAMA_URL = f"{mainip}:{mainport}/api/tags"
       count = 0

       while True:
         try:
           response = requests.get(OLLAMA_URL)
           response.raise_for_status()
           data = response.json()
           # Assume 'models' key contains the list of available/loaded models
           loaded_models = [model for model in data.get("models", [])]
           print("loaded_models=",loaded_models)
           if mainmodel in json.dumps(loaded_models) or mainmodel+":latest" in json.dumps(loaded_models):
             print(f"Model {mainmodel} found")
             return 1
           else:
             time.sleep(5)
             count += 1
             if count > 600:
              break
             else:
               continue
         except Exception as e:
           print(f"Error querying Ollama server: {e} Will keep trying")
           time.sleep(5)
           count += 1
           if count > 20:
             break
           continue

       return 0

   def remove_escape_sequences(string):
       return string.encode('utf-8').decode('unicode_escape')

   def cleanstringjson(mainstr):

       mainstr = mainstr.replace("'","").replace('`',"").replace("\n","").replace("\\n","").replace("\t","").replace("\\t","").replace("\r","").replace("\\r","").replace("\\*","").replace("\\ ","").replace("\\\\","\\")


       a = list(mainstr.lower())
       b = "abcdefghijklmnopqrstuvwxyz-*123456789'{}`"
       i=0
       for char in a:
           if char == "\\" and a[i+1] in b:
             a[i]=''
           if char == "\\" and a[i+1] == "\\" and a[i+2] == '"':
             a[i]=''

           i=i+1

       mainstr=''.join(a)
       mainstr=re.sub(r'[\n\r]+', '', mainstr)

       mainstr = mainstr.translate({ord('\n'): None, ord('\r'): None})
       mainstr = " ".join(mainstr.splitlines())

       return mainstr

   def cleanstring(mainstr):

       mainstr = mainstr.replace('"',"").replace("'","").replace('`',"").replace("\n","").replace("\\n","").replace("\t","").replace("\\t","").replace("\r","").replace("\\r","").replace("\\*","").replace("\\ ","").replace("\\\\","\\").replace("\\1","1").replace("\\2","2").replace("\\3","3").replace("\\4","4").replace("\\5","5").replace("\\6","6").replace("\\7","7").replace("\\8","8").replace("\\9","9")
       mainstr = mainstr.splitlines()
       mainstr = " ".join(mainstr)

       a = list(mainstr.lower())
       b = "abcdefghijklmnopqrstuvwxyz-*123456789'{}`"
       i=0
       for char in a:
           if char == "\\" and a[i+1] in b:
             a[i]=''
           if char == "\\" and a[i+1] == "\\" and a[i+2] == '"':
             a[i]=''

           i=i+1

       mainstr=''.join(a)
       mainstr=re.sub(r'[\n\r]+', '', mainstr)

       mainstr = mainstr.translate({ord('\n'): None, ord('\r'): None})
       return mainstr

   ############## Delete folder content ########################
   def deletefoldercontents(dirpath,deletevectordbcnt):
       if deletevectordbcnt < int(default_args['deletevectordbcount']):
           deletevectordbcnt += 1
           return deletevectordbcnt
       else:
           deletevectordbcn=0

       folder = dirpath
       for filename in os.listdir(folder):
           file_path = os.path.join(folder, filename)
           try:
               if os.path.isfile(file_path) or os.path.islink(file_path):
                   os.unlink(file_path)
               elif os.path.isdir(file_path):
                   shutil.rmtree(file_path)
           except Exception as e:
               print('Failed to delete %s. Reason: %s' % (file_path, e))
       return deletevectordbcnt
   ########################### Vector DB for Team Lead: Agent Responses ###############
   # this is for the team lead agent to consolidate information from individual agents
   ###################################################################################
   def loadtextdataintovectordb(responses,deletevectordbcnt,llm):

       vectordbpath = default_args['vectordbpath']

       directory_path="{}/tmlvectortextindex".format(vectordbpath)

       if not os.path.exists(directory_path):
          os.makedirs(directory_path)

       # delete previous folder content
       deletevectordbcnt=deletefoldercontents(directory_path,deletevectordbcnt)

       documents = [Document(text=t) for t in responses]

       #build index
       tml_index = VectorStoreIndex.from_documents(
           documents,
           embedding="local"
       )
       #persist index

       # persist index
       tml_index.storage_context.persist(persist_dir=directory_path)

       tml_text_engine = tml_index.as_query_engine(llm=llm,similarity_top_k=3)

       return tml_text_engine,deletevectordbcnt

   def pull_ollama_model(model_name):
       """
       Initiates an Ollama model pull using the Ollama API.

       Args:
           model_name (str): The name of the model to pull (e.g., "llama3").
       """
       mainip=default_args['mainip']
       mainport=int(default_args['mainport'])

       url = f"{mainip}:{mainport}/api/pull"  # Default Ollama API endpoint
       headers = {"Content-Type": "application/json"}
       payload = {"name": model_name}

       try:
           response = requests.post(url, headers=headers, data=json.dumps(payload), stream=True)
           response.raise_for_status()  # Raise an exception for HTTP errors

           print(f"Initiating pull for model: {model_name}")
           for chunk in response.iter_content(chunk_size=None):
               if chunk:
                   # Process the streaming response, e.g., print progress
                   try:
                       data = json.loads(chunk.decode('utf-8'))
                       if 'status' in data:
                           print(f"Status: {data['status']}", end='\r')
                   except json.JSONDecodeError:
                       pass # Handle incomplete JSON chunks if necessary

           print(f"\nPull for model '{model_name}' completed.")

       except requests.exceptions.RequestException as e:
           print(f"Error pulling model '{model_name}': {e}")


   def stopcontainers():


      ollamacontainername = default_args['ollamacontainername']
      cfound=0
      subprocess.call("docker image ls > gptfiles.txt", shell=True)
      with open('gptfiles.txt', 'r', encoding='utf-8') as file:
           data = file.readlines()
           r=0
           for d in data:
             darr = d.split(" ")
             if '-privategpt-' in darr[0]:
               buf="docker stop $(docker ps -q --filter ancestor={} )".format(darr[0])
               if ollamacontainername in darr[0]:
                   cfound=1
                   # if ollama container found check if model is already loaded - if not  stop container
                   if get_loaded_models()==0:
                     print(buf)
                     subprocess.call(buf, shell=True)
                     return 0
                   break
      if cfound==0:
         print("INFO STEP 9b: Ollama container {} not found.  It may need to be pulled.".format(ollamacontainername))
         tsslogging.locallogs("WARN", "STEP 9b: Ollama container not found. It may need to be pulled if it does not start: docker pull {}".format(ollamacontainername))
         return 0

      return 1

   def startpgptcontainer():
         print("Starting Ollama container: {}".format(default_args['ollamacontainername']))
         collection = default_args['vectordbcollectionname']
         concurrency = default_args['concurrency']
         ollamacontainername = default_args['ollamacontainername']
         mainport = int(default_args['mainport'])
         cuda = int(default_args['CUDA_VISIBLE_DEVICES'])
         temp = default_args['temperature']
         mainmodel=default_args['ollama-model']
         mainembedding=default_args['embedding']
         mainhost = default_args['mainip']

         mainmodels = mainmodel.split(",")
         mainmodel = " && ".join(mainmodels)

         ollamaserver = mainhost + ":" + str(mainport)
         localmodels=''
         if default_args['localmodelsfolder'] != '':
             localmodels = "-v " + default_args['localmodelsfolder'] + ":/root/.ollama:z"

         time.sleep(10)
         if os.environ['TSS'] == "1":
             buf = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z {} --env OLLAMA_LOAD_TIMEOUT=30m0s --env PORT={} --env TSS=1 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env LLAMAMODEL=\"{}\" --env mainembedding=\"{}\" --env OLLAMASERVERPORT=\"{}\" {}".format(mainport,mainport,localmodels,mainport,collection,concurrency,cuda,temperature,mainmodel,mainembedding,ollamaserver,ollamacontainername)
         else:
             buf = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z {} --env OLLAMA_LOAD_TIMEOUT=30m0s --env PORT={} --env TSS=0 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env LLAMAMODEL=\"{}\" --env mainembedding=\"{}\" --env OLLAMASERVERPORT=\"{}\" {}".format(mainport,mainport,localmodels,mainport,collection,concurrency,cuda,temperature,mainmodel,mainembedding,ollamaserver,ollamacontainername)


         if stopcontainers() == 1:
           return 1,buf,mainmodel,mainembedding

         v=subprocess.call(buf, shell=True)
         print("INFO STEP 9b: Ollama container.  Here is the run command: {}, v={}".format(buf,v))
         tsslogging.locallogs("INFO", "STEP 9b: Ollama container.  Here is the run command: {}, v={}".format(buf,v))

         return v,buf,mainmodel,mainembedding


   def producegpttokafka(value,maintopic):
        inputbuf=value.strip()
        topicid=int(default_args['topicid'])
        producerid=default_args['producerid']
        identifier = default_args['identifier']

        # Add a 7000 millisecond maximum delay for VIPER to wait for Kafka to return confirmation message is received and written to topic
        delay=default_args['delay']
        enabletls=default_args['enabletls']

        inputbuf=cleanstringjson(inputbuf)


        try:
           result=maadstml.viperproducetotopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,producerid,enabletls,delay,'','', '',0,inputbuf,'',
                                               topicid,identifier)
           print(result)
        except Exception as e:
           print("ERROR:",e)

   def consumefromtopic(maintopic):

         rollbackoffsets = int(default_args['rollbackoffset'])
         enabletls = int(default_args['enabletls'])
         consumerid=default_args['consumerid']
         companyname=default_args['companyname']
         offset = int(default_args['offset'])
         brokerhost = default_args['brokerhost']
         brokerport = int(default_args['brokerport'])
         microserviceid = default_args['microserviceid']
         topicid = default_args['topicid']
         preprocesstype = default_args['preprocesstype']
         delay = int(default_args['delay'])
         partition = int(default_args['partition'])

         print("before viperconsume",VIPERHOST,VIPERPORT,maintopic)
         result=maadstml.viperconsumefromtopic(VIPERTOKEN,VIPERHOST,VIPERPORT,maintopic,
                     consumerid,companyname,partition,enabletls,delay,
                     offset, brokerhost,brokerport,microserviceid,
                     topicid,rollbackoffsets,preprocesstype)
         return result


   def windowname(wtype,sname,dagname):
       randomNumber = random.randrange(10, 9999)
       wn = "python-{}-{}-{},{}".format(wtype,randomNumber,sname,dagname)
       with open("/tmux/pythonwindows_{}.txt".format(sname), 'a', encoding='utf-8') as file:
         file.writelines("{}\n".format(wn))

       return wn

   ############# Get the real-time data from the data streams #########################
   def getjsonsfromtopics(topics):

       print("in getjsonsfromtopics==",topics)

       topicsarr = topics.split("->>")
       topicjsons = []

       for t in topicsarr:
         t=t.strip()
         t2 = t.split("<<-")[0].strip()
         try:
           jsonvalue=consumefromtopic(t2)
         except Exception as e:
           print("error=",e)
         topicjsons.append(jsonvalue)

       return topicjsons


   def extract_hyperpredictiondata(hjson):

       print("in extract")

       hyper_json = json.loads(hjson)
       hnum=0
       pt=""
       pv=""
       mainuid=""
       jbufs = ""

       if len(hyper_json['streamtopicdetails']['topicreads']) == 0:
        return ""

       for item in hyper_json['streamtopicdetails']['topicreads']:
           jbuf = ""

           if "preprocesstype" in item:
              ptypes = item['preprocesstype']
              pt = ptypes
              iden = item['identifier']
              idenarr = iden.split("~")
              pv = idenarr[0]
              hyperprediction = str(item['hyperprediction'])
              hnum=round(float(hyperprediction))

           if "islogistic" in item:
              pv="machine learning"
              if item['islogistic'] == "1":
                 pt = "probability prediction"
                 hyperprediction = str(item['hyperprediction'])
                 hnum = round(float(hyperprediction)*100)
              else:
                 hyperprediction = str(item['hyperprediction'])
                 hnum = round(float(hyperprediction))
                 pt = "prediction"


           if "identifier" in item:
               iden = item['identifier']
               idenarr = iden.split("~")
               mainuid = idenarr[-1]
               mainuid = mainuid.split("=")[1]


           jbuf = '{"hp":' + str(hnum) + ',"pt":"' + pt + '", "pv":"' + pv + '", "uid":"' + mainuid + '"}'
           jbufs = jbufs + jbuf +","


       hliststr = "[" + jbufs[:-1] + "]"
       hliststr=re.sub(r'[\n\r]+', '', hliststr)
       hliststr = hliststr.translate({ord('\n'): None, ord('\r'): None})
       print("hliststr==",hliststr)
       return hliststr

   def checkjson(cjson):

       model = default_args['ollama-model']
       temperature = float(default_args['temperature'])
       embeddingmodel = default_args['embedding']

       cjson = cjson.strip()
       try:
        checkedjson = json.loads(cjson)  # check to see if json loads - if not its bad
       except Exception as e:
        print("Json error=",e)
        if cjson[-1] != '}':
           if "Model" not in cjson and "Embedding" not in cjson and "Temperature" not in cjson:
             cjson = cjson +'","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'
           else:
             cjson = cjson + '"}'

        elif cjson[-2] != '"':
           if "Model" not in cjson and "Embedding" not in cjson and "Temperature" not in cjson:
             cjson = cjson[:-1] +'","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'
           else:
             cjson = cjson[:-1] + '"}'

        cjson = repair_json(cjson, skip_json_loads=True )
        pass
        # bad json

       return cjson


   def agentquerytopics(usertopics,topicjsons,llm):
       topicsarr = usertopics.split("->>")
       bufresponse = ""
       bufarr = []
       agenttopic = default_args['agenttopic']

       model = default_args['ollama-model']
       temperature = float(default_args['temperature'])
       embeddingmodel = default_args['embedding']

       md = model.split(",")
       model=md[0]

       if len(topicsarr) == 0:
           print("No topics data")
           return "",""

       responses = []
       for t,mainjson in zip(topicsarr,topicjsons):
         t=t.strip()
         t2  = t.split("<<-")
         mainjson=mainjson.lower()
         if "hyperprediction" in mainjson:
            mainjson=extract_hyperpredictiondata(mainjson)
            if mainjson == "":
              continue

         if "<<data>>" in t2[1]:
            query_str=t2[1]
            query_str = query_str.replace("<<data>>", f"{mainjson}")
            print("query_string====",query_str)


       # Invoking with a string
         print("------before llm invoke===")
         response = llm.invoke(query_str)
         response=str(response.content)

         prompt=cleanstring(t2[1].strip())

         response=cleanstring(response)
         response=response.replace(";",",").replace(":","").replace("'","").replace('"',"")

         bufresponse  = '{"Date": "' + str(datetime.now(timezone.utc)) + '","Agent_Name": "Topic_Agent", "Topic": "'+t2[0].strip()+'","Prompt":"' + prompt + '","Response": "' + response.strip() + '","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'
         bufresponse=checkjson(bufresponse)
         print("======bufresponse====",bufresponse)
         bufarr.append(bufresponse)

         producegpttokafka(bufresponse,agenttopic)

         responses.append(response)

       return responses,bufarr

   def teamleadqueryengine(tml_text_engine):
       bufresponse = ""

       model = default_args['ollama-model']
       md = model.split(",")
       if len(md)>1:
         model=md[1]

       temperature = float(default_args['temperature'])
       embeddingmodel = default_args['embedding']

       teamleadprompt = teamleadprompt.replace(";"," ")
       response = tml_text_engine.query(teamleadprompt )
       response=str(response)
   #    print("team repsose = ", response)
       prompt=cleanstring(teamleadprompt.strip())
       response=cleanstring(response.strip())
       response=response.replace(";",",").replace(":","").replace('"',"").replace("'","")
       bufresponse  = '{"Date": "' + str(datetime.now(timezone.utc)) + '","Agent_Name": "Team_Lead_Agent", "Topic": "'+default_args['teamlead_topic'] +'","Prompt":"' + prompt + '","Response": "' + response.strip() + '","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'
       bufresponse=checkjson(bufresponse)

       producegpttokafka(bufresponse,default_args['teamlead_topic'])

       return response,bufresponse

   ################ Create Supervisor

   def createactionagents(llm,sname):
       print("in createactionagents")
       repo=tsslogging.getrepo()

       agents=[]
       filepath=f"/{repo}/tml-airflow/dags/tml-solutions/{sname}/agenttools.py"
       print("filepath===",filepath)
       module_name = "agenttools"

       spec = importlib.util.spec_from_file_location(module_name, filepath)
       dynamic_module = importlib.util.module_from_spec(spec)
       spec.loader.exec_module(dynamic_module)

       maintools=default_args['agenttoolfunctions'].strip()
       funcname=maintools.split("->>")

       for f in funcname:
          if len(f)>2:
            f=f.strip()
            fname=f.split("<<-")[0]
            print(fname)
            func_objects = []
            func_object = getattr(dynamic_module, fname)
            func_objects.append(func_object)

            aname=f.split("<<-")[1]
            aprompt=f.split("<<-")[2]

            agent = create_react_agent(
               model=llm,
               tools=func_objects,
               name=aname,
               prompt=aprompt

            )
            agents.append(agent)
       return agents


   def createasupervisor(agents,supervisorprompt,llm):
       print("in createasupervisor==",supervisorprompt)

       supervisorprompt = supervisorprompt.replace(";"," ")

       workflow = create_supervisor(
         agents,
         model=llm,
         prompt=supervisorprompt
       )
   # Compile and run
       app = workflow.compile()
       return app

   def invokesupervisor(app,maincontent):

       model = default_args['ollama-model']
       md = model.split(",")
       if len(md)>2:
         model=md[2]

       temperature = float(default_args['temperature'])
       embeddingmodel = default_args['embedding']
       funcname = default_args['agenttoolfunctions']
       funcname = funcname.replace(";","==")
       maincontent=maincontent.replace(";",",")

       try:
           supervisormaincontent ="""
             Here is the team lead's assessment: {}.  Based on the Team Lead's assessment what is the appropriate action.
           """.format(maincontent)

           result = app.invoke({
             "messages": [
                 {
                     "role": "user",
                     "content": supervisormaincontent
                 }
             ]
           })
       except Exception as e:
         print("WARN STEP 9b: Agentic AI: unable to create supervisor agent")
         tsslogging.locallogs("WARN", "STEP 9b: Agentic AI: unable to create supervisor agent")
         return "error","error"

       lastmessage=""
       for chunk in app.stream(
           input=result,
           stream_mode="values",):
           if chunk["messages"][-1].content != "":
             lastmessage=chunk["messages"][-1].content

       lastmessage=str(lastmessage)
       lastmessage=cleanstring(lastmessage.strip())
       lastmessage=lastmessage.replace(";",",").replace("'","").replace('"',"").replace(":","")
       bufresponse  = '{"Date": "' + str(datetime.now(timezone.utc)) + '","Agent_Name": "Supervisor_Agent", "Topic": "' + default_args['supervisor_topic'] + '","Prompt":"' + supervisormaincontent + '","Response": "' + lastmessage.strip() + '","Model": "' + model + '","Embedding":"' + embeddingmodel + '", "Temperature":"' + str(temperature) +'"}'


       mainjson=[]
       mainstr=""
       for m in result["messages"]:
         mainjson.append(pprint.pformat(m))
        # mainstr = mainstr + json.dumps(str(m.json)) + ","

       mainjson=json.dumps({"supervisor_workflow_invocation": mainjson})
       mainjson=mainjson[:-1] + ",\"funcname\":" + json.dumps(funcname)+",\"supervisorprompt\":\""+supervisormaincontent+"\"}"
       mainjson=cleanstring(mainjson)
       mainjson=checkjson(mainjson)

       try:
         #print(mainjson)
         producegpttokafka(mainjson,default_args['supervisor_topic'])

         return mainjson,bufresponse
       except Exception as e:
         print("ERROR: invalid json")
         return "error","error"

   def formatcompletejson(bufresponses,teamlead_response,lastmessage):

       bufresponses = " ".join(str(bufresponses).splitlines())
       teamlead_response = " ".join(str(teamlead_response).splitlines())
       lastmessage = " ".join(str(lastmessage).splitlines())

       bufresponses = " ".join(bufresponses.split(" "))
       teamlead_response = " ".join(teamlead_response.split(" "))
       lastmessage = " ".join(lastmessage.split(" "))

       bufresponses = bufresponses.replace("'","").replace("\n"," ").replace("\\n"," ").replace("\t", " ").replace("\r"," ").replace("#","").strip()
       teamlead_response = teamlead_response.replace("'","").replace("\n"," ").replace("\\n"," ").replace("\t", " ").replace("\r", " ").replace("#","").strip()
       lastmessage = lastmessage.replace("'","").replace("\n"," ").replace("\t", " ").replace("\\n"," ").replace("\r"," ").replace("#","").strip()

       print("bufresponses===",bufresponses)
       print("teambuf===",teambuf)
       print("supbuf===",supbuf)

       # check if valid
       try:
         jvalid=json.loads(bufresponses)
       except Exception as e:
         bufresponses = '[{"Status": "no data found", "Model": "na", "Embedding": "na", "Temperature": "na", "Prompt": "na", "Response": "no data found", "Date": "' + str(datetime.now(timezone.utc)) + '", "Agent_Name": "", "Topic": "na"}]'

       try:
         jvalid=json.loads(teamlead_response)
       except Exception as e:
         teamlead_response =  '{"Status": "no data found", "Model": "na", "Embedding": "na", "Temperature": "na", "Prompt": "na", "Response": "no data found", "Date": "' + str(datetime.now(timezone.utc)) + '", "Agent_Name": "Team Lead agent", "Topic": "na"}'

       try:
         jvalid=json.loads(lastmessage)
       except Exception as e:
         lastmessage = '{"Status": "no data found", "Model": "na", "Embedding": "na", "Temperature": "na", "Prompt": "na", "Response": "Error - likely a Tool could not be run. Check your tools.", "Date": "' + str(datetime.now(timezone.utc)) + '", "Agent_Name": "Supervisor agent", "Topic": "na"}'


       mainjson = bufresponses[:-1] + "," + teamlead_response + "," + lastmessage + "]"
       mainjson = " ".join(mainjson.split())
       mainjson = " ".join(mainjson.splitlines())

       mainjson=re.sub(r'[\n\r]+', '', mainjson)

       mainjson = mainjson.replace("'","").replace("\n"," ").replace("\\n"," ").replace("\t", " ").replace("\r"," ").replace("\\r"," ").strip()

       mainjson = mainjson.translate({ord('\n'): None, ord('\r'): None})
       print("mainjson======",mainjson)

       return mainjson

   def startagenticai(**context):
          sd = context['dag'].dag_id
          sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
          pname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))

          if 'step9brollbackoffset' in os.environ:
             if os.environ['step9brollbackoffset'] != '':
               default_args['rollbackoffset'] = os.environ['step9brollbackoffset']

          if 'step9bollama-model' in os.environ:
             if os.environ['step9bollama-model'] != '':
               default_args['ollama-model'] = os.environ['step9bollama-model']
          if 'step9bdeletevectordbcount' in os.environ:
             if os.environ['step9bdeletevectordbcount'] != '':
               default_args['deletevectordbcount'] = os.environ['step9bdeletevectordbcount']

          if 'step9bvectordbpath' in os.environ:
             if os.environ['step9bvectordbpath'] != '':
               default_args['vectordbpath'] = os.environ['step9bvectordbpath']

          if 'step9btemperature' in os.environ:
             if os.environ['step9btemperature'] != '':
               default_args['temperature'] = os.environ['step9btemperature']

          if 'step9bvectordbcollectionname' in os.environ:
             if os.environ['step9bvectordbcollectionname'] != '':
               default_args['vectordbcollectionname'] = os.environ['step9bvectordbcollectionname']
          if 'step9bollamacontainername' in os.environ:
             if os.environ['step9bollamacontainername'] != '':
               default_args['ollamacontainername'] = os.environ['step9bollamacontainername']
          if 'step9bCUDA_VISIBLE_DEVICES' in os.environ:
             if os.environ['step9bCUDA_VISIBLE_DEVICES'] != '':
               default_args['CUDA_VISIBLE_DEVICES'] = os.environ['step9bCUDA_VISIBLE_DEVICES']

          if 'step9bmainip' in os.environ:
             if os.environ['step9bmainip'] != '':
               default_args['mainip'] = os.environ['step9bmainip']
          if 'step9bmainport' in os.environ:
             if os.environ['step9bmainport'] != '':
               default_args['mainport'] = os.environ['step9bmainport']

          if 'step9bembedding' in os.environ:
             if os.environ['step9bembedding'] != '':
               default_args['embedding'] = os.environ['step9bembedding']
          if 'step9bagents_topic_prompt' in os.environ:
             if os.environ['step9bagents_topic_prompt'] != '':
               default_args['agents_topic_prompt'] = os.environ['step9bagents_topic_prompt']

          if 'step9bagenttopic' in os.environ:
             if os.environ['step9bagenttopic'] != '':
               default_args['agenttopic'] = os.environ['step9bagenttopic']

          if 'step9bteamlead_topic' in os.environ:
             if os.environ['step9bteamlead_topic'] != '':
               default_args['teamlead_topic'] = os.environ['step9bteamlead_topic']
          if 'step9bteamleadprompt' in os.environ:
             if os.environ['step9bteamleadprompt'] != '':
               default_args['teamleadprompt'] = os.environ['step9bteamleadprompt']
          if 'step9bsupervisor_topic' in os.environ:
             if os.environ['step9bsupervisor_topic'] != '':
               default_args['supervisor_topic'] = os.environ['step9bsupervisor_topic']
          if 'step9bagenttoolfunctions' in os.environ:
             if os.environ['step9bagenttoolfunctions'] != '':
               default_args['agenttoolfunctions'] = os.environ['step9bagenttoolfunctions']
          if 'step9bagent_team_supervisor_topic' in os.environ:
             if os.environ['step9bagent_team_supervisor_topic'] != '':
               default_args['agent_team_supervisor_topic'] = os.environ['step9bagent_team_supervisor_topic']
          if 'step9bcontextwindow' in os.environ:
             if os.environ['step9bcontextwindow'] != '':
               default_args['contextwindow'] = os.environ['step9bcontextwindow']

          if 'step9blocalmodelsfolder' in os.environ:
             if os.environ['step9blocalmodelsfolder'] != '':
               default_args['localmodelsfolder'] = os.environ['step9blocalmodelsfolder']

          VIPERTOKEN = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERTOKEN".format(sname))
          VIPERHOST = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESSAGENTICAI".format(sname))
          VIPERPORT = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESSAGENTICAI".format(sname))
          HTTPADDR = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HTTPADDR".format(sname))


          ti = context['task_instance']
          ti.xcom_push(key="{}_rollbackoffset".format(sname), value="_{}".format(default_args['rollbackoffset']))
          ti.xcom_push(key="{}_ollama-model".format(sname), value=default_args['ollama-model'])
          ti.xcom_push(key="{}_deletevectordbcount".format(sname), value="_{}".format(default_args['deletevectordbcount']))
          ti.xcom_push(key="{}_vectordbpath".format(sname), value="{}".format(default_args['vectordbpath']))
          ti.xcom_push(key="{}_temperature".format(sname), value="_{}".format(default_args['temperature']))
          ti.xcom_push(key="{}_topicid".format(sname), value="_{}".format(default_args['topicid']))
          ti.xcom_push(key="{}_enabletls".format(sname), value="_{}".format(default_args['enabletls']))
          ti.xcom_push(key="{}_partition".format(sname), value="_{}".format(default_args['partition']))
          ti.xcom_push(key="{}_vectordbcollectionname".format(sname), value=default_args['vectordbcollectionname'])
          ti.xcom_push(key="{}_ollamacontainername".format(sname), value=default_args['ollamacontainername'])
          ti.xcom_push(key="{}_mainip".format(sname), value=default_args['mainip'])
          ti.xcom_push(key="{}_mainport".format(sname), value="_{}".format(default_args['mainport']))
          ti.xcom_push(key="{}_embedding".format(sname), value=default_args['embedding'])
          ti.xcom_push(key="{}_agents_topic_prompt".format(sname), value=default_args['agents_topic_prompt'])
          ti.xcom_push(key="{}_teamlead_topic".format(sname), value=default_args['teamlead_topic'])
          ti.xcom_push(key="{}_teamleadprompt".format(sname), value=default_args['teamleadprompt'])
          ti.xcom_push(key="{}_supervisor_topic".format(sname), value=default_args['supervisor_topic'])
          ti.xcom_push(key="{}_supervisorprompt".format(sname), value=default_args['supervisorprompt'])

          at=default_args['agenttoolfunctions']
          at=at.replace(SMTP_PASSWORD,'')

          ti.xcom_push(key="{}_agenttoolfunctions".format(sname), value=at)

          ti.xcom_push(key="{}_agent_team_supervisor_topic".format(sname), value=default_args['agent_team_supervisor_topic'])
          ti.xcom_push(key="{}_concurrency".format(sname), value="_{}".format(default_args['concurrency']))
          ti.xcom_push(key="{}_cuda".format(sname), value="_{}".format(default_args['CUDA_VISIBLE_DEVICES']))
          ti.xcom_push(key="{}_agenttopic".format(sname), value="{}".format(default_args['agenttopic']))

          ti.xcom_push(key="{}_contextwindow".format(sname), value="_{}".format(default_args['contextwindow']))

          ti.xcom_push(key="{}_localmodelsfolder".format(sname), value="{}".format(default_args['localmodelsfolder']))

          repo=tsslogging.getrepo()
          if sname != '_mysolution_':
           fullpath="/{}/tml-airflow/dags/tml-solutions/{}/{}".format(repo,pname,os.path.basename(__file__))
          else:
            fullpath="/{}/tml-airflow/dags/{}".format(repo,os.path.basename(__file__))

          wn = windowname('agenticai',sname,sd)
          subprocess.run(["tmux", "new", "-d", "-s", "{}".format(wn)])
          subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "cd /Viper-preprocess-agenticai", "ENTER"])
          subprocess.run(["tmux", "send-keys", "-t", "{}".format(wn), "python {} 1 {} {}{} {} \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" {} {} {} {} \"{}\" \"{}\" {} {} \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" \"{}\" {} \"{}\" \"{}\"".format(fullpath,
                          VIPERTOKEN, HTTPADDR, VIPERHOST, VIPERPORT[1:],
                          default_args['rollbackoffset'],default_args['ollama-model'],default_args['deletevectordbcount'],default_args['vectordbpath'],
                          default_args['temperature'],default_args['topicid'],default_args['enabletls'],
                          default_args['partition'], default_args['vectordbcollectionname'], default_args['ollamacontainername'],
                          default_args['mainip'],default_args['mainport'],default_args['embedding'],
                          default_args['agents_topic_prompt'],default_args['teamlead_topic'],default_args['teamleadprompt'],
                          default_args['supervisor_topic'],default_args['supervisorprompt'],default_args['agenttoolfunctions'],
                          default_args['agent_team_supervisor_topic'],default_args['concurrency'],default_args['CUDA_VISIBLE_DEVICES'],
                          pname,default_args['contextwindow'],default_args['localmodelsfolder'],default_args['agenttopic']),"ENTER"])

   if __name__ == '__main__':
       if len(sys.argv) > 1:
          if sys.argv[1] == "1":
           repo=tsslogging.getrepo()

           VIPERTOKEN = sys.argv[2]
           VIPERHOST = sys.argv[3]
           VIPERPORT = sys.argv[4]

           rollbackoffset =  sys.argv[5]
           ollamamodel =  sys.argv[6]
           deletevectordb =  sys.argv[7]
           vectordbpath=sys.argv[8]
           temperature=sys.argv[9]

           topicid=sys.argv[10]
           enabletls=sys.argv[11]

           partition=sys.argv[12]
           vectordbcollectionname=sys.argv[13]
           ollamacontainername=sys.argv[14]
           mainip=sys.argv[15]
           mainport=sys.argv[16]
           embedding=sys.argv[17]
           agents_topic_prompt=sys.argv[18]
           teamlead_topic=sys.argv[19]
           teamleadprompt=sys.argv[20]
           supervisor_topic=sys.argv[21]
           supervisorprompt=sys.argv[22]
           agenttoolfunctions=sys.argv[23]

           agent_team_supervisor_topic=sys.argv[24]
           concurrency=sys.argv[25]
           cuda =  sys.argv[26]
           pname = sys.argv[27]
           contextwindow = sys.argv[28]
           localmodelsfolder = sys.argv[29]

           agenttopic = sys.argv[30]

          default_args['rollbackoffset']=rollbackoffset
          default_args['ollama-model']=ollamamodel
          default_args['deletevectordbcount']=deletevectordb
          default_args['vectordbpath']=vectordbpath
          default_args['temperature']=temperature
          default_args['topicid']=topicid
          default_args['enabletls']=enabletls
          default_args['partition']=partition
          default_args['vectordbcollectionname']=vectordbcollectionname
          default_args['ollamacontainername']=ollamacontainername
          default_args['mainip']=mainip
          default_args['mainport']=mainport
          default_args['embedding']=embedding
          default_args['agents_topic_prompt']=agents_topic_prompt
          default_args['teamlead_topic']=teamlead_topic
          default_args['teamleadprompt']=teamleadprompt
          default_args['supervisor_topic']=supervisor_topic
          default_args['supervisorprompt']=supervisorprompt
          default_args['agenttoolfunctions']=agenttoolfunctions
          default_args['agent_team_supervisor_topic']=agent_team_supervisor_topic
          default_args['concurrency']=concurrency
          default_args['CUDA_VISIBLE_DEVICES']=cuda
          default_args['contextwindow']=contextwindow
          default_args['localmodelsfolder']=localmodelsfolder
          default_args['agenttopic']=agenttopic

       if "KUBE" not in os.environ:

             tsslogging.locallogs("INFO", "STEP 9b: Starting Ollama container")
             v,buf,mainmodel,mainembedding=startpgptcontainer()
             if v==1:
               tsslogging.locallogs("WARN", "STEP 9b: There seems to be an issue starting the Ollama container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
             else:
               tsslogging.locallogs("INFO", "STEP 9b: Success starting Ollama container.  Here is the run command: {}".format(buf))

             time.sleep(10)  # wait for containers to start
       elif  os.environ["KUBE"] == "0":

             tsslogging.locallogs("INFO", "STEP 9b: Starting ollama server")
             v,buf,mainmodel,mainembedding=startpgptcontainer()
             if v==1:
               tsslogging.locallogs("WARN", "STEP 9b: There seems to be an issue starting the Ollama container.  Here is the run command - try to run it nanually for testing: {}".format(buf))
             else:
               tsslogging.locallogs("INFO", "STEP 9b: Success starting Agentic AI.  Here is the run command: {}".format(buf))

             time.sleep(10)  # wait for containers to start
       else:
             tsslogging.locallogs("INFO", "STEP 9b: [KUBERNETES] Starting Agentic AI - LOOKS LIKE THIS IS RUNNING IN KUBERNETES")
             tsslogging.locallogs("INFO", "STEP 9b: [KUBERNETES] Make sure you have applied the Agentic AI YAML files and have the agentic AI Pod running")

       count=0

           # create the Supervisor and kick off action

   #    llmstatus = get_loaded_models()
    #   print("llmstatus==",llmstatus,pname)

       mainmodels=default_args['ollama-model']

       models = mainmodels.split(",")  #models must be agent,teamlead,supervisor
       embedding=None

       modelsarr = []
       for m in models:
          llmstatus = get_loaded_models()
          checkforloadedmodels(m)
          print("llmstatus==",llmstatus,pname)
          llm,embedding=setollama(m.strip())
          modelsarr.append(llm)


       if len(modelsarr) >2:
         #try:
         actionagents=createactionagents(modelsarr[2],pname)
         supervisorprompt = default_args['supervisorprompt']
         try:
           app=createasupervisor(actionagents,supervisorprompt,modelsarr[2])
         except Exception as e:
           print("Error=",e)
           tsslogging.locallogs("WARN", "STEP 9b unable to create agents {}".format(e))
       else:
          tsslogging.locallogs("WARN","STEP 9b unable to load LLM - Aborting")
          print("WARN", "STEP 9b unable to load LLM - Aborting")
          exit(0)

       deletevectordbcnt=0
       while True:
            deletevectordbcnt +=1
            try:
               agent_topics = default_args['agents_topic_prompt']
               topicjsons=getjsonsfromtopics(agent_topics)
               responses,bufresponses=agentquerytopics(agent_topics,topicjsons,modelsarr[0])
            #try:
               tml_text_engine,deletevectordbcnt=loadtextdataintovectordb(responses,deletevectordbcnt,modelsarr[1])
               teamlead_response,teambuf=teamleadqueryengine(tml_text_engine)
               mainjson,supbuf=invokesupervisor(app,teamlead_response)
               complete=formatcompletejson(bufresponses,teambuf,supbuf)

               if default_args['agent_team_supervisor_topic']!='':
                 producegpttokafka(complete,default_args['agent_team_supervisor_topic'])

               time.sleep(1)
            except Exception as e:
             print("Error=",e)
             if count == 0:
               tsslogging.locallogs("ERROR", "STEP 9b: Agentic AI Step 9b DAG in {} {}  Aborting after 10 consecutive errors.".format(os.path.basename(__file__),e))
               tsslogging.tsslogit("PrivateGPT Step 9b DAG in {} {} Aborting after 10 consecutive errors.".format(os.path.basename(__file__),e), "ERROR" )
               tsslogging.git_push("/{}".format(repo),"Entry from {}".format(os.path.basename(__file__)),"origin")
             time.sleep(5)
             count = count + 1
             if count > 600:
               break

7.14. STEP 9b DAG Core Parameter Explanation

Step 9b DAG parameter

Explanation

ollamacontainername

Use this Ollama container:

This Containers will run your LLM locally.

rollbackoffset

This determines how much data to process.

agents_topic_prompt

This is the field where you tell the agent which topic to monitor and the prompt.

FORMAT: topic agent will monitor<<-prompt you want for the agent to answer->>

For example: “testtopic<<-Do you seee any issues in the real-time json data?->>”

Separate multiple topics by a ->>

You can also add <<data>> in the prompt you want the agent to answer. For example:

“testtopic<<-Do you seee any issues in the real-time json data. Here is data <<data>>?->>”

teamlead_topic

This topic will contain all of the team lead responses.

teamleadprompt

Enter the prompt for the Team Lead agent.

supervisor_topic

All supervisor responses are stored in this topic.

supervisorprompt

Enter the prompt for the supervisor.

agenttoolfunctions

This is the key field that will link the tools (python functions) to the

supervisor agent that will execure the tools. The tools are feined in ref:STEP 9b: Agents’ Tools

FORMAT: <tool_function<<-agent_name<<-system_prompt->>tool_function2<<-agent_name2<<-sysemt_prompt2->>….>

For example if connecting to the send_email tools then agenttoolfunctions is:

“””send_email<<-send_email<<- You are an email-sending agent. Use smtp parameters

to send emails when there is an anomaly in the data, make sure to

indicate the device name in the mainuid field. do not write a

smtp script, actually send the email using the SMTP parameters

smtp_server=’{}’ smtp_port={} username=’{}’ password=’{}’ sender=’{}’ recipient=’{}’ subject=’’ body=’’->>

“””.format(SMTP_SERVER,SMTP_PORT,SMTP_USERNAME,SMTP_PASSWORD,SMTP_USERNAME,recipient)

Note: the delimiters <<- (separates tool function, agent name and agent promt

and ->> ends the tool funtion, and starts another one.

The variables SMTP_SERVER,SMTP_PORT,SMTP_USERNAME,

SMTP_PASSWORD,SMTP_USERNAME,recipient should be defined

as environmental variables when starting the TSS container.

agent_team_supervisor_topic

This topic will contain responses from the individual agents,

team lead, and supervisor. See ref:Sample Output from TML Multi-Agentic AI Solution

mainip

This is the IP to the Ollama container.

mainport

This is the port Ollama server is listening on i.e. 11434

embedding

This is the embedding used in the Vector DB.

TML Multi-Agentic AI solution uses from llama_index.core.indices.vector_store.base import VectorStoreIndex

TML recommends the embedding: nomic-embed-text

temperature

This is the temperature for the Ollama model.

A temperature of 0 means LLM will be conservative, 1 means it may hallucinate.

ollama-model

The Ollama LLM models to use. Any Ollama model with tools training can be used.

Note: In this field you need to specify a model for: topic agent, team lead agent and supervisor agent

For example: ‘ollama-model’: ‘phi3:3.8b,phi3:3.8b,llama3.2:3b’

this tells TML to use phi3:3.8b for both the topic agents and team lead and

llama3.2:3b for the supervisor agent.

deletevectordbcount

This count determines how much data to save in the vector DB. A higher number will cause more data in the

vector DB which would give the LLM more memory to base its responses.

vectordbpath

This is the path to the vector store on disk.

contextwindow

Enter the context window for the LLM. This will vary for each LLM. Higher windows

will require more VRAM.

localmodelsfolder

Enter the local path where LLM models will be saved. It is important to cache the LLM

from Ollama to improve LLM loading times.

7.15. Example of 9b Configuration Parameters

Below is an example of the configurations of Dag 9b above. In this example, we connect the send_email function in the Agenttools.py file to the supervisor agent. Note, that the SMTP parameters are environmental variables that are set when the solution container or TSS container is started.

default_args = {
 'owner': 'Sebastian Maurice',   # <<< *** Change as needed
 'ollamacontainername' : 'maadsdocker/tml-privategpt-with-gpu-nvidia-amd64-llama3-tools', #'maadsdocker/tml-privategpt-no-gpu-amd64',  # enter a valid container https://hub.docker.com/r/maadsdocker/tml-privategpt-no-gpu-amd64
 'rollbackoffset' : '15',  # <<< *** Change as needed
 'offset' : '-1', # leave as is
 'enabletls' : '1', # change as needed
 'brokerhost' : '', # <<< *** Leave as is
 'brokerport' : '-999', # <<< *** Leave as is
 'microserviceid' : '',  # change as needed
 'topicid' : '-999', # leave as is
 'delay' : '100', # change as needed
 'companyname' : 'otics',  # <<< *** Change as needed
 'consumerid' : 'streamtopic',  # <<< *** Leave as is
 'agenttopic' : 'agent-responses', # this topic containes the individual agent responses
 'agents_topic_prompt' : """
        iot-preprocess<<-You are a precise data analysis assistant. Your task is to point out any anomalies or interesting insights that could help improve the performance and functioning of
        IoT device.  The json data are from IOT devices.  the hp field shows the data that are processed for the process variable (pv), using the process types (pt) like:
        avg or average, or trend analysis, or anomprob (i.e. anomaly probability) etc.  The device being processed is in the uid field of the json.
         here is the json data:

          <<data>>

         INSTRUCTIONS:
         1. Examine each number in the json array
         2. Provide a brief analysis of the results

         FORMAT YOUR RESPONSE:
         - Filtered results: [list the qualifying numbers with their "uid" fields]
         - Count of qualifying numbers: [number]
         - Analysis: [brief explanation of what the filter revealed]

         Be precise and concise in your response.->>
        iot-ml-prediction-results-output<<-You are a precise data analysis assistant. Your task is to filter and analyze numeric data based on specified criteria.

        TASK: Filter numbers from the given json array using the threshold: greater than 90

        Input JSON arrary:

             <<data>>

         INSTRUCTIONS:
         1. Examine each number in the json array
         2. Apply the filter condition: number > 90
         3. Return only numbers that meet the criteria with their "uid" fields
         4. If no numbers meet the criteria, explicitly state this
         5. Provide a brief analysis of the results

         FORMAT YOUR RESPONSE:
         - Filtered results: [list the qualifying numbers with their "uid" fields]
         - Count of qualifying numbers: [number]
         - Analysis: [brief explanation of what the filter revealed]

         Be precise and concise in your response.
""", # <topic agent will monitor:prompt you want for the agent>
 'teamlead_topic' : 'team-lead-responses', # Enter the team lead topic - all team lead responses will be written to this topic
 'teamleadprompt' : """
         Analyze the dataset containing IoT device monitoring records managed by individual agents.
         Review all data fields to determine whether there are any issues or major concerns requiring urgent attention.

         Focus on the following criteria:
         1. Each record contains a unique device identifier stored in the field "uid".
         2. Examine the failure probability for each device stored in the hp field.
         3. Categorize the probabilities as follows:
          - Low: 0% to 50%
          - Medium: 51% to 75%
          - High: 76% to 89%
          - Urgent: 90% to 100%

        Tasks:
        - Identify and highlight devices (by their "uid") that have **urgent failure probabilities** (≥ 90%).
        - For each flagged device, provide details and reasoning on why it may require immediate investigation.
        - Only include devices that meet the urgent threshold. Do not report on low, medium, or high categories unless relevant for context.
        - State clearly whether the identified issue is *urgent*.
        - Do not use or generate any code; perform a reasoning-based analysis directly from the provided data.

""", # Enter the team lead prompt
'supervisor_topic' : 'supervisor-responses', # Enter the supervisor topic - all supervisor responses will be written to this topic
'supervisorprompt' : """
        You are a team supervisor analyzing operational device data and recommending whether an alert email should be send.
        You manage a send email expert and a average expert.
        For send email, use send_email agent.
        For average, use average agent.

       INSTRUCTIONS:
       1.Analyze the Team Lead assessment and determine the proper action:
       - If devices are marked urgent or failure probabilities exceed 90%, select "send_email".
       - If no urgent devices are found or probabilities remain below thresholds, then no action is needed.
""", # Enter the supervisor prompt
 'agenttoolfunctions' : """
        send_email<<-send_email<<- You are an email-sending agent. Use smtp parameters to send emails when there is an anomaly in the data, make sure to
                     indicate the device name in the mainuid field. do not write a smtp script, actually send the email using the SMTP parameters
                     smtp_server='{}'
                     smtp_port={}
                     username='{}'
                     password='{}'
                     sender='{}'
                     recipient='{}'
                     subject=''
                     body=''->>
        average<<-average<<-You are an average agent.  Take average of the device failure probabilities.
""".format(SMTP_SERVER,SMTP_PORT,SMTP_USERNAME,SMTP_PASSWORD,SMTP_USERNAME,recipient),  # enter the tools : tool_function is the name of the funtions in the agenttools python file
 'agent_team_supervisor_topic': 'all-agents-responses', # this topic will hold the responses from agents, team lead and supervisor
'producerid' : 'agentic-ai',   # <<< *** Leave as is
 'identifier' : 'This is analysing TML output with Agentic AI',
 'mainip': 'http://127.0.0.1', # Ollama server container listening on this host
 'mainport' : '11434', # Ollama listening on this port
 'embedding': 'nomic-embed-text', # Embedding model
 'preprocesstype' : '', # Leave as is
 'partition' : '-1', # Leave as is
 'vectordbcollectionname' : 'tml-llm-model-v2', # change as needed
 'concurrency' : '2', # change as needed Leave at 1
 'CUDA_VISIBLE_DEVICES' : '0', # change as needed
 'temperature' : '0.1', # This value ranges between 0 and 1, it controls how conservative LLM model will be, if 0 very very, if 1 it will hallucinate
 #--------------------
 'ollama-model': 'phi3:3.8b,phi3:3.8b,llama3.2:3b', # maximum  3 models can be specified: agent,teamlead,supervisor
 'deletevectordbcount': '5',
 'vectordbpath': '/rawdata/vectordb',
 'contextwindow': '4096',
 'localmodelsfolder': '/mnt/c/maads/tml-airflow/rawdata/ollama'
}

7.16. STEP 9b: Agents’ Tools

Below code allows users to incorporate any tools they want to their TML multi-agentic solutions.

Note

If your tool special Python libraries you can easily install these libraries using the def install_package(package_name, importname):

This gives tremendous flexibility in integrating tools that the AI cn execute in real-time..ie send_mail tool is added as an example.

You integrate the tools to your solution by configuring the agenttoolfunctions in Step 9b DAG.

# Agent Tool
from langchain_core.tools import tool
from email.mime.text import MIMEText
from email.message import EmailMessage
import smtplib
#from langchain_tavily import TavilySearch
import subprocess
import sys

"""
You must define all your tools here for your agents to execute
You can define as many agents tools you want

YOU MUST ALSO update funcname

funcname = ["web_search:search_agent:You are a search expert","add:math_expert:You are a math expert","maxagent:max_agent:You find the company with maximum employees"]

The format is funcname = ["<function name>,<function_name>:<agent name>:<prompt>","<function name>:<agent name>:<prompt>",...]

NOTE: You can assign multiple functions to agents - separate multiple functions by a comma
"""

# if your tool requires a package you can install it using the install_package function
# the function will check if package is already installed
def install_package(package_name, importname):
    """
    Installs a specified Python package using pip.
    """
    try:
        __import__(importname)
    except ImportError:
        print(f"Package '{package_name}' not found. Attempting to install...")
        try:
            subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
            print(f"Package '{package_name}' installed successfully.")
        except subprocess.CalledProcessError as e:
            print(f"Error installing package '{package_name}': {e}")

#install_package("langchain-tavily","from langchain_tavily import TavilySearch")

# SendEmail by Agent
@tool
def send_email(smtp_server: str, smtp_port: int, username: str, password: str,
                    sender: str, recipient: str, subject: str, body: str) -> bool:
    """
    Sends an email reply via SMTP using the generated response.
    """

    recemails = recipient.split(",")

    try:
        # Use the updated format_email which preserves body line breaks
        msg = EmailMessage()
        msg["Subject"] = subject
        msg["From"] = username
        msg["To"] = recipient
        msg.set_content(body)

        with smtplib.SMTP(smtp_server, int(smtp_port)) as server:
            server.starttls()
            server.login(username, password)
#            server.send_message(msg)
            server.sendmail(username, recemails, msg.as_string())

        return True
    except Exception as e:
        print("Failed to send email:", e)
        return False

#send_email({"smtp_server":"smtp.gmail.com","smtp_port":587,"username":SMTP_USERNAME,"password":SMTP_PASSWORD,"sender":SMTP_USERNAME,"recipient":recipientlist,"subject":"test","body":"test 2"})

# Example: Add two numbers
@tool
def add(a: float, b: float) -> float:
    '''Add two numbers.'''
    return a + b


@tool
def web_search(query: str) -> str:
    '''Search the web for information.'''
    return "Searched the web"

@tool
def max_agent(query: list) -> int:
    '''Find the company with the most employees.'''
    print(query)
    return max(query)

@tool
def average(query: list) -> int:
    '''Find the average.'''
    average=0.0
    if len(query) !=0:
      average = sum(query) / len(query)
      average = round(average, 2)
    return average

7.17. STEP 10: Create TML Solution Documentation: tml-system-step-10-documentation-dag

Note

TSS will automatically generate documentation for your solution at READTHEDOCS. Each TML solution you create will have its own documentation that will detail the solution parameters in the DAGs. This is another unique and powerful feature of the TSS. This enables you to share your documentation with others - almost instantly!

Tip

The TSS will develop the base documentation for your solution.

Note. Your documentation URL will be: https://<Your Solution Name>.readthedocs.io

Your Solution Name is the name you chose here: Lets Start Building a TML Solution plus first 4 characters of your ReadTheDocs token. This project is committed under the tml-solutions folder in Github.

Watch the YouTube to see how to configure this Dag: YouTube Video

   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.operators.bash import BashOperator
   from datetime import datetime
   from airflow.decorators import dag, task
   import os
   import sys
   import requests
   import json
   import subprocess
   import tsslogging
   import shutil
   from git import Repo
   import time
   sys.dont_write_bytecode = True

   ######################################################USER CHOSEN PARAMETERS ###########################################################
   default_args = {
    'conf_project' : 'Transactional Machine Learning (TML)',
    'conf_copyright' : '2024, Otics Advanced Analytics, Incorporated - For Support email support@otics.ca',
    'conf_author' : 'Sebastian Maurice',
    'conf_release' : '0.1',
    'conf_version' : '0.1.0',
    'dockerenv': '', # add any environmental variables for docker must be: variable1=value1, variable2=value2
    'dockerinstructions': '', # add instructions on how to run the docker container
   }

   ############################################################### DO NOT MODIFY BELOW ####################################################

   def triggerbuild(sname):

           URL = "https://readthedocs.org/api/v3/projects/{}/versions/latest/builds/".format(sname)
           TOKEN = os.environ['READTHEDOCS']
           HEADERS = {'Authorization': f'token {TOKEN}'}
           response = requests.post(URL, headers=HEADERS)
           print(response.json())

   def updatebranch(sname,branch):

           URL = "https://readthedocs.org/api/v3/projects/{}/".format(sname)
           TOKEN = os.environ['READTHEDOCS']
           HEADERS = {'Authorization': f'token {TOKEN}'}
           data={
               "name": "{}".format(sname),
               "repository": {
                   "url": "https://github.com/{}/{}".format(os.environ['GITUSERNAME'],sname),
                   "type": "git"
               },
               "default_branch": "{}".format(branch),
               "homepage": "http://template.readthedocs.io/",
               "programming_language": "py",
               "language": "en",
               "privacy_level": "public",
               "external_builds_privacy_level": "public",
               "tags": [
                   "automation",
                   "sphinx"
               ]
           }
           response = requests.patch(
               URL,
               json=data,
               headers=HEADERS,
           )

   def setupurls(projectname,producetype,sname):

       ptype=""
       if producetype=="LOCALFILE":
         ptype=producetype
       elif producetype=="REST":
         ptype="RESTAPI"
       elif producetype=="MQTT":
         ptype=producetype
       elif producetype=="gRPC":
         ptype=producetype


       stepurl1="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_1_getparams_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl2="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_2_kafka_createtopic_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl3="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_read_{}_step_3_kafka_producetotopic_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,ptype,projectname)
       stepurl4="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_4_kafka_preprocess_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl4a="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_4a_kafka_preprocess_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl4b="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_4b_kafka_preprocess_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl4c="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_4c_kafka_preprocess_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl5="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_5_kafka_machine_learning_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl6="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_6_kafka_predictions_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl7="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_7_kafka_visualization_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl8="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_8_deploy_solution_to_docker_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl9="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_9_privategpt_qdrant_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl9b="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_9b_agenticai_dag-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)
       stepurl10="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/tml_system_step_10_documentation_dag_tml-multi-agenticai-iot-3f10-{}.py".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,projectname)

       print("stepurl1=",stepurl1)

       doparse("/{}/docs/source/details.rst".format(sname), ["--step1url--;{}".format(stepurl1)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step2url--;{}".format(stepurl2)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step3url--;{}".format(stepurl3)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step4url--;{}".format(stepurl4)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step4aurl--;{}".format(stepurl4a)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step4burl--;{}".format(stepurl4b)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step4curl--;{}".format(stepurl4c)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step5url--;{}".format(stepurl5)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step6url--;{}".format(stepurl6)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step7url--;{}".format(stepurl7)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step8url--;{}".format(stepurl8)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step9url--;{}".format(stepurl9)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step9burl--;{}".format(stepurl9b)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--step10url--;{}".format(stepurl10)])

   def doparse(fname,farr):
         data = ''
         try:
          with open(fname, 'r', encoding='utf-8') as file:
           data = file.readlines()
           r=0
           for d in data:
               for f in farr:
                   fs = f.split(";")
                   if fs[0] in d:
                       data[r] = d.replace(fs[0],fs[1])
               r += 1
          with open(fname, 'w', encoding='utf-8') as file:
           file.writelines(data)
         except Exception as e:
            pass

   def updateollamaandpgpt(op,ollamacontainername,concurrency,collection,temp,rollback,ollama,deletevector,vectordbpath,topicid,enabletls,partition,mainip,
                          mainport,embedding,agents_topic_prompt,teamlead_topic,teamleadprompt,supervisor_topic,supervisorprompt,agenttoolfunctions,agent_team_supervisor_topic,contextwindow,
                          pvectorsearchtype,ptemperature,pcollection,pconcurrency,pvectordimension,pcontextwindowsize,mainmodel,mainembedding,pgptcontainername):
         print("update==",op)
         if ollamacontainername != None:
          doparse("/{}/ollama.yml".format(op), ["--ollamacontainername--;{}".format(ollamacontainername)])
          doparse("/{}/ollama.yml".format(op), ["--agenticai-kubeconcur--;{}".format(concurrency[1:])])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-kubecollection--;{}".format(collection)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-kubetemperature--;{}".format(temp)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-rollbackoffset--;{}".format(rollback)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-ollama-model--;{}".format(ollama)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-deletevectordbcount--;{}".format(deletevector)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-vectordbpath--;{}".format(vectordbpath)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-topicid--;{}".format(topicid)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-enabletls--;{}".format(enabletls)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-partition--;{}".format(partition)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-vectordbcollectionname--;{}".format(collection)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-ollamacontainername--;{}".format(ollamacontainername)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-mainip--;{}".format(mainip)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-mainport--;{}".format(mainport)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-embedding--;{}".format(embedding)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-agents_topic_prompt--;{}".format(agents_topic_prompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-teamlead_topic--;{}".format(teamlead_topic)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-teamleadprompt--;{}".format(teamleadprompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-supervisor_topic--;{}".format(supervisor_topic)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-supervisorprompt--;{}".format(supervisorprompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-agenttoolfunctions--;{}".format(agenttoolfunctions.strip().replace('\n','').replace("\\n","").replace("'","").replace(";","=="))])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-agent_team_supervisor_topic--;{}".format(agent_team_supervisor_topic)])
          doparse("/{}/ollama.yml".format(op),  ["--agenticai-contextwindow--;{}".format(contextwindow)])

         if pgptcontainername != None:
          doparse("/{}/privategpt.yml".format(op), ["--kubevectorsearchtype--;{}".format(pvectorsearchtype)])
          doparse("/{}/privategpt.yml".format(op), ["--kubetemperature--;{}".format(ptemperature[1:])])
          doparse("/{}/privategpt.yml".format(op), ["--kubecollection--;{}".format(pcollection)])
          doparse("/{}/privategpt.yml".format(op), ["--kubeconcur--;{}".format(pconcurrency[1:])])
          doparse("/{}/privategpt.yml".format(op), ["--kubevectordimension--;{}".format(pvectordimension[1:])])
          doparse("/{}/privategpt.yml".format(op), ["--kubecontextwindowsize--;{}".format(pcontextwindowsize[1:])])
          doparse("/{}/privategpt.yml".format(op), ["--kubemainmodel--;{}".format(mainmodel)])
          doparse("/{}/privategpt.yml".format(op), ["--kubemainembedding--;{}".format(mainembedding)])
          doparse("/{}/privategpt.yml".format(op), ["--kubeprivategpt--;{}".format(pgptcontainername)])

   def copyymls(projectname,sname,ingressyml,solutionyml):
       orepo=tsslogging.getrepo()
       op=f"/{orepo}/tml-airflow/dags/tml-solutions/{projectname}/ymls"
       os.makedirs(op, exist_ok=True)
       op=f"/{orepo}/tml-airflow/dags/tml-solutions/{projectname}/ymls/{sname}"
       os.makedirs(op, exist_ok=True)

       tsslogging.writeoutymls(op,ingressyml,solutionyml,sname)
       return op

   def generatedoc(**context):
       istss1=1
       if 'TSS' in os.environ:
         if os.environ['TSS'] == "1":
           istss1=1
         else:
           istss1=0

       if 'tssdoc' in os.environ:
           if os.environ['tssdoc']=="1":
               return

       sd = context['dag'].dag_id
       sname=context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutionname".format(sd))
   #    rtdsname = tsslogging.rtdprojects(sname,sd)

       kube=0
       step9prompt=''
       step9context=''
       step9keyattribute=''
       step9keyprocesstype=''
       step9hyperbatch=''
       step9vectordbcollectionname=''
       step9concurrency=''
       cudavisibledevices=''
       step9docfolder=''
       step9docfolderingestinterval=''
       step9useidentifierinprompt=''
       step5processlogic=''
       step5independentvariables=''
       step9searchterms=''
       step9streamall=''
       step9temperature=''
       step9vectorsearchtype=''
       step9pcontextwindowsize=''
       step9pgptcontainername=''
       step9pgpthost=''
       step9pgptport=''
       step9vectordimension=''
       step4crawdatatopic=''
       step4csearchterms=''
       step4crememberpastwindows=''
       step4cpatternwindowthreshold=''
       step4crtmsstream=''
       step4crtmsscorethreshold=''
       step4cattackscorethreshold=''
       step4cpatternscorethreshold=''
       step4clocalsearchtermfolder=''
       step4clocalsearchtermfolderinterval=''
       step4crtmsfoldername=''
       step3localfileinputfile=''
       step3localfiledocfolder=''
       step4crtmsmaxwindows=''
       rtmsoutputurl=""
       mloutputurl=""

       step2raw_data_topic=""
       step2preprocess_data_topic=""
       step4raw_data_topic=""
       step4preprocess_data_topic=''
       step4preprocesstypes=""
       step4jsoncriteria=""
       step4ajsoncriteria=""
       step4amaxrows=""
       step4apreprocesstypes=""
       step4araw_data_topic=""
       step4apreprocess_data_topic=""
       step4bpreprocesstypes=""
       step4bjsoncriteria=""
       step4bmaxrows=""
       step4braw_data_topic=""
       step4bpreprocess_data_topic=""

       step9brollback=""
       step9bdeletevectordbcount=""
       step9bvectordbpath=""
       step9btemperature=""
       step9bvectordbcollectionname=""
       step9bollamacontainername=""
       step9bCUDA_VISIBLE_DEVICES=""
       step9bmainip=""
       step9bmainport=""
       step9bembedding=""
       step9bagents_topic_prompt=""
       step9bteamlead_topic=""
       step9bteamleadprompt=""
       step9bsupervisor_topic=""
       step9bagenttoolfunctions=""
       step9bagent_team_supervisor_topic=""
       step9bconcurrency=""
       step9bollama=""
       step9btopicid=""
       step9benabletls=""
       step9bpartition=""
       step9bsupervisorprompt=""
       step9bcontextwindow=""
       step9blocalmodelsfolder=""
       step9bagenttopic=""

       if "KUBE" in os.environ:
             if os.environ["KUBE"] == "1":
                kube=1
                return

       tsslogging.locallogs("INFO", "STEP 10: Started to build the documentation")
       producinghost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODCE".format(sname))
       producingport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPRODUCE".format(sname))
       preprocesshost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS".format(sname))
       preprocessport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS".format(sname))
       preprocesshost2 = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS2".format(sname))
       preprocessport2 = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS2".format(sname))

       mlhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTML".format(sname))
       mlport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTML".format(sname))
       predictionhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREDICT".format(sname))
       predictionport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREDICT".format(sname))
       dashboardhtml = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_dashboardhtml".format(sname))
       vipervizport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERVIZPORT".format(sname))
       solutionvipervizport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONVIPERVIZPORT".format(sname))
       airflowport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_AIRFLOWPORT".format(sname))
       mqttusername = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_MQTTUSERNAME".format(sname))
       kafkacloudusername = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_KAFKACLOUDUSERNAME".format(sname))
       projectname = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_projectname".format(sd))
       externalport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_EXTERNALPORT".format(sname))
       solutionexternalport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONEXTERNALPORT".format(sname))

       solutionairflowport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONAIRFLOWPORT".format(sname))

       hpdehost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOST".format(sname))
       hpdeport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORT".format(sname))

       hpdepredicthost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOSTPREDICT".format(sname))
       hpdepredictport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORTPREDICT".format(sname))

       subprocess.call(["sed", "-i", "-e",  "s/--project--/{}/g".format(default_args['conf_project']), "/{}/docs/source/conf.py".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--copyright--/{}/g".format(default_args['conf_copyright']), "/{}/docs/source/conf.py".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--author--/{}/g".format(default_args['conf_author']), "/{}/docs/source/conf.py".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--release--/{}/g".format(default_args['conf_release']), "/{}/docs/source/conf.py".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--version--/{}/g".format(default_args['conf_version']), "/{}/docs/source/conf.py".format(sname)])

       stitle = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutiontitle".format(sname))
       sdesc = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_solutiondescription".format(sname))
       brokerhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_brokerhost".format(sname))
       brokerport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_brokerport".format(sname))
       cloudusername = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_cloudusername".format(sname))
       cloudpassword = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_cloudpassword".format(sname))

       subprocess.call(["sed", "-i", "-e",  "s/--solutionname--/{}/g".format(sname), "/{}/docs/source/index.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--solutiontitle--/{}/g".format(stitle), "/{}/docs/source/index.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--solutiondescription--/{}/g".format(sdesc), "/{}/docs/source/index.rst".format(sname)])

       projecturl="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname)

       doparse("/{}/docs/source/index.rst".format(sname), ["--projectname--;{}".format(projectname)])

       subprocess.call(["sed", "-i", "-e",  "s/--solutionname--/{}/g".format(sname), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--sname--/{}/g".format(sname), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--stitle--/{}/g".format(stitle), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--sdesc--/{}/g".format(sdesc), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--brokerhost--/{}/g".format(brokerhost), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--brokerport--/{}/g".format(brokerport[1:]), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--cloudusername--/{}/g".format(cloudusername), "/{}/docs/source/details.rst".format(sname)])

       subprocess.call(["sed", "-i", "-e",  "s/--solutiontitle--/{}/g".format(stitle), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--solutiondescription--/{}/g".format(sdesc), "/{}/docs/source/details.rst".format(sname)])


       companyname = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_companyname".format(sname))
       myname = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_myname".format(sname))
       myemail = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_myemail".format(sname))
       mylocation = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_mylocation".format(sname))
       replication = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_replication".format(sname))
       numpartitions = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_numpartitions".format(sname))
       enabletls = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_enabletls".format(sname))
       microserviceid = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_microserviceid".format(sname))
       raw_data_topic = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_raw_data_topic".format(sname))
       step2raw_data_topic=raw_data_topic
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_preprocess_data_topic".format(sname))
       step2preprocess_data_topic=preprocess_data_topic
       ml_data_topic = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_ml_data_topic".format(sname))
       prediction_data_topic = context['ti'].xcom_pull(task_ids='step_2_solution_task_createtopic',key="{}_prediction_data_topic".format(sname))

       subprocess.call(["sed", "-i", "-e",  "s/--companyname--/{}/g".format(companyname), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--myname--/{}/g".format(myname), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--myemail--/{}/g".format(myemail), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--mylocation--/{}/g".format(mylocation), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--replication--/{}/g".format(replication[1:]), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--numpartitions--/{}/g".format(numpartitions[1:]), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--enabletls--/{}/g".format(enabletls[1:]), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--microserviceid--/{}/g".format(microserviceid), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--ml_data_topic--/{}/g".format(ml_data_topic), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--prediction_data_topic--/{}/g".format(prediction_data_topic), "/{}/docs/source/details.rst".format(sname)])

       PRODUCETYPE = ""
       TOPIC = ""
       PORT = ""
       IDENTIFIER = ""
       HTTPADDR = ""
       FROMHOST = ""
       TOHOST = ""
       CLIENTPORT = ""
       snamertd = sname.replace("_", "-")
       PRODUCETYPE = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_PRODUCETYPE".format(sname))
       TOPIC = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_TOPIC".format(sname))
       PORT = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_PORT".format(sname))
       IDENTIFIER = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_IDENTIFIER".format(sname))
       HTTPADDR = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_HTTPADDR".format(sname))
       FROMHOST = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_FROMHOST".format(sname))
       TOHOST = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_TOHOST".format(sname))

       CLIENTPORT = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_CLIENTPORT".format(sname))
       TSSCLIENTPORT = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_TSSCLIENTPORT".format(sname))
       TMLCLIENTPORT = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_TMLCLIENTPORT".format(sname))

       setupurls(projectname,PRODUCETYPE,sname)

       if PRODUCETYPE=='LOCALFILE':
         inputfile = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_inputfile".format(sname))
         step3localfileinputfile=inputfile
         docfolderprocess = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_docfolder".format(sname))
         step3localfiledocfolder=docfolderprocess
         doctopic = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_doctopic".format(sname))
         chunks = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_chunks".format(sname))
         docingestinterval = context['ti'].xcom_pull(task_ids='step_3_solution_task_producetotopic',key="{}_docingestinterval".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--docfolderprocess--;{}".format(docfolderprocess)])
         doparse("/{}/docs/source/details.rst".format(sname), ["--doctopic--;{}".format(doctopic)])
         doparse("/{}/docs/source/details.rst".format(sname), ["--chunks--;{}".format(chunks[1:])])
         doparse("/{}/docs/source/details.rst".format(sname), ["--docingestinterval--;{}".format(docingestinterval[1:])])
         doparse("/{}/docs/source/details.rst".format(sname), ["--inputfile--;{}".format(inputfile)])

       subprocess.call(["sed", "-i", "-e",  "s/--PRODUCETYPE--/{}/g".format(PRODUCETYPE), "/{}/docs/source/details.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--TOPIC--/{}/g".format(TOPIC), "/{}/docs/source/details.rst".format(sname)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--PORT--;{}".format(PORT[1:])])
       doparse("/{}/docs/source/details.rst".format(sname), ["--HTTPADDR--;{}".format(HTTPADDR)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--FROMHOST--;{}".format(FROMHOST)])
       doparse("/{}/docs/source/details.rst".format(sname), ["--TOHOST--;{}".format(TOHOST)])

       doparse("/{}/docs/source/details.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])
       doparse("/{}/docs/source/index.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])
       doparse("/{}/docs/source/operating.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])
       doparse("/{}/docs/source/logs.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--datetime--;{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))])

       if len(CLIENTPORT) > 1:
         doparse("/{}/docs/source/details.rst".format(sname), ["--CLIENTPORT--;{}".format(CLIENTPORT[1:])])
         doparse("/{}/docs/source/details.rst".format(sname), ["--TSSCLIENTPORT--;{}".format(TSSCLIENTPORT[1:])])
         doparse("/{}/docs/source/details.rst".format(sname), ["--TMLCLIENTPORT--;{}".format(TMLCLIENTPORT[1:])])
       else:
         doparse("/{}/docs/source/details.rst".format(sname), ["--CLIENTPORT--;Not Applicable"])
         doparse("/{}/docs/source/details.rst".format(sname), ["--TSSCLIENTPORT--;Not Applicable"])
         doparse("/{}/docs/source/details.rst".format(sname), ["--TMLCLIENTPORT--;Not Applicable"])

       doparse("/{}/docs/source/details.rst".format(sname), ["--IDENTIFIER--;{}".format(IDENTIFIER)])

       subprocess.call(["sed", "-i", "-e",  "s/--ingestdatamethod--/{}/g".format(PRODUCETYPE), "/{}/docs/source/details.rst".format(sname)])

       raw_data_topic = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_raw_data_topic".format(sname))
       if raw_data_topic:
         step4raw_data_topic=raw_data_topic
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_preprocess_data_topic".format(sname))
       if preprocess_data_topic:
         step4preprocess_data_topic=preprocess_data_topic
       preprocessconditions = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_preprocessconditions".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_delay".format(sname))
       array = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_array".format(sname))
       saveasarray = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_saveasarray".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_topicid".format(sname))
       rawdataoutput = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_rawdataoutput".format(sname))
       asynctimeout = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_asynctimeout".format(sname))
       timedelay = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_timedelay".format(sname))
       usemysql = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_usemysql".format(sname))
       preprocesstypes = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_preprocesstypes".format(sname))
       if preprocesstypes:
         step4preprocesstypes=preprocesstypes
       pathtotmlattrs = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_pathtotmlattrs".format(sname))
       identifier = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_identifier".format(sname))
       jsoncriteria = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_jsoncriteria".format(sname))
       if jsoncriteria:
         step4jsoncriteria=jsoncriteria
       maxrows4 = context['ti'].xcom_pull(task_ids='step_4_solution_task_preprocess',key="{}_maxrows".format(sname))
       if maxrows4:
         step4maxrows=maxrows4

       if preprocess_data_topic:
           subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocessconditions--/{}/g".format(preprocessconditions), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--array--/{}/g".format(array[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--saveasarray--/{}/g".format(saveasarray[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rawdataoutput--/{}/g".format(rawdataoutput[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--asynctimeout--/{}/g".format(asynctimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--timedelay--/{}/g".format(timedelay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocesstypes--/{}/g".format(preprocesstypes), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--pathtotmlattrs--/{}/g".format(pathtotmlattrs), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--identifier--/{}/g".format(identifier), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--jsoncriteria--/{}/g".format(jsoncriteria), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows--/{}/g".format(maxrows4[1:]), "/{}/docs/source/details.rst".format(sname)])

       raw_data_topic = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_raw_data_topic".format(sname))
       if raw_data_topic:
         step4araw_data_topic=raw_data_topic
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_preprocess_data_topic".format(sname))
       if preprocess_data_topic:
         step4apreprocess_data_topic=preprocess_data_topic
       preprocessconditions = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_preprocessconditions".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_delay".format(sname))
       array = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_array".format(sname))
       saveasarray = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_saveasarray".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_topicid".format(sname))
       rawdataoutput = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_rawdataoutput".format(sname))
       asynctimeout = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_asynctimeout".format(sname))
       timedelay = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_timedelay".format(sname))
       usemysql = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_usemysql".format(sname))
       preprocesstypes = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_preprocesstypes".format(sname))
       if preprocesstypes:
         step4apreprocesstypes=preprocesstypes
       pathtotmlattrs = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_pathtotmlattrs".format(sname))
       identifier = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_identifier".format(sname))
       jsoncriteria = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_jsoncriteria".format(sname))
       if jsoncriteria:
        step4ajsoncriteria=jsoncriteria
       maxrows4 = context['ti'].xcom_pull(task_ids='step_4a_solution_task_preprocess',key="{}_maxrows".format(sname))
       if maxrows4:
         step4amaxrows=maxrows4

       if preprocess_data_topic:
           subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic1--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic1--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocessconditions1--/{}/g".format(preprocessconditions), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay1--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--array1--/{}/g".format(array[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--saveasarray1--/{}/g".format(saveasarray[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid1--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rawdataoutput1--/{}/g".format(rawdataoutput[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--asynctimeout1--/{}/g".format(asynctimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--timedelay1--/{}/g".format(timedelay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocesstypes1--/{}/g".format(preprocesstypes), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--pathtotmlattrs1--/{}/g".format(pathtotmlattrs), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--identifier1--/{}/g".format(identifier), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--jsoncriteria1--/{}/g".format(jsoncriteria), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows1--/{}/g".format(maxrows4[1:]), "/{}/docs/source/details.rst".format(sname)])

       raw_data_topic = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_raw_data_topic".format(sname))
       if raw_data_topic:
          step4braw_data_topic=raw_data_topic
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_preprocess_data_topic".format(sname))
       if preprocess_data_topic:
           step4bpreprocess_data_topic=preprocess_data_topic
       preprocessconditions = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_preprocessconditions".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_delay".format(sname))
       array = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_array".format(sname))
       saveasarray = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_saveasarray".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_topicid".format(sname))
       rawdataoutput = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_rawdataoutput".format(sname))
       asynctimeout = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_asynctimeout".format(sname))
       timedelay = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_timedelay".format(sname))
       usemysql = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_usemysql".format(sname))
       preprocesstypes = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_preprocesstypes".format(sname))
       if preprocesstypes:
          step4bpreprocesstypes=preprocesstypes
       pathtotmlattrs = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_pathtotmlattrs".format(sname))
       identifier = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_identifier".format(sname))
       jsoncriteria = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_jsoncriteria".format(sname))
       if jsoncriteria:
          step4bjsoncriteria=jsoncriteria
       maxrows4b = context['ti'].xcom_pull(task_ids='step_4b_solution_task_preprocess',key="{}_maxrows".format(sname))
       if maxrows4b:
          step4bmaxrows=maxrows4b

       if preprocess_data_topic:
           subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic2--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic2--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocessconditions2--/{}/g".format(preprocessconditions), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay2--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--array2--/{}/g".format(array[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--saveasarray2--/{}/g".format(saveasarray[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid2--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rawdataoutput2--/{}/g".format(rawdataoutput[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--asynctimeout2--/{}/g".format(asynctimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--timedelay2--/{}/g".format(timedelay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocesstypes2--/{}/g".format(preprocesstypes), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--pathtotmlattrs2--/{}/g".format(pathtotmlattrs), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--identifier2--/{}/g".format(identifier), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--jsoncriteria2--/{}/g".format(jsoncriteria), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows2--/{}/g".format(maxrows4b[1:]), "/{}/docs/source/details.rst".format(sname)])


       raw_data_topic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_raw_data_topic".format(sname))
       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_preprocess_data_topic".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_delay".format(sname))
       array = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_array".format(sname))
       saveasarray = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_saveasarray".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_topicid".format(sname))
       rawdataoutput = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rawdataoutput".format(sname))
       asynctimeout = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_asynctimeout".format(sname))
       timedelay = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_timedelay".format(sname))
       usemysql = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_usemysql".format(sname))
       searchterms = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_searchterms".format(sname))
       rememberpastwindows = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rememberpastwindows".format(sname))
       identifier = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_identifier".format(sname))
       patternwindowthreshold = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_patternwindowthreshold".format(sname))
       maxrows4c = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_maxrows".format(sname))
       rtmsstream = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsstream".format(sname))
       rtmsscorethresholdtopic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsscorethresholdtopic".format(sname))
       attackscorethresholdtopic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_attackscorethresholdtopic".format(sname))
       patternscorethresholdtopic = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_patternscorethresholdtopic".format(sname))
       rtmsscorethreshold = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsscorethreshold".format(sname))
       attackscorethreshold = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_attackscorethreshold".format(sname))
       patternscorethreshold = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_patternscorethreshold".format(sname))
       rtmsmaxwindows = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsmaxwindows".format(sname))
       if rtmsmaxwindows:
         step4crtmsmaxwindows=rtmsmaxwindows
         subprocess.call(["sed", "-i", "-e",  "s/--rtmsmaxwindows--/{}/g".format(rtmsmaxwindows[1:]), "/{}/docs/source/details.rst".format(sname)])

       localsearchtermfolder = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_localsearchtermfolder".format(sname))
       localsearchtermfolderinterval = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_localsearchtermfolderinterval".format(sname))
       rtmsfoldername = context['ti'].xcom_pull(task_ids='step_4c_solution_task_preprocess',key="{}_rtmsfoldername".format(sname))

       if searchterms:
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsscorethresholdtopic--;{}".format(rtmsscorethresholdtopic)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--attackscorethresholdtopic--;{}".format(attackscorethresholdtopic)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--patternscorethresholdtopic--;{}".format(patternscorethresholdtopic)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsfoldername--;{}".format(rtmsfoldername)])

           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsscorethreshold--;{}".format(rtmsscorethreshold[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--attackscorethreshold--;{}".format(attackscorethreshold[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--patternscorethreshold--;{}".format(patternscorethreshold[1:])])
           subprocess.call(["sed", "-i", "-e",  "s/--raw_data_topic3--/{}/g".format(raw_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic3--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rtmsstream--/{}/g".format(rtmsstream), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay3--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--array3--/{}/g".format(array[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--saveasarray3--/{}/g".format(saveasarray[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid3--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rawdataoutput3--/{}/g".format(rawdataoutput[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--asynctimeout3--/{}/g".format(asynctimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--timedelay3--/{}/g".format(timedelay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rememberpastwindows--/{}/g".format(rememberpastwindows[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--patternwindowthreshold--/{}/g".format(patternwindowthreshold[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--identifier3--/{}/g".format(identifier), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows3--/{}/g".format(maxrows4c[1:]), "/{}/docs/source/details.rst".format(sname)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmssearchterms--;{}".format(searchterms)])
           rtmsoutputurl="https:\/\/github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/{}".format(os.environ["GITUSERNAME"], tsslogging.getrepo(),projectname,rtmsfoldername)
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsoutputurl--;{}".format(rtmsoutputurl)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--localsearchtermfolder--;{}".format(localsearchtermfolder)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--localsearchtermfolderinterval--;{}".format(localsearchtermfolderinterval[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--rtmsfoldername--;{}".format(rtmsfoldername)])

           step4crawdatatopic=raw_data_topic
           step4csearchterms=searchterms
           step4crememberpastwindows=rememberpastwindows
           step4cpatternwindowthreshold=patternwindowthreshold
           step4crtmsstream=rtmsstream
           step4crtmsscorethreshold=rtmsscorethreshold
           step4cattackscorethreshold=attackscorethreshold
           step4cpatternscorethreshold=patternscorethreshold
           step4clocalsearchtermfolder=localsearchtermfolder
           step4clocalsearchtermfolderinterval=localsearchtermfolderinterval
           step4crtmsfoldername=rtmsfoldername

       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_preprocess_data_topic".format(sname))
       ml_data_topic = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_ml_data_topic".format(sname))
       modelruns = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_modelruns".format(sname))
       offset = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_offset".format(sname))
       islogistic = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_islogistic".format(sname))
       networktimeout = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_networktimeout".format(sname))
       modelsearchtuner = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_modelsearchtuner".format(sname))
       dependentvariable = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_dependentvariable".format(sname))
       independentvariables = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_independentvariables".format(sname))
       if independentvariables:
         step5independentvariables = independentvariables

       rollbackoffsets = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_rollbackoffsets".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_topicid".format(sname))
       consumefrom = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_consumefrom".format(sname))
       fullpathtotrainingdata = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_fullpathtotrainingdata".format(sname))
       transformtype = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_transformtype".format(sname))
       sendcoefto = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_sendcoefto".format(sname))
       coeftoprocess = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_coeftoprocess".format(sname))
       coefsubtopicnames = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_coefsubtopicnames".format(sname))
       processlogic = context['ti'].xcom_pull(task_ids='step_5_solution_task_ml',key="{}_processlogic".format(sname))
       if fullpathtotrainingdata:
            step5sp=fullpathtotrainingdata.split("/")
            if len(step5sp)>0:
              mloutputurl="https:\/\/github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/mldata/{}".format(os.environ["GITUSERNAME"], tsslogging.getrepo(),projectname,step5sp[-1])
              doparse("/{}/docs/source/details.rst".format(sname), ["--mloutputurl--;{}".format(mloutputurl)])

       if processlogic:
         step5processlogic = processlogic

       if modelruns:
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--ml_data_topic--/{}/g".format(ml_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--modelruns--/{}/g".format(modelruns[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--offset--/{}/g".format(offset[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--islogistic--/{}/g".format(islogistic[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--networktimeout--/{}/g".format(networktimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--modelsearchtuner--/{}/g".format(modelsearchtuner[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--dependentvariable--/{}/g".format(dependentvariable), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--independentvariables--/{}/g".format(independentvariables), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rollbackoffsets--/{}/g".format(rollbackoffsets[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--consumefrom--/{}/g".format(consumefrom), "/{}/docs/source/details.rst".format(sname)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--fullpathtotrainingdata--;{}".format(fullpathtotrainingdata)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--processlogic--;{}".format(processlogic)])

           subprocess.call(["sed", "-i", "-e",  "s/--transformtype--/{}/g".format(transformtype), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--sendcoefto--/{}/g".format(sendcoefto), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--coeftoprocess--/{}/g".format(coeftoprocess), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--coefsubtopicnames--/{}/g".format(coefsubtopicnames), "/{}/docs/source/details.rst".format(sname)])

       preprocess_data_topic = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_preprocess_data_topic".format(sname))
       ml_prediction_topic = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_ml_prediction_topic".format(sname))
       streamstojoin = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_streamstojoin".format(sname))
       inputdata = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_inputdata".format(sname))
       consumefrom2 = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_consumefrom".format(sname))
       offset = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_offset".format(sname))
       delay = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_delay".format(sname))
       usedeploy = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_usedeploy".format(sname))
       networktimeout = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_networktimeout".format(sname))
       maxrows = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_maxrows".format(sname))
       topicid = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_topicid".format(sname))
       pathtoalgos = context['ti'].xcom_pull(task_ids='step_6_solution_task_prediction',key="{}_pathtoalgos".format(sname))

       if ml_prediction_topic:
           subprocess.call(["sed", "-i", "-e",  "s/--preprocess_data_topic--/{}/g".format(preprocess_data_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--ml_prediction_topic--/{}/g".format(ml_prediction_topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--streamstojoin--/{}/g".format(streamstojoin), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--inputdata--/{}/g".format(inputdata), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--consumefrom2--/{}/g".format(consumefrom2), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--offset--/{}/g".format(offset[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--delay--/{}/g".format(delay[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--usedeploy--/{}/g".format(usedeploy[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--networktimeout--/{}/g".format(networktimeout[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--maxrows--/{}/g".format(maxrows[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topicid--/{}/g".format(topicid[1:]), "/{}/docs/source/details.rst".format(sname)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--pathtoalgos--;{}".format(pathtoalgos)])

       topic = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_topic".format(sname))
       secure = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_secure".format(sname))
       offset = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_offset".format(sname))
       append = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_append".format(sname))
       chip = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_chip".format(sname))
       rollbackoffset = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_rollbackoffset".format(sname))
       dashboardhtml = context['ti'].xcom_pull(task_ids='step_7_solution_task_visualization',key="{}_dashboardhtml".format(sname))

       containername = context['ti'].xcom_pull(task_ids='step_8_solution_task_containerize',key="{}_containername".format(sname))
       if containername:
           hcname = containername.split('/')[1]
           huser = containername.split('/')[0]
           hurl = "https://hub.docker.com/r/{}/{}".format(huser,hcname)
       else:
           containername="TBD"

       if vipervizport:
           subprocess.call(["sed", "-i", "-e",  "s/--vipervizport--/{}/g".format(vipervizport[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--topic--/{}/g".format(topic), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--dashboardhtml--/{}/g".format(dashboardhtml), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--secure--/{}/g".format(secure[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--offset--/{}/g".format(offset[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--append--/{}/g".format(append[1:]), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--chip--/{}/g".format(chip), "/{}/docs/source/details.rst".format(sname)])
           subprocess.call(["sed", "-i", "-e",  "s/--rollbackoffset--/{}/g".format(rollbackoffset[1:]), "/{}/docs/source/details.rst".format(sname)])


       repo = tsslogging.getrepo()
       gitrepo="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}".format(os.environ['GITUSERNAME'],repo,projectname)
      # gitrepo = "\/{}\/tml-airflow\/dags\/tml-solutions\/{}".format(repo,sname)

       v=subprocess.call(["sed", "-i", "-e",  "s/--gitrepo--/{}/g".format(gitrepo), "/{}/docs/source/operating.rst".format(sname)])
       print("V=",v)
       doparse("/{}/docs/source/operating.rst".format(sname), ["--gitrepo--;{}".format(gitrepo)])

       subprocess.call(["sed", "-i", "-e",  "s/--solutionname--/{}/g".format(sname), "/{}/docs/source/operating.rst".format(sname)])
       subprocess.call(["sed", "-i", "-e",  "s/--dockercontainer--/{}\n\n{}/g".format(containername,hurl), "/{}/docs/source/operating.rst".format(sname)])

       chipmain = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_chip".format(sname))

       doparse("/{}/docs/source/operating.rst".format(sname), ["--justcontainer--;{}".format(containername)])

       doparse("/{}/docs/source/operating.rst".format(sname), ["--tsscontainer--;maadsdocker/tml-solution-studio-with-airflow-{}".format(chip)])

       doparse("/{}/docs/source/operating.rst".format(sname), ["--chip--;{}".format(chipmain)])
       if istss1==0:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionairflowport--;{}".format(solutionairflowport[1:])])
       else:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionairflowport--;{}".format("TBD")])

       doparse("/{}/docs/source/operating.rst".format(sname), ["--externalport--;{}".format(externalport[1:])])
       if istss1==0:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionexternalport--;{}".format(solutionexternalport[1:])])
       else:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionexternalport--;{}".format("TBD")])

       pconsumefrom = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_consumefrom".format(sname))
       pgpt_data_topic = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_pgpt_data_topic".format(sname))
       pgptcontainername = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_pgptcontainername".format(sname))
       pmainmodel=""
       pmainembedding=""
       if pgptcontainername != None:
         step9pgptcontainername=pgptcontainername
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubeprivategpt--;{}".format(pgptcontainername)])
         mainmodel = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_mainmodel".format(sname))
         pmainmodel=mainmodel
         mainembedding = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_mainembedding".format(sname))
         pmainembedding=mainembedding
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubemainmodel--;{}".format(mainmodel)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubemainembedding--;{}".format(mainembedding)])

       poffset = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_offset".format(sname))
       prollbackoffset = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_rollbackoffset".format(sname))
       ptopicid = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_topicid".format(sname))
       penabletls = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_enabletls".format(sname))
       ppartition = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_partition".format(sname))
       pprompt = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_prompt".format(sname))
       pcontextwindowsize = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_contextwindowsize".format(sname))
       pvectordimension = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_vectordimension".format(sname))
       pmitrejson = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_mitrejson".format(sname))

       if pmitrejson:
          doparse("/{}/docs/source/details.rst".format(sname), ["--mitrejson--;{}".format(pmitrejson)])

       if pcontextwindowsize:
          step9pcontextwindowsize=pcontextwindowsize
          doparse("/{}/docs/source/details.rst".format(sname), ["--contextwindowsize--;{}".format(pcontextwindowsize[1:])])
          doparse("/{}/docs/source/kube.rst".format(sname), ["--kubecontextwindowsize--;{}".format(pcontextwindowsize[1:])])

       if pvectordimension:
          step9vectordimension=pvectordimension
          doparse("/{}/docs/source/details.rst".format(sname), ["--vectordimension--;{}".format(pvectordimension[1:])])
          doparse("/{}/docs/source/kube.rst".format(sname), ["--kubevectordimension--;{}".format(pvectordimension[1:])])

       if pprompt:
         step9prompt=pprompt
         step9prompt=step9prompt.strip().replace('\n','').replace("\\n","").replace(";",",").replace("''","")

       pdocfolder = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_docfolder".format(sname))
       if pdocfolder:
         step9docfolder=pdocfolder
         doparse("/{}/docs/source/details.rst".format(sname), ["--docfolder--;{}".format(pdocfolder)])

       pdocfolderingestinterval = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_docfolderingestinterval".format(sname))
       if pdocfolderingestinterval:
         step9docfolderingestinterval=pdocfolderingestinterval
         doparse("/{}/docs/source/details.rst".format(sname), ["--docfolderingestinterval--;{}".format(pdocfolderingestinterval[1:])])

       puseidentifierinprompt = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_useidentifierinprompt".format(sname))
       if puseidentifierinprompt:
         step9useidentifierinprompt=puseidentifierinprompt
         doparse("/{}/docs/source/details.rst".format(sname), ["--useidentifierinprompt--;{}".format(puseidentifierinprompt[1:])])

       pcontext = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_context".format(sname))
       if pcontext:
          step9context=pcontext
       pjsonkeytogather = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_jsonkeytogather".format(sname))
       pkeyattribute = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_keyattribute".format(sname))
       if pkeyattribute:
         step9keyattribute=pkeyattribute
       pconcurrency = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_concurrency".format(sname))
       if pconcurrency:
         step9concurrency=pconcurrency
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubeconcur--;{}".format(pconcurrency[1:])])

       pcuda = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_cuda".format(sname))
       if pcuda:
        cudavisibledevices=pcuda
       pcollection = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_vectordbcollectionname".format(sname))
       if pcollection:
         step9vectordbcollectionname=pcollection
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubecollection--;{}".format(pcollection)])

       pgpthost = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_pgpthost".format(sname))
       if pgpthost:
         step9pgpthost=pgpthost

       pgptport = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_pgptport".format(sname))
       if pgptport:
         step9pgptport=pgptport

       pprocesstype = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_keyprocesstype".format(sname))
       if pprocesstype:
         step9keyprocesstype=pprocesstype
       hyperbatch = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_hyperbatch".format(sname))
       if hyperbatch:
         step9hyperbatch=hyperbatch
       psearchterms = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_searchterms".format(sname))
       if psearchterms:
         step9searchterms=psearchterms
         doparse("/{}/docs/source/details.rst".format(sname), ["--searchterms--;{}".format(psearchterms)])
       pstreamall = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_streamall".format(sname))
       if pstreamall:
         step9streamall=pstreamall
         doparse("/{}/docs/source/details.rst".format(sname), ["--streamall--;{}".format(pstreamall[1:])])
       ptemperature = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_temperature".format(sname))
       if ptemperature:
         step9temperature=ptemperature
         doparse("/{}/docs/source/details.rst".format(sname), ["--temperature--;{}".format(ptemperature[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubetemperature--;{}".format(ptemperature[1:])])

       pvectorsearchtype = context['ti'].xcom_pull(task_ids='step_9_solution_task_ai',key="{}_vectorsearchtype".format(sname))
       if pvectorsearchtype:
         step9vectorsearchtype=pvectorsearchtype
         doparse("/{}/docs/source/details.rst".format(sname), ["--vectorsearchtype--;{}".format(pvectorsearchtype)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--kubevectorsearchtype--;{}".format(pvectorsearchtype)])

       ollama= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_ollama-model".format(sname))
       if ollama != None: # Step 9b executing
         step9bollama=ollama
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-ollama-model--;{}".format(ollama)])
         rollback= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_rollbackoffset".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-rollbackoffset--;{}".format(rollback[1:])])
         step9brollback=rollback[1:]

         deletevector= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_deletevectordbcount".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-deletevectordbcount--;{}".format(deletevector[1:])])
         step9bdeletevectordbcount=deletevector[1:]

         vectordbpath= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_vectordbpath".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-vectordbpath--;{}".format(vectordbpath)])
         step9bvectordbpath=vectordbpath

         temp= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_temperature".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-temperature--;{}".format(temp[1:])])
         step9btemperature=temp[1:]

         topicid= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_topicid".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-topicid--;{}".format(topicid[1:])])
         step9btopicid=topicid[1:]

         enabletls= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_enabletls".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-enabletls--;{}".format(enabletls[1:])])
         step9benabletls=enabletls[1:]

         partition= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_partition".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-partition--;{}".format(partition[1:])])
         step9bpartition=partition[1:]

         collection= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_vectordbcollectionname".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-vectordbcollectionname--;{}".format(collection)])
         step9bvectordbcollectionname=collection

         ollamacontainername= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_ollamacontainername".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-ollamacontainername--;{}".format(ollamacontainername)])
         step9bollamacontainername=ollamacontainername

         mainip= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_mainip".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-mainip--;{}".format(mainip)])
         step9bmainip=mainip

         mainport= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_mainport".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-mainport--;{}".format(mainport[1:])])
         step9bmainport=mainport[1:]

         embedding= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_embedding".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-embedding--;{}".format(embedding)])
         step9bembedding=embedding

         agents_topic_prompt= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_agents_topic_prompt".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-agents_topic_prompt--;{}".format(agents_topic_prompt)])
         step9bagents_topic_prompt=agents_topic_prompt

         teamlead_topic= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_teamlead_topic".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-teamlead_topic--;{}".format(teamlead_topic)])
         step9bteamlead_topic=teamlead_topic

         teamleadprompt= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_teamleadprompt".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-teamleadprompt--;{}".format(teamleadprompt)])
         step9bteamleadprompt=teamleadprompt
         step9bteamleadprompt=step9bteamleadprompt.replace('\n',' ').replace("\\n","").strip().replace(";",",").replace("''","")

         supervisor_topic= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_supervisor_topic".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-supervisor_topic--;{}".format(supervisor_topic)])
         step9bsupervisor_topic=supervisor_topic

         supervisorprompt= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_supervisorprompt".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-supervisorprompt--;{}".format(supervisorprompt)])
         step9bsupervisorprompt=supervisorprompt
         step9bsupervisorprompt=step9bsupervisorprompt.replace('\n','').replace("\\n","").strip().replace(";",",").replace("''","")

         agenttoolfunctions= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_agenttoolfunctions".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-agenttoolfunctions--;{}".format(agenttoolfunctions)])
         step9bagenttoolfunctions=agenttoolfunctions
         step9bagenttoolfunctions=step9bagenttoolfunctions.replace('\n','').replace("\\n","").strip().replace(";",",").replace("''","")


         agent_team_supervisor_topic= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_agent_team_supervisor_topic".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-agent_team_supervisor_topic--;{}".format(agent_team_supervisor_topic)])
         step9bagent_team_supervisor_topic=agent_team_supervisor_topic

         agenttopic= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_agenttopic".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-agenttopic--;{}".format(agenttopic)])
         step9bagenttopic=agenttopic

         localmodelsfolder= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_localmodelsfolder".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-localmodelsfolder--;{}".format(localmodelsfolder)])
         step9blocalmodelsfolder=localmodelsfolder

         concurrency= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_concurrency".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-concurrency--;{}".format(concurrency[1:])])
         step9bconcurrency=concurrency[1:]

         cuda= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_cuda".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-cuda--;{}".format(cuda[1:])])
         step9bCUDA_VISIBLE_DEVICES=cuda[1:]

         contextwindow= context['ti'].xcom_pull(task_ids='step_9b_solution_task_agenticai',key="{}_contextwindow".format(sname))
         doparse("/{}/docs/source/details.rst".format(sname), ["--agenticai-contextwindow--;{}".format(contextwindow[1:])])
         step9bcontextwindow=contextwindow[1:]

         doparse("/{}/docs/source/kube.rst".format(sname), ["--ollamacontainername--;{}".format(ollamacontainername)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-kubeconcur--;{}".format(concurrency[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-kubecollection--;{}".format(collection)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-kubetemperature--;{}".format(temp[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-rollbackoffset--;{}".format(rollback[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-ollama-model--;{}".format(ollama)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-deletevectordbcount--;{}".format(deletevector[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-vectordbpath--;{}".format(vectordbpath)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-topicid--;{}".format(topicid[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-enabletls--;{}".format(enabletls[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-partition--;{}".format(partition[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-vectordbcollectionname--;{}".format(collection)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-ollamacontainername--;{}".format(ollamacontainername)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-mainip--;{}".format(mainip)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-mainport--;{}".format(mainport[1:])])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-contextwindow--;{}".format(contextwindow[1:])])

         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-agenttopic--;{}".format(agenttopic)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-localmodelsfolder--;{}".format(localmodelsfolder)])

         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-embedding--;{}".format(embedding)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-agents_topic_prompt--;{}".format(agents_topic_prompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-teamlead_topic--;{}".format(teamlead_topic)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-teamleadprompt--;{}".format(teamleadprompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",",") )])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-supervisor_topic--;{}".format(supervisor_topic)])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-supervisorprompt--;{}".format(supervisorprompt.strip().replace('\n','').replace("\\n","").replace("'","").replace(";",","))])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-agenttoolfunctions--;{}".format(agenttoolfunctions.strip().replace('\n','').replace("\\n","").replace("'","").replace(";","=="))])
         doparse("/{}/docs/source/kube.rst".format(sname), ["--agenticai-agent_team_supervisor_topic--;{}".format(agent_team_supervisor_topic)])

       ebuf=""
       if 'dockerenv' in default_args:
        if default_args['dockerenv'] != '':
          buf=default_args['dockerenv']
          darr = buf.split("***")
          ebuf="\n"
          for d in darr:
             v=d.split("=")
             if len(v)>1:
               if 'jsoncriteria' in v[0].strip():
                 d=d[d.index("=")+1:]
                 ebuf = ebuf + '          --env ' + v[0].strip() + '=\"' + d + '\" \\ \n'
               else:
                 ebuf = ebuf + '          --env ' + v[0].strip() + '=\"' + v[1].strip() + '\" \\ \n'
             else:
               ebuf = ebuf + '          --env ' + v[0].strip() + '=' + ' \\ \n'
          ebuf = ebuf[:-1]
        if default_args['dockerinstructions'] != '':
          doparse("/{}/docs/source/operating.rst".format(sname), ["--dockerinstructions--;{}".format(default_args['dockerinstructions'])])
        else:
          doparse("/{}/docs/source/operating.rst".format(sname), ["--dockerinstructions--;{}".format("Please ask the developer of this solution.")])

       if len(CLIENTPORT) > 1:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--clientport--;{}".format(TMLCLIENTPORT[1:])])
         dockerrun = """docker run -d --net=host -p {}:{} -p {}:{} -p {}:{} -p {}:{} \\
             --env TSS=0 \\
             --env SOLUTIONNAME={} \\
             --env SOLUTIONDAG={} \\
             --env GITUSERNAME=<Enter Github Username> \\
             --env GITPASSWORD='<Enter Github Password>' \\
             --env GITREPOURL=<Enter Github Repo URL> \\
             --env SOLUTIONEXTERNALPORT={} \\
             -v /var/run/docker.sock:/var/run/docker.sock:z  \\
             -v /your_localmachine/foldername:/rawdata:z \\
             --env CHIP={} \\
             --env SOLUTIONAIRFLOWPORT={}  \\
             --env SOLUTIONVIPERVIZPORT={} \\
             --env DOCKERUSERNAME='' \\
             --env CLIENTPORT={}  \\
             --env EXTERNALPORT={} \\
             --env KAFKABROKERHOST=127.0.0.1:9092 \\
             --env KAFKACLOUDUSERNAME='<Enter API key>' \\
             --env KAFKACLOUDPASSWORD='<Enter API secret>' \\
             --env SASLMECHANISM=PLAIN \\
             --env VIPERVIZPORT={} \\
             --env MQTTUSERNAME='' \\
             --env MQTTPASSWORD='' \\
             --env AIRFLOWPORT={}  \\
             --env READTHEDOCS='<Enter Readthedocs token>' \\{}
             {}""".format(solutionexternalport[1:],solutionexternalport[1:],
                             solutionairflowport[1:],solutionairflowport[1:],solutionvipervizport[1:],solutionvipervizport[1:],
                             TMLCLIENTPORT[1:],TMLCLIENTPORT[1:],sname,sd,
                             solutionexternalport[1:],chipmain,
                             solutionairflowport[1:],solutionvipervizport[1:],TMLCLIENTPORT[1:],
                             externalport[1:],vipervizport[1:],airflowport[1:],ebuf,containername)
       else:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--clientport--;Not Applicable"])
         dockerrun = """docker run -d --net=host -p {}:{} -p {}:{} -p {}:{} \\
             --env TSS=0 \\
             --env SOLUTIONNAME={} \\
             --env SOLUTIONDAG={} \\
             --env GITUSERNAME=<Enter Github Username> \\
             --env GITPASSWORD='<Enter Github Password>' \\
             --env GITREPOURL=<Enter Github Repo URL> \\
             --env SOLUTIONEXTERNALPORT={} \\
             -v /var/run/docker.sock:/var/run/docker.sock:z \\
             -v /your_localmachine/foldername:/rawdata:z \\
             --env CHIP={} \\
             --env SOLUTIONAIRFLOWPORT={} \\
             --env SOLUTIONVIPERVIZPORT={} \\
             --env DOCKERUSERNAME='' \\
             --env EXTERNALPORT={} \\
             --env KAFKABROKERHOST=127.0.0.1:9092 \\
             --env KAFKACLOUDUSERNAME='<Enter API key>' \\
             --env KAFKACLOUDPASSWORD='<Enter API secret>' \\
             --env SASLMECHANISM=PLAIN \\
             --env VIPERVIZPORT={} \\
             --env MQTTUSERNAME='' \\
             --env MQTTPASSWORD='' \\
             --env AIRFLOWPORT={} \\
             --env READTHEDOCS='<Enter Readthedocs token>' \\{}
             {}""".format(solutionexternalport[1:],solutionexternalport[1:],
                             solutionairflowport[1:],solutionairflowport[1:],solutionvipervizport[1:],solutionvipervizport[1:],
                             sname,sd,solutionexternalport[1:],chipmain,
                             solutionairflowport[1:],solutionvipervizport[1:],
                             externalport[1:],vipervizport[1:],airflowport[1:],ebuf,containername)

      # dockerrun = re.escape(dockerrun)
       v=subprocess.call(["sed", "-i", "-e",  "s/--dockerrun--/{}/g".format(dockerrun), "/{}/docs/source/operating.rst".format(sname)])

       if istss1==1:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--dockerrun--;{}".format(dockerrun),"--dockercontainer--;{} ({})".format(containername, hurl)])
         doparse("/{}/docs/source/details.rst".format(sname), ["--dockerrun--;{}".format(dockerrun),"--dockercontainer--;{} ({})".format(containername, hurl)])
       else:
         try:
           with open("/tmux/step1solutionold.txt", "r") as f:
             msname=f.read()
             mbuf="Refer to the original solution container and documenation here: https://{}.readthedocs.io/en/latest/operating.html".format(msname.strip())
             doparse("/{}/docs/source/operating.rst".format(sname), ["--dockerrun--;{}".format(dockerrun),"--dockercontainer--;{}".format(mbuf)])
         except Exception as e:
           pass

       step9rollbackoffset=-1
       step9llmmodel=''
       step9embedding=''
       step9vectorsize=''
       if pgptcontainername != None:
           if os.environ['TSS'] == "1":
              privategptrun = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z --env PORT={} --env TSS=1 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env vectorsearchtype=\"{}\" --env contextwindowsize={} --env vectordimension={} {}".format(pgptport[1:],pgptport[1:],pgptport[1:],pcollection,pconcurrency[1:],pcuda[1:],ptemperature[1:], pvectorsearchtype, pcontextwindowsize[1:], pvectordimension[1:],pgptcontainername)
           else:
              privategptrun = "docker run -d -p {}:{} --net=host --gpus all -v /var/run/docker.sock:/var/run/docker.sock:z --env PORT={} --env TSS=0 --env GPU=1 --env COLLECTION={} --env WEB_CONCURRENCY={} --env CUDA_VISIBLE_DEVICES={} --env TOKENIZERS_PARALLELISM=false --env temperature={} --env vectorsearchtype=\"{}\" --env contextwindowsize={} --env vectordimension={} {}".format(pgptport[1:],pgptport[1:],pgptport[1:],pcollection,pconcurrency[1:],pcuda[1:],ptemperature[1:], pvectorsearchtype, pcontextwindowsize[1:], pvectordimension[1:],pgptcontainername)

           step9llmmodel='Refer to: https://tml.readthedocs.io/en/latest/genai.html'
           step9embedding='Refer to: https://tml.readthedocs.io/en/latest/genai.html'
           step9vectorsize='Refer to: https://tml.readthedocs.io/en/latest/genai.html'

           doparse("/{}/docs/source/details.rst".format(sname), ["--llmmodel--;{}".format(step9llmmodel)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--embedding--;{}".format(step9embedding)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--vectorsize--;{}".format(step9vectorsize)])

           doparse("/{}/docs/source/details.rst".format(sname), ["--pgptcontainername--;{}".format(pgptcontainername),"--privategptrun--;{}".format(privategptrun)])

           qdrantcontainer = "qdrant/qdrant"
           qdrantrun = "docker run -d -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant"
           doparse("/{}/docs/source/details.rst".format(sname), ["--qdrantcontainer--;{}".format(qdrantcontainer),"--qdrantrun--;{}".format(qdrantrun)])

           doparse("/{}/docs/source/details.rst".format(sname), ["--consumefrom--;{}".format(pconsumefrom)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--pgpt_data_topic--;{}".format(pgpt_data_topic)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--vectordbcollectionname--;{}".format(pcollection)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--offset--;{}".format(poffset[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--rollbackoffset--;{}".format(prollbackoffset[1:])])
           step9rollbackoffset=prollbackoffset[1:]
           doparse("/{}/docs/source/details.rst".format(sname), ["--topicid--;{}".format(ptopicid[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--enabletls--;{}".format(penabletls[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--partition--;{}".format(ppartition[1:])])
           pprompt=pprompt.replace("\\n"," ")
           doparse("/{}/docs/source/details.rst".format(sname), ["--prompt--;{}".format(pprompt)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--context--;{}".format(pcontext)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--jsonkeytogather--;{}".format(pjsonkeytogather)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--keyattribute--;{}".format(pkeyattribute)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--concurrency--;{}".format(pconcurrency[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--cuda--;{}".format(pcuda[1:])])
           if kube == 1:
               doparse("/{}/docs/source/details.rst".format(sname), ["--pgpthost--;{}".format('privategpt-service')])
           else:
               doparse("/{}/docs/source/details.rst".format(sname), ["--pgpthost--;{}".format(pgpthost)])

           doparse("/{}/docs/source/details.rst".format(sname), ["--pgptport--;{}".format(pgptport[1:])])
           doparse("/{}/docs/source/details.rst".format(sname), ["--keyprocesstype--;{}".format(pprocesstype)])
           doparse("/{}/docs/source/details.rst".format(sname), ["--hyperbatch--;{}".format(hyperbatch[1:])])

       snamerp=sname.replace("_","-")
       rbuf = "https://{}.readthedocs.io".format(snamerp)
       doparse("/{}/docs/source/details.rst".format(sname), ["--readthedocs--;{}".format(rbuf)])

       ############# VIZ URLS

       vizurl = "http:\/\/localhost:{}\/{}?topic={}\&offset={}\&groupid=\&rollbackoffset={}\&topictype=prediction\&append={}\&secure={}".format(solutionvipervizport[1:],dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])
       vizurlkube = "http://localhost:{}/{}?topic={}&offset={}&groupid=&rollbackoffset={}&topictype=prediction&append={}&secure={}".format(solutionvipervizport[1:],dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])
       if 'gRPC' in PRODUCETYPE:
         vizurlkubeing = "http://tml.tss2/viz/{}?topic={}&offset={}&groupid=&rollbackoffset={}&topictype=prediction&append={}&secure={}".format(dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])
       else:
         vizurlkubeing = "http://tml.tss/viz/{}?topic={}&offset={}&groupid=&rollbackoffset={}&topictype=prediction&append={}&secure={}".format(dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])

       if istss1==0:
         subprocess.call(["sed", "-i", "-e",  "s/--visualizationurl--/{}/g".format(vizurl), "/{}/docs/source/operating.rst".format(sname)])
       else:
         subprocess.call(["sed", "-i", "-e",  "s/--visualizationurl--/{}/g".format("This will appear AFTER you run Your Solution Docker Container"), "/{}/docs/source/operating.rst".format(sname)])

       tssvizurl = "http:\/\/localhost:{}\/{}?topic={}\&offset={}\&groupid=\&rollbackoffset={}\&topictype=prediction\&append={}\&secure={}".format(vipervizport[1:],dashboardhtml,topic,offset[1:],rollbackoffset[1:],append[1:],secure[1:])
       subprocess.call(["sed", "-i", "-e",  "s/--tssvisualizationurl--/{}/g".format(tssvizurl), "/{}/docs/source/operating.rst".format(sname)])

       tsslogfile = "http:\/\/localhost:{}\/viperlogs.html?topic=viperlogs\&append=0".format(vipervizport[1:])
       subprocess.call(["sed", "-i", "-e",  "s/--tsslogfile--/{}/g".format(tsslogfile), "/{}/docs/source/operating.rst".format(sname)])

       solutionlogfile = "http:\/\/localhost:{}\/viperlogs.html?topic=viperlogs\&append=0".format(solutionvipervizport[1:])
       if istss1==0:
         subprocess.call(["sed", "-i", "-e",  "s/--solutionlogfile--/{}/g".format(solutionlogfile), "/{}/docs/source/operating.rst".format(sname)])
       else:
         subprocess.call(["sed", "-i", "-e",  "s/--solutionlogfile--/{}/g".format("This will appear AFTER you run Your Solution Docker Container"), "/{}/docs/source/operating.rst".format(sname)])

       githublogs = "https:\/\/github.com\/{}\/{}\/blob\/main\/tml-airflow\/logs\/logs.txt".format(os.environ['GITUSERNAME'],repo)
       subprocess.call(["sed", "-i", "-e",  "s/--githublogs--/{}/g".format(githublogs), "/{}/docs/source/operating.rst".format(sname)])
       #-----------------------
       subprocess.call(["sed", "-i", "-e",  "s/--githublogs--/{}/g".format(githublogs), "/{}/docs/source/logs.rst".format(sname)])
       tsslogging.locallogs("INFO", "STEP 10: Documentation successfully built on GitHub..Readthedocs build in process and should complete in few seconds")
       try:
          sf = ""
          with open('/dagslocalbackup/logs.txt', "r") as f:
               sf=f.read()
          doparse("/{}/docs/source/logs.rst".format(sname), ["--logs--;{}".format(sf)])
       except Exception as e:
         print("Cannot open file - ",e)
         pass

       #-------------------
       airflowurl = "http:\/\/localhost:{}".format(airflowport[1:])
       subprocess.call(["sed", "-i", "-e",  "s/--airflowurl--/{}/g".format(airflowurl), "/{}/docs/source/operating.rst".format(sname)])

       readthedocs = "https:\/\/{}.readthedocs.io".format(sname)
       subprocess.call(["sed", "-i", "-e",  "s/--readthedocs--/{}/g".format(readthedocs), "/{}/docs/source/operating.rst".format(sname)])

       triggername = sd
       print("triggername=",triggername)
       doparse("/{}/docs/source/operating.rst".format(sname), ["--triggername--;{}".format(sd)])
       doparse("/{}/docs/source/operating.rst".format(sname), ["--airflowport--;{}".format(airflowport[1:])])
       doparse("/{}/docs/source/operating.rst".format(sname), ["--vipervizport--;{}".format(vipervizport[1:])])
       if istss1==0:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionvipervizport--;{}".format(solutionvipervizport[1:])])
       else:
         doparse("/{}/docs/source/operating.rst".format(sname), ["--solutionvipervizport--;{}".format("TBD")])

       tssdockerrun = ("docker run -d \-\-net=host \-\-env AIRFLOWPORT={} " \
                       " -v <change to your local folder>:/dagslocalbackup:z " \
                       " -v /var/run/docker.sock:/var/run/docker.sock:z " \
                       " -v /your_localmachine/foldername:/rawdata:z " \
                       " \-\-env GITREPOURL={} " \
                       " \-\-env CHIP={} \-\-env TSS=1 \-\-env SOLUTIONNAME=TSS " \
                       " \-\-env EXTERNALPORT={} " \
                       " \-\-env VIPERVIZPORT={} " \
                       " \-\-env GITUSERNAME='{}' " \
                       " \-\-env DOCKERUSERNAME='{}' " \
                       " \-\-env MQTTUSERNAME='{}' " \
                       " \-\-env KAFKACLOUDUSERNAME='{}' " \
                       " \-\-env KAFKACLOUDPASSWORD='<Enter your API secret>' " \
                       " \-\-env READTHEDOCS='<Enter your readthedocs token>' " \
                       " \-\-env GITPASSWORD='<Enter personal access token>' " \
                       " \-\-env DOCKERPASSWORD='<Enter your docker hub password>' " \
                       " \-\-env MQTTPASSWORD='<Enter your mqtt password>' " \
                       " \-\-env UPDATE=1 " \
                       " maadsdocker/tml-solution-studio-with-airflow-{}".format(airflowport[1:],os.environ['GITREPOURL'],
                               chip,externalport[1:],vipervizport[1:],
                               os.environ['GITUSERNAME'],os.environ['DOCKERUSERNAME'],mqttusername,kafkacloudusername,chip))

       doparse("/{}/docs/source/operating.rst".format(sname), ["--tssdockerrun--;{}".format(tssdockerrun)])

       producinghost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPRODUCE".format(sname))
       producingport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_SOLUTIONEXTERNALPORT".format(sname))
       preprocesshost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS".format(sname))
       preprocessport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS".format(sname))
       preprocesshost2 = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESS2".format(sname))
       preprocessport2 = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESS2".format(sname))

       preprocesshostpgpt = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREPROCESSPGPT".format(sname))
       preprocessportpgpt = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREPROCESSPGPT".format(sname))

       mlhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTML".format(sname))
       mlport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTML".format(sname))
       predictionhost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERHOSTPREDICT".format(sname))
       predictionport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_VIPERPORTPREDICT".format(sname))

       hpdehost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOST".format(sname))
       hpdeport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORT".format(sname))

       hpdepredicthost = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEHOSTPREDICT".format(sname))
       hpdepredictport = context['ti'].xcom_pull(task_ids='step_1_solution_task_getparams',key="{}_HPDEPORTPREDICT".format(sname))

       tmlbinaries = ("VIPERHOST_PRODUCE={}, VIPERPORT_PRODUCE={}, "
                          "VIPERHOST_PREPOCESS={}, VIPERPORT_PREPROCESS={}, "
                          "VIPERHOST_PREPOCESS2={}, VIPERPORT_PREPROCESS2={}, "
                          "VIPERHOST_PREPOCESS_PGPT={}, VIPERPORT_PREPROCESS_PGPT={}, "
                          "VIPERHOST_ML={}, VIPERPORT_ML={}, "
                          "VIPERHOST_PREDCT={}, VIPERPORT_PREDICT={}, "
                          "HPDEHOST={}, HPDEPORT={}, "
                          "HPDEHOST_PREDICT={}, HPDEPORT_PREDICT={}".format(producinghost,producingport[1:],preprocesshost,preprocessport[1:],
                                                                               preprocesshost2,preprocessport2[1:],
                                                                                preprocesshostpgpt,preprocessportpgpt[1:],
                                                                                 mlhost,mlport[1:],predictionhost,predictionport[1:],
                                                                                 hpdehost,hpdeport[1:],hpdepredicthost,hpdepredictport[1:] ))


       subprocess.call(["sed", "-i", "-e",  "s/--tmlbinaries--/{}/g".format(tmlbinaries), "/{}/docs/source/operating.rst".format(sname)])
       ########################## Kubernetes

       doparse("/{}/docs/source/kube.rst".format(sname), ["--solutionnamefile--;{}.yml".format(sname)])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--solutionname--;{}".format(sname)])
       if pgptcontainername != None and ollama != None:
               if '127.0.0.1' in brokerhost:
                 kcmd = "kubectl apply -f kafka.yml -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f qdrant.yml -f privategpt.yml -f ollama.yml -f {}.yml".format(sname)
               else:
                 kcmd = "kubectl apply -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f qdrant.yml -f privategpt.yml -f ollama.yml -f {}.yml".format(sname)

               doparse("/{}/docs/source/kube.rst".format(sname), ["--kubectl--;{}".format(kcmd)])
       elif pgptcontainername != None:
               if '127.0.0.1' in brokerhost:
                 kcmd = "kubectl apply -f kafka.yml -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f qdrant.yml -f privategpt.yml -f {}.yml".format(sname)
               else:
                 kcmd = "kubectl apply -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f qdrant.yml -f privategpt.yml -f {}.yml".format(sname)

               doparse("/{}/docs/source/kube.rst".format(sname), ["--kubectl--;{}".format(kcmd)])
       elif ollama != None:
               if '127.0.0.1' in brokerhost:
                 kcmd = "kubectl apply -f kafka.yml -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f {}.yml -f ollama.yml".format(sname)
               else:
                 kcmd = "kubectl apply -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f {}.yml -f ollama.yml".format(sname)

               doparse("/{}/docs/source/kube.rst".format(sname), ["--kubectl--;{}".format(kcmd)])
       else:
               if '127.0.0.1' in brokerhost:
                 kcmd = "kubectl apply -f kafka.yml -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f {}.yml".format(sname)
               else:
                 kcmd = "kubectl apply -f secrets.yml -f mysql-storage.yml -f mysql-db-deployment.yml -f {}.yml".format(sname)

               doparse("/{}/docs/source/kube.rst".format(sname), ["--kubectl--;{}".format(kcmd)])


       if maxrows4:
         step4maxrows=maxrows4[1:]
       else:
         step4maxrows=-1

       if maxrows4b:
         step4bmaxrows=maxrows4b[1:]
       else:
         step4bmaxrows=-1

       if maxrows4c:
         step4cmaxrows=maxrows4c[1:]
       else:
         step4cmaxrows=-1

       if rollbackoffsets:
         step5rollbackoffsets=rollbackoffsets[1:]
       else:
         step5rollbackoffsets=-1

       if maxrows:
         step6maxrows=maxrows[1:]
       else:
         step6maxrows=-1

       kubebroker='kafka-service:9092'
       if 'KUBEBROKERHOST' in os.environ:
          kubebroker = os.environ['KUBEBROKERHOST']
       kafkabroker='127.0.0.1:9092'
       if 'KAFKABROKERHOST' in os.environ:
          kafkabroker = os.environ['KAFKABROKERHOST']

       step1solutiontitle=stitle
       step1description=sdesc
       try:
         with open("/tmux/cname.txt", "r") as f:
           containername=f.read()
       except Exception as e:
           pass

   #    step9bagenttoolfunctions=""
       step9bagents_topic_prompt=step9bagents_topic_prompt.replace("\\n","").replace('\n','').strip().replace(";","==").replace("'","")
       if len(CLIENTPORT) > 1:
         kcmd2=tsslogging.genkubeyaml(sname,containername,TMLCLIENTPORT[1:],solutionairflowport[1:],solutionvipervizport[1:],solutionexternalport[1:],
                          sd,os.environ['GITUSERNAME'],os.environ['GITREPOURL'],chipmain,os.environ['DOCKERUSERNAME'],
                          externalport[1:],kafkacloudusername,mqttusername,airflowport[1:],vipervizport[1:],
                          step4maxrows,step4bmaxrows,step5rollbackoffsets,step6maxrows,step1solutiontitle,step1description,
                          step9rollbackoffset,kubebroker,kafkabroker,PRODUCETYPE,step9prompt,step9context,step9keyattribute,step9keyprocesstype,
                          step9hyperbatch[1:],step9vectordbcollectionname,step9concurrency[1:],cudavisibledevices[1:],
                          step9docfolder,step9docfolderingestinterval[1:],step9useidentifierinprompt[1:],step5processlogic,
                          step5independentvariables,step9searchterms,step9streamall[1:],step9temperature[1:],step9vectorsearchtype,
                          step9llmmodel,step9embedding,step9vectorsize,step4cmaxrows,step4crawdatatopic,step4csearchterms,step4crememberpastwindows[1:],
                          step4cpatternwindowthreshold[1:],step4crtmsstream,projectname,step4crtmsscorethreshold[1:],step4cattackscorethreshold[1:],
                          step4cpatternscorethreshold[1:],step4clocalsearchtermfolder,step4clocalsearchtermfolderinterval[1:],step4crtmsfoldername,
                          step3localfileinputfile,step3localfiledocfolder,step4crtmsmaxwindows[1:],step9pcontextwindowsize[1:],
                          step9pgptcontainername,step9pgpthost,step9pgptport[1:],step9vectordimension[1:],
                          step2raw_data_topic,step2preprocess_data_topic,step4raw_data_topic,step4preprocesstypes,
                          step4jsoncriteria,step4ajsoncriteria,step4amaxrows[1:],step4apreprocesstypes,step4araw_data_topic,
                          step4apreprocess_data_topic,step4bpreprocesstypes,step4bjsoncriteria,step4braw_data_topic,
                          step4bpreprocess_data_topic,step4preprocess_data_topic,
                          step9brollback,
                          step9bdeletevectordbcount,
                          step9bvectordbpath,
                          step9btemperature,
                          step9bvectordbcollectionname,
                          step9bollamacontainername,
                          step9bCUDA_VISIBLE_DEVICES,
                          step9bmainip,
                          step9bmainport,
                          step9bembedding,
                          step9bagents_topic_prompt,
                          step9bteamlead_topic,
                          step9bteamleadprompt,
                          step9bsupervisor_topic,
                          step9bagenttoolfunctions,
                          step9bagent_team_supervisor_topic,step9bcontextwindow,step9blocalmodelsfolder, step9bagenttopic)
       else:
         kcmd2=tsslogging.genkubeyamlnoext(sname,containername,TMLCLIENTPORT[1:],solutionairflowport[1:],solutionvipervizport[1:],solutionexternalport[1:],
                          sd,os.environ['GITUSERNAME'],os.environ['GITREPOURL'],chipmain,os.environ['DOCKERUSERNAME'],
                          externalport[1:],kafkacloudusername,mqttusername,airflowport[1:],vipervizport[1:],
                          step4maxrows,step4bmaxrows,step5rollbackoffsets,step6maxrows,step1solutiontitle,step1description,step9rollbackoffset,
                          kubebroker,kafkabroker,step9prompt,step9context,step9keyattribute,step9keyprocesstype,
                          step9hyperbatch[1:],step9vectordbcollectionname,step9concurrency[1:],cudavisibledevices[1:],
                          step9docfolder,step9docfolderingestinterval[1:],step9useidentifierinprompt[1:],step5processlogic,
                          step5independentvariables,step9searchterms,step9streamall[1:],step9temperature[1:],step9vectorsearchtype,
                          step9llmmodel,step9embedding,step9vectorsize,step4cmaxrows,step4crawdatatopic,step4csearchterms,step4crememberpastwindows[1:],
                          step4cpatternwindowthreshold[1:],step4crtmsstream,projectname,step4crtmsscorethreshold[1:],step4cattackscorethreshold[1:],
                          step4cpatternscorethreshold[1:],step4clocalsearchtermfolder,step4clocalsearchtermfolderinterval[1:],step4crtmsfoldername,
                          step3localfileinputfile,step3localfiledocfolder,step4crtmsmaxwindows[1:],step9pcontextwindowsize[1:],
                          step9pgptcontainername,step9pgpthost,step9pgptport[1:],step9vectordimension[1:],
                          step2raw_data_topic,step2preprocess_data_topic,step4raw_data_topic,step4preprocesstypes,
                          step4jsoncriteria,step4ajsoncriteria,step4amaxrows[1:],step4apreprocesstypes,step4araw_data_topic,
                          step4apreprocess_data_topic,step4bpreprocesstypes,step4bjsoncriteria,step4braw_data_topic,
                          step4bpreprocess_data_topic,step4preprocess_data_topic,
                          step9brollback,
                          step9bdeletevectordbcount,
                          step9bvectordbpath,
                          step9btemperature,
                          step9bvectordbcollectionname,
                          step9bollamacontainername,
                          step9bCUDA_VISIBLE_DEVICES,
                          step9bmainip,
                          step9bmainport,
                          step9bembedding,
                          step9bagents_topic_prompt,
                          step9bteamlead_topic,
                          step9bteamleadprompt,
                          step9bsupervisor_topic,
                          step9bagenttoolfunctions,
                          step9bagent_team_supervisor_topic,step9bcontextwindow,step9blocalmodelsfolder, step9bagenttopic)

       doparse("/{}/docs/source/kube.rst".format(sname), ["--solutionnamecode--;{}".format(kcmd2)])

       kpfwd="kubectl port-forward deployment/{} {}:{}".format(sname,solutionvipervizport[1:],solutionvipervizport[1:])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--kube-portforward--;{}".format(kpfwd)])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--visualizationurl--;{}".format(vizurlkube)])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--visualizationurling--;{}".format(vizurlkubeing)])
       doparse("/{}/docs/source/kube.rst".format(sname), ["--nginxname--;{}".format(sname)])

       if len(CLIENTPORT) > 1:
         if 'gRPC' in PRODUCETYPE:
           kcmd3=tsslogging.ingressgrpc(sname)
         else:
           kcmd3=tsslogging.ingress(sname)
       else:   # localfile being processed
         kcmd3=tsslogging.ingressnoext(sname)

       doparse("/{}/docs/source/kube.rst".format(sname), ["--ingress--;{}".format(kcmd3)])

       ###########################
       try:
         tmuxwindows = "None"
         with open("/tmux/pythonwindows_{}.txt".format(sname), 'r', encoding='utf-8') as file:
           data = file.readlines()
           data.append("viper-produce")
           data.append("viper-preprocess")
           data.append("viper-preprocess-pgpt")
           data.append("viper-preprocess-agenticai")
           data.append("viper-ml")
           data.append("viper-predict")
           tmuxwindows = ", ".join(data)
           tmuxwindows = tmuxwindows.replace("\n","")
           print("tmuxwindows=",tmuxwindows)
       except Exception as e:
          pass

       doparse("/{}/docs/source/operating.rst".format(sname), ["--tmuxwindows--;{}".format(tmuxwindows)])
       #try:
       if os.environ['TSS'] == "1":
         doparse("/{}/docs/source/operating.rst".format(sname), ["--tssgen--;TSS Development Environment Container"])
       else:
          if "KUBE" not in os.environ:
            doparse("/{}/docs/source/operating.rst".format(sname), ["--tssgen--;TML Solution Container"])
          else:
            if os.environ["KUBE"] == "0":
              doparse("/{}/docs/source/operating.rst".format(sname), ["--tssgen--;TML Solution Container"])
            else:
              doparse("/{}/docs/source/operating.rst".format(sname), ["--tssgen--;TML Solution Container (RUNNING IN KUBERNETES)"])

       # Kick off shell script
       #tsslogging.git_push("/{}".format(sname),"For solution details GOTO: https://{}.readthedocs.io".format(sname),sname)


       rtd = context['ti'].xcom_pull(task_ids='step_10_solution_task_document',key="{}_RTD".format(sname))
        #try:
       sp=f"{sname}/docs/source"
       orepo=tsslogging.getrepo()
       op=f"/{orepo}/tml-airflow/dags/tml-solutions/{projectname}"
       files,opath=tsslogging.dorst2pdf(sp,op)
       tsslogging.mergepdf(opath,files,f"{sname}")

       gb="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/pdf_documentation/{}.pdf".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,sname)
       print("INFO: Your PDF Documentation will be found here: {}".format(gb))

       # gityml
       gityml="https://github.com/{}/{}/tree/main/tml-airflow/dags/tml-solutions/{}/ymls/{}".format(os.environ['GITUSERNAME'],tsslogging.getrepo(),projectname,sname)
       doparse("/{}/docs/source/kube.rst".format(sname), ["--gityml--;{}".format(gityml)])

       oppt=copyymls(projectname,sname,kcmd3,kcmd2)
       updateollamaandpgpt(oppt,step9bollamacontainername,step9bconcurrency,step9bvectordbcollectionname,step9btemperature,step9brollback,step9bollama,step9bdeletevectordbcount,step9bvectordbpath,step9btopicid,step9benabletls,step9bpartition,step9bmainip,
                          step9bmainport,step9bembedding,step9bagents_topic_prompt,step9bteamlead_topic,step9bteamleadprompt,step9bsupervisor_topic,step9bsupervisorprompt,step9bagenttoolfunctions,step9bagent_team_supervisor_topic,step9bcontextwindow,
                          pvectorsearchtype,ptemperature,pcollection,pconcurrency,pvectordimension,pcontextwindowsize,pmainmodel,pmainembedding,pgptcontainername)

       subprocess.call("/tmux/gitp.sh {} 'For solution details GOTO: https://{}.readthedocs.io'".format(sname,snamertd), shell=True)

        #except Exception as e:
         # print("Error=",e)
       try:
        if rtd == None:
           URL = 'https://readthedocs.org/api/v3/projects/'
           TOKEN = os.environ['READTHEDOCS']
           HEADERS = {'Authorization': f'token {TOKEN}'}
           data={
               "name": "{}".format(sname),
               "repository": {
                   "url": "https://github.com/{}/{}".format(os.environ['GITUSERNAME'],sname),
                   "type": "git"
               },
               "homepage": "http://template.readthedocs.io/",
               "programming_language": "py",
               "language": "en",
               "privacy_level": "public",
               "external_builds_privacy_level": "public",
               "tags": [
                   "automation",
                   "sphinx"
               ]
           }
           response = requests.post(
               URL,
               json=data,
               headers=HEADERS,
           )
           print(response.json())
           tsslogging.tsslogit(response.json())
           os.environ['tssdoc']="1"
        time.sleep(10)
        updatebranch(sname,"main")
        triggerbuild(sname)
        ti = context['task_instance']
        ti.xcom_push(key="{}_RTD".format(sname), value="DONE")
        print("INFO: Your Documentation will be found here: https://{}.readthedocs.io/en/latest".format(snamertd))
       except Exception as e:
        print("ERROR=",e)

Json Key

Explanation

conf_project

This is the project name that will be

used in Readthedocs documentation

conf_copyright

This is the copyright information

that will be used in Readthedocs documentation

conf_author

This is the author name that will

be used in Readthedocs documentation

conf_release

This is the release number for

your Readthedocs documentation

conf_version

This is the version number that will

be used in Readthedocs documentation

dockerenv

Ideally, TML solution containers run in Kubernetes.

But, if you or other users run this container

you can specify the docker environmental variables

that can be modified at runtime. The format must be

variable1=value1***variable2=value2*…**, use

THREE (3) stars to separate variable and value pairs.

dockerinstructions

You can specify instructions for users on how to

to run your container.

7.18. Example Of Setting Docker Instructions in Step 10

default_args = {
 'conf_project' : 'Transactional Machine Learning (TML)',
 'conf_copyright' : '2024, Otics Advanced Analytics, Incorporated - For Support email support@otics.ca',
 'conf_author' : 'Sebastian Maurice',
 'conf_release' : '0.1',
 'conf_version' : '0.1.0',
 'dockerenv': 'step4cmaxrows=100***step4crawdatatopic=iot-preprocess***step4csearchterms=rgx:p([a-z]+)ch ~~~ |authentication failure,--entity-- password failure ***\
 step4crememberpastwindows=500***step4cpatternwindowthreshold=30***step4crtmsscorethreshold=0.6***step4cattackscorethreshold=0.6***\
 step4cpatternscorethreshold=0.6***step4crtmsstream=rtms-stream-mylogs***step4clocalsearchtermfolder=|mysearchfile1,|mysearchfile2***\
 step4clocalsearchtermfolderinterval=60***step4crtmsfoldername=rtms2***step3localfiledocfolder=mylogs,mylogs2***step4crtmsmaxwindows=1000000', # add any environmental variables for docker must be: variable1=value1***variable2=value2
 'dockerinstructions': """To run this docker container Enter the following CORE parameters:

      1. KAFKABROKERHOST=127.0.0.1:9092 - this uses the Local Kafka installed in your TML solution container.
         You can specify a Kafka Cloud URL if using AWS MSK or Confluent Kafka Cloud, simply replace this field.

      2. Enter KAFKACLOUDUSERNAME and  KAFKACLOUDPASSWORD IF using Kafka Cloud from AWS MSK
         and Confluent, if using local kafka (127.0.0.1:9092), these MUST be empty.

      3. SASLMECHANISM=PLAIN is set for Local Kafka and Confluent Kafka Cloud.
         If using AWS MSK, this MUST be changed to SCRAM512.

      4. Enter GITUSERNAME

      5. Enter GITPASSWORD

      6. Enter READTHEDOCS

      7. Update volume mapping: /your_localmachine/foldername:/rawdata:z

      8. IF YOU ARE DISTRUBUTING THIS CONTAINER TO OTHERS THEN SEND THEM THIS DOCKER RUN BUT THEY WILL NEED TO ENTER THE ABOVE CORE PARAMETERS.
         TO MAKE IT EASY FOR OTHERS TO RUN YOUR SOLUTION YOU CAN USE THE TSSTMLDEMO GITHUB AND READTHEDOCS ACCOUNT - UPDATE THE FOLLOWING:

      9.  GITUSERNAME=tsstmldemo

      10. GITREPOURL=https://github.com/tsstmldemo/tsstmldemo

      11. GITPASSWORD=<Will be retrieved from OS IF using tsstmldemo>

      12. READTHEDOCS=aefa71df39ad764ac2785b3167b77e8c1d7c553a

      13. step4cmaxrows=100 this means the number of offsets to rollback.  Change to higher or lower number.
          Higher number more data will be processed and more memory consumed.

      14. step4crawdatatopic=iot-preprocess, this is the Step 4 preprocessing topic of the entities.
          If this is empty string, no entities are cross-refenced with the log files.  Only log files will be processed.

      15. step4csearchterms=rgx:p([a-z]+)ch ~~~ |authentication failure,--entity-- password failure, these are
          the fixed search terms.  You can specify dynamic search terms in the field step4clocalsearchtermfolder

      16. step4crememberpastwindows=500, this is the past, short-term windows for TML to remember.
          TML RTMS will go back 500 sliding time windows.

      17. step4cpatternwindowthreshold=30, this is the maximum pattern threshold before raising an alarm.

      18. step4crtmsscorethreshold=0.6, this is the RTMS score threshold.  This is used to send
          messages that exceed this RTMS threshold to its own rtms topic.

      19. step4cattackscorethreshold=0.6, this is the Attack score threshold.  This is used to send messages
          that exceed this attack threshold to its own attack topic.

      20. step4cpatternscorethreshold=0.6, this is the Pattern score threshold.  This is used to send
          messages that exceed this pattern threshold to its own pattern topic.

      21. step4crtmsstream=rtms-stream-mylogs, this is the kafka topic that stores ALL the results from RTMS.

      22. step4clocalsearchtermfolder=|mysearchfile1,|mysearchfile2, this is name of the folders that
          contain text files for searches. A | for OR, and @ for AND.  TML will read the search terms
          in real-time and immediately start applying them to the streamed data.

      23. step4clocalsearchtermfolderinterval=60, this is the number in seconds that the files
          in the folders specified in step4clocalsearchtermfolder, will be read.  So, 60 means,
          read files every 60 seconds.

      24. step4crtmsfoldername=rtms2, TML RTMS will output logs of the search results to GitHub.
          This is convenient for testing and validation.  NOTE: Only the latest 950 files will
          be sent to GitHub because GitHub has a maximum file limit of 1000.

      25. step3localfiledocfolder=mylogs,mylogs2, these are the folders that contain your log
          text log files.  These are read in STEP 3 LOCALFILE task.

      26. step4crtmsmaxwindows=1000000, this is the maximum number of windows for LONG-TERM
          pattern matching.  Here, TML will go-back 1,000,000 sliding time windows,
          which in effect could be months of analysis.  Yoi can easily increase this number.

      - PLEASE NOTE: THE GITHUB AND READTHEDOCS ACCOUNTS ARE PUBLIC AND SHARED ACCOUNTS BY OTHERS.

      - THEY ARE MEANT ONLY FOR QUICK DEMOS.  IDEALLY, PERSONAL GITHUB AND READTHEDOCS ACCONTS SHOULD BE USED."""
}

7.19. Creating Your Own DAG

Note

This is for advanced TML developer who are also advanced Python developers.

You can easily create your own custom DAG and add it to the solution templates. Follow these guideline.

  1. Create a project first - see Lets Start Building a TML Solution

  2. Go to your project folder in TSS - as shown in figure below

    _images/customdag1.png
  3. Create and SAVE your DAG

    Tip

    You should copy a previously written TML Dag and then simply modify it for your needs.

  4. Your new DAG will be in the project folder.

    Important

    Make sure you click Git Workspaces to commit your DAG to Github. As shown in the figure below.

_images/customdag2.png
  1. Now add your new DAG to one of the solution templates. Simply click one of the solution templates.

    _images/customdag3.png

Lets choose solution DAG solution_template_processing_dag-myawesometmlsolution.py. Import your new DAG into the temlate by adding an import statement for your new DAG. Here you can create step 11 for your new DAG called “mynewdag”:

step11 = importlib.import_module(“tml-solutions.myawesometmlsolution.mynewdag”)

_images/customdag4.png
  1. Now, connect your new DAG to the solution process flow - as shown in figure below:

Note

This task assumes you have a function named mycooldag in your python script: tml-solutions.myawesometmlsolution.mynewdag.py and now TSS will also run sensor_H task you just created.

_images/customdag5.png
  1. To run your new solution - click DAGs in the top-menu.

    You should see your new STEP 11. If so, CONGRATULATIONS! You just created a new/custom TML solution.

_images/customdag6.png

7.20. Github Push Issues

You may, sometimes, encounter an issue pushing to Github in the UI. IF this happens, you can issue a +gitresetpull or +gitresetpush as shown in the figure below:

Note

This ususaly happens if there is commit from another process.

Important to note that +gitresetpull will fetch all of the commits and add them to the main branch.

+gitresetpush will rebase the commit to the head of the main branch, commit the changes and push it to main branch.

_images/gitreset2.png
_images/gitreset.png

After the +gitresetpull – you can then Push your changes.

_images/gitresetpush.png

7.21. Example TML Solution Container Reference Architecture

_images/solutioncontainer.png

The above image shows a typical TML solution container

Attention

  • Every TML solution runs in a Docker container

  • Linux is installed in the container

  • TMUX (terminal multiplexer) is used to structure TML solution components in their own task windows to make it easier to maintain and operationalize TML solutions

  • Apache Kafka is installed (Cloud Kafka can easily be used)

  • maria db is used as a configuration database for TML solutions

  • specific solution python scripts are installed and run the TML solution

  • TML dashboard code (html/javascript) runs in the container

  • java is installed

7.22. Lets Start Building a TML Solution

Here is the TML solution creation process, that is detailed below:

_images/tmlcreateprocess.png

PROCESS STEPS

Process STEP 0. Go into tml-airflow folder

Start the TSS container (TSS Docker Run Command) and go into the TSS Code Editor: TSS Code Editor.

Process STEP 1. Type the name of your project

You must choose a name for your TML project. No spaces, or special characaters, just text.

NOTE: Four characters from your READTHEDOCS token will be automatically appended to your project name.

Process STEP 2. Click the folder: myawesometmlproject-3f10

You must choose a name for your TML project. No spaces, or special characaters, just text.

NOTE: We are just using myawesometmlproject as an example. Youc an choose any name you want.

Process STEP 3. Make Parameter Modifications to Your Project’s TML DAGs

Simply update the parameters to your TML DAGs. You do not need to write any code.

Process STEP 4. Choose the Solution Template You Want to Run

You must select a solution template. These templates build and run the entire end-end

TML solution and make modifications to your TML DAGs.

Process STEP 5. Run Your Solution

You can now run your solution.

Process STEP 6: Go To the Solution Documentation

You can now run your solution.

Process STEP 7: Your Solution Docker Run Command

You can now run your solution container.

Process STEP 8: Stream Your Solution Dashboard

Stream your real-time dashboard.

Process STEP 9: TML Solution Built in Less than 2 Minutes

Congratulations! You just built a real-time solution in less than 2 minutes

7.23. STEP 0. Go into tml-airflow folder

Tip

Watch the video that shows how to easily create, delete, copy and stop TML project: Youtube Video

Assuming you have the TSS container running following the steps here TSS Docker Run Command and logged in using the instructions here How To Use the TML Solution Container go into DAG code editor then:

_images/sol11.png

7.24. STEP 0. tml-airflow -> dags -> tml-solutions

You will see the following as shown in figure below

_images/sol1.png

7.25. STEP 1. Click the file: CREATETMLPROJECT.txt - you will see the following as shown in figure below:

_images/sol2.png

7.26. STEP 1. Type the name of your project

7.26.1. Creating a Project

Important

You should use lowecase letters. DO NOT ENTER ANY SPACES - Enter any name like myawesometmlproject then PRESS SAVE

_images/sol3.png

Note

All projects will be “appended” with parts of your READTHEDOCS token. This is to ensure project uniqness on READTHEDOCS.

7.27. STEP 1. You just created a TML Project and committed to Github. Congratulations!

To confirm everything went ok go to the Github account:

i.e. /raspberrypi/tml-airflow/dags/tml-solutions/ you should see a folder for myawesometmlproject-3f10

_images/sol4.png

7.28. Deleting a Project

Tip

If you want to DELETE this project simply type a - (minus) in front of it (as shown below):

-myawesometmlproject

The TSS will delete the entire project and commit the changes to Github.

NOTE: If you deleted a previous project and re-created it you should CLEAR your TSS browser CACHE.

Warning

All information/code related to this project will be deleted and may not

be recoverable.

_images/deleteproject.png

7.29. STEP 2. Click the folder: myawesometmlproject-3f10

You will see the figure below - VOILA!

_images/sol5.png

7.30. STEP 2. Confirm Your New Project Was Created in TSS and Committed to Github

To confirm the new DAGs for myawesometmlproject were created properly, in TSS click DAGs (top menu item)

Then enter a filter: myawesometmlproject Click Enter.

You should see all your DAGs (note if they don’t show up just wait 30 seconds or so) - you should see figure below:

_images/sol6.png

Important

What did you just do?

You copied TML TEMPLATE DAGs to your own solution folder - for your own TML solution build.

If you want to create another TML solution - just repeat STEPS 1-3 with a new project name.

Tip

New project could take 30 seconds or more to show up on the main Airflow screen.

Please be patient. If there are no errors - it will show up.

7.30.1. Stopping a Running Project

To stop a running project use the ‘.’ then project name.

_images/dotproject.png

7.30.2. Copying A Previous Project

Tip

If you want to copy from a previous TML project and rename to a new project then:

  1. In STEP 3 type myawesometmlproject>myawesometmlproject2, the character “>” means copy myawesometmlproject to myawesometmlproject2 (as shown in figure below)

  2. Hit Save

  3. Voila! You just copied an older projec to a new one and saved the time in entering paramters in the DAGs.

_images/sol7.png

To confirm the new project was properly copied repeat STEPS 4 - 6. You should see your myawesometmlproject2-3f10 committed to Github:

_images/sol8.png

Important

The documentation link WILL ONLY be functional AFTER you run your project in TSS.

Here are your new DAGs:

_images/sol9.png
_images/sol55.png

Tip

Check the logs for status updates: Go to /raspberrypi/tml-airflow/logs/logs.txt

_images/sol10.png

Tip

For details on the editor go to Codemirror

7.31. STEP 3. Make Parameter Modifications to Your Project’s TML DAGs

_images/soldags2.png

TML Dags inside your project:

_images/tmldags.png

7.32. STEP 4. Choose the Solution Template You Want to Run

You have several solution templates to choose from see TML Solution Templates and choose the functions you want your solution to perform see The Solution Template Naming Conventions

Attention

After you create a project in STEP 1 above, these templates will be copied under your project.

DO NOT MODIFY the original templates, create a project first, then work on the renamed templates under your project name.

This ensure proper versioning of projects, and ensures project integrity. Also, it allows you to see the differences between multiple projects.

Important

This solution reads a local file. All local files are in the /rawdata folder in the container. If you want to read your own local file, you MUST map a local folder to the rawdata folder. For further details refer to here Producing Data Using a Local File

7.32.1. Project Solution Template Run

As an example, let choose solution_preprocessing_dag-myawesometmlsolution-3f10

Tip

Note, when you create your own project - I called mine: myawesometmlsolution - all of the DAGs and solution templates are copied, renamed and committed to Github. It is a copy of DAG 8. Solution Template: solution_template_processing_dag.py and simply copied, renamed and moved under your project folder myawesometmlsolution-3f10. Go to TSS and see it as in STEP 3.

Also, this project folder will automatically be committed to your Github folder - see figure below.

_images/sd2.png

Now, as per STEP 3. Make a Parameter Modification to Your Project’s TML DAGs as you need. This DAG uses a local file for ingesting data: how do I know this? See The Solution Template Naming Conventions

7.32.1.1. Parameter Changes to TML DAGs

Here is a step by step changes to the TML DAGs.

  1. tml_read_LOCALFILE_step_3_kafka_producetotopic_dag-myawesometmlsolution-3f10.py: Change the inputfile field to point to your local data file:

    • I added ‘inputfile’ : ‘/rawdata/IoTData.txt’ - the IoTData.txt is provided to you for demonstation inside the TSS container in the /rawdata folder.

    • SAVE the file

    _images/p1.png
  2. tml_system_step_1_getparams_dag-myawesometmlsolution-3f10.py: Most of the parameters are set for you. But, if you are using KAFKA CLOUD you may want to set:

  • brokerhost : ‘127.0.0.1’, # <<<<************* THIS WILL ACCESS LOCAL KAFKA - YOU CAN CHANGE TO CLOUD KAFKA HOST

  • brokerport : ‘9092’, # <<<<************* LOCAL AND CLOUD KAFKA listen on PORT 9092

  • cloudusername : ‘’, # <<<< –THIS WILL BE UPDATED FOR YOU IF USING KAFKA CLOUD WITH API KEY - LEAVE BLANK

  • cloudpassword : ‘’, # <<<< –THIS WILL BE UPDATED FOR YOU IF USING KAFKA CLOUD WITH API SECRET - LEAVE BLANK

_images/p2.png

To see what all the other parameters mean, go here DAG STEP 1: Parameter Explanation

For our demonstration we will use the existing values in the DAG.

  1. tml_system_step_2_kafka_createtopic_dag-myawesometmlsolution-3f10.py: Now create all the Kafka topics for your solution. Specifcally,

  • ‘raw_data_topic’ : ‘iot-raw-data’, # Separate multiple topics with comma <<< ****** You change topic names as needed

  • ‘preprocess_data_topic’ : ‘iot-preprocess,iot-preprocess2’, # Separate multiple topics with comma <<< ****** You change topic names as needed

  • ‘ml_data_topic’ : ‘ml-data’, # Separate multiple topics with comma <<< ****** You change topic names as needed

  • ‘prediction_data_topic’ : ‘prediction-data’, # Separate multiple topics with comma <<< ****** You change topic names as needed

  • ‘pgpt_data_topic’ : ‘cisco-network-privategpt’, # PrivateGPT will produce responses to this topic - change as needed

  • ‘replication’ : ‘1’, Leave at 1 for on-prem Kafka

  • ‘numpartitions’: ‘1’, Increase partition as needed.

_images/p3.png

All topics will be created for your solution in Kafka.

Important

If using Kafka Cloud you will need to set

  • ‘replication’ : ‘3’, Change to a minimum of 3 for replication factor

  • ‘numpartitions’: ‘1’, Increase partition as needed.

For more explanation on parameters go here DAG STEP 2: Parameter Explanation

  1. tml_system_step_4_kafka_preprocess_dag-myawesometmlsolution-3f10.py: Modify the preprocessing JSONCRITERIA.

    Refer to JSON PROCESSING for more explanation. The following jsoncriteria is being used.

_images/p4.png
'jsoncriteria' : 'uid=metadata.dsn,filter:allrecords~\
subtopics=metadata.property_name~\
values=datapoint.value~\
identifiers=metadata.display_name~\
datetime=datapoint.updated_at~\
msgid=datapoint.id~\
latlong=lat:long', # <<< **** Specify your json criteria. Here is an example of a multiline json -

Note

Since this is preprocessing ONLY we are skipping the Machine Learning and AI DAGs - DAGS 5, 6 and 9.

  1. tml_system_step_7_kafka_visualization_dag-myawesometmlsolution-3f10.py

    For further details on how to create your own dashboards refer to :re:`Creating Your Own Dashboards`

    As an example, TSS has several dashboards out of the box - dashboard.html is being used here.

_images/p5.png

Other dashboards are:

  • iot-failure-seneca.html

  • iot-failure-machinelearning-uoft.html

  • tml-cisco-network-privategpt-monitor.html

You can go inside these dashboard by going to your <repo>/tml-airflow/dashboard in Github and create your own.

7.33. STEP 5. Run Your Solution

The figures below show the VERY SIMPLE steps of running your solution template DAG:

_images/p51.png

Then click the START button on top right.

_images/p52.png

If the solution ran successfully you will see all green light.

_images/p53.png

7.34. STEP 6: Go To the Solution Documentation

Your solution documentation is automatically generated for you:

Tip

To find the name of the documentation URL goto to your Github /tml-airflow/dags/tml-solutions/myawesometmlsolution-3f10

The url is in the commit message as shown in figure below.

_images/sp5.png

7.35. STEP 7: Your Solution Docker Run Command

You solution docker container is also automatically built and pushed to Docker hub:

_images/sp6.png

Your Solution docker run command is in the documentation. You can now take this Docker container and scale it with Kubernetes as you wish.

_images/sp8.png

7.36. STEP 8: Stream Your Solution Dashboard

Click the Operating Details and Run Your Dashboard

_images/sp1.png
_images/sp2.png
_images/sp3.png

And, here is your real-time dashboard - auto-generated!

_images/sp4.png

7.37. STEP 9: TML Solution Built in Less than 2 Minutes

CONGRATULATIONS! YOU JUST BUILT A END-END REAL-TIME SOLUTION IN LESS THAN 2 MINUTES!

_images/sp7.png

7.38. Project Action Commands Summary

Goto the TSS and select from the top menu item: Admin -> Dags Code Editor

Navigate to the File: root/tml-airflow/dags/tml-solutions/CREATETMLPROJECT.txt then perform the following as you wish:

Action Type

Syntax

Explanation

Add Project

No symbol needed

Just Type project name. No spaces, or

special characters,

just alphanumerics in CREATETMLPROJECT.txt

Delete Project

-

Type - then project name.

For example, -myproject in CREATETMLPROJECT.txt

Copy From a Previous Project

>

Type > between projects. For example,

oldproject>newproject in CREATETMLPROJECT.txt

Stop a Running Project

.

Type . then your currently running project.

For example, .myproject in CREATETMLPROJECT.txt

Tip

Also see here Copying TML Project(s) From Others Git Repo for copying projects between TML users.