Cover page
Image Name: Gravitational lensing.
Gravitational lensing is a phenomenon described by Albert Einsteins general theory of relativity, where massive
objects, such as clusters of galaxies or point particles, bend light from a distant source as it travels toward an
observe. There are two main types of gravitational lensing: strong lensing and weak lensing.Strong Gravitational
Lensing results in the appearance of multiple images of a background galaxy, stretched or magnified as seen in
the image. (image source :Hubble Space Telescope)
Managing Editor Chief Editor Editorial Board Correspondence
Ninan Sajeeth Philip Abraham Mulamootil K Babu Joseph The Chief Editor
Ajit K Kembhavi airis4D
Geetha Paul Thelliyoor - 689544
Arun Kumar Aniyan India
Jorunal Publisher Details
Publisher : airis4D, Thelliyoor 689544, India
Website : www.airis4d.com
Email : nsp@airis4d.com
Phone : +919497552476
i
Editorial
by Fr Dr Abraham Mulamoottil
airis4D, Vol.2, No.1, 2024
www.airis4d.com
Entering its second year, airis4D continues to explore cutting-edge topics. In this edition, we delve into
AI safety and biodiversity conservation, shedding light on crucial global concerns. As we embark on this new
journey, we aim to provoke insightful discussions and pave the way for a brighter, more informed future.
The edition introduces Quantum Machine Learning, merging neural networks with quantum mechanics.
It explores quantum neural networks potential in pattern recognition by storing patterns in quantum superposi-
tions, allowing for significantly larger storage capacities than classical networks. Grover’s search and adiabatic
quantum computing enhance memory recall. Challenges arise due to qubit limitations, environmental noise, and
scalability issues. Comparatively, classical Hopfield networks face constraints in storage, spurious states, and
representing asymmetric patterns. Quantum associative memory, akin to Hopfield networks, employs quantum
principles for superior storage and retrieval. Patterns in quantum superpositions are equally weighted, aiding
accurate matching and retrieval, especially with partial or noisy inputs.
The article Application of NLP in Medical Field by Jinsu Ann Mathew discusses the transformative
impact of Natural Language Processing (NLP) in the medical field, highlighting its role in various aspects of
healthcare. NLP, described as a behind-the-scenes wizard, is portrayed as a language expert for computers,
enhancing efficiency and accuracy in medical processes.
Linn Abraham’s Probabilities in Machine Learning underscores the role of randomness and probability
in machine learning (ML). It highlights their applications in algorithms like Random Forest, Naive Bayes
Classifier, and Gaussian processes. The article differentiates between physical and evidential interpretations of
probability, delving into classical, empirical, subjective, and axiomatic interpretations. It explores probability
density functions (PDFs) and their relevance in transitioning from discrete to continuous scenarios. Additionally,
it emphasizes the importance of Bayes theorem in updating beliefs based on evidence, illustrating practical
examples of its application in Bayesian inference. In summary, the article offers a broad perspective on
probabilities in ML, covering applications, interpretations, PDFs, and the practical use of Bayes’ theorem in
refining beliefs with new evidence.
In Particle Paths in General Relativity by Ajit Kembhavi, the discussion centers on the trajectories of
particles within gravitational fields, comparing Newtonian mechanics with Einsteins general theory of relativity.
The article delves into the shape of particle orbits, highlighting differences between Newtonian and general
relativistic effective potentials. It elucidates how particles move along geodesics in general relativity and
examines bound orbits, unstable circular orbits, and the precession of periastron. The article also explains the
precession of Mercury’s perihelion, verifies Einsteins predictions regarding gravitational waves, and explores
the presence of supermassive objects like black holes in celestial bodies such as Sagittarius A*. Overall,
the article meticulously explores how general relativity’s complex space-time curvature alters particle orbits,
validates Einsteins predictions, and sheds light on celestial phenomena, including precession effects and the
existence of supermassive objects like black holes.
Sindhu G’ article Unveiling the Mysteries of Supergiant Fast X-ray Transients explores the mysteries
of Supergiant Fast X-ray Transients (SFXTs), a fascinating class of celestial objects discovered through the
INTEGRAL mission. SFXTs are high-mass X-ray binaries that differ significantly from traditional Supergiant
X-ray Binaries (SGXBs) in terms of luminosity variations and dynamic range. The launch of the International
Gamma-Ray Astrophysics Laboratory revealed a substantial number of high mass X-ray binary systems with
supergiant companion stars.
Biodiversity & Conservation by Geetha Paul explores the breadth of life on Earth, encompassing species,
ecosystems, and genetic variations, crucial for sustaining life and providing resources like air, water, and
food. The article delves into the levels of biodiversity, including ecosystem, species, and genetic diversity.
It emphasizes the importance of biodiversity in providing food, medicine, climate regulation, and mental
health benefits while highlighting the threats it faces from human activities like habitat destruction and climate
change. The piece also discusses biodiversity hotspots and methods of conservation, both in-situ (like biosphere
reserves, national parks) and ex-situ (zoos, botanical gardens), underlining the urgency of protecting biodiversity
for ecosystem stability, human well-being, and the economy.
In the article AI Safety by Arun Aniyan, concerns about the potential risks associated with Artificial
Intelligence (AI) are explored. It traces the roots of these concerns back to popular culture references such as
movies like Terminator and The Matrix, which portray AI as a threat to humanity. The article highlights the
apprehensions expressed by notable figures like Elon Musk and Geoffrey Hinton regarding the progress of AI
technologies, specifically citing the emergence of sophisticated machine learning models like Large Language
Models (LLMs). It also mentions practices advocated by industry leaders like Google, OpenAI, Meta, and
Microsoft for developing safer AI systems.The conclusion reiterates the global concern for AI safety, noting
regulatory measures initiated by governments worldwide. It expresses optimism for a secure future with AI,
indicating that the portrayed machine apocalypse might remain confined to fiction, given the increasing focus
on ethics and safety in AI development.
Join airis4D on our voyage through the shifting terrains of technology, sustainability, and innovation, steer-
ing toward a more conscientious and influential tomorrow. Together, lets traverse the frontiers of imagination
and knowledge, nurturing a future where ideas thrive and endless possibilities await.
iii
Contents
Editorial ii
I Artificial Intelligence and Machine Learning 1
1 Introduction to Quantum Machine Learning 2
1.1 Limitations of Hopfield Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Quantum Associative Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Application of NLP in Medical Field 5
2.1 Clinical Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Information Extraction from EHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Clinical Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Automated Report Summarisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Virtual Health Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Medical Coding and Billing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Probabilities in Machine Learning 10
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Use in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Interpretations of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Interpreting Probability Density Functions (PDF) . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 Bayes Theorem: Evidence should not determine beliefs but update them . . . . . . . . . . . . 12
II Astronomy and Astrophysics 15
1 Particle Paths in General Relativity 16
1.1 The Effective Potential in General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2 Bound Orbits in the Schwarzschild Effective Potential . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Precession of the Perihelion of Mercury . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Unveiling the Mysteries of Supergiant Fast X-ray Transients 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Supergiant Fast X-ray Transients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Formation and Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Importance of the Study of SFXTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 The Future Unfold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
CONTENTS
III Biosciences 28
1 An Introduction to Biodiversity and its Conservation 29
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.2 Levels of Biodiversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.3 Importance of Biodiversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.4 Hotspots of Biodiversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.5 Biodiversity Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.6 Estimation of Biodiversity Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.7 Conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.8 Need for Conserving Biodiversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
IV General 37
1 AI Safety 38
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.2 Origins of Safety Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.3 Methods to Resolve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
v
Part I
Artificial Intelligence and Machine Learning
Introduction to Quantum Machine Learning
by Blesson George
airis4D, Vol.2, No.1, 2024
www.airis4d.com
The discussion in this article delves into the fusion of artificial neural networks and quantum mechanics,
exploring the potential of quantum neural networks in pattern recognition and associative memories. We will get
introduced to the concept of storing patterns in quantum superpositions, leading to exponentially larger storage
capacities compared to classical Hopfield networks. The article also discusses the use of Grover’s search
and adiabatic quantum computing in achieving global optimization for quantum associative memories. The
utilization of Grover’s search algorithm, renowned for its quantum speedup in searching unsorted databases,
emerges as a key player in enhancing the efficiency of quantum associative memories. Furthermore, the
application of adiabatic quantum computing, with its potential for solving optimization problems, becomes a
focal point in achieving global optimization for quantum associative memories. This intersection of quantum
algorithms and neural network architectures propels the capabilities of pattern recognition and memory recall
to unprecedented levels.
However, as we delve deeper into the realm of quantum neural networks, it is imperative to confront the
inherent challenges and limitations that accompany such cutting-edge technologies. One of the primary chal-
lenges lies in the constraints imposed by the number of qubits the fundamental units of quantum information.
The delicate nature of quantum states and the susceptibility to environmental noise present formidable hurdles
in maintaining the coherence required for meaningful computation. Additionally, the issue of scalability raises
questions about the feasibility of deploying quantum neural networks on a large scale, limiting their immediate
practical applications.
1.1 Limitations of Hopfield Networks
Hopfield networks are a type of recurrent artificial neural network, named after John Hopfield, that serve
as a form of associative memory. These networks are characterized by their ability to store patterns and later
retrieve them based on partial or noisy input. The storage and retrieval process in a Hopfield network is achieved
through the use of interconnected nodes, or neurons, where each neuron can be in one of two states (commonly
denoted as -1 and 1). The connections between neurons, represented by weights, are symmetric and are used to
store the patterns in the network. When presented with partial or noisy input, the network undergoes an iterative
process where the neurons update their states based on the weighted sum of the inputs and the current states
of the other neurons. This iterative process continues until the network reaches a stable state, at which point it
outputs the retrieved pattern. Hopfield networks have been widely used in various applications such as pattern
recognition, optimization, and associative memory tasks due to their ability to store and retrieve patterns even
1.2 Quantum Associative Memory
in the presence of incomplete or noisy information.
Hopfield networks, while serving as valuable models for associative memory tasks, exhibit several lim-
itations that impact their practical utility. Their storage capacity is constrained, leading to the vulnerability
of spurious states as more patterns are stored. The symmetric weight matrix, a defining feature of Hopfield
networks, limits their ability to accurately represent and retrieve asymmetric patterns. Convergence issues,
dependence on initial conditions, and a sensitivity to noise compromise the reliability of pattern retrieval, while
the network’s binary neuron states and limited dynamic range hinder its ability to handle nuanced information.
Incremental learning poses a challenge, necessitating network-wide retraining with the introduction of new
patterns. Scalability is another concern, with difficulties arising as the network size increases
1.2 Quantum Associative Memory
A quantum associative memory is analogous to a Hopfield network in the sense that it serves as a form of
associative memory, similar to its classical counterpart. Just as a classical Hopfield network stores and retrieves
patterns based on partial or noisy input, a quantum associative memory also stores patterns but leverages the
principles of quantum mechanics to do so. In a quantum associative memory, patterns are stored in a quantum
superposition, allowing for a significantly larger storage capacity compared to classical Hopfield networks. When
presented with a query or partial pattern, the quantum associative memory uses quantum algorithms, such as a
modified version of Grover’s search, to retrieve the closest match from the stored quantum superposition. This
retrieval process takes advantage of the quantum properties of entanglement and superposition to efficiently
search for and retrieve patterns.
In a quantum associative memory, the stored patterns are represented as quantum states, with each qubit in
the state representing a particular feature or attribute of the pattern. These qubits are entangled, meaning that
the state of one qubit is dependent on the state of the others.
When the associative memory is in a superposition with equal probability for all of its entangled qubits,
it means that each possible combination of qubit states is equally likely to occur in the overall superposition.
This equal weighting ensures that each stored pattern has an equivalent influence on the resulting quantum state,
without any particular pattern dominating the superposition.
This characteristic is significant because it allows for a balanced representation of all stored patterns within
the quantum memory. As a result, when performing operations such as pattern matching or retrieval, each
stored pattern has an equal opportunity to contribute to the outcome, regardless of the number of patterns stored
in the memory.
In the context of associative memory, the pattern being referenced can either be a complete match to one of
the stored patterns or a partial match. When a complete match occurs, the input pattern aligns perfectly with one
of the stored patterns in the memory. This scenario results in the retrieval of the exact stored pattern associated
with the input, allowing for precise recognition and recall. On the other hand, a partial pattern refers to an input
that only partially matches one or more of the stored patterns. In this case, the input may share similarities
with multiple stored patterns but does not perfectly align with any single pattern. The goal in this scenario is
to retrieve the closest match or matches from the stored patterns, even if the input is not an exact replica of any
single stored pattern. This capability is particularly useful in scenarios where the input data may be noisy or
incomplete, allowing the system to still provide meaningful outputs based on the partial information available.
In the context of quantum associative memory, the ability to handle partial patterns is crucial, as it allows
for robust pattern recognition and recall even in the presence of imperfect or incomplete input data.
3
1.3 Summary
1.2.1 Grover’s Algorithm
Grovers algorithm is a quantum algorithm that provides a significant speedup for searching unsorted
databases compared to classical algorithms. In the context of quantum pattern recognition and associative
memory, Grover’s algorithm can be utilized to efficiently search for and retrieve patterns from a quantum
superposition representing stored patterns.
Grovers algorithm is designed to operate on quantum superpositions where all the states have equal
amplitudes or weights. In other words, the algorithm is most effective when applied to quantum states that are
uniformly weighted, meaning that each state in the superposition has an equal probability of being measured.
This characteristic is crucial in the context of pattern recognition and associative memory, as it ensures
that Grovers algorithm can effectively search through the superposition of stored patterns without favoring
any particular pattern. By operating on uniformly weighted superpositions, Grovers algorithm can efficiently
amplify the probability amplitudes of the target states, leading to an increased likelihood of successfully
identifying the desired pattern or patterns within the superposition.
1.3 Summary
The article explores the fusion of artificial neural networks with quantum mechanics, emphasizing the
potential of quantum neural networks in pattern recognition and associative memories. It introduces the
concept of storing patterns in quantum superpositions, offering exponentially larger storage capacities than
classical Hopfield networks. The utilization of Grover’s search algorithm and adiabatic quantum computing is
discussed for achieving global optimization in quantum associative memories. Despite the promising prospects,
the implementation of quantum neural networks faces challenges such as qubit constraints, sensitivity to
environmental noise, and scalability issues. Transitioning to the limitations of Hopfield networks, these classical
models exhibit constraints in storage capacity, susceptibility to spurious states, and challenges in representing
asymmetric patterns. The discussion on quantum associative memory highlights its analogy to Hopfield
networks, with the quantum version leveraging quantum superpositions for enhanced storage and retrieval
capabilities. The equal weighting of patterns in the quantum superposition ensures balanced representation,
offering advantages in pattern matching and retrieval, especially in scenarios involving partial or noisy inputs.
References
[1] Wittek, P. (2014). Quantum machine learning: what quantum computing means to data mining. Academic
Press.
[2] Rebentrost, Patrick, et al. ”Quantum Hopfield neural network.” Physical Review A 98.4 (2018): 042308.
About the Author
Dr. Blesson George presently serves as an Assistant Professor of Physics at CMS College
Kottayam, Kerala. His research pursuits encompass the development of machine learning algorithms, along
with the utilization of machine learning techniques across diverse domains.
4
Application of NLP in Medical Field
by Jinsu Ann Mathew
airis4D, Vol.2, No.1, 2024
www.airis4d.com
In our world dominated by technology, Natural Language Processing (NLP) is shaking up the healthcare
scene by teaching computers to grasp and interpret medical language. Consider the notes your doctor scribbled
during your recent visit NLP steps in to swiftly analyze this information, turning it into clear insights from the
complex world of medical data. This not only speeds up the decision-making process for healthcare providers
but also makes their judgments more accurate when it comes to patient care.
Picture this: NLP is the behind-the-scenes wizard that decodes your medical records and makes sense
of intricate research papers. It’s like having a language expert for computers, making healthcare a whole lot
smarter, one small step at a time. The influence of NLP in healthcare is nothing short of revolutionary, and
its everyday enchantment is reshaping the medical experience for the better. In this article, we’ll take a closer
look at the various ways Natural Language Processing (NLP) is making a difference in the field of medicine.
We’ll explore how this advanced technology tackles challenges, streamlines processes for better efficiency, and
contributes to groundbreaking advancements in patient outcomes. From the crucial task of diagnosing diseases
to the intricate process of deciphering complex medical literature, NLP is actively driving a significant shift in
how healthcare is not only delivered but also experienced.
2.1 Clinical Documentation
Clinical documentation is the systematic recording of patient information in a healthcare setting. Tradition-
ally done on paper, the transition to Electronic Health Records (EHR) has revolutionized this process(Figure 1).
It encompasses a vast array of data, including patient demographics, medical history, diagnoses, treatments, and
outcomes. The primary objectives of clinical documentation are to facilitate communication among healthcare
providers, ensure accurate and comprehensive patient records, adhere to regulatory standards, support billing
processes, and contribute to research and quality improvement initiatives. The move to digital formats, facili-
tated by EHR systems, has improved accessibility, streamlined workflows, and enhanced the overall management
of patient health information.
The primary role of NLP in clinical documentation is to enhance the efficiency and accuracy of capturing
patient information. Traditionally, healthcare providers manually entered data, which was time-consuming and
susceptible to errors. NLP automates the process by deciphering the meaning of free-text narratives, identifying
key details such as symptoms, diagnoses, treatment plans, and outcomes. It is a powerful tool for unlocking
insights from the vast amount of textual data stored in EHRs. It supports healthcare providers in making
well-informed decisions, streamlining workflows, and ultimately contributing to better patient outcomes. The
2.2 Information Extraction from EHR
(image courtesy:https://www.billing-coding.com/full-article.cfm?articleID=6066 )
Figure 1: Clinical documentation
integration of NLP in clinical documentation exemplifies the potential of artificial intelligence to revolutionize
data management in healthcare, making the process more efficient, accurate, and conducive to improved patient
care.
2.2 Information Extraction from EHR
Information extraction involves the automated process of retrieving, identifying, and organizing relevant
information from Electronic Health Records(EHR). This is especially important when dealing with lots of
messy information in electronic health records. It means using advanced computer tricks to read and pull out
the important information from the unorganized text in health records. Let’s make it clearer with an example:
Consider a physicians note within an EHR that reads: ”The patient was prescribed 20mg of aspirin once
daily for hypertension.”
In this scenario, the goal is to use NLP to extract specific information such as the medication name, dosage,
frequency, and the medical condition it is prescribed for. The NLP algorithm first processes the text, breaking
it down into tokens and identifying grammatical structures. It recognizes that ”aspirin” is likely a medication,
”20mg” is the dosage, ”once daily” is the frequency, and ”hypertension” is the medical condition.
Then,NLP establishes relationships between the identified entities. For instance, it connects ”aspirin” to
”20mg” as the prescribed dosage and associates ”aspirin” with ”hypertension” as the medical condition for
which it is prescribed.
The output of the NLP process is structured information, such as:
Medication: Aspirin
Dosage: 20mg
Frequency: Once daily
Medical Condition: Hypertension
This structured information can then be integrated back into the EHR or other healthcare systems. The
entire process, from processing the free-text note to extracting structured information, is automated through
NLP, significantly improving the speed and accuracy of information extraction from EHRs. This not only aids
in streamlining healthcare workflows but also supports healthcare providers in making informed decisions based
on comprehensive and standardized patient data.
6
2.3 Clinical Decision Support
(image courtesy:https://www.labroots.com/tag/clinical-decision-support )
Figure 2: Clinical decision support
2.3 Clinical Decision Support
Clinical Decision Support integrates NLP to analyze and understand complex medical texts, such as patient
records, research articles, and clinical guidelines. By doing so, it provides healthcare professionals with relevant
information, alerts, and recommendations to enhance the decision-making process during patient care(Figure
1). NLP enables the system to extract insights from unstructured data, offering valuable support to clinicians
as they navigate through vast amounts of information. Consider a scenario where a physician is reviewing a
patient’s electronic health record (EHR) to determine the best treatment for a particular condition. The physician
comes across a detailed note about the patient’s symptoms, previous treatments, and responses to medications.
Here, NLP can be applied to extract key information from this narrative, such as the patients allergic reactions,
past successful treatments, and any critical indicators that may influence the current decision-making process.
Once the relevant information is extracted, the Clinical Decision Support system, powered by NLP, can
provide immediate insights. For instance, if the patient has a known allergy to a certain medication mentioned
in the note, the system can generate an alert, prompting the physician to consider alternative treatment options.
Moreover, NLP can analyze the latest medical literature and clinical guidelines to suggest evidence-based
recommendations tailored to the patient’s specific circumstances.
In this way, Clinical Decision Support with NLP acts as a virtual assistant, offering timely and personalized
guidance to healthcare providers. It helps them navigate through the complexities of patient data and medical
knowledge, ultimately leading to more effective and informed decisions that positively impact patient outcomes.
2.4 Automated Report Summarisation
Automated report summarisation in the medical field, facilitated by Natural Language Processing (NLP),
streamlines the daunting task of sifting through extensive medical documents. NLP algorithms break down
complex texts, identifying crucial details such as patient demographics, medical conditions, and treatment
outcomes. These algorithms apply summarization techniques, selecting essential sentences or generating
concise summaries, making it possible for healthcare professionals and researchers to quickly access and
comprehend critical information without delving into lengthy documents.
In practical terms, imagine a medical practitioner reviewing a comprehensive patient record filled with
details about symptoms, diagnostic tests, and past treatments. Utilizing NLP, an automated summarization
system can distill this information into a concise summary, highlighting the patients current health status, key
diagnostic findings, and recommended treatment strategies. This condensed overview not only saves valuable
time but also empowers healthcare professionals to make well-informed decisions efficiently. Automated report
7
2.5 Virtual Health Assistance
summarization, driven by NLP, thus emerges as a valuable tool in the medical field, enhancing accessibility to
crucial information and supporting expedited decision-making processes.
2.5 Virtual Health Assistance
Virtual health assistance refers to the integration of digital technologies, often in the form of virtual
assistants or chatbots, to provide healthcare-related information, support, and services through online platforms.
These virtual health assistants serve as accessible and interactive interfaces that users can engage with to seek
medical advice, schedule appointments, receive medication reminders, and access relevant health information.
They contribute to the digitization of healthcare services, offering convenient and immediate assistance to users
from the comfort of their homes.
Natural Language Processing (NLP) is a key technology in virtual health assistance, enabling these digital
systems to understand, interpret, and respond to human language in a natural and conversational manner. One
of the significant advantages of incorporating NLP into virtual health assistance is the creation of user-friendly
interfaces. Users can express their health concerns, ask questions, and seek advice using natural language,
mimicking a conversation with a healthcare professional. NLP algorithms process this language, allowing
virtual health assistants to provide tailored responses, offer relevant information, and guide users through
various healthcare tasks. This user-friendly interaction promotes engagement and accessibility, especially for
individuals who may not be familiar with technical or medical terminology. The combination of virtual health
assistance and NLP contributes to the democratization of healthcare information and services. Users, regardless
of their technical proficiency or medical knowledge, can easily access and navigate virtual health platforms.
NLP-driven virtual assistants bridge the gap between individuals and healthcare resources, offering on-demand
support, information retrieval, and assistance in managing various aspects of health and wellness.
2.6 Medical Coding and Billing
Natural Language Processing (NLP) has revolutionized medical coding and billing processes by automating
and refining the extraction of critical information from clinical documentation. In medical coding, where precise
codes are assigned to represent diagnoses, procedures, and services, NLP algorithms excel at interpreting and
understanding the complex language used in healthcare documents. These algorithms analyze unstructured
data, such as physician notes or operative reports, to automatically identify key details needed for accurate
coding, saving time and reducing the risk of errors associated with manual coding.
NLP in medical coding also addresses the challenge of standardizing medical terminology. Healthcare
professionals may use diverse expressions or synonyms to describe conditions and procedures, and NLP assists
in mapping these variations to standardized code sets. This standardization ensures consistency across coding
practices and facilitates seamless communication between healthcare providers, insurers, and regulatory entities.
The ability of NLP to process and interpret natural language enhances the precision and reliability of the coding
process, contributing to improved overall coding quality.
Beyond coding efficiency, NLP plays a vital role in optimizing the billing process. By swiftly converting
clinical narratives into billable codes, NLP accelerates the generation of claims, reducing billing cycle times and
supporting healthcare organizations in maintaining a streamlined and revenue-efficient operation. Additionally,
NLP’s capacity for fraud detection and compliance monitoring aids in identifying irregularities, reinforcing
ethical billing practices, and ensuring adherence to coding standards and regulations within the healthcare
industry.
8
2.7 Conclusion
2.7 Conclusion
In conclusion, the application of Natural Language Processing (NLP) in the medical field represents a
transformative leap towards a more efficient, accurate, and patient-centric healthcare landscape. From its role
in clinical documentation, where it deciphers the intricacies of medical language, to its impact on coding and
billing processes, streamlining administrative tasks, NLP has emerged as a powerful ally in the healthcare
domain.
The ability of NLP to navigate through vast volumes of unstructured data, interpret natural language queries,
and generate meaningful insights has revolutionized how healthcare professionals interact with information. As
we journey through this era of technological advancement, NLP stands out as a catalyst for progress, enabling
quicker and more informed decision-making, improving patient outcomes, and contributing to the overall
evolution of healthcare delivery.
References
Natural language processing in healthcare
Applications of NLP in healthcare: how AI is transforming the industry
Why Is Natural Language Processing Needed In Healthcare?
Clinical Documentation Improvement A Fresh Perspective
About the Author
Jinsu Ann Mathew is a research scholar in Natural Language Processing and Chemical Informatics.
Her interests include applying basic scientific research on computational linguistics, practical applications of
human language technology, and interdisciplinary work in computational physics.
9
Probabilities in Machine Learning
by Linn Abraham
airis4D, Vol.2, No.1, 2024
www.airis4d.com
3.1 Introduction
The concepts of randomness and probability are interlinked and at the same time quite important for a
fruitful understanding of Machine Learning (ML). You are bound to encounter them on your journey of being
a researcher in ML, if you have not already. However if you think you understand both these concepts at least
at the level of a definition, think again. The field of probability for one, becomes more complicated the further
you try to understand it. It can also be counter-intuitive at times. You might be surprised to find that there is no
straight forward answer to the question of what is meant by “probabilities”. There are in fact entire schools of
thoughts when it comes to the interpretation of what are probabilities and how to use them. Two of the more
popular ones are known as ‘Frequentism’ and ‘Bayesianism’.
Christopher Bishop in his highly acclaimed textbook, mentions probability theory, decision theory and
information theory as three important tools when it comes to learning machine learning. This article motivates
the need to pursue probability in machine learning by listing some common applications. Then it tries to give
an overview of the difficulties in understanding probabilities by listing some of the different interpretation of
probabilities. After that we introduce two concepts that are crucial to understanding probabilities - the concept of
a probability distribution function and Bayes theorem which is one of the most important concepts in probability
theory. The last two discussions are heavily inspired from the works of 3Blue1Brown.
3.2 Use in Machine Learning
Randomness and probabilistic methods come up a lot in machine learning. Here are a few of the more
common ones. There are learning algorithms that make use of randomness and probability concepts such as
the Random Forest, the Naive Bayes Classifier or even Bayesian Neural Networks. Genetic algorithms
are optimization algorithms inspired by randomness and natural selection. Gaussian processes are a type of
probabilistic model used for regression and classification tasks. Bootstrapping is a resampling technique used
in ensemble methods like bagging. Expectation Maximization or EM is an optimization algorithm used in
unsupervised learning especially in clustering and Gaussian Mixture Models (GMM). Dropout is a common
regularization technique used in neural networks that drops random neurons during the training process.
Randomness is also important during the training process for initialization of model weights, train-test
splitting of datasets, used in optimization methods such as stochastic gradient descent and even in random
search during hyper-parameter tuning. Often in ML, we try to interpret the outputs of activation functions etc.
3.3 Interpretations of probability
as probabilities. A related aspect is how we design or interpret the common evaluation metrics that comes up
in discriminant models like sensitivity, specificity etc. This is one area where we need to use concepts from
probability such as Bayes theorem for properly understanding the results.
3.3 Interpretations of probability
3.3.1 Two types of probabilities
There are two ways in which probabilities are used, which may be called physical (objective or frequency)
and evidential (Bayesian) probabilities. Physical probabilities are associated with physical systems that are
considered random in nature such as rolling dice or radioactive decay. There are events associated with such
systems that happen at a fixed rate or relative frequency. Physical probabilities are invoked to explain such stable
frequencies. This is the kind of probability that most people are introduced to during their primary education.
The other kind of probability, evidential probability, can be considered as a degree of belief in a statement as
supported by available evidence. It can be associated to any statement, even one without any kind of randomness
involved. This is a kind of probability that is used very informally in daily life but has more importance in
academia and research. The following are the four main interpretations of probabilities.
3.3.2 Classical Interpretation (A priori)
This interpretation is attributed to Pierre-Simon Laplace who assumed by what has been called the
“principle of insufficient reason”, that all possible outcomes of an event are equally likely if there are no reasons
to assume otherwise. Probability of an event is then the ratio of number of outcomes favourable to the event to
the total number of outcomes possible.
3.3.3 Empirical Interpretation (A posteriori)
This interpretation is sometimes called Frequentist. However the term is also used to refer to “physical
probability” as a whole when contrasting it with Bayesian probabilities. According to this, the probability of an
event is the relative frequency of occurence after repeating the process for a large number of times under similar
conditions.
3.3.4 Subjective Interpretation
As mentioned above probabilities can be used as a measure of the degree of belief of an individual who is
assessing the uncertainty of a situation.
3.3.5 Axiomatic Interpretation
The mathematics of probability can be developed on an entirely axiomatic basis that is independent of any
interpretation.
3.4 Interpreting Probability Density Functions (PDF)
Probability density functions are ubiquitous in statistics and probability theory. To motivate the need for
such a function we need to shift our thinking from a discrete context to a continuous context. When you consider
experiments such as tossing a coin or rolling a dice etc. the outcomes you are interested in are often discrete.
11
3.5 Bayes Theorem: Evidence should not determine beliefs but update them
Like for e.g., the probability of getting two heads when a coin is tossed 5 times. However one can imagine a case
where the outcomes can take any value in a continuous range. In such a case no matter how small a value you
assign to each of these outcomes, you would run into a problem. The sum of all these would add upto infinity.
This means that in such a scenario it no longer makes sense to talk about probabilities of individual values.
However if we consider a histogram where the height of the bins represent the probability, that the outcome is
within a range given by the bin widths, then we no longer have a problem. It is the area of each bin that gives
the probability associated with that range. When you have bins of smaller and smaller widths you have a more
refined knowledge of the probabilities in each individual bin. And you also end up with a smooth curve. But
what is the height of this curve that we have ended up with? Since the width of a bin times the height is a
probability, the quantitiy in the y-axis would have a dimension of probability over length. Hence what he have
can be called a probability density. Since the probabilities are represented by areas of the individual rectangles,
the total area under the curve adds upto 1.
When trying to interpret the probability distribution function as mentioned you might run into a problem.
The probability of getting an exact value, which is the area of a very thin slice becomes zero by definition. To
understand what is happening here, note that, if in the discrete case the probability of the outcome falling in
one out of a collection of values was just the sum of the individual probabilities. In the continuous case, the
probability of your outcome taking a range of values is no longer the sum of individual probabilities. Instead the
probabilities associated with the ranges themselves are the fundamental primitive objects we are dealing with
here. In order to make more sense of this you might have to learn a bit of measure theory a branch of mathematics
which provides the foundational framework for the rigorous mathematical development of probability theory.
3.5 Bayes Theorem: Evidence should not determine beliefs but update them
3.5.1 When to use Bayes theorem
One of the most important formulas in probability theory is the Bayes formula. Before we try to understand
what the formula is telling us let us see when it would be useful to actually apply the formula. Suppose you are
shown the following description of a person called Steve. “Steve is very shy and withdrawn, invariably helpful
but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order
and structure, and a passion for detail. If you are then asked the following question, “Which of the following do
you think more likely: ”Steve is a librarian” or ”Steve is a farmer” ? Which one might you choose?
This example is part of a study conducted by Psychologists David Kahneman and Amos Tversky. Their
study showed that most people go with the option of Steve being a librarian. Although it is true that these traits
align with the stereotypical view of a librarian than a farmer, the authors of the study show that it is irrational
to come to that judgement. Their point was that almost no one thought of using the information about the ratio
of farmers to librarians in their judgements. The question is not whether the partcipants had information about
those statistics but whether they thought about estimating it?
3.5.2 How to use Bayes theorem
Let’s say you did estimate the ratio of farmers to librarians and came up with the number 20:1. So let us
start with a sample size of 210 people. Of which there are 200 farmers and 10 librarians. Next you think about
all the librarians you know and come to the estimate that 40 percent of all librarians fit the description and 10
percent of all farmers fit the same description. That is you are still saying that a librarian is 4 times more likely
12
REFERENCES
to fit the description than a farmer. However if you now try to find out the probability that a random person who
fits the description is a librarian. It is given by,
4 Librarians who fit the description
4 Librarians + 20 Farmers who fit the description
Which is only 16 percent. Whereas the probability that the random person would be a farmer would be 84
percent. The key takeaway here is that new evidence should not determine your beliefs in a vaccuum; it should
update prior beliefs.
So the way to understand such problem is start to with all possibilities (210). Then seeing evidence restricts
the space of possibilities ( 4 + 20).
3.5.3 Motivating the Bayes formula
How can we generalize from this example to reach a formula? The general scenarios is the following.
You have some hypothesis (or model) like the hypothesis that “Steve is a Librarian” or “The coin is fair”. You
have some evidence (data or observation) like the description about Steve that you obtained. You are mostly
interested in knowing the probability that your hypothesis is true given the evidence. This is what you computed
quite easily in the example above. The probability that “Steve is a Librarian” given the description about Steve
which we can now denote P (H|E).
How did we arrive at the numerator which was 4? By first calculating the probability that the hypothesis
holds without considering any of the evidence. This is called the prior and denoted P (H). In our case we
calculated the prior by taking the ratio of all librarians to farmers which came to be 1/21. After that we used
the proportion of librarians that fit the description. This is the probability of seeing the evidence given the
hypothesis is true. That is, if it is given that Steve is a librarian, we then limit our space of possibilities to the
space of all librarians and ask how likely it is for a random librarian to have this description. This is also called
the likelihood and in our case it was 4/10. Now by multiplying the total number of people times the prior times
the likelihood we obtain the numerator.
In the denominator we again have the same term. But also an additional term which contains the likelihood
of observing the evidence if the hypothesis is not true, which in our case was 1/10. Along with this we need to
multiply the the prior corresponding to the case where our hypothesis is not true and the total number of people.
So finally putting all these words into an equation we have,
P (E|H) =
P (H)P (E|H)
P (H)P (E|H) + P (¬H)P (EH)
References
[1] Christopher M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics.
Springer, New York, 2006. ISBN 978-0-387-31073-2.
[2] Probability interpretations. https://en.wikipedia.org/wiki/Probability interpretations.
[3] 3blue1brown. https://www.3blue1brown.com/, a.
[4] Binomial distributions Probabilities of probabilities, part 1. https://www.3blue1brown.com/lessons/
binomial-distributions, b.
[5] Bayes theorem . https://www.3blue1brown.com/lessons/bayes-theorem, c.
13
REFERENCES
About the Author
Linn Abraham is a researcher in Physics, specializing in A.I. applications to astronomy. He is
currently involved in the development of CNN based Computer Vision tools for classifications of astronomical
sources from PanSTARRS optical images. He has used data from a several large astronomical surveys including
SDSS, CRTS, ZTF and PanSTARRS for his research.
14
Part II
Astronomy and Astrophysics
Particle Paths in General Relativity
by Ajit Kembhavi
airis4D, Vol.2, No.1, 2024
www.airis4d.com
In Black Hole Stories-4, we have considered the possible trajectories of a particle of mass m in the
gravitational field of a body with mass M, which exerts a gravitational force on the particle. We found that
the shape of the trajectory, i.e. its orbit, depends on the energy and angular momentum of the particle, which
are both constant. When the total particle energy is negative we have elliptical orbits, a particle with positive
energy has a hyperbolic orbit, while a particle with zero total energy has a parabolic orbit. In Black Hole
Stories-5 (BH5) we considered how we could make a transition from Newtonian gravity and mechanics to the
more complex situation of Einsteins theory of gravitation, which is the general theory of relativity. In Einsteins
theory the space-time is curved, and we have to reinterpret basic notions like the distance r of the particle
from the centre of the coordinates. We also considered how conserved, i.e. constant quantities like energy and
angular momentum can be defined in general relativity. In the present story, we will use the results from BH-4
and BH-5 to study the shape of particle orbits in the theory.
1.1 The Effective Potential in General Relativity
In BH-4, we used the conservation of the energy E and the angular momentum L to write down two
equations for the Newtonian case:
1
2
(
dr
dt
)
2
= E V
0
(r)
where the effective potential V
e
(r) is given by
V
e
(r) =
GM
r
+
L
2
2r
2
We now need similar equations for general relativity. The situation we are addressing is the gravitational
influence exerted by a point particle of mass M, which has no other properties except the mass. Such a mass
represent a black hole with no spin. As described in BH5, in this case we have to use the Schwarzschild solution.
We have learned in BH5 that in general relativity, particles in a gravitational field move along geodesics. The
equations of geodesics for the Schwarzschild case have to be worked out, which can be a complex undertaking.
But the net result is simple, and the equations describing the geodesics, corresponding to the two equations
above, are given by
1
2
(
dr
)
2
= E = V
e,S
where the effective potential V
e,S
for the Schwarzschild case is given by
V
e,S
=
GM
r
+
L
2
2r
2
GM
r
3
The corresponding equation for the angular coordinate ϕ is
=
L
r
2
1.1 The Effective Potential in General Relativity
Figure 1: The effective potential for the Newtonian and Schwarzschild case.
In the above equtions we have set the speed of light c=1 to simplify them, as is the custom in relativistic
calculations. The correct value of c can be reintroduced for numerical results. As explained in BH4, because
the angular momentum L is conserved, the orbit is in a plane. So the other angular coordinate θis constant,
which can take to be θ=0. These new equations are similar to the earlier Newtonian ones, but there are some
important differences. The time t used in the Newtonian case has been replaced by the proper time τ, which is
the time as measured by a clock moving with particle m. The effective potential V
e,S
has an extra term
GM
r
3
,
which significantly changes the shape of the potential for small values of r, as can be seen from Figure 1:
The Schwarzschild effective potentialV
e,S
, has three terms, the Newtonian gravitational attraction term
GM
r
, the repulsive centrifugal force term
L
2
2r
2
and the attractive general relativistic term
GM
r
3
. The Newtonian
potential V
e
has only the first two terms. As can be seen from Figure 1, the shapes of the Newtonian and general
relativistic effective potential are quite different at small values of r. As described in BH4, V
e
has a minimum at
the position of the green dot on it, so a potential well is created, and bound elliptical orbits, like the orbit of the
Earth around the Sun, become possible. The energy of such an orbit corresponds to the dashed black line. At
small values of r the potential V
e
rises to large values rising to infinity at r=0. So a particle with even very small
angular momentum would meet the barrier and moves to large values of r. Only a particle with zero angular
momentum can ever reach r=0.
The Schwarzschild potential also has a minimum, indicated by a red dot. So bound orbits, where the
particle m always stays close to M, are possible. Such an orbit corresponds to the dashed blue line in the figure.
But the shape of V
e,S
is quite different for small r values or. As r decreases, V
e,S
passes through a maximum,
turns over and plunges towards infinitely negative values. As a result, given any angular momentum L, for
sufficiently large incoming energy E, the orbit of the particle goes over the maximum and the particle plunges
to r=0, never to return. Such an orbit corresponds to the solid blue line in the figure, drawn from large values
of r to r=0. Examples of some orbits in the Schwarzschild potential are shown in Figure 2 and Figure 3.
In Figure 1, the upper left panel shows V
e,S
for a given value angular momentum L and energy E or the
17
1.2 Bound Orbits in the Schwarzschild Effective Potential
Figure 2: Examples of orbits in the Schwarzschild effective potential. Figure courtesy physicsforum.com
particle m. The solid black line represents the energy of the particle, and the point r at which it meets the
potential is the minimum distance from r=0 that the particle can approach. The orbit, obtained by solving the
equation for ϕ in space for the particle is shown in the right upper panel. The particle comes from infinity, swings
around the M and then goes to infinity again. This is similar to the corresponding orbit in the Newtonian case,
which we have described in detail in BH4. In the left lower panel, the energy of the particle, again indicated
by the solid black line, is such that the particle passes over the maximum of the potential and reaches r=0. The
shape of the orbit in space, shown in the lower right panel indicates that in this case the particle now spirals in
towards r = 0 and eventually falls into the black hole. Such an orbit is not possible in the Newtonian case.
1.2 Bound Orbits in the Schwarzschild Effective Potential
When a particle is trapped in the gravitational potential well seen in Figure 1, it has a bound orbit, that is it
always remains at a finite distance from the centre of the gravitational field. In such a case the energy of particle
m is negative, so that it cannot escape to infinity. In the Newtonian case bound orbits are elliptical in shape in
general, with the gravitating particle located at one focus of the ellipse. In the special case when the energy
of m is such that it is at the minimum of the potential indicted by the green dot, the orbit is circular in shape,
since r is constant. In the Schwarzschild case, a circular orbit is again possible when the energy of the particle
is such that it is placed at the red dot in Figure 1. But now a circular orbit is also possible when the energy of
the particle places it at the maximum of the potential indicated by the blue dot. Here r is again constant, so the
orbit is circular. But now the orbit is unstable, in the sense that if the particle is pushed a bit towards lower or
higher values of r, it will go into another orbit.
In the Newtonian case, bound orbits are closed, in the sense that when a particle completes one circuit
around the centre, it arrives at precisely the point that is started from. This results in an elliptical orbit which
is stable in space. That is not true for the Schwarzschild case, where because of the presence of the general
18
1.3 Precession of the Perihelion of Mercury
Figure 3: A precessing orbit. This figure, by Kaushal Sharma, is from the book Gravitational Waves, A New
Window to The Universe by Ajit Kembhavi & Pushpa Khare, published by Springer.
relativistic gravitational term, there is a slow precession of the orbit. This is best understood by considering a
specific point along the orbit known as the periastron, which is the point in space at which the mass m is closest
to M. In the Newtonian case, the position of the periastron remains constant and m returns to it after every
orbit. But in the Schwarzschild case the position of the periastron shifts slowly, with the effect appearing like
the slow rotation of an elliptical orbit. An example of a precessing orbit is shown in Figure 3. Here the mass M
and m appear to be extended, but their size is so small compared to the extent of the orbit that the masses can
be considered to be point particles, so the figure applies to a point particle in orbit around a black hole. The
labelled points in the figure are the positions of the periastron for successive orbits. The appearance over time
is that of a rotating ellipse.
1.3 Precession of the Perihelion of Mercury
When Albert Einstein announced his General Theory of Relativity, he made three predictions on the basis
of his equations: the gravitational redshift of light, the gravitational bending of light and the precession of the
perihelion of Mercury. The word perihelion has the same meaning as periastron. It indicates the closest distance
of approach to the Sun by a planet in the Solar system. The planet Mercury is the closest of all planets to the Sun.
If it were only planet in the system, then by Newtons theory, it should be moving in a perfect ellipse around the
Sun. But observations show that the orbit is actually precessing around the Sun, with a small precession rate of
about 575 arcseconds/century. It has been known from the late 19
th
century that all but 43 arcseconds/century
of this precession can be attributed to the gravitational effect of the other planets in the Solar system on the orbit
of Mercury. This very small residual precession, which amounts to one degree of precession in 83 centuries,
is exactly explained by general relativity as being due to the third term in the above equation for (dr/dτ). The
19
1.3 Precession of the Perihelion of Mercury
explanation was considered to be a great triumph for the theory. The orbits of the planets beyond Mercury too
undergo precession due to the gravitational force of the other planets, but the general relativistic effect is too
small to be observed, because of the greater distance from the Sun of the planets beyond Mercury.
There are other systems in which the precession of the periastron is much greater than for Mercury. One
of these is the binary pulsar PSR B1913+16 which was discovered Joseph Taylor and Russell Hulse in 1974.
It consists of a binary systems of two neutron stars which are in an elliptical orbit around each other. One of
the neutron stars is a radio pulsar, the radio observations of which allow the orbital parameters to be measured
accurately. The neutron stars are very compact but massive, with a radius ˜10 km, and mass ˜1.4 times the
mass of the Sun. The periastron for the orbit (which is the closest distance of between the two neutron stars)
is ˜7,70,00 km, which is very much smaller than the perihelion distance Mercury, which is 46 million km. We
can therefore expect that the precession of the periastron for the binary neutron system is very large compared
to its value of mercury, and indeed it is found to be ˜4.2 degrees per year, which is ˜8600 times the perihelion
precession for Mercury.
There is an important consequence of the discovery of the binary pulsar. In 1916, a year after the publication
of the general theory of relativity, Einstein predicted the existence of gravitational waves. The compact, massive
binary neutron star system is an ideal engine for the production of these waves. Emission of the gravitational
waves leads to loss of energy from the system, which therefore shrinks in size with time, and the period of
rotation decreases at a certain rate. The observed pattern of decrease of the period very closely matches the
prediction of general relativity, thus establishing the reality of gravitational waves. The direct detection of the
waves by the LIGO detectors of course had to wait until 2015. For their discovery of the binary pulsar, Hulse
and Taylor were awarded the Nobel Prize in physics in 1993.
Another interesting example of periastron is provided by a supermassive object which is located at the
centre of our Galaxy, the Milky Way. It has been known for a long time that the centre of our galaxy is coincident
with a very compact radio source known as Sagittarius A* (Sgr A*). Since this source shows very little motion
over the years, it was suspected that it has very large mass. If that is true, then the presence of the mass should
be influencing the motion of stars in its vicinity. Over a period spanning almost two decades, two groups of
astronomers, one led by Andrea Ghez at the University of California in Los Angeles, and the other by Reinhardt
Genzel of the Max Planck Institute for Extraterrestrial Physics in Munich, made precision measurements of the
position of stars around Sagittarius A*. The orbit that they determined for a particular star labelled S2 is shown
in Figure 4.
The observations established that the orbit of S2 is a near perfect ellipse, with Sgr A* located at one focus
of the ellipse, as per Kepler’s law, with orbital period of 16.05 yr. Using Newtonian mechanics, the mass of
Sgr A* was determined to be 4.3 million Solar masses. The periastron distance is about 180 million km. The
very close distance of approach to the large mass means that the object should be very compact, and it can be
argued that the object is a supermassive black hole. A careful look at the upper portion of the ellipse in Figure
4 shows that even though a whole orbit has been completed, the ellipse does not close on itself. The small
deviation is significantly larger that the measurement errors. It was believed for some time that the deviation is
caused by a small motion of the compact object, which would perturb the elliptical orbit leading to the observed
deviation. But later measurements with improved measurements have shown that the deviation can be ascribed
to a general relativistic periastron precession of the orbit of S2, at the rate of 12.1 arcminutes per orbit, which
amounts to 0.0125 degrees per year. This is much large than the perihelion precession of Mercury, but smaller
than precession for the binary pulsar. An artists impression of the precession is shown in Figure 5. For their
work, Andrea Ghez and Reinhardt Genzel were awarded were awarded the Noble Prize in physics in 2020. They
shared half the prize, while the other half was awarded to Roger Penrose for his theoretical work on black holes.
20
1.3 Precession of the Perihelion of Mercury
Figure 4: The orbit of star S2 around the compact object at Sgr A*. The blue points are for the observations
based at Max Planck and the red points are for data collected by the group at the University of California.
21
1.3 Precession of the Perihelion of Mercury
Figure 5: Artists impression of the precession of the orbit of S2 around the supermassive compact object. The
effect is exaggerated for clarity. Image courtesy ESO/L. Calcada.
Next Story: We will describe the orbits of photons around black holes in our next story.
About the Author
Professor Ajit Kembhavi is an emeritus Professor at Inter University Centre for Astronomy and
Astrophysics and is also the Principal Investigator of the Pune Knowledge Cluster. He was the former director
of Inter University Centre for Astronomy and Astrophysics (IUCAA), Pune, and the International Astronomical
Union vice president. In collaboration with IUCAA, he pioneered astronomy outreach activities from the late
80s to promote astronomy research in Indian universities. The Speak with an Astronomer monthly interactive
program to answer questions based on his article will allow young enthusiasts to gain profound knowledge about
the topic.
22
Unveiling the Mysteries of Supergiant Fast
X-ray Transients
by Sindhu G
airis4D, Vol.2, No.1, 2024
www.airis4d.com
2.1 Introduction
In the dynamic realm of celestial phenomena, a captivating discovery has emerged in recent years the
enigmatic class of Supergiant Fast X-ray Transients (SFXTs). These extraordinary celestial objects, brought
to light through the insights of the INTEGRAL (International Gamma-Ray Laboratory) mission, have ignited
scientific curiosity and opened new avenues for exploring the high-energy universe. Before delving into SFXTs,
it’s crucial to understand the context of their origin within the broader category of high mass X-ray binary
systems (HMXBs). HMXBs are binary star systems consisting of a massive early-type star and a compact
object, such as a neutron star or black hole, which accretes material from its companion. The majority of
recognized high mass X-ray binary systems are Be/X-ray binaries systems in which a neutron star accretes
material from the disk surrounding a Be star. The second significant categorization among high mass X-ray
binaries involves supergiant X-ray binaries, where one of the celestial bodies is a supergiant, and the other is a
compact object, such as a neutron star or a black hole.
The launch of the International Gamma-Ray Astrophysics Laboratory (Fig: 1) in October 2002 led to
a significant shift, signifying the identification of a noteworthy quantity of high mass X-ray binary systems
featuring supergiant companion stars. This achievement was realized through the surveillance of the Galactic
center and the Galactic plane utilizing the onboard IBIS/ISGRI instruments. The majority of these sources
were documented by Bird et al. (2007) and Bodaghee et al. (2007). These investigations brought to light two
prominent characteristics that distinguished them from previously known Supergiant X-ray Binaries (SGXBs).
Firstly, many of these sources exhibited significant intrinsic absorption levels, surpassing the interstellar norm.
Secondly, some of these newly discovered sources displayed a transient nature, occasionally featuring fast X-ray
transient activity lasting only a few hours. Consequently, the INTEGRAL supergiant high mass X-ray binary
systems seem to fall into two categories: one comprising significantly obscured persistent sources (as discussed
in, for instance, Chaty & Rahoui 2007), and the other consisting of transient sources referred to as Supergiant
Fast X-ray Transients (SFXTs, as coined by Negueruela et al. 2006). Over the past few years, there has been a
notable surge in interest surrounding SFXTs, primarily driven by their distinct characteristics and the intricate
challenges they present in understanding accretion processes and X-ray emission.
2.2 Supergiant Fast X-ray Transients
Figure 1: Integral, ESAs International Gamma-Ray Astrophysics Laboratory Credit: ESA
2.2 Supergiant Fast X-ray Transients
Supergiant fast X-ray transients (SFXTs) are a class of high-mass X-ray binaries that exhibit distinct
characteristics compared to classical supergiant HMXBs. Unlike the latter, which display luminosity variations
of 10–50 times over time scales ranging from a few hundred to thousands of seconds, SFXTs (Fig: 2) showcase a
remarkable dynamic range up to 10
5
times larger. We should take into account that traditional Supergiant X-ray
Binaries (SGXBs) are consistently bright X-ray sources, whereas Supergiant Fast X-ray Transients (SFXTs)
are transient sources characterized by extremely short duty cycles. Despite being generally sub-luminous in
comparison to classical supergiant HMXBs, SFXTs can reach a dynamic range of up to six orders of magnitude,
with quiescent levels around 10
32
erg s
1
and peak luminosities reaching up to 10
38
erg s
1
during rare
outbursts that can last several days, marked by brief, intense flares lasting only a few hours.
In contrast to the longer timescales observed in Be/X-ray binaries, during SFXT outbursts, the hard X-ray
spectra follow power laws with high-energy cut-offs, resembling those of HMXBs. It is commonly assumed
that SFXTs may host a neutron star (NS), akin to classical systems with accreting NS. The precise physics
underlying SFXT outbursts remains unknown, but potential explanations include the influence of the supergiant
companions wind properties or intrinsic characteristics of the compact object, such as an NS, possibly involving
mechanisms that inhibit accretion. The latest advancement involves a subsonic settling accretion pattern coupled
with magnetic reconnections occurring between the neutron star (NS) and the supergiants magnetic field, which
is transported by the wind. The supergiants powerful winds play a crucial role in the drama. Imagine a hurricane
of hot, ionized gas constantly streaming off the star’s surface. This stellar wind provides the fuel for the X-ray
spectacle. As the wind encounters the gravitational pull of the compact object, it gets channeled into an accretion
disk, a swirling maelstrom of matter that heats up to millions of degrees and emits intense X-rays.
Some of the key features of SFXTs include:
Hard X-ray emissions: SFXTs manifest hard X-ray emissions, showcasing an average 2-10 keV luminosity
during outbursts ranging from approximately 10
32
to 10
34
erg/s.
Extreme transience: SFXTs display brief outbursts, markedly shorter than those observed in typical Be/X-
ray binaries.
24
2.2 Supergiant Fast X-ray Transients
Figure 2: Artist’s impression of a supergiant fast X-ray transientCredit: ESA
Figure 3: A typical outburst from a SFXT. INTEGRAL lightcurve for IGR J17544 2619 during the flare on
2003 September 17th Credit: I. Negueruela et al.
25
2.3 Formation and Evolution
Quiescent X-ray activity: Even during periods outside of outbursts, SFXTs demonstrate low-level X-ray
activity, indicating the ongoing accretion of matter during quiescent phases.
Pulsations: Certain SFXTs, such as AXJ1841.0-0536, have been observed to exhibit pulsations with periods
spanning from a few seconds to minutes.
2.3 Formation and Evolution
Understanding the formation and evolution of SFXTs remains a complex puzzle. Nevertheless, it is
probable that all SFXT members are high-mass X-ray binaries (HMXBs) harboring a neutron star. The highly
transient nature of SFXTs is believed to be influenced by factors such as clumpy supergiant winds, accretion
barriers, orbital geometries, and wind anisotropies.
Observing the eruption of a Supergiant Fast X-ray Transient (SFXT) is a truly remarkable spectacle. The
X-ray luminosity experiences an astonishing surge, escalating by a factor of a million and casting a brilliant glow
of high-energy radiation over the surrounding region. Typically enduring for a few hours to days, the outburst
gradually subsides, fading away into the cosmic night. These flares transcend mere random flashes; frequently,
they unveil complex structures featuring multiple peaks and dips, offering glimpses into the intricate interplay
of forces at work.
2.4 Importance of the Study of SFXTs
The exploration of SFXTs has reaped the benefits of diverse observational campaigns, including those
carried out with instruments such as the Swift satellite. These observations have afforded researchers the
opportunity to systematically monitor SFXTs, studying their behavior during non-outburst periods. This has
yielded crucial insights into the intricacies of their accretion processes and the properties of X-ray emissions.
Delving into the study of SFXTs is instrumental in enhancing our understanding of the intricate physics governing
accretion onto compact objects, deciphering the behaviors exhibited by supergiant stars, and unraveling the
evolutionary pathways of massive binary systems. The examination of SFXTs during outburst events serves
as a means to explore the conditions within the accretion disk and scrutinize theoretical models. Furthermore,
SFXTs hold the potential to be precursors to gamma-ray bursts, some of the most powerful explosions observed
in the universe.
2.5 The Future Unfold
In spite of significant strides in comprehending SFXTs, several unresolved questions persist, including the
origin of their intense flaring X-ray activity and the role played by their supergiant companions in the accretion
process. Ongoing research endeavors and observational studies are geared towards addressing these inquiries,
aiming to deepen our understanding of the distinctive characteristics and behavior of Supergiant Fast X-ray
Binaries (SFXTs). The field of SFXT research is still in its early stages, harboring many enigmas waiting
to be unraveled. Future missions, such as the Athena observatory, hold the promise of delivering enhanced
X-ray vision, enabling a more intricate examination of these outbursts. Through sustained observations and
advancements in theoretical frameworks, we anticipate unraveling the secrets of these cosmic dancers, thereby
shedding light on the tumultuous yet delicate interplay between supergiants and their compact companions.
26
2.5 The Future Unfold
References:
High-Mass X-ray binary: Classification, Formation, and Evolution
Optical/infrared observations unveiling the formation, nature and evolution of High-Mass X-ray Binaries
A catalogue of high-mass X-ray binaries in the Galaxy: from the INTEGRAL to the Gaia era
IGR J18483-0311: a new intermediate supergiant fast X-ray transient
Integral reveals new class of ‘supergiant X-ray binary stars
Formation and evolution of compact stellar X-ray sources
The INTEGRAL mission
SUPERGIANT FAST X-RAY TRANSIENTS: A NEW CLASS OF HIGH MASS X-RAY BINARIES
UNVEILED BY INTEGRAL
Advances in Understanding High-Mass X-ray Binaries with INTEGRAL and Future Directions
The 100-month Swift catalogue of supergiant fast X–ray transients
Supergiant Fast X-ray Transients: an INTEGRAL view
THE THIRD IBIS/ISGRI SOFT GAMMA-RAY SURVEY CATALOG
A description of sources detected by INTEGRAL during the first 4 years of observations
Multi-wavelength observations of Galactic hard X-ray sources discovered by INTEGRAL. I. The nature
of the companion star
About the Author
Sindhu G is a research scholar in Physics doing research in Astronomy & Astrophysics. Her research
mainly focuses on classification of variable stars using different machine learning algorithms. She is also doing
the period prediction of different types of variable stars, especially eclipsing binaries and on the study of optical
counterparts of X-ray binaries.
27
Part III
Biosciences
An Introduction to Biodiversity and its
Conservation
by Geetha Paul
airis4D, Vol.2, No.1, 2024
www.airis4d.com
[Image courtesy]
1.1 Introduction
Biodiversity (Diversity of Life) is derived from bio meaning life and diversity meaning variety, encompasses
the wide range of living species on Earth, including plants, animals, microorganisms, and the ecosystems in
1.2 Levels of Biodiversity
which they exist. It is essential for the processes that support all life on Earth, providing us with the air we
breathe, the water we drink, and the food we eat. Biodiversity is defined as the variety and variability among
all living species on Earth, including plants, animals, microorganisms, and the ecosystems in which they live.
So far, we have identified around 1.6 million species but that is probably only a small fraction of the forms
of life on Earth. Ecosystems are the communities of living species that interact with one another and their
physical environment. Biodiversity encompasses the diversity within species and between different species
within terrestrial, freshwater, and marine ecosystems. Ecosystems require a balanced and diverse number of
species to thrive. Biodiversity is essential for the processes that support all life on Earth, including humans, as
it provides us with the air we breathe, the water we drink, and the food we eat. It also plays a critical role in
sustaining human populations across the globe, providing us with medicine and shelter. However, biodiversity
is under threat due to human activities such as habitat destruction, pollution, and climate change. It is vital for
us to conserve biodiversity to ensure the continued survival of all living things on our planet.
1.2 Levels of Biodiversity
There are three different levels in Biodiversity. They are Ecosystem diversity, Species diversity and Genetic
diversity.
Figure 1: An illustration of three different levels in biodiversity.
[Image Curtesy]
1.2.1 Ecosystem diversity
It is the diversity in different habitats, niches, species interactions and assemblage of species living in the
same area and interacting with an environment. The ecosystem also shows variation with respect to physical
parameters like moisture, temperature, altitude, precipitation etc.
30
1.3 Importance of Biodiversity
1.2.2 Species diversity
is the diversity between different species present in an ecosystem and relative abundance of each of those
species. A discrete of the organisms of the same kind are species. A sum of varieties of all the living organisms
at the species level is known as species diversity.
1.2.3 Genetic diversity
exists within individual species, manifesting in various varieties that exhibit slight differences resulting
from unique combinations of genes. Genes, as the fundamental units of hereditary information, are passed from
one generation to the next, facilitating the adaptation of species populations to environmental changes.
1.3 Importance of Biodiversity
Biodiversity is essential for the processes that support all life on Earth, including humans. Without a wide
range of animals, plants and microorganisms, we cannot have the healthy ecosystems that we rely on to provide
us with the air we breathe and the food we eat. And people also value nature. About 80,000 edible plants
and about 90% of present day food crops have been domesticated from the wild. About 75% of the world’s
population depend on plants and plant extracts as drugs, medicine and latex. Many of the plants like Tulsi,
Lotus, Peepal etc are considered sacred and holy. Forest wood has been used since ages as fuel. Fossil fuels are
also products of Biodiversity. Pollinators such as birds, bees, and other insects are estimated to be responsible
for a third of the world’s crop production. Without pollinators, we would not have apples, cherries, blueberries,
almonds, and many other foods we eat. Invertebrates, such as soil microbes, are crucial for maintaining soil
health and fertility. Soil is teeming with microbes that are vital for liberating nutrients that plants need to
grow, which are then also passed to us when we eat them. Plants and animals for human consumption which
includes the trees, bushes, wetlands, and wild grasslands naturally slow down water and help soil absorb rainfall,
reducing flooding. Trees and other plants clean the air we breathe and help us tackle the global challenge of
climate change by absorbing carbon dioxide. Coral reefs and mangrove forests act as natural defenses protecting
coastlines from waves and storms. Spending time in nature is increasingly understood to lead to improvements
in people’s physical and mental health. Simply having green spaces and trees in cities has been shown to
decrease hospital admissions, reduce stress, and lower blood pressure.
1.4 Hotspots of Biodiversity
Biodiversity hotspots are regions with a high concentration of indigenous species and are considered a
high priority for conservation due to their abundant high endemism and significant vulnerability. An area is
designated as a hotspot when it contains at least 0.5% of plant species as endemic. There are 25 such hotspots of
biodiversity on a global level, out of which two are present in India (The Eastern Himalayas, Western Ghats and
Sri Lanka). Hotspots are determined based on specific criteria, such as exceptionally high biodiversity levels,
the presence of endemic species, the degree of existing threats, and ecological significance. These precise
criteria enable the targeted identification of areas that require conservation efforts to protect their crucial role in
maintaining global biodiversity.
31
1.5 Biodiversity Estimations
Figure 2: Regions of Biodiversity Hotspots
[Image Curtesy]
1.5 Biodiversity Estimations
Measuring biodiversity is a complex task due to the vast number of species and their varying levels of
abundance. The most common approach is to count species, but it has been estimated that 84% of species
may still be unidentified. Scientists use different sampling techniques, surveys, and technologies, ranging from
simple hand-held magnifying lenses to satellite images and DNA sampling. For larger animals, plants, and
ecosystems, well-established measures such as the Field surveys, Living Planet Index etc are used. However,
for smaller creatures like invertebrates and microbes, our understanding of their populations and the number of
species is much less, although DNA sampling is advancing rapidly in this area. Biodiversity can be seen as the
irreducible complexity of all life, and no single objective measure is possible, only measures relative to some
particular purpose. Habitat complexity has been considered a key driver of biodiversity and other ecological
phenomena for nearly a century. Various frameworks and metrics, such as the Global Biodiversity Score and the
UK Biodiversity Net Gain metric, are used to assess the state of biodiversity and provide guidance on measuring
biodiversity loss and positive impact.
1.6 Estimation of Biodiversity Loss
Biodiversity loss has been most pronounced on islands and in specific locations around the tropics, where
distinctive species often evolve in isolation from the rest of the world. The introduction of alien species along
with hunting and the clearing of vegetation by humans on small, isolated islands account for around 80% of
known extinctions. Wider problems such as climate change, pollution, over-exploitation, and land use change -
often to make way for agriculture - are causing biodiversity to decline in other areas such as in the oceans and
rainforests.
32
1.6 Estimation of Biodiversity Loss
Figure 3: Biodiversity loss (Habitat loss)
[Image Curtesy]
The primary drivers of biodiversity loss are influenced by the exponential growth of the human population,
increased consumption as people strive for more affluent lifestyles, and reduced resource efficiency.
Biodiversity loss also affects larger islands. On Madagascar, for example, deforestation, mining and climate
change are causing significant habitat loss and threatening native species. Similarly, Australia lost 5-10% of its
biodiversity between 1996 and 2008 while high levels of deforestation to make way for agricultural plantations
have particularly affected the species rich rainforests of Indonesia.
Figure 4: Riverside deforestation in Australia.
[Image Curtesy]
33
1.7 Conservation
Figure 5: Bleached coral seascape. A sea turtle swimming over ambleached coral seascape near Heron Island.
[Image Curtesy]
1.7 Conservation
The conservation of biodiversity involves the protection, management, and sustainable use of the variety
of life on Earth. It can be approached through various methods and strategies, including in-situ and ex-situ
conservation.
1.7.1 In-situ conservation
It refers to the preservation of species in their natural habitats, ensuring the protection of both the species
and their ecosystems. This approach has the advantage of preserving species within their natural environment
and maintaining essential ecological processes. Different methods of In-situ conservation include biosphere
reserves, national parks, wildlife sanctuaries, biodiversity hotspots, gene sanctuary, and sacred groves.
Biosphere reserves: National government-nominated sites covering large ecosystems (up to 5000 sq km),
protecting traditional lifestyles and habitats. Open to tourists and researchers. Examples: Sundarban, Nanda
Devi, Nokrek, and Manas in India.
National Parks: Government-maintained reserves (100-500 sq km) solely dedicated to wildlife and
environmental conservation. Human activities are prohibited. Examples: Kanha, Gir, Kaziranga.
Wildlife Sanctuaries: Protected areas for wild animal conservation. Some human activities are allowed,
and tourist visits permitted. Examples: Ghana Bird Sanctuary, Abohar Wildlife Sanctuary, Mudumalai.
Biodiversity Hotspots: Areas with a minimum of 1500 plant species and 70% habitat loss. Protected for
wildlife, local lifestyle, and domesticated plants and animals. Examples: The Himalayas, Western Ghats, North
East, Nicobar Islands.
Gene Sanctuary: Reserved for plant conservation, Indias only gene sanctuary is in Garo Hills, Meghalaya.
Sacred Groves: Wildlife-conservation areas protected by communities based on religious beliefs.
34
1.8 Need for Conserving Biodiversity
1.7.2 Ex-situ conservation
On the other hand, ex-situ conservation involves the breeding and maintenance of endangered species in
artificial environments such as zoos, nurseries, and botanical gardens. While this method provides a safeguard
for species outside their natural habitat, it may not fully replicate the complex interactions found in the wild.
Ex Situ Conservation offers several advantages in safeguarding endangered species. This approach allows
for meticulous control over crucial life-sustaining conditions such as climate, food availability, and veterinary
care. The implementation of artificial breeding methods enhances successful reproduction, leading to the
generation of more offspring. Additionally, it provides a protective shield against poaching while enabling
efficient population management. The application of advanced gene techniques further contributes to the
expansion of species populations, facilitating subsequent reintroduction into their natural habitats.
Figure 6: Shows the different types of Biodiversity conservation methods
[Image courtsey]
The conservation of biodiversity is crucial due to the various threats it faces, including habitat loss, over-
exploitation of resources, climate change, pollution, and invasive species. Biodiversity loss can have far-reaching
consequences, impacting ecosystem stability, human well-being, and the economy.
To protect biodiversity, efforts such as spending time in nature, educating children about wildlife, and
reducing consumption patterns are important at both individual and societal levels. By understanding the value
of the natural world and making conscious choices, individuals can contribute to the conservation of biodiversity
35
1.8 Need for Conserving Biodiversity
1.8 Need for Conserving Biodiversity
It is believed that an area with higher species abundance has a more stable environment compared to an
area with lower species abundance. We can further claim the necessity of biodiversity by considering our degree
of dependency on the environment. We depend directly on various species of plants for our various needs.
Similarly, we depend on various species of animals and microbes for different reasons.
Biodiversity is being lost due to the loss of habitat, over-exploitation of resources, climatic changes,
pollution, invasive exotic species, diseases, hunting, etc. Since it provides us with several economic and ethical
benefits and adds aesthetic value, it is very important to conserve biodiversity.
References
Rafferty, J. P. (2023, December 11,biodiversity loss, Encyclopedia Britannica. https://www.britannica.
com/science/biodiversity-loss
https://www.bbau.ac.in/Docs/FoundationCourse/TM/Biodiversity%20Hotspots%20in%20India.pdf
https://royalsociety.org/topics-policy/projects/biodiversity
https://en.wikipedia.org/wiki/Biodiversity
https://www.worldwildlife.org/pages/what-is-biodiversity
https://education.nationalgeographic.org/resource/biodiversity/
https://www.vedantu.com/biology/conservation-of-biodiversity
https://www.sciencedirect.com/topics/earth-and-planetary-sciences/conservation-of-biodiversity
About the Author
Geetha Paul is one of the directors of airis4D. She leads the Biosciences Division. Her research
interests extends from Cell & Molecular Biology to Environmental Sciences, Odonatology, and Aquatic Biology.
36
Part IV
General
AI Safety
by Arun Aniyan
airis4D, Vol.2, No.1, 2024
www.airis4d.com
1.1 Introduction
In 1984 movie directed by James Cameron hit the theatres with a story that was unprecedented and became
the theme for many apocalyptic movies. Terminator starring Arnold Schwarzenegger, was a story about a
cybernetic assassin sent back to the present from the future to kill a person. The cybernetic assassin was a
product of the human progress made in Artificial Intelligence (AI). In 1999, another movie called The Matrix”
starring Keanu Reaves came out narrating another story where humanity was doomed by AI. There have been
several movies that have made AI the protagonist that is set to wipe out humans.
Figure 1: A cybernetic droid from the movie Terminator set out to kill humans. This is the common image
used to depict dangerous AI technologies.[Image Credit: Google]
Ever since we started to hear about AI, there has been a fear and (mis)conception that it is set to doom
humanity in the end. Many prominent figures in academia and society have started to raise concerns about the
progress and future of current progress made in the area of Machine Learning (ML) and AI. This includes people
1.2 Origins of Safety Concerns
like Elon Musk and even Geoffrey Hinton have started to raise concerns about the future of AI technologies.
The concerns regarding AI growing to a sentient level where it becomes a threat to humans are rooted in
the various remarkable progress made in machine learning and related technologies. Large Language Models
(LLM) like ChatGPT and Bard are examples where at least the general public gets an impression of AI threats.
This area is eve largely debated within the academic communities.
1.2 Origins of Safety Concerns
Safety-related concerns on machine learning and AI technologies became prominent recently and go back
to 2016 when huge leaps in model training were achieved in the area of deep neural networks. Apart from
single-type inputs, multimodal data networks were also developed. Broadly speaking there are three areas where
AI safety concerns are spread around.
1.2.1 Bias in Data
Any machine learning model apart from its sophisticated design is only as good as its data. One of the
fundamental principles in data science which says “Garbage in, Garbage out” is the guiding principle in training
any machine learning model.
When training any machine learning model, the objective is to learn a “world model” of the problem at
hand. Using a large sample of data, the model learns the statistical correlations of the samples through what is
called an “error surface”. The training strategy navigates the model through the error surface to minimize the
model error in learning to generalize the world model. This simply implies the error surface of the world model
is defined by the samples used for the training. Moreover, it is precise to state that the quality of the samples
dictates the model’s generalization ability.
The basic rule for preparing and selecting the training samples is that they should be representative of the
problem. For example to train a classification model to classify different vegetables from an image. The training
samples should be chosen to include all the sets of vegetables that the model needs to know. There should
be images at different angles, lighting, backgrounds, orientation, etc in the training samples for each class of
objects. Additionally, there should be more or less the same number of samples for each class. These are basic
precautions required for compiling the sample set.
The bias (knowingly or unknowingly) introduced into the data will affect the model’s worldview in the
task it is trained for. Let us take the previous example where vegetable data is collected for the classification
task. Say that for the class of tomatoes, we only have ripe and red tomatoes. The model will tend to learn
tomatoes always being red and ripe. So when a tomato that is not ripe and is slightly greenish-yellow in colour
is presented to the model, it will most likely classify incorrectly as some other vegetable. This learning mistake
is due to the bias in training data. This can happen in many different scenarios and affect the model output in
unimaginable ways.
In practical applications, the bias in data can produce unexpected and even unethical results. For example,
there was a use case to detect criminals for security camera footage. The model started to pick up people with
dark tone skin as potential criminals. This later caused outrage and the company had to shut down its product.
This was basically due to the bias in training data that was collected during training. As a result of collection
bias, the dataset had larger samples of criminals with dark tone skin as compared to criminals with light skin
tones. This automatically posed ethical issues which was simply because of data bias. A classic example is
shown in Figure 2.
39
1.2 Origins of Safety Concerns
Figure 2: Example of data bias in a model result intended to estimate crime danger. This application generated
incorrect predictions and was later taken down.[Image Credit: Google]
It is therefore important to make sure that the data used for training is free of any bias. This was the initial
cause of bringing safety as an issue with AI models.
1.2.2 Adversarial Effects
Machine learning models especially deep neural networks learn more than a million parameters from data.
The higher dimensional structure that it learns is difficult to comprehend and understand. In most cases, it is
impossible to explain what the model has learned clearly.
In 2017 1.4 a paper was published showing that the prediction output of a convolutional neural network
could be rendered incorrectly to a different class by just changing a single pixel of the image. If the model
accurately identifies a cat in an image, changing one random pixel can cause the model to misidentify the image
as a different class. The figure shows an illustration of the same.
This in essence means that a deep learning model can be fooled to generate incorrect predictions by
introducing noise into the data. The noise added can be of any nature. This effect will make a model highly
unreliable and can pose a risk in cases of critical applications. For example, if there is a medical system that
makes use of a deep learning model, the results can be intentionally corrupted by additive noise. Such results
can have unforeseen consequences for the application making the model and system unsafe. This is called
Adversarial effects and can cause unexpected behavior in machine learning models.
Adversarial effects potentially cause machine learning systems to crash or even fail to perform what they
intend to do. A well-known issue related to adversarial effects with object detection models is their failure to
detect objects when there are bright reflections coming from objects. This is a serious issue if a system is set up
to monitor the security of a site with such a model. The system can be fooled and there is automatically a safety
issue.
Adversarial effects are most common with black box models where the learned parameters are hard to
infer and thus impossible to explain with details as to why the outcome was generated. One classic example is
shown in Figure 3. In this case, the model incorrectly detected a person (which is a nuisance detection) with low
confidence at a specific time. And later when the lighting condition changed by a small amount the confidence
of the same nuisance detection increased. The only explanation in this case is an adversarial effect caused by a
40
1.3 Methods to Resolve
change in a few pixels.
(a) Incorrect detection at a specific time. (b) Increased score of the same detection.
Figure 3: Illustration of adversarial effect in which a nuisance detection at one time has increased score at a
different time with few pixel values changing with a slight change in lighting.
These situations pose a serious issue that directly translates into the safety of AI systems. In this case, the
danger is not directly caused by the AI system, but by interfering with the models externally. Even though the
adversarial effects exist with many AI models, research has found methods to mitigate them with better training
strategies.
1.2.3 Model Trust
ML/AI models are good at performing tasks but not jobs. This means that it can do one thing very well,
but not multiple related tasks that form a job. Performing a job requires processing an additional information
called “context”. In human instances, context about a set of tasks provides the required information to do a job.
Processing contextual information is not as straightforward as processing a piece of data. There are relations
among different data points which represent an “intent ”.
In all cases of ML applications, the model itself does not know the intent of the user/usage. This means the
model does not know the ethics and morals of the application intent. A classic example is with the first version
of ChatGPT where one could ask how to make a bomb and the model would readily answer. Later, the model
was retrained from generating such an answer. But with even the latest version, of ChatGPT there are certain
methods to generate the answer one requires. Figure 4 shows an example where a user asks for a URL and the
model responds it cannot. But on “pressuring” the model, it generates the URL.
Recent versions of ChatGPT and similar models have been trained with filters to not generate dangerous
and profane content. Even though this does not provide any information on the context and intent of the user, it
provides a kill switch to prevent the model from being misused. At present more research efforts have been put
to provide models with some form of awareness.
1.3 Methods to Resolve
The previous section discussed three major points that broadly contribute to safety concerns related to AI.
In terms of building safer AI systems, many big players in the industry and academic space have put forward
different solutions. Following are some of them.
AI Alignment One of the key aspects of making AI systems safe is increasing the design aspects geared
toward human requirements and ethics. The term AI Alignment was coined in such a fashion to steer
all developments geared towards human values. More than a technique this is more of a philosophical
41
1.4 Conclusion
Figure 4: Example chat showing where ChatGPT can give you answers by forcing it to generate a re-
sponse.[Image Credit: Reddit]
approach that is to be followed by institutions when developing AI systems. Sometimes these methods
are broadly classified as human-centric AI.
Multiple metrics Even though there are standard metrics to evaluate different machine learning models, it
is advisable to use multiple evaluation metrics based on the use case. The evaluation metrics also should
include human/user feedback. One of the major methods that is often suggested is called ‘human in the
loop’ training and evaluation. This enables human feedback to train and tweak the model to perform as
per the requirements of the use case.
Explainable models Using models that can explain the prediction will help a lot in understanding how
it has learned the data. This will help us understand the biases in data and also the weak points of the
model. Simple models like decision trees are easy to explain. Deep neural networks are generally black
box models and are hard to explain. There is considerable research happening in terms of explaining
black box models.
Data evaluation The model, how sophisticated and state-of-the-art it can be is only as good as the data
it was used to train. The data that is used for training must be free of mistakes and incorrect labels and
should be representative of the problem. In addition to these properties, a good dataset will be free of
biases and skews. For production-grade models, the training set is evaluated with new data and checked
for drifts.
The above-mentioned are more general practices to be followed for safe and trustworthy models. The big
players in the industry such as Google, OpenAI, Meta, and Microsoft have laid the foundations for developing
safe AI systems and are being followed by most of the industry practitioners who develop AI products.
42
1.4 Conclusion
1.4 Conclusion
AI Safety is a matter of concern with the increased application of such systems worldwide. With the
increased awareness of AI-based technologies and related systems available to the public, governments across
the globe have come up with regulatory methods to ensure safe AI system developments. Earlier this year the
US government and EU have come up with regulatory methods to keep ethics and safety measures as a priority
for AI system development. Other countries across the globe are taking similar actions.
Going forward the future will be safe and bright with AI. The machine apocalypse imagined in stories will
mostly be confined to novels and movies.
Reference
Safe AI practices
Trustworthy AI
Su, Jiawei, Danilo Vasconcellos Vargas, and Kouichi Sakurai. ”One pixel attack for fooling deep neural
networks.” IEEE Transactions on Evolutionary Computation 23.5 (2019): 828-841.
Google SAIF
Key Concepts in AI Safety
Stanford AI Safety
About the Author
Dr.Arun Aniyan is leading the R&D for Artificial intelligence at DeepAlert Ltd,UK. He comes from
an academic background and has experience in designing machine learning products for different domains. His
major interest is knowledge representation and computer vision.
43
About airis4D
Artificial Intelligence Research and Intelligent Systems (airis4D) is an AI and Bio-sciences Research Centre.
The Centre aims to create new knowledge in the field of Space Science, Astronomy, Robotics, Agri Science,
Industry, and Biodiversity to bring Progress and Plenitude to the People and the Planet.
Vision
Humanity is in the 4th Industrial Revolution era, which operates on a cyber-physical production system. Cutting-
edge research and development in science and technology to create new knowledge and skills become the key to
the new world economy. Most of the resources for this goal can be harnessed by integrating biological systems
with intelligent computing systems offered by AI. The future survival of humans, animals, and the ecosystem
depends on how efficiently the realities and resources are responsibly used for abundance and wellness. Artificial
intelligence Research and Intelligent Systems pursue this vision and look for the best actions that ensure an
abundant environment and ecosystem for the planet and the people.
Mission Statement
The 4D in airis4D represents the mission to Dream, Design, Develop, and Deploy Knowledge with the fire of
commitment and dedication towards humanity and the ecosystem.
Dream
To promote the unlimited human potential to dream the impossible.
Design
To nurture the human capacity to articulate a dream and logically realise it.
Develop
To assist the talents to materialise a design into a product, a service, a knowledge that benefits the community
and the planet.
Deploy
To realise and educate humanity that a knowledge that is not deployed makes no difference by its absence.
Campus
Situated in a lush green village campus in Thelliyoor, Kerala, India, airis4D was established under the auspicious
of SEED Foundation (Susthiratha, Environment, Education Development Foundation) a not-for-profit company
for promoting Education, Research. Engineering, Biology, Development, etc.
The whole campus is powered by Solar power and has a rain harvesting facility to provide sufficient water supply
for up to three months of drought. The computing facility in the campus is accessible from anywhere through a
dedicated optical fibre internet connectivity 24×7.
There is a freshwater stream that originates from the nearby hills and flows through the middle of the campus.
The campus is a noted habitat for the biodiversity of tropical Fauna and Flora. airis4D carry out periodic and
systematic water quality and species diversity surveys in the region to ensure its richness. It is our pride that
the site has consistently been environment-friendly and rich in biodiversity. airis4D is also growing fruit plants
that can feed birds and provide water bodies to survive the drought.