Cover page
Pillars of Creation
(Image source : https://www.planetary.org/space-images/jwst-pillars-of-creation )
This is the image of the Eagle Nebula at a distance of 6500 light years as seen through the James Web Space
Telescope. This star-forming region was made famous when the Hubble Space Telescope imaged it in 1995. It
is where new stars are born! The dust cloud that looks like a monster consists of microscopic dust particles and
hydrogen gas which form the raw materials for the new born stars! The shining head are composed of newborn
young stars. Image courtesy: NASA, ESA, CSA, STScI; Joseph DePasquale (STScI), Anton M. Koekemoer
(STScI), Alyssa Pagan (STScI)
Managing Editor Chief Editor Editorial Board Correspondence
Ninan Sajeeth Philip Abraham Mulamootil K Babu Joseph The Chief Editor
Ajit K Kembhavi airis4D
Geetha Paul Thelliyoor - 689544
Arun Kumar Aniyan India
Jorunal Publisher Details
Publisher : airis4D, Thelliyoor 689544, India
Website : www.airis4d.com
Email : nsp@airis4d.com
Phone : +919497552476
i
Editorial
by Fr Dr Abraham Mulamoottil
airis4D, Vol.1, No.5, 2023
www.airis4d.com
This edition of airis4D discusses various topics, including Bayesian inference, Natural Language Processing
(NLP), Solar Flares, Cataclysmic Variable Stars, the Barcode of Life Project, and The Ten Years of Science.
Bayesian inference is an essential statistical inference technique that updates the probability of a hypothesis
as new data becomes available. The technique is useful in machine learning for making predictions based
on prior beliefs, and the posterior distribution of predictors can be updated based on new evidence. Blesson
George rightly points out that Bayesian inference is an important technique in statistics, useful in various fields,
including medicine, business, economics, and social sciences.
Jinsu Ann Mathew’s article of Natural Language Processing (NLP) provides a good overview of the
field, including its three main technologies, NLU, NLG, and NLP. NLU enables machines to comprehend and
interpret human language, while NLG uses the information gathered from NLU to generate a natural-sounding
and situation-appropriate response. The article notes that without NLU, machines would struggle to process
human language, limiting their ability to extract insights and value from this vast source of information. The
articles discussion of NLP is brief, but it is clear that it is an essential technology that enables machines to
understand and generate human language.
The article on solar flares provides an interesting historical perspective on the topic, tracing the evolution
of scientific understanding of the Sun from its worship as a deity to a scientific object of study. Linn Abraham
notes that solar flares emit intense bursts of electromagnetic radiation that can cause broadcast interference and
power outages on Earth, making it important to understand them to protect power grids, satellites, and astronauts
during spacewalks. While there is no immediate need to predict solar flares, it is crucial to understand them to
develop strategies for mitigating their effects.
The article of Sindhu G on cataclysmic variable stars provides an informative overview of these fascinating
objects, which are binary star systems that undergo sudden and dramatic changes in their properties. The article
notes that they are actively researched by astronomers to better understand their properties and evolution. While
the article is brief, it provides an excellent introduction to the topic and is likely to spark the readers interest in
learning more about these objects.
Geetha Paul’s article on the Barcode of Life project highlights the project’s ambitious goal of creating a
comprehensive, digital library of DNA barcodes for every species on Earth. The projects potential applications
are vast, including identifying species in ecology, agriculture, medicine, and food safety, among others. The
article notes that the project’s main challenge is the sheer scale of the project, with an estimated 8.7 million
species on Earth. However, the Barcode of Life project has already made significant progress in documenting
and preserving the world’s biodiversity.
Robin Jacob Roy calls to understand and prepare for the dangers posed by tropical cyclones. Tropical
cyclones are fueled by heat and moisture from warm ocean surfaces, and characterized by a low-pressure center,
strong winds, and heavy rain. Favorable environmental factors are needed for a tropical cyclone to develop,
including warm ocean waters, unstable atmosphere, moist air, distance from the equator, and minimal wind
shear. A tropical cyclones structure includes the eye, eyewall, spiral rainbands, upper-level outflow, and sea
surface. Tropical cyclones have various hazards, including storm surges, flooding, powerful winds, tornadoes,
and lightning, and preparation for these dangers is crucial. Once a tropical cyclone reaches sustained winds of at
least 74 mph, it is classified as a hurricane or typhoon, depending on the region. When these hazards combine
and interact with one another, the potential for loss of life and damage to property increases substantially.
The article ”Bringing Industrial Practices to Academia” by Arun Aniyan talks about how the work culture in
academia differs from that of the industry and how bringing the discipline and work methodology of the industry
can improve the work style and throughput of academic work. The article introduces the Agile methodology
and explains its formal framework for product development. The Agile methodology emphasizes defining the
problem, scope, and requirements of a product in a strict and disciplined manner, which can help in achieving
the desired outcome in academia as well. It also emphasizes dividing the solution into independent parts to set
specific milestones for execution.
”The Ten Years of Science” by Ninan Sajeeth Philip discusses significant scientific developments across
various fields in the past ten years, including astronomy, genetics and biotechnology, microbiome research, and
neuroscience. The past decade has witnessed remarkable progress in these fields, leading to new insights into
the nature of the universe and transforming our understanding of the world.
Overall, the articles provide various topics to spark the reader’s interest in learning more about them. The
writing is clear and concise, making it accessible to a broad audience.
iii
Contents
Editorial ii
I Artificial Intelligence and Machine Learning 1
1 Bayesian Interference in Machine Learning 2
1.1 Introduction to Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Bayesian Inference in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Advantages and Limitations of Bayesian Inference in Machine Learning . . . . . . . . . . . . 5
2 Unpacking NLP: Understanding the Roles of NLU and NLG 6
2.1 How NLU and NLG fit into NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Natural Language Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Natural Language Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
II Astronomy and Astrophysics 11
1 Exploring the Explosive Side of the Sun 12
1.1 The Sun and Life on Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Historical Record of Solar Explosions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Do We Need to Predict These Outbursts? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 How the Modern World is Becoming More Vulnerable to Solar Flares? . . . . . . . . . . . . . 14
1.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Cataclysmic Variable Stars 16
2.1 What are Cataclysmic Variable stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Some types of Cataclysmic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
III Biosciences 23
1 Barcode of Life: A Global Biodiversity Challenge 24
1.1 NEXT GENERATION SEQUENCING (NGS) . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.2 DNA barcoding as a new tool for food traceability . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3 Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.4 Strengths, Weaknesses, Opportunities, and Threats for DNA barcoding, resulting from the
SWOT analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
IV Climate 33
1 Tropical Cyclones 34
CONTENTS
1.1 Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4 Naming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.5 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
V General 39
1 Bringing Industrial Practices to Academia 40
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.2 Agile Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.3 Process of Agile Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.4 Reward function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2 The Ten Years of Science 46
2.1 Development across sciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2 Astronomy Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3 Gravitational Wave Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4 Exoplanets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
v
Part I
Artificial Intelligence and Machine Learning
Bayesian Interference in Machine Learning
by Blesson George
airis4D, Vol.1, No.5, 2023
www.airis4d.com
1.1 Introduction to Bayesian Inference
Statistical inference is the process of using data analysis techniques, statistical methods, and probability
theory to draw conclusions or inferences about a population based on a data sample. In many instances, it is
impractical or impossible to acquire data from an entire population; as a result, only a representative sample
is collected. Statistical inference enables us to draw conclusions about the larger population based on this
sample data. Often, inference permits the derivation of the properties of an underlying probability distribution.
Numerous disciplines, including medicine, business, economics, social sciences, etc., utilise statistical inference
frequently. The two main categories of statistical inference are estimation and hypothesis testing. In estimation,
sample statistics are used to estimate population parameters, and in hypothesis testing, sample data are used to
determine the probability that a hypothesis is true.
Bayesian inference is a method of statistical inference in which Bayes theorem is applied to revise the
probability of a hypothesis as additional evidence or data becomes available. It is based on Bayes theorem,
which describes the relationship between the prior probability of a hypothesis and the likelihood of the observed
data given the hypothesis.
Thomas Bayes, an English statistician of the 18th century, formulated the Bayes theorem, which is given
in its most popular form as
P (A|B) =
P (B|A)P (A)
P (B)
where
P (A|B) is the conditional probability of event A occurring given that B is true. It is defined as the
’Posterior’ probability. The term ”posterior” is used to distinguish the updated probability from the
prior probability, which is the probability assigned to the parameter before observing any new data. The
posterior probability is a fundamental concept in Bayesian inference, which is a statistical framework that
allows us to update our beliefs or knowledge about a phenomenon based on new evidence or data.
P (B|A) is the conditional probability of event B occurring given that A is true. It is known as the
likelihood, which is the probability of observing a certain set of data given a specific parameter value. It
represents the degree to which the data support or favour the hypothesis and measures the compatibility
between the observed data and the hypothesis.
P (A) and P (B) are the probabilities of A and B occurring independently of one another (the marginal
probability). P (A) is known as the prior. The prior is the initial or prior probability assigned to a
1.2 Bayesian Inference in Machine Learning
parameter before observing any new data. It represents our prior belief or degree of uncertainty about the
parameter based on our background knowledge, assumptions, or previous studies.
Bayes theorem for hypothesis testing is typically expressed as follows:
P (H|E) =
P (E|H)P (H)
P (E)
where P (H|E) gives the probability that a hypothesis is true given the evidence, P (H) is the probability for
hypothesis being true, P (E|H ) defines the probability of seeing the evidence if the hypothesis is true and P (E)
gives the probability of observing the evidence.
1.1.1 Bayesian inference and frequentist inference
Bayesian inference and frequentist inference are two distinct methods of statistical inference, the process
of deriving conclusions from data.
Bayesian inference, which is based on Bayes theorem, entails revising prior beliefs about a hypothesis
or parameter based on new data to calculate a posterior probability. The posterior distribution represents
an updated hypothesis or parameter belief. Bayesian inference permits us to incorporate prior knowledge,
uncertainty, and hypotheses into our analysis, and to make probabilistic predictions and decisions based on the
available evidence.
On the other hand, frequentist inference relies on the frequentist interpretation of probability, which defines
probability as the long-run relative frequency of an event. Frequentist inference entails estimating the probability
of an event or parameter based on observed data, presuming that the observed data are a random sample from a
larger population. Frequentist inference does not rely on prior distributions or hypotheses and typically involves
evaluating hypotheses and calculating confidence intervals.
The treatment of prior knowledge and uncertainty is a crucial distinction between Bayesian and frequentist
inferences. In contrast to frequentist inference, Bayesian inference explicitly incorporates prior knowledge and
uncertainty into the analysis. This can result in differing conclusions or decisions based on the same data,
depending on the analysts preconceived notions and assumptions.
1.2 Bayesian Inference in Machine Learning
Bayesian inference can be utilised for a variety of tasks in machine learning, including parameter estimation,
model selection, regression, classification, and clustering.
Bayesian inference can also be used for model selection and comparison, where different models or
hypotheses are compared based on their posterior probabilities. This allows for a principled way of selecting
the best model or hypothesis given the available data and prior knowledge.
One advantage of Bayesian inference in machine learning is its ability to incorporate prior knowledge or
assumptions into the analysis, which can improve the accuracy and robustness of the results. Bayesian inference
can also provide a natural way of handling uncertainty and making probabilistic predictions or decisions.
1.2.1 Some Bayesian inference models for machine learning
1.2.1.1 Naive Bayes
Naive Bayes is a simple but effective probabilistic algorithm used in machine learning for classification
tasks. It is based on Bayes theorem, which states that the probability of a hypothesis (or class) given some
observed data can be computed using the prior probability of the hypothesis and the likelihood of the data
3
1.3 Advantages and Limitations of Bayesian Inference in Machine Learning
given the hypothesis. Naive Bayes assumes that the features (or variables) are conditionally independent given
the class, which means that the presence or absence of one feature does not affect the likelihood of another
feature. Despite this strong independence assumption, Naive Bayes can perform well in practice and is often
used as a baseline model for text classification, spam filtering, and other applications where the feature space is
high-dimensional and sparse.
1.2.1.2 Linear Discriminant Analysis
Linear Discriminant Analysis (LDA) is a classic technique used in supervised learning for dimensionality
reduction and classification tasks. Bayesian inference can be used to estimate the parameters of LDA and make
predictions based on new data.
In LDA, the goal is to find a linear combination of the input features that maximally separates the
classes, while also minimizing the within-class variance. The resulting linear discriminants can be used for
dimensionality reduction and classification of new data. Bayesian inference can be used to estimate the mean
and covariance matrix of the input features for each class, as well as the prior probabilities of each class. This
allows for a principled way of handling uncertainty and incorporating prior knowledge into the analysis.
1.2.1.3 Linear regression
In Bayesian linear regression, for instance, the objective is to estimate the coefficients of a linear model
using training data and a prior distribution on the coefficients. From a Bayesian perspective, linear regression
is formulated using probability distributions rather than point estimates. The response is assumed to be drawn
from a probability distribution rather than a distinct value.
The objective of Bayesian Linear Regression is not to determine the singular ”best” value of the model
parameters, but rather the posterior distribution of the model parameters. Not only is the response derived from
a probability distribution, but it is also presumed that the model parameters are drawn from a distribution. The
posterior probability of the model parameters is dependent on the inputs and outputs of training. Using Bayes’
theorem, the posterior distribution of the coefficients can be computed by multiplying the prior distribution
by the likelihood of the data. The posterior distribution can then be utilised to make predictions or quantify
uncertainty.
1.2.1.4 Markov Chain Monte Carlo (MCMC) methods
Markov Chain Monte Carlo (MCMC) methods are a class of algorithms used in Bayesian inference to
simulate and approximate the posterior distribution of the model parameters. MCMC methods use a Markov
chain to generate a sequence of samples from the posterior distribution, where each sample is dependent on the
previous sample and is drawn from a proposal distribution. Bayesian inference in MCMC methods involves
using the simulated samples to estimate the posterior distribution of the parameters of interest.
Bayesian inference in MCMC methods can be used for a variety of applications, including parameter
estimation, model selection, and uncertainty quantification. MCMC methods are particularly useful when the
posterior distribution is complex or high-dimensional and cannot be computed analytically. MCMC methods
can also handle non-linear and non-Gaussian models, as well as incorporate prior knowledge or constraints on
the model parameters.
4
1.3 Advantages and Limitations of Bayesian Inference in Machine Learning
1.3 Advantages and Limitations of Bayesian Inference in Machine Learning
The advantages of Bayesian inference methodologies are: Bayesian Inference offers a framework for
incorporating prior information and assumptions into a model. This can enhance the model’s accuracy and
reduce the danger of overfitting.In addition, Bayesian Inference provides a method for quantifying uncertainty
in model predictions, which is useful for risk assessment and decision making. Bayesian Inference is also a
flexible machine learning tool because it can be applied to a wide variety of models and data types.
Computational Complexity is one of the limitations of Bayesian Inference in Machine Learning. Bayesian
Inference can be computationally costly, particularly for complex models and large data sets. Prior Selection -
The choice of prior distribution can have a substantial impact on the results of Bayesian Inference, and selecting
an appropriate prior can be difficult.
References
Bayes Theorem in Machine learning
Introduction to Bayesian Logistic Regression
Bayesian Inference: The Best 5 Models and 10 Best Practices for Machine Learning
Bayes theorem
Bayesian Analysis: Advantages and Disadvantages
About the Author
Blesson George is currently working as Assistant Professor of Physics at CMS College Kottayam,
Kerala. His research interests include developing machine learning algorithms and application of machine
learning techniques in protein studies.
5
Unpacking NLP: Understanding the Roles of
NLU and NLG
by Jinsu Ann Mathew
airis4D, Vol.1, No.5, 2023
www.airis4d.com
Imagine you are talking to a machine that understands you perfectly, like a trusted friend or family
member. You can tell it what you need, and it responds in a way that makes sense, almost like it can read your
mind. This may sound like science fiction, but its actually becoming a reality, thanks to advances in natural
language technologies like NLP, NLU, and NLG. It is important to note that these acronyms should not be used
interchangeably, as NLP(Natural Language Processing), NLG(Natural Language Generation), and NLU(Natural
Language Understanding) are related ,yet distinct technologies that serve as the building blocks for a machine’s
language skills. These technologies allow machines to understand, interpret, and generate human language. At
a high level, NLU and NLG are just components of NLP. In this article, we will explore the fascinating world of
natural language technologies and how they are bringing us closer to a future where we can communicate with
machines just like we do with each other.
2.1 How NLU and NLG fit into NLP
NLP, a subfield of artificial intelligence, involves the interaction between computers and human languages.
Within NLP, NLU is responsible for enabling machines to comprehend and interpret human language. NLU
achieves this by breaking down words and sentences into individual components, considering context, and
identifying the meaning behind the words used. NLG, on the other hand, utilizes the information gathered from
NLU to generate a natural-sounding and situation-appropriate response. NLG can even incorporate idioms and
colloquialisms, giving the conversation a more human-like feel. Working together, these components of NLP
allow machines to understand and generate human language(Figure 1)
2.2 Natural Language Understanding
Natural Language Understanding (NLU) refers to a computer’s ability to comprehend and interpret human
language. Before a computer can process unstructured text into a machine-readable format, it needs to understand
the peculiarities of natural language. These complexities include nuances in grammar, syntax, and semantics
that are often challenging for machines to decipher. NLU provides algorithms and models that help computers
analyze and comprehend the meaning of natural language. Without NLU, machines would struggle to process
human language, limiting their ability to extract insights and value from this vast source of information.
2.2 Natural Language Understanding
Figure 1: Relation between NLP, NLU and NLG
(image courtesy:https://grammarist.com/heteronyms/tear-vs-tear/)
Figure 2: Tear vs Tear
For example, consider the following sentence: I saw a tear in her eye as she tried to tear the paper.
In this sentence, the word ”tear” has two different meanings. The first ”tear” refers to a drop of water coming
out of someones eye when they cry, while the second ”tear” means to rip or cut something apart (Figure 2).
The context of the sentence makes it clear which meaning is intended: the tear in her eye indicates that the
first meaning of ”tear” is being used, while the fact that she is trying to tear the paper suggests that the second
meaning of ”tear” is intended.
Here, NLU algorithms would analyze the surrounding words and phrases to determine whether the word
”tear” is being used as a verb or a noun. By considering the context in which the word is used, NLU technology
could accurately identify the intended meaning of the word ”tear” in each instance, allowing machines to
understand the subtle nuances of language and interpret the sentence correctly.
2.2.1 Intent recognition and Entity recognition
Two fundamental concepts of NLU are intent and entity recognition.
Intent recognition is a crucial component of NLU (Natural Language Understanding) that involves iden-
tifying the users underlying purpose or objective in the input text. This process not only involves identifying
the words and phrases used in the input text, but also interpreting the context and tone to determine the user’s
sentiment. As such, intent recognition is considered the first and most important step in NLU, as it sets the
foundation for the subsequent processing and analysis of the text.
Entity recognition involves identifying the specific pieces of information or entities in a message and
extracting the most important information about them. Entities can be thought of as parameters or variables that
help to define the context of the users input. There are two main types of entities that can be identified through
entity recognition: named entities and numeric entities. Named entities refer to specific entities that have a
7
2.3 Natural Language Generation
(image courtesy:https://www.artefact.com/blog/nlu-benchmark-for-intent-detection-and-named-entity-recognition-in-call-center-conversations/)
Figure 3: llustration of a customer intent and several entities that are extracted from conversation
proper name, such as people, companies, locations, and organizations. Numeric entities, on the other hand, are
entities that are recognized as numbers, currencies, and percentages. By identifying and extracting the entities
in a user’s input, NLU systems can gain a deeper understanding of the user’s needs and provide more relevant
and accurate responses.
To gain a clear understanding of the concepts of intent and entity, let us consider the following sentence.
”Hi, I’m calling to know if the small black bag is available in NYC?” . The intent of the sentence is to
inquire about the availability of a specific item - the small black bag - in a particular location, which is NYC.
The entity in this sentence is the specific item being inquired about, which is the ”small black bag.” This is a
named entity that belongs to the category of products or items. Another entity in the sentence is the location,
which is ”NYC” or New York City. This is also a named entity that belongs to the category of locations(Figure
3)
NLU is important for a wide range of applications, including chatbots, virtual assistants, and customer
service systems. By accurately understanding the intent and entities of a user’s input, these systems can provide
more effective and personalized responses.
2.3 Natural Language Generation
Natural language generation (NLG) is the process of generating text that appears to be written by a human,
without the need for a human to actually write it. This means that NLG systems can generate text that sounds like
it was written by a person, rather than a machine. The purpose of NLG is to help computers communicate more
effectively with humans, by making the text they produce more understandable and relatable. Sophisticated
NLG software is capable of analyzing vast amounts of structured or unstructured data and extracting meaningful
insights. By using advanced algorithms and machine learning models, NLG systems can identify patterns,
correlations, and relationships within the data that might not be immediately obvious to human analysts.
Once the data has been analyzed, NLG software can generate human-like language output that summarizes
the key findings and insights. This text can be customized to suit the needs of the user, and can be presented in
a variety of formats, such as reports, dashboards, or visualizations.
8
2.3 Natural Language Generation
(image courtesy:https://research.aimultiple.com/nlg/)
Figure 4: Stages of NLG
2.3.1 Stages of NLG
NLG involves a series of stages, with each step refining the data to generate natural-sounding language.
The process consists of six stages(Figure 4), which are:
1.Content Analysis: In this stage, the input data is analyzed to determine what should be included in the
final content. For example, an NLG system analyzing a weather forecast might identify the key information,
such as temperature, wind speed, and precipitation, and determine how they are related.
2.Data Understanding: The NLG system then interprets the data and identifies patterns and relationships
using machine learning algorithms, such as recognizing that high humidity levels are often associated with
increased chances of precipitation.
3.Document Structuring: Once the data has been analyzed and understood, the NLG system creates a
plan for the document and chooses a narrative structure based on the type of data being interpreted. For example,
it might decide on a headline, an opening paragraph, and a series of subheadings to structure the forecast.
4.Sentence Aggregation: In this stage, relevant sentences or parts of sentences are combined in ways
that accurately summarize the topic, such as generating a sentence like ”Expect scattered thunderstorms in the
afternoon with temperatures reaching the mid-80s.”
5.Grammatical Structuring: The NLG system applies grammatical rules to generate natural-sounding
text, such as determining the syntactical structure of a sentence and rewriting it in a grammatically correct
manner like ”There is a 40% chance of rain tomorrow.”
6.Language Presentation: The final output is generated based on a template or format the user or
programmer has selected. For example, the NLG system might output the information in a visually appealing
and easy-to-read format, such as a table or chart, with icons or symbols representing different weather conditions.
In summary, NLP, NLU, and NLG are technologies that are rapidly evolving and gaining importance in
various industries. NLP involves the analysis and manipulation of natural language data, NLU focuses on
machine comprehension and interpretation of human language, and NLG is concerned with the creation of
human-like language by machines. These technologies have the potential to revolutionize the way humans and
machines interact and communicate, leading to greater efficiency, accuracy, and automation in various fields
such as healthcare, finance, and customer service.
9
2.3 Natural Language Generation
References
NLP, NLU, and NLG: Whats The Difference? A Comprehensive Guide, Nahla Davies, KDnuggets, June
2022
What Is Natural Language Understanding (NLU)?, Rachel Wolff, MonkeyLearn, January 2021
NLP vs. NLU vs. NLG: the differences between three natural language processing concepts, Eda
Kavlakoglu, IBM Blog, November 2020
Natural Language Understanding (NLU), TechTarget
Understanding NLU benchmarks for intent detection and named-entity recognition in call centre conver-
sations, Artefact Data Digest, November 2020
Natural Language Generation (NLG) in 2023, Cem Dilmegan,AI Multiple, January 2023
Natural Language Generation (NLG),Ivy Wigmore, TechTarget
What is Natural Language Generation?, AI, Data & Analytics Network, July 2022
About the Author
Jinsu Ann Mathew is a research scholar in Natural Language Processing and Chemical Informatics.
Her interests include applying basic scientific research on computational linguistics, practical applications of
human language technology, and interdisciplinary work in computational physics.
10
Part II
Astronomy and Astrophysics
Exploring the Explosive Side of the Sun
by Linn Abraham
airis4D, Vol.1, No.5, 2023
www.airis4d.com
1.1 The Sun and Life on Earth
Energy is a fundamental requirement for life. The most common characteristics of life, namely, organi-
zation, metabolism, response to stimuli, reproduction, adaptation and evolution, all require energy. All almost
life forms on Earth depend on the Sun for their energy (There are organisms that can derive their energy by
oxidizing inorganic compounds such as hydrogen sulfide, ammonia or methane. They are mostly found in
environments without sunlight such as deep-sea hydrothermal vents, hot springs, and caves). It is no wonder
then that almost of all the early civilizations and cultures worshiped the Sun as a deity. The warmth and light
of the Sun was a welcome relief from the cold and dark nights that made him afraid. The Suns regular rising
and setting provided a sense of order and predictability to the natural world, which was often seen as chaotic
and unpredictable. The brightness and intensity of the Sun gave it a divine glow that we have come to associate
with many other deities. Without any apparent source of energy the Sun was a mystery to the ancients and this
made it worthy to be worshiped by countless generations of people.
1.2 Historical Record of Solar Explosions
Today, worshiping the Sun might sound to be a ridiculous thing to do. The scientific revolution brought
about a paradigm shift in our perception of the Sun. The Italian astronomer Galileo Galilei made the first
recorded observations of Sunspots. Galileos observations showed that the Sun had dark spots on its surface that
appeared to change in size and shape over time. These observations were groundbreaking, as they challenged
the prevailing view of the Sun as a perfect, unblemished celestial body, and paved the way for further research
into the Suns physical properties and behavior. Samuel Heinrich Schwabe discovered the sunspot cycle in
1844. This is a periodic variation in the number of sunspots observed. The average period is seen to be close
to 11 years with the actual period between 7 and 17 years. The fact that there can be sudden explosions on
the surface of the sun was discovered on 1 September 1859 by Richard Carrington, an English astronomer.
Carrington was studying a sunspot with his telescope. As is usually the case with astronomers who study the
sun, he projected an image of the sun onto a screen and observed it. He was probably surprised to find that ‘two
patches of intensely bright and white light broke out in the middle of the sunspot.
1.3 Do We Need to Predict These Outbursts?
Figure 1: Sun Blasts a M6.6 Flare On Feb. 13th at 1738 UT, sunspot 1158 unleashed the strongest solar flare
of the year so far, an M6.6-category blast. Image Credit: NASA/ SDO/ AIA
Figure 2: The Earth is dwarfed by a giant explosion on the Sun that ejects huge amounts of hot gas into space.
Here the sizes of the Earth and the Sun are shown to scale, but not the distance between them. Image Credit:
NASA
13
1.3 Do We Need to Predict These Outbursts?
1.3 Do We Need to Predict These Outbursts?
About 18 hours after he saw the flare, the magnetic observing station in Kew reported a sudden violent
change in the earths magnetic field, known as a geomagnetic storm. The storm was so strong that it disrupted
telegraph communications around the world and caused auroras to be visible at lower latitudes than usual. One
of the earliest known geomagnetic storms was the ”Bastille Day Event” of July 14, 1770. This storm was
observed by French astronomer Jean-Andre Deluc, who noted that a compass needle in his possession was
behaving erratically. Other observers in Europe and North America also reported unusual auroras and magnetic
disturbances around the same time. Carrington made the first indirect suggestion made by anybody that an
explosion in the sun could cause abnormal things to happen on the Earth. He made this observation at the
meeting of the Royal Astronomical Society about two months after the discovery. Several other solar storms
have been observed ever since. The greatest solar storm of the Space Age was observed in 1989 which led to a
blackout in the Canadian province of Quebec due to the outage of their power grid. Interestingly, the first flare
recorded by any human being (dubbed The Carrington Event”) is also the strongest flare ever to have been
observed. The evidence for this was unearthed by researchers who found measurements of the storm as far away
from the geomagnetic poles as India. This flare must have been at least three times as strong as the 1989 flare
that caused havoc in Quebec.
1.4 How the Modern World is Becoming More Vulnerable to Solar Flares?
The fact that the Carrington event was not recorded as a catastrophic event in history whereas the Quebec
solar flare made major headlines shows how the world has become increasingly dependent on technology. Not
just any technology but technology that is vulnerable to solar flares. The connection between solar flares and
modern technology can be understood with Michael Faraday’s observation of magnetic fields inducing electric
currents. Our current understanding of the sun shows that it must be a ball of hot burning gas. But the gas is
mostly in the inner layers, whereas the outer layers exists as plasma due to the very high temperatures. In 1958
Eugene Parker proposed the theory of the Solar Wind where he found that the high temperature of the corona
should cause a continuous wind of gas and plasma to blow through the entire solar system. Solar flares cause a
sudden increase in the influx of these particles. The earths magnetic field mitigate the effects of sudden solar
disturbances to some extent. However this protective shield is weakest near the geomagnetic poles because
of a lower density of magnetic field lines. The polar auroras are a harmless consequence of this fact. One
geomagnetic pole is located in the Canadian Arctic. This explains why the major blackout occurred in the
Canadian province of Quebec. The interaction of the electrical charged particles with the Earths magnetic field
is capable of causing induced currents in our electrical grids and subsequently cause them to fail. There are
several other ways in which solar flares affect our lives. Radio communications are an essential part of modern
technologies. The ionosphere is what makes radio communications to distant places on the Earth possible
which would otherwise be impossible due to the Earths curvature. Solar flares disturb the ionosphere and thus
cause disturbances in radio communication. The north geomagnetic pole is a frequent route for passengers
between Europe and North America. After a major solar flare, airline companies would have to reroute flights
else the pilots lose radio contact with ground stations. Artificial satellites are another piece of technology that
has become ubiquitous to life on Earth. We rely on them for everything from television broadcasting to daily
navigation (GPS satellites). A major solar flare can roast the electronics on board if precautions are not put in
place. All this points to an increased need to understand the dynamics of our Sun. Knowing more about the
solar activity and sunspots are critical. Advance warning of the solar flare predictions are important for disaster
14
1.5 References
Figure 3: Comet NEOWISE is visible in an aurora-filled sky in this photo taken on July 14, 2020, by
Aurorasaurus Ambassador Donna Lach. Image Credit: NASA
mitigation.
1.5 References
1. Choudhuri, Arnab Rai, Nature’s Third Cycle: A Story of Sunspots, Oxford, New York, Oxford University
Press, 2015.
2. Karttunen, Hannu, Pekka Kr
¨
oger, Heikki Oja, Markku Poutanen, and Karl Johan Donner, eds. Funda-
mental Astronomy. Berlin, Heidelberg: Springer Berlin Heidelberg, 2017. https://doi.org/10.1007/978-
3-662-53045-0.
3. Did you know that? 12 interesting facts about the Sun.
4. List of solar storms
About the Author
Linn Abraham is a researcher in Physics, specializing in A.I. applications to astronomy. He is
currently involved in the development of CNN based Computer Vision tools for classifications of astronomical
sources from PanSTARRS optical images. He has used data from a several large astronomical surveys including
SDSS, CRTS, ZTF and PanSTARRS for his research.
15
Cataclysmic Variable Stars
by Sindhu G
airis4D, Vol.1, No.5, 2023
www.airis4d.com
2.1 What are Cataclysmic Variable stars
Variable stars have been studied by astronomers for many years, and they provide important information
about the nature and behavior of stars, as well as their evolution and physical properties. We already know that
a variable star is a star whose brightness, as observed from Earth, changes over time. This article explains what
cataclysmic variable stars(Figure 1) are. A cataclysmic variable star, often abbreviated as CV star, is a type of
variable star that undergoes sudden and dramatic changes in their properties, leading to significant increases in
brightness and energy output. Cataclysmic variables are one type of intrinsic variable stars. Intrinsic variables
are types of variable stars where the variability is caused by changes in the physical properties of the stars
themselves.
Figure 1: Cataclysmic Variable Stars (Image Courtesy: MIT)
Cataclysmic variable (CV) stars are binary systems that typically consist of a normal star (often referred to
as the secondary or donor star) and a white dwarf (the primary star). The primary star in a cataclysmic variable
is typically a white dwarf, which is the remnant of a star that has exhausted its nuclear fuel and has collapsed
to a small, dense state. The companion star can be a main-sequence star, a subgiant, or even another white
dwarf. In cataclysmic variable stars, the gravitational pull of the white dwarf is so strong that it distorts the
shape of the companion star. This distortion can cause the companion star to transfer material onto the white
2.2 Discovery
dwarf through accretion, where material from the companion star forms an accretion disk around the white
dwarf. The accretion disk is a swirling disk of gas and dust that forms around the white dwarf as material
from the companion star accumulates and accretes onto the white dwarf’s surface. The study of mass transfer
and accretion in cataclysmic variables provides valuable insights into the physics of binary star systems and the
evolution of compact objects such as white dwarfs.
Figure 2: Cataclysmic Variable Stars (Image Courtesy: NASA)
In Figure 2, the companion star is shown on the left, with its mass transferring material onto the accretion
disk around the white dwarf (shown on the right). The hot spot is the region where the material from the
companion star impacts the accretion disk, creating a bright spot in the system. The white dwarf is shown at
the center. The interactions between the companion star, the accretion disk, and the white dwarf in cataclysmic
variable stars are complex and dynamic, leading to the observed variability and outbursts in their brightness.
Due to the high density of the white dwarf, the gravitational potential energy released during the accretion
process is enormous. Some of this energy is converted into X-rays, during the accretion process. This X-ray
emission is one of the ways cataclysmic variable stars can be detected and studied, as it provides important
information about the accretion process, the properties of the accretion disk, and the physical conditions in the
vicinity of the white dwarf. Cataclysmic variable stars are generally faint in X-rays. As a result, only cataclysmic
variables that are located relatively close to our Sun, within a few hundred light-years, have been studied in
X-rays so far, while there may be millions of cataclysmic variables in our galaxy. Studying cataclysmic variables
in X-rays provides valuable insights into the high-energy processes and dynamics occurring in these systems,
shedding light on their complex behavior and evolution.
Cataclysmic variable stars are typically small in size, with the entire binary system being comparable in
size to the Earth-Moon system. The orbital periods of cataclysmic variables are usually short, ranging from 1 to
10 hours. This means that the two stars in the binary system are relatively close to each other and orbit around
their common center of mass at a rapid pace.Cataclysmic variable stars are fascinating objects that continue to
be actively researched by astronomers to better understand their properties and evolution.
2.2 Discovery
Cataclysmic variables are indeed a class of astronomical objects that are commonly observed by amateur
astronomers. Cataclysmic variables are known for their irregular and sudden changes in brightness, particularly
17
2.3 Some types of Cataclysmic Variables
during outbursts, which can make them easily detectable even with modest amateur telescopes. Cataclysmic
variables often have short orbital periods, which means that their brightness can change significantly within
hours or days, providing an exciting and dynamic observing experience for amateur astronomers. Cataclysmic
variables are often easily detectable by amateur astronomers due to their distinctive characteristics. They are
typically blue in color, exhibit rapid and strong variability in their brightness, and often have peculiar emission
lines in their spectra. These features, along with their ultraviolet and X-ray emissions, make them relatively
straightforward to identify compared to other celestial objects.
In terms of occurrence, around six galactic novae, which are a type of cataclysmic variable, are discovered
in our own galaxy each year. However, observations in other galaxies suggest that the actual occurrence rate
could be higher, between 20 and 50 per year. The discrepancy between the observed and predicted occurrence
rates could be due to factors such as interstellar dust obscuration, lack of observers in certain regions of the sky
(such as the southern hemisphere), and challenges in observing during daylight or full moon periods.
Amateur astronomers play a valuable role in monitoring cataclysmic variables and contributing to our
understanding of these objects. Their observations, when combined with data from professional observatories,
can help uncover important information about the behavior, properties, and evolution of cataclysmic variables,
as well as shed light on their occurrence rates and other astrophysical processes associated with these intriguing
celestial objects.
2.3 Some types of Cataclysmic Variables
2.3.1 Supernovae
A supernova (Figure 3 and Figure 4) is a cataclysmic stage in the life of a star, characterized by a sudden
and dramatic increase in brightness. Supernovae are one of the most energetic and dramatic events in the
universe. The increase in brightness can be significant, with a typical supernova becoming many magnitudes
brighter, often reaching an absolute magnitude of around -15 or even brighter. There are several ways in which
a star can undergo a supernova, including the core collapse of a massive star or the runaway nuclear fusion
in a white dwarf. In the case of a massive star, when it has exhausted the nuclear fuel in its core, the core
collapses under gravity, resulting in an intense release of energy in the form of a supernova explosion. The
outer layers of the star are expelled into space at high velocities, and the core may either collapse to form a
dense neutron star or, if the core is massive enough, it may collapse further to form a black hole. In the case
of a white dwarf, which is the remnant core of a low to medium mass star that has exhausted its nuclear fuel,
a supernova can occur if it accretes enough mass from a companion star to trigger runaway nuclear fusion.
This can result in a thermonuclear explosion that completely destroys the white dwarf, releasing a tremendous
amount of energy in the process. The peak luminosity of a supernova can be incredibly bright, comparable to
the brightness of an entire galaxy, and it can outshine its host galaxy for a short period of time. However, the
brightness of a supernova fades over weeks or months as the ejected material expands and cools. Supernovae
are important astronomical events that provide insights into the processes of stellar evolution, nucleosynthesis,
and the dynamics of the universe. The remnants of supernovae, such as neutron stars and black holes, can also
play significant roles in the evolution of galaxies and the universe at large.
18
2.3 Some types of Cataclysmic Variables
Figure 3: Oldest Recorded Supernova (Image Courtesy: X-ray: NASA/CXC/SAO & ESA; Infared: NASA/JPL-
Caltech/B. Williams (NCSU))
Figure 4: Supernova 1987A in the Large Magellanic Cloud. (Image Courtesy: Anglo-Australian Observatory
2.3.2 Luminous red nova
Luminous red novae (Figure 5) are a type of stellar explosion that is caused by the merger of two stars, and
they are distinct from classical novae. Luminous red novae are known for their characteristic red appearance
and slow decline following the initial outburst.
19
2.3 Some types of Cataclysmic Variables
Figure 5: V838 Monocerotis a possible luminous red nova (Image Courtesy: NASA/ ESA)
2.3.3 Novae
A nova(Figure 6) occurs in a close binary system. Novae are known for their rapid and often unpredictable
increase in brightness. The typical range of brightness increase, or ”nova outburst, is generally between 7 and
16 magnitudes. The behavior of novae, including the rapid rise in brightness followed by a steady decline back
to the pre-nova magnitude over a period of weeks to months, suggests that the event causing the nova does not
typically result in the destruction of the original star. This is consistent with the widely accepted model for
novae, which involves an accreting white dwarf in a close binary system. In this model, the white dwarf accretes
material from its companion star over a long period of time, ranging from thousands to hundreds of thousands of
years, until there is sufficient material to trigger a thermonuclear explosion that then blasts the shell of material
off into space.
Figure 6: Nova KT Eridani(Image Courtesy: Kevin Heider)
2.3.4 Recurrent Novae
Recurrent novae (Figure 7) are similar to novae in that they exhibit outbursts with a change in magnitude
(brightness) typically ranging from 7 to 16 magnitudes, and the period of outburst can be up to about 200 days.
Recurrent novae are characterized by multiple outbursts over recorded observations.
20
2.3 Some types of Cataclysmic Variables
Figure 7: Recurrent Novae, T Pyxidis Light Echoes(Image Courtesy: NASA/ESA)
2.3.5 Dwarf novae
Figure 8: Dwarf Novae, Z Camelopardalis(Image Courtesy: NASA/JPL-Caltech/T. Pyle(SSC)/R. Hurt(SSC))
Dwarf novae (Figure 8) are a type of variable star that exhibit sudden and significant increases in brightness
by 2 to 5 magnitudes over a few days, with intervals of weeks or months between outbursts. They are intrinsically
faint stars and are characterized by a binary system with a white dwarf as one of the component stars. There are
three subtypes of dwarf nova that have been identified: U Geminorum, Z Camelopardalis, and SU Ursae Majoris
stars. The most widely accepted model to explain the outbursts of dwarf novae is the disk instability model.
According to this model, the accretion disk around the white dwarf undergoes thermal instabilities, leading
21
2.3 Some types of Cataclysmic Variables
to sudden increases in brightness during outbursts. Unlike classical novae, there is no significant ejection of
material in dwarf novae events.
References
Understanding Variable Stars, John R Percy, Cambridge University Press.
GCVS variability types, Samus N.N., Kazarovets E.V., Durlevich O.V., Kireeva N.N., Pastukhova
E.N.,General Catalogue of Variable Stars: Version GCVS 5.1,Astronomy Reports, 2017, vol. 61, No. 1,
pp. 80-88 2017ARep...61...80S
Cataclysmic variable star
Cataclysmic Variables
Variable star
About the Author
Sindhu G is a research scholar in Physics doing research in Astronomy & Astrophysics. Her research mainly
focuses on classification of variable stars using different machine learning algorithms. She is also doing the
period prediction of different types of variable stars, especially eclipsing binaries and on the study of optical
counterparts of x-ray binaries.
22
Part III
Biosciences
Barcode of Life: A Global Biodiversity
Challenge
by Geetha Paul
airis4D, Vol.1, No.5, 2023
www.airis4d.com
Biodiversity, the variety of life on Earth, is fundamental to the well-being of our planet and all its inhabitants.
However, the rapid pace of human development and the resulting destruction of natural habitats have led to a
dramatic decline in biodiversity over the past few decades. In response, scientists and conservationists around
the world have been working to document and protect the planets species before they disappear forever. One of
the most promising initiatives in this effort is the ‘Barcode of Life’ (BOLD) project.
The Barcode of Life project is a global biodiversity challenge that aims to create a curated, comprehensive,
digital library of DNA barcodes for every species on Earth. DNA barcoding involves analysing a specific region
of an organism’s DNA to identify it with a unique code, similar to a product barcode. This code can then be
used to quickly and accurately identify the species, even when only a small fragment or specimen is available.
The BOLD project was launched in 2003 by Paul Hebert, a biologist at the University of Guelph in Canada.
Since then, it has grown to include hundreds of institutions and thousands of researchers from around the world.
The project has already generated over 10 million DNA barcode records, representing more than 260,000
species.
The potential applications of the Barcode of Life project are vast. DNA barcoding can be used to identify
species in a wide range of fields, including ecology, agriculture, medicine and also in food safety & quality. For
example, it can be used to identify the species of fish being sold in a market, can determine which insects are
responsible for crop damage, or to diagnose infectious diseases. In addition, the Barcode of Life project can
help in conservation efforts by enabling researchers to identify and monitor threatened and endangered species
more efficiently.
The BOLD project also serves as a platform for collaboration among scientists, conservationists, and
policymakers from around the world. By bringing together experts from different fields, the project fosters
interdisciplinary research and helps to bridge gaps in knowledge and resources.
However, there are also challenges associated with the Barcode of Life project. One of the main challenges
is the sheer scale of the project. With an estimated 8.7 million species on Earth, creating a complete DNA
barcode library is a monumental task. In addition, there are technical and logistical challenges associated with
collecting and analysing DNA samples from such a diverse range of organisms. Despite these challenges,
the Barcode of Life project has already made significant progress in documenting and preserving the world’s
biodiversity. As the project continues to grow, it has the potential to revolutionise the way we understand and
protect life on Earth. The Barcode of Life follows a standardised procedure for generating DNA barcodes, which
includes the following steps:
Figure 1: Diagram showing the steps of DNA barcoding coupled with next-
generation sequencing (NGS) for determination of biodiversity composition in
complex environmental samples.Image courtesy:https://www.researchgate.net/figure/
a-Diagram-showing-the-steps-of-DNA-barcoding-coupled-with-next-generation-sequening-NGS
25
Figure 2: Schematic drawing of the steps involved in the creation of reference libraries of DNA barcodes.
Note the links with involved institutions such as natural history museums, and related initiatives like the
Encyclopedia of Life. (With permission from the Consortium for the Barcode of Life) Image courtesy:https:
//www.mdpi.com/1424-2818/13/7/313
The various steps involved in the barcoding pipeline are as follows:
1. Specimen collection and identification: Collecting a specimen from the field and identifying it to the
species level is the first step in generating a DNA barcode. Specimens are identified using traditional
taxonomic methods such as morphology studies.
2. DNA extraction: DNA is extracted from the collected specimen using standard laboratory techniques. The
DNA extraction protocol depends on the nature of the specimen, the type of tissue, and the preservation
method used.
3. PCR amplification of barcode region: The barcode region is amplified using the polymerase chain reaction
(PCR) with universal primers designed to target the barcode region of interest. The barcode region is
typically a short fragment of a standard mitochondrial gene, such as cytochrome c oxidase I (COI) in
animals.
4. Sequencing: The PCR products are sequenced using Sanger sequencing or next-generation sequencing
(NGS) technologies. Sanger sequencing is the traditional method of sequencing and can generate high-
quality, long reads, while NGS provides high-throughput sequencing of short reads. The PCR products
are prepared into a library that can be sequenced using NGS technologies such as Illumina, PacBio,
or Oxford Nanopore. The prepared library contains many different barcode sequences from different
specimens.
5. Data analysis and quality control: The NGS data generated from the raw sequencing data are processed
and analysed using bioinformatics tools to obtain the DNA barcode sequence,to identify the species
present in the sample. The sequence is then compared to a reference library to confirm the species
identification. Quality control measures are applied to ensure the accuracy and reliability of the barcode
sequence.
6. Interpretation of results: Finally, the results are interpreted to understand the biodiversity composition of
the sample. The number of unique sequences identified in the sample can be used to estimate the species
richness, while the abundance of each sequence can be used to estimate the relative abundance of each
species.
7. Data submission and storage: The generated barcode sequence and associated metadata are submitted
26
1.1 NEXT GENERATION SEQUENCING (NGS)
to a centralised database, such as the Barcode of Life Data Systems (BOLD) or GenBank, for long-term
storage and public access.
The Barcode of Life procedure provides a standardised, cost-effective, and efficient method for generating
DNA barcodes for all eukaryotic species. It has many applications in various fields, including biodiversity
conservation, ecology, forensics, medicine and biosecurity.
Overall, DNA barcoding coupled with Next Generation Sequencing (NGS) provides a powerful tool for
studying biodiversity composition in complex environmental samples. Billions of DNA strands get sequenced
simultaneously using NGS. Whereas with Sanger Sequencing, only one strand is sequenced at a time. While
the Human Genome Project took over 30 years to sequence the human genome for the first time. Now with Next
Generation Sequencing, a whole human genome can be sequenced in just one day. NGS only works because
the Human Genome Project created a human reference DNA sequence. The basic principle behind NGS is that
DNA can be cut into smaller pieces and sequenced. Next Generation Sequencing (NGS) is used to sequence
both DNA and RNA. First, samples get collected, and the DNA or RNA gets purified. Next, the DNA or
RNA gets checked to ensure its purity. RNA first needs to be reverse-transcribed into DNA before it can get
sequenced. A library then gets prepared from the DNA. A library is the collection of short DNA fragments
from a long stretch of DNA. Libraries get made by cutting the DNA into short pieces of a specific size. This
cutting gets done using high-frequency sound waves or enzymes. Then sequences of DNA called adapters get
added to each end of a DNA fragment. These adapters contain the information needed for sequencing. They
also include an index to identify the sample. Finally, any non-bound adapters get removed, and the library is
complete. Depending on the application, there can be a PCR (Polymerised Chain Reaction) step to increase the
library amount.
1.1 NEXT GENERATION SEQUENCING (NGS)
Figure 3: The figure shows the Libraries obtained by cutting the DNA into short pieces of a specific size. Image
courtesy:https://www.clevalab.com/post/a-step-by-step-guide-to-ngs
A successful library will be of the correct size. It will also be of a high enough concentration for sequencing.
Next step is to attach the library to FlowCell.
27
1.1 NEXT GENERATION SEQUENCING (NGS)
Figure 4: The main sequencing instruments used in NGS are from Illumina. These instru-
ments use a method called sequencing by synthesis.Image courtesy:https://www.ibiology.org/techniques/
next-generation-sequencing
Figure 5: Attaching the library to the flow Cell - sequencing by synthesis. Image courtesy:https://www.clevalab.
com/post/a-step-by-step-guide-to-ngs
The sequencing occurs on a glass surface of a flow cell. Short pieces of DNA called oligonucleotides are
bound to the surface of the flow cell. These oligonucleotides match the adapter sequences of the library. First,
the library gets denatured to form single DNA strands. Then this library gets added to the flow cell, which
attaches to one of the two oligos. The strand that attaches to the oligo is the forward strand. Next, the reverse
strand gets made, and the forward strand gets washed away. The library is now bound to the flow cell.
If sequencing started now, the fluorescent signal would be too low for detection. So each unique library
fragment needs to get amplified to form clusters. This Clonal Amplification is by a PCR that happens at a single
temperature. Annealing, extension and melting occur by changing the flow cell solution. First, the strands
bind to the second oligo on the flow cell to form a bridge. The strands get copied. Then these double-stranded
28
1.1 NEXT GENERATION SEQUENCING (NGS)
fragments get denatured. This denaturation gets done by adding another solution to the flow cell. This copying
and denaturing repeats over and over. Localised clusters get made, and finally, the reverse strands get cut. These
strands get washed away, leaving the forward strand ready for sequencing.
Figure 6: Clonal amplification where localised clusters are made. Image courtesy: https://www.clevalab.com/
post/a-step-by-step-guide-to-ngs
Figure 7: Sequencing of the DNA strand and addition of fluorescent G,C,T,A nucleotides. Image courtesy:
https://www.clevalab.com/post/a-step-by-step-guide-to-ngs
The sequencing primer binds to the forward strands. Next, fluorescent nucleotides G, C, T and A get
added to the flow cell along with DNA Polymerase. Each nucleotide has a different colour fluorescent tag
and a terminator. So only one nucleotide can get sequenced at a time. First, the complementary base binds
to the sequence. Then the camera reads and records the colour of each cluster. Next, a new solution flows in
and removes the terminators. The nucleotides and DNA Polymerase flow in again, and another nucleotide gets
sequenced. These read cycles continue for the number of reads set on the sequencer. Once complete, these
read sequences get washed away. Then the first index gets sequenced, then washed away. If only a single read
is needed, the sequencing ends here. But, for paired-end sequencing, the second index is sequenced, as well as
the reverse strand of the library. There is no primer for the second index read. Instead, a bridge gets created so
that the second oligo acts as the primer. The second index is then sequenced. These two index reads use unique
dual indexes. These allow the use of up to 384 samples in the same flow cell. Next, the reverse stands get made,
29
1.1 NEXT GENERATION SEQUENCING (NGS)
and the forward strands are cut and washed away. The reverse stands are then sequenced.
Figure 8: Filtering of bad reads. Image courtesy: https://www.clevalab.com/post/a-step-by-step-guide-to-ngs
Figure 9: Aligning and Making Sense of the Reads. Image courtesy: https://www.clevalab.com/post/
a-step-by-step-guide-to-ngs
Once the sequencing is complete, any bad reads get filtered out. These include the clusters that overlap,
lead or lag with sequencing or are of low intensity. The clusters cant overlap on a patterned flow cell, but there
can be more than one library fragment per nanowell. These polyclonal wells will also get filtered out. Next,
the reads passing the filter get demultiplexed. Demultiplexing uses the attached indexes to identify and sort
reads from each sample. Finally, the reads get mapped to the reference genome. The different reads align to the
reference genome, overlapping each other. Paired-end sequencing creates two sequencing reads from the same
library fragment. During sequence alignment, the algorithm knows that these reads belong together. Longer
stretches of DNA or RNA can get analysed with greater confidence that the alignment is correct.
Next step is the Aligning and Making Sense of the Reads
Read depth is an essential metric in sequencing. Read depth is the number of reads for a nucleotide.
Average read depth is the average depth across the region sequenced. For whole genome sequencing, a 30x
30
1.2 DNA barcoding as a new tool for food traceability
average read depth is good. A 1500x average read depth is suitable for detecting rare mutation events in cancers.
Another essential metric is coverage. The aim is to have no missing areas across the target DNA.
1.2 DNA barcoding as a new tool for food traceability
Food safety and quality are nowadays a major concern. Any case of food alteration, especially when
reported by the media, has a great impact on public opinion. There is an increasing demand for the improvement
of quality controls, hence addressing scientific research towards the development of reliable molecular tools
for food analysis. DNA barcoding is a widely used molecular-based system, which can identify biological
specimens, and is used for the identification of both raw materials and processed food. In this review the results
of several researches are critically analysed, in order to exploit the effectiveness of DNA barcoding in food
traceability, and to delineate some best practices in the application of DNA barcoding throughout the industrial
pipeline. The use of DNA barcoding for food safety and in the identification of commercial fraud is also
discussed.
1.3 Highlights
Food quality: from the field to the table. From molecular-based approaches to DNA barcoding.
DNA barcoding to identify and certify food raw material. DNA barcoding as a traceability tool during food
industrial processing. Food safety and commercial frauds.
1.4 Strengths, Weaknesses, Opportunities, and Threats for DNA barcoding,
resulting from the SWOT analysis.
Figure 10: Major Strengths, Weaknesses, Opportunities, and Threats for DNA barcoding, resulting from the
SWOT analysis. Image courtesy: https://www.mdpi.com/1424-2818/13/7/313
In conclusion, the Barcode of Life project is a global biodiversity challenge with the ambitious goal of
creating a comprehensive, digital library of DNA barcodes for every species on Earth. By providing a rapid
31
1.4 Strengths, Weaknesses, Opportunities, and Threats for DNA barcoding, resulting from the SWOT analysis.
and accurate means of species identification, the project has the potential to transform our understanding of the
planet’s biodiversity and support conservation efforts worldwide. While there are challenges associated with the
project, the Barcode of Life initiative represents an important step forward in our efforts to protect the natural
world.
References
Biological identifications through DNA barcodes; Paul D N Hebert Alina Cywinska, Shelley L Ball,
Jeremy R deWaard,. 2003 Feb 7;270(1512):313-21. doi: 10.1098/rspb.2002.2218. https://pubmed.ncbi.
nlm.nih.gov/12614582/
DNA barcoding as a new tool for food traceability; Andrea Galimberti , Fabrizio De Mattia , Alessia Losa
, Ilaria Bruni, Silvia Federici , Maurizio Casiraghi , Stefan Martellos , Massimo Labra; Food Research
International, Volume 50, Issue 1, January 2013, Pages 55-63 https://doi.org/10.1016/j.foodres.2012.09.
036
https://www.researchgate.net/publication/228231851 The Contribution of the Barcode of Life Initiative
to the Discovery and Monitoring of Biodiversity
Grant, D.M.; Brodnicke, O.B.; Evankow, A.M.; Ferreira, A.O.; Fontes, J.T.; Hansen, A.K.; Jensen, M.R.;
Kalaycı, T.E.; Leeper, A.; Patil, S.K.; Prati, S.; Reunamo, A.; Roberts, A.J.; Shigdel, R.; Tyukosova, V.;
Bendiksby, M.; Blaalid, R.; Costa, F.O.; Hollingsworth, P.M.; Stur, E.; Ekrem, T. The Future of DNA
Barcoding: Reflections from Early Career Researchers. Diversity 2021, 13, 313.https://www.mdpi.com/
1424-2818/13/7/313
About the Author
Geetha Paul is one of the directors of airis4D. She leads the Biosciences Division. Her research
interests extends from Cell & Molecular Biology to Environmental Sciences, Odonatology, and Aquatic Biology.
32
Part IV
Climate
Tropical Cyclones
by Robin Jacob Roy
airis4D, Vol.1, No.5, 2023
www.airis4d.com
A tropical cyclone is a rapidly rotating powerful weather system characterized by a low-pressure center, a
closed low-level atmospheric circulation, strong winds, and a spiral arrangement of thunderstorms that produce
heavy rain and squalls. Tropical cyclones pose a major threat to both human life and property, even during
their early stages of development. These weather systems are composed of various hazards that can each cause
significant impacts on their own, including storm surges, flooding, powerful winds, tornadoes, and lightning.
When these hazards combine and interact with one another, the potential for loss of life and damage to property
increases substantially. As a result, it is important to understand and prepare for the dangers posed by tropical
cyclones. Figure 1 shows the image of Hurricane Ian, a powerful Category 5 Atlantic hurricane which strike
the state of Florida since the 1935 Labor Day hurricane.
1.1 Formation
Tropical cyclones are powered by the heat and moisture from the warm ocean surface, which causes the air
to rise rapidly, creating an area of low pressure. As the warm air rises, it cools and condenses, forming clouds
and releasing heat, which fuels the storm’s growth. The rotation of the Earth then causes the storm to spin, with
winds picking up speed as it intensifies. Once a tropical cyclone reaches sustained winds of at least 74 mph
(119 km/h), it is classified as a hurricane or typhoon, depending on the region where it forms. These storms can
cause significant damage to coastal communities and are closely monitored by meteorologists around the world.
Before a tropical cyclone can develop, various favorable environmental factors need to be present. These
include:
Warm ocean waters (at least 27°C) throughout a depth of about 46 m.
An atmosphere which cools fast enough with height such that it is potentially unstable to moist convection.
Relatively moist air near the mid-level of the troposphere (4,900 m).
Generally, a minimum distance of at least 480 km from the equator.
A pre-existing near-surface disturbance.
Vertical wind shear, which refers to the difference in wind speed at various altitudes, should be minimal
(less than approximately 37 km/h) from the surface to the upper troposphere for a tropical cyclone to
form.
1.2 Structure
Figure 1: Hurricane Ian pictured from the International Space Station as it orbited 415 kms above the Caribbean
Sea east of Belize on September 26, 2022. Source: NASA.
Figure 2: Cross-section of a tropical cyclone in the Northern Hemisphere Source: Wikimedia Commons.
1.2 Structure
Tropical cyclones are complex, multi-level, and multi-dimensional systems characterized by strong winds,
heavy rainfall, and low atmospheric pressure. The structure of a tropical cyclone can be divided into several
parts, each with its own characteristics and functions.
1. Eye: The center of a tropical cyclone is called the ”eye.” It is an area of low pressure, where the winds
are light, and the skies are clear or partly cloudy. The size of the eye varies from a few kilometers to more
than 50 kilometers in diameter in the most intense cyclones. The pressure in the eye is typically around
10 percent lower than the pressure outside the cyclone.
2. Eyewall: The eyewall is the area surrounding the eye, where the strongest winds and heaviest rainfall are
found. The eyewall is where the most intense thunderstorms occur, with updrafts and downdrafts that can
reach over 20 kilometers in height. The eyewall is the most dangerous part of a tropical cyclone, where
the winds can exceed 200 kilometers per hour and the rainfall can be more than 100 millimeters per hour.
3. Spiral Rainbands: The spiral rainbands extend outward from the eyewall and are responsible for much
of the rainfall associated with tropical cyclones. They are curved bands of clouds that rotate around the
35
1.3 Classification
center of the storm. These bands can be hundreds of kilometers long and tens of kilometers wide. The
rainfall within these bands can cause flash flooding and landslides.
4. Upper-level outflow: Tropical cyclones draw in warm, moist air from the surface, which rises and cools as
it reaches higher altitudes. This rising air creates a low-pressure area at the surface, which draws in more
warm, moist air. The air that rises within the cyclone spreads out at the upper levels of the atmosphere,
forming a high-pressure area, called the ”upper-level outflow.” This outflow helps to ventilate the cyclone
and maintain its circulation.
5. Sea surface: Tropical cyclones develop over warm ocean waters, where the sea surface temperature is
typically above 26.5°C. The warm water provides the energy for the cyclone to form and intensify, as it
evaporates and rises to form clouds and thunderstorms. The strong winds associated with the cyclone
also stir up the ocean, causing large waves and dangerous storm surges.
Figure 2 illustrates a cross-section of a typical hurricane that occurs in the Northern Hemisphere. Overall,
the structure of a tropical cyclone is a complex and dynamic system, with multiple components interacting with
each other to maintain its circulation and intensity. The interaction between the different parts of the cyclone is
critical to its development and behavior, and understanding this interaction is essential for predicting the track
and intensity of the storm.
1.3 Classification
Around the world, tropical cyclones are classified in different ways, based on the location (tropical cyclone
basins), the structure of the system and its intensity. For example, tropical storms that occur in the Caribbean
Sea, Gulf of Mexico, North Atlantic Ocean, and the eastern and central North Pacific Ocean are referred to
as ”hurricanes.” In the western North Pacific, they are called ”typhoons.” Meanwhile, in the Bay of Bengal
and Arabian Sea, they are known as ”cyclones.” In the western South Pacific and southeast Indian Ocean, they
are referred to as ”severe tropical cyclones, while in the southwest Indian Ocean, they are called ”tropical
cyclones.”
The Saffir-Simpson Hurricane Wind Scale is a 1 to 5 categorization based on the hurricanes intensity at
the indicated time. It is a widely recognized system used to measure the strength of hurricanes in the Atlantic
and northern Pacific Oceans. The scale was first introduced by Herbert Saffir and Robert Simpson in 1971 and
later revised in 2012.
The scale measures hurricanes based on their sustained wind speed, and it is divided into five categories.
Category 1 hurricanes have sustained wind speeds of 74-95 mph, while Category 5 hurricanes have sustained
wind speeds of 157 mph or higher. Categories 2, 3, and 4 fall in between these two extremes.
In addition to wind speed, the scale also considers the potential damage that a hurricane can cause. For
example, Category 3 hurricanes can cause significant damage to homes, buildings, and infrastructure, while
Category 5 hurricanes can cause catastrophic damage, with widespread power outages, structural damage, and
loss of life.
The Saffir-Simpson Hurricane Wind Scale is an essential tool for predicting and preparing for hurricane
events. It helps emergency managers and communities to understand the potential impact of a hurricane and to
make decisions about evacuations, emergency response, and other measures to protect life and property.
36
1.4 Naming
1.4 Naming
Tropical cyclones have the potential to last for more than a week, leading to the possibility of multiple
cyclones occurring simultaneously. To prevent confusion, each cyclone is assigned a name by weather fore-
casters. Naming conventions for tropical cyclones vary by region, with the Atlantic and Southern hemisphere
(Indian Ocean and South Pacific) using alphabetical order to alternate between male and female names. In the
Northern Indian Ocean, a new naming system was implemented in 2000, with names listed in alphabetical order
by country and using neutral gender. Typically, the list of names is proposed by National Meteorological and
Hydrological Services (NMHSs) of the World Meteorological Organization (WMO) members within a specific
region and approved by the respective tropical cyclone regional bodies during annual or biennial sessions.
Naming storms, or tropical cyclones, has been a longstanding practice aimed at quickly identifying storms
in warning messages. Using names is believed to be easier to remember than technical terms or numbers, and can
increase community preparedness and interest in storm warnings. The use of short, distinctive names in written
and spoken communications has proven to be quicker and less prone to error than older identification methods.
Initially, storms were named arbitrarily, but later on, feminine names were used for storms in the mid-1900s,
and a more organized system was adopted using alphabetically arranged names. Since 1953, Atlantic tropical
storms have been named from lists originated by the National Hurricane Center and updated by an international
committee of the World Meteorological Organization. Lists initially featured only womens names, but mens
names were introduced in 1979 and are alternated with women’s names. The lists are rotated, with each list
used again after six years. In the event that a storm’s name is deemed inappropriate due to its impact, it is
stricken from the list, and a new name is selected to replace it. Examples of infamous storm names that have
been stricken from the list include Mangkhut, Irma, Maria, Haiyan, Sandy, Katrina, Mitch, and Tracy.
1.5 Forecasting
Meteorologists worldwide utilize modern technology like weather radars, satellites, and computers to
track tropical cyclones from their development stage. Forecasting tropical cyclones could be challenging
because of their sudden course changes and weakening. Nonetheless, advanced technologies like numerical
weather prediction models help meteorologists forecast a tropical cyclones intensity, movement, speed, and
where and when it might hit land. National Meteorological Services of the countries involved release official
warnings based on these forecasts. The WMO Tropical Cyclone Programme provides information on these
hazards, while the WMO Severe Weather Information Centre gives real-time tropical cyclone advisories. The
WMO framework enables the timely and widespread dissemination of information about tropical cyclones.
International cooperation and coordination help monitor tropical cyclones from their initial stage of formation.
The WMO coordinates these activities through its Tropical Cyclone Programme. WMO-designated Regional
Specialized Meteorological Centres with activity specialization in tropical cyclones and Tropical Cyclone
Warning Centres detect, track, monitor and forecast all tropical cyclones in their respective regions. These
centres offer advisory information and guidance to National Meteorological and Hydrological Services in
real-time.
References:
Tropical Cyclones
Tropical Cyclone Naming
37
1.5 Forecasting
Saffir-Simpson Hurricane Wind Scale
Mapping the effects of Hurricanes and Cyclones.
About the Author
Robin is a researcher in Physics specializing in the applications of Machine Learning for Remote
Sensing. He is particularly interested in using Computer Vision to address challenges in the fields of Biodiversity,
Protein studies, and Astronomy. He is currently working on classifying satellite images with Landsat and Sentinel
data.
38
Part V
General
Bringing Industrial Practices to Academia
by Arun Aniyan
airis4D, Vol.1, No.5, 2023
www.airis4d.com
1.1 Introduction
Workstyle and work culture deeply impact an individual’s personal life. Workstyle is a major factor that
decides the number of fruitful outcomes that one can generate over a period of time. Generally known as
throughput, it also decides the personal satisfaction of one’s work and also drives the path to professional
growth.
In academia, ones throughput and efficiency are measured by the number of publications with high impact
generated over a period of a year. This cannot be fully achieved by ones individual effort these days. Therefore,
people work on multiple projects in a collaborative fashion which helps in generating more (shared) outputs
and publications. This is a successful model but also has a lot of dependencies when the collaborators are
involved in multiple projects. Even though there is a leader for a collaborative project, timely execution is often
a challenge in academia. The Ph.D. Comics cartoon shown in Figure 1 is a good example of similar situations
when one has his foot in multiple projects and other engagements.
Figure 1: PhD Comics [Image Credit : phdcomics.com]
The work pace in academia, especially with research work is generally in a relaxed fashion. Additionally,
pressure builds only over time and peaks when ones funding finishes up. Often with graduate students, there
are additional psychological factors that build along with this.
Industry in contrast has a completely different culture and work pace. There the work pressure is generally
higher than in academia but flat over time. Although the monetary returns are way higher than in academia, the
1.2 Agile Research
quality may not be as high as an academic result. The work methodology matters a lot in the industry and that
discipline is what makes it thrive and produce tangible output. Both time and money are the driving factors for
constantly generating output.
Bringing the planning methodology and discipline of industrial work can have a huge impact on the work
style and throughput of academic work.
1.2 Agile Research
In industry, every development is conceived as a “Product”. All projects are designed to deliver a product
at the end of its execution. There is a definition of a product that can be applied across different industries.
Product is defined as the result of a process that provides a basic utility and is functional to serve a purpose.
This essentially means every solution ends up as a product. The key features of a product are the following.
A product does not have fancy capabilities but features.
Features may be added later but are not a basic requirement.
Product definition does not change during the process.
These key features also lay some foundations on how to develop a product. Designing and delivering
a product in the industrial setup follows a strict and disciplined method. Even though there are multiple
methodologies, one of the most popular and recent is called the Agile” methodology. The following section
explains the formal framework for how a product is designed in an agile methodology.
1.2.1 Problem Definition
The first step is to define the problem. This involves clear articulation of the final solution and the means to
attain that objective. There must be utmost clarity about the final form of the solution and similar to a storyline,
there must be a well-defined starting and a final acceptance condition. Without this clarity, the next steps cannot
be defined. The problem is defined in a very strict manner such that the definition itself will generate the rest of
the requirements.
1.2.2 Scope
A solution for a problem can have different levels of scope. There might be a solution that is very future-
proof and one that can cater to a limited set of use cases. The scope of the product/solution has to be defined
along with the solution definition. This will determine the effort required to build the solution. The scope
also helps to put specific boundaries around the solution which dictate its initial abilities. It will contain the
outcomes which will fit a definitive timeline. The initial scope may have limited abilities, but this can iterated
with improvements in the next version.
1.2.3 Requirements
Once the problem is defined along with its solution with a specific scope, the next item it will automatically
dictate is the set of requirements to build the solution. It is similar to saying for a specific dish, the recipe
requires the following ingredients. The scope will automatically dictate the minimum necessities for the different
stages of implementation. When working as a team, the requirements will also specify “what” is required from
“whom”. Since the scope of the solution is already set, the requirement will not and should not change during
the execution of the project. It is also important to note that the requirements cover all aspects of the solution.
41
1.3 Process of Agile Execution
1.2.4 Timeline
The scope and requirements already give direction for the project execution. The next piece is defining the
timeline. Often in industry, the timeline is decided by the budget of the project with other commercial reasons.
In academia, the duration of the funding and the deadline one might need to submit a thesis or report can be the
guiding principle.
In agile methodology, the whole solution is divided into independent individual parts before execution.
The independence of each part allows for setting specific milestones for the project and also testing along the
execution process. Figure 2 shows a representation of how the timeline is split across milestones. When the
Figure 2: Project execution is broken into different modules with milestones across the time period of execution.
timeline is designed, it always made sure, that it considers the following aspects.
Know-how of the team members about the subject.
Availability of the team members.
Upcoming leaves and holidays in the time period.
Immediate availability of execution resources such as computational power and data.
The project lead also makes sure that the timeline also caters for any unexpected events that may happen during
the course of the project. To cater for such situations a minimum of 35% buffer is applied to all timeline
calculations. The timeline is initially set in days assuming that the person works for 9hrs a day. It will also
include research and study time if a critical piece of technology is required for the solution is new to the team
members.
The general principle of designing the timeline is to be extremely realistic about the execution of a task.
The time required should not be underestimated or even overestimated. It must be the most realistic value with
some buffer.
1.2.5 Risk
Every project/ solution has some risks associated with it. This may involve risks based on cost, the
relevance of the solution at the time of delivery, certain pieces of technology becoming obsolete, or even the
funding source even stopping. Such risks are agreed upon by all the team members and also the receiver of the
solution. The project always starts with agreement of such risks and countermeasures to tackle them.
1.3 Process of Agile Execution
In the agile methodology, it is the process of project execution that makes sure that intermediate milestones
are met and the solution is delivered on time with the required quality. The key highlight of the agile methodology
is its cyclic process which guarantees a refined output and is never blocked during execution. Figure 3 shows a
representation of the agile process.
A general trend across all projects is that the initial solution that was envisioned during the problem
definition and scope design may not always work. Or it may have unforeseen deficiencies. In a traditional
42
1.3 Process of Agile Execution
Figure 3: The different stages of an agile development methodology.
execution method, such scenarios may even stall the further steps of the project. Agile methodology is generally
immune to such risky situations. For each individual module to achieve a milestone detailed in the timeline, that
module will undergo the process of design, development, and testing. Once the testing phase of that specific
individual module is reached, it will decide whether at that point the milestone can be achieved or not. If the
testing fails, the milestone is redesigned with appropriate corrective measures, developed, and again tested. This
is a rapid process and can cater to both improvements and course corrections along the way. To show a broader
picture of the agile advantage, see Figure 4.
Figure 4: Comparison of waterfall vs agile method for project execution.[Image Credit: Google-Images]
The traditional approach to development is called the waterfall method. As the name suggests, the outcome
is only visible at the end of the timeline. The biggest flaw of this method is that there is no intermediate
testing and no room for course correction along the timeline. In contrast, the agile method which breaks the
whole solution into individual independent components makes it possible to intermediate tests and also course
correction. This guarantees product quality and timely delivery of the solution. It is also noted that each
43
1.4 Reward function
intermediate milestone period is referred to as a “sprint”. Every sprint makes sure there is an output that can be
measured.
1.4 Reward function
No work methodology is successful without the healthy mental state of the team members. The team
members should feel a sense of accomplishment and work satisfaction in the process. A sense of achievement at
the end of the development cycle will generate plenty of energy to work on future projects. Work methodologies
have to make sure the team members of a project get professionally rewarded when they go through the process
of development.
Consider the traditional waterfall methodology with developing a car as shown in Figure 5. Lets say the
development is spread over 4 steps. Step 1 makes a wheel and progresses to step 2 by joining them. In the third
Figure 5: Development of a car with waterfall methodology.[Image Credit: Google-Images]
step, the engine parts and some body parts are fixed. With step 4 the car is completed. The most evident part
of this development life cycle is that the team members will only get a sense of satisfaction at the end of the
development. With the waterfall mechanism we have seen there are potential risks of failure making the final
reward uncertain.
This is where the agile methodology makes sure that every milestone increases the level of happiness by
small factors such that the team members need not require to wait till the end to get rewarded. Figure 6 illustrates
this. With agile methodology, if one needs to develop a car, one would start with the smallest proof of concept
and progress in an incremental fashion. So one first makes a skateboard, which is still not a car but you can
still move. Then you progress making a bicycle and then a motorcycle and finally a car. The main advantage
is that there is a far better incremental sense of happiness that the team members get over time and this always
increases the work spirit.
This is a key driving factor in the case of academic work. When a graduate student works on a research
project, his spirits are high in the beginning which often taper down over time and then there is a final spike
at the end. This is basically the effects of waterfall methodology and always demotivates young minds. But if
the agile style is followed there is always an intermediate milestone that lifts the spirit of people and no loss in
enthusiasm over time.
1.5 Conclusion
This article mainly elaborated the benefits of following an agile work style for improving efficiency as well
as generating tangible outcomes. Mirroring these techniques may not be as straight forward in academia but is
still doable to a large extend.
When it comes to academic research, the final product is always a publication. One way to think about
articulating each milestone would be to have each section of the publication completed at each stage. So for
each section to have a content, a specific set of milestones in terms of work has to be achieved. This can be
44
1.5 Conclusion
Figure 6: Product execution reward function with waterfall and agile method.[Image Credit: Google-Images]
incrementally done and finally the publication is ready to be send over for submission. Therefore here the
product is always a paper and the implementation path is to fill in each section with the actual work done. This
always gives room for retrospective and improvement. Most importantly it guarantees increase in happiness
over time with reasonable output.
In academic research there are often many unknown factors which is difficult to measure. With agile
methodologies, there is no clear direction on how to handle such uncertainties. This is challenge with fully
implementing this method in academic research. But this is where general academic practices of working on
proof of concepts and then testing will help.
Overall there is good benefit in applying agile methods in research which improves efficiency, happiness
and also throughput of academic work.
About the Author
Dr.Arun Aniyan is leading the R&D for Artificial intelligence at DeepAlert Ltd,UK. He comes from
an academic background and has experience in designing machine learning products for different domains. His
major interest is knowledge representation and computer vision.
45
The Ten Years of Science
by Ninan Sajeeth Philip
airis4D, Vol.1, No.5, 2023
www.airis4d.com
2.1 Development across sciences
Significant scientific developments have occurred across various fields in the past ten years. From astronomy
to microbiomes, discoveries and technological advancements have opened up new avenues for research and
exploration. We have witnessed a remarkable explosion of scientific advancements that have transformed our
understanding of the world and revolutionised our lives. These developments have expanded our knowledge and
improved the quality of life with medical, energy, and communications breakthroughs.
Astronomy is one of the most exciting areas of scientific progress in the past decade. With new telescopes,
probes, and observational techniques, astronomers have discovered many new planets, stars, and galaxies. From
the detection of gravitational waves to the discovery of the first interstellar object, the past decade has been a
golden age for space exploration, offering us new insights into the origins and nature of the universe.
But the advances in science have not been limited to the cosmos. The past decade has also seen tremen-
dous progress in genetics and biotechnology, with the development of CRISPR-Cas9 technology, which has
revolutionised gene editing and made gene therapy a reality. The ability to manipulate genes has opened up new
possibilities for treating genetic disorders and developing treatments.
Another area of scientific progress in the past decade has been studying microbiomes, the complex
communities of microorganisms that live on and within us. Research has shown that these microbiomes play
a critical role in our health and well-being, influencing everything from our immune system to our mental
health. Advances in microbiome research have opened up new avenues for developing personalised medicine
and treating a range of diseases.
The past decade has also seen tremendous progress in neuroscience, with new insights into the workings
of the human brain and the development of brain-computer interfaces that can potentially transform the lives
of people with severe disabilities. With these advances in mind, the future of science and technology is bright,
with the potential for continued breakthroughs in medicine, energy, and communications that will transform our
lives and our world.
This article series will overview some of the most significant scientific developments in the past decade.
2.2 Astronomy Research
Astronomy has witnessed unprecedented progress in the past decade, with discoveries and technological
advancements that have opened up new avenues for research and exploration. These developments have
2.3 Gravitational Wave Detection
significantly expanded our understanding of the universe and have provided us with new insights into the nature
of the cosmos.
2.3 Gravitational Wave Detection
One of the most significant astronomical developments in the past decade was the detection of gravitational
waves, predicted by Albert Einsteins general theory of relativity over a century ago. According to the theory,
when massive black holes and neutron stars merge, they can produce ripples in space-time that propagate across
the universe. If we have a sensitive interferometer, such ripples in space-time can be observed as fringes
appearing in a specific pattern. The details about the colliding bodies can then be inferred from the nature of
the observed fringe.
2.3.1 Challenge
The interferometer was known even before Einsteins General relativity was formulated. In fact, the famous
Michelson–Morley experiment in 1887 inspired Einstein to develop the theory of relativity. The Michelson
interferometer looks like what is shown in Figure 1.
Figure 1: The schematic of the Michelson Interferometer.
It has a laser beam of coherent light source that is split into two components using a semitransparent mirror
so that one part of it goes to mirror M1 and the other to mirror M2. These portions of the interferometer are
called the arms of the interferometer; the farther they are, the better the interferometer’s sensitivity. When they
are reflected back, they both are collected by the detector. You might have observed colour fridges when a thin
film of oil is reflected in sunlight over clear water. The phenomenon is called interference, where depending
on the wavelength of the light reflected off from the two surfaces of the oil film and its thickness, the waves
may cancel or enhance each other. Since sunlight has seven colours and the oil film has varying thicknesses
depending on how it spreads over the water surface, different wavelengths get cancelled and enhanced at each
point. This gives the film such a beautiful colour pattern. But a laser beam is monochromatic, meaning it has
47
2.3 Gravitational Wave Detection
only one wavelength. That means, when they interfere, the fringes that are formed will be black and white.
Black where they cancel and white where they enhance. Now, if the arms are of the same length and the
mirrors are exactly perpendicular to each other, The light waves reflected from their centres will be identical in
movement. 1 We say they are in the same phase. So they will enhance each other and create a white spot at
the centre of the detector. But off-centre points will be farther for the beam of light, and hence there will be
a phase difference between their paths when they reach the detector. We will get a dark spot in regions where
the phases are opposite. Since these points are symmetric about the centre, what one observes will be circular
black and white ring structures caused by the subsequent in phase and out of phase light waves.
Since gravitational waves are very feeble, they might often be hidden in the random noise, making in
extremely difficult to detect. To detect such feeble signals, our interferometer have to be super sensitive. That
means, they should have very sensitive mirrors and should have sufficient arm lengths to produce interference
patterns. In fact, the required arm lengths to detect even super massive blackhole merges would be a few
kilometers. Making such an interferometer is a technology challenge even today. It is so sensitive that even a
person walking at some distance from the interferrometer can create patterns! How will we know which one is
real gravitational wave?
The theory of relativity tells us the nature of the gravitational wave ripples produced during each type
of mergers. If we convert them inot sound waves, it will appear like a chirp sound. I would recommend
readers to listen to the chirp sound produced during blackhole mergers here: https://www.ligo.caltech.edu/
video/ligo20160615v2 The actual pattern would look as shown in Figure 2.
Figure 2: LIGO measurement of the gravitational waves at the Livingston (right) and Hanford (left) detectors,
compared with the theoretical predicted values Image curtesy :https://en.wikipedia.org/wiki/First observation
of gravitational waves
That is a clue, but because the gravitational waves might be completely hidden in the ambient noise, it is
almost impossible to detect them. Scientists developed what is called a matched filter that can filter out and
1In reality, there is a phase difference between reflected and refracted light. But for simplicity, we ignore it here.
48
2.4 Exoplanets
enhance probable gravitational wave signals from noisy signals they receive at the interferrometer. To make sure
that they are actually seeing a gravitational wave, they created two interferometers at two different locations. A
real gravitational wave should make simultaneous detection of the patterns in both detector. Surely, the more
the number of detectors, the better would be the reliability of the detection.
In 2015, scientists at the Laser Interferometer Gravitational-Wave Observatory (LIGO) announced the
first-ever detection of gravitational waves resulting from two colliding black holes 1.3 billion light-years away.
This discovery confirmed the existence of gravitational waves and marked a significant milestone in studying
the universe.
The success of the detection was not just the success of the General Relativity. It was also a demonstration
of the level of measurement precision modern instruments can offer. The amount of ripple that the chirp signal
created on the LIGO mirrors were as small as the change in the distance to our nearest star that is 4 light years
away by a factor of the width of a hair! Even Einstein never expecte that we would be able to attain such a
measurement precision.
2.4 Exoplanets
The past decade has also seen significant progress in understanding the formation and evolution of planets
and planetary systems. The discovery of exoplanets - planets orbiting stars outside our solar system - has
exploded in the past decade. When a planet moves around a star, if it happens to be in the line of sight of the
observer, it would cause a dip in the amount of light observed. A planet will produce a box type dip as shown
in Figure 3. The method is called transit method. See https://exoplanets.nasa.gov/faq/31/whats-a-transit/ for an
illustration. Following a transit method, the Kepler space telescope discovered thousands of exoplanets, which
detects the slight dip in light as a planet passes in front of its host star. The TESS space telescope, launched in
2018, has continued this work and discovered numerous exoplanets in its first few years of operation. The James
Webb Space Telescope has just started revolutionising our ability to study exoplanets and their atmospheres,
giving us insight into the potential for life beyond our solar system.
Figure 3: The schematic of the transit method for exoplanet discovery. Image courtesy:https://astronomy.com/
sitefiles/resources/image.aspx?item=%7BFAAA90E5-5808-4275-9EA8-3AD601DD31A4%7D
49
2.4 Exoplanets
In 2017, the discovery of the first interstellar object, ’Oumuamua, provided astronomers with a rare glimpse
into the origins of our solar system. This cigar-shaped object was the first known interstellar visitor to our solar
system passing through the inner solar system. Its unusual shape and behaviour suggest that it originated from
outside our solar system and emerged from another star system.
Another significant development in the past decade has been the launch of new telescopes and observatories.
The Atacama Large Millimeter Array (ALMA), a radio telescope located in Chile, has opened a unique window
into the universe, allowing astronomers to observe the cosmos at longer wavelengths than ever. The Event
Horizon Telescope (EHT) has produced the first-ever image of a black hole at the centre of the Messier 87
galaxy using a global network of telescopes.
Finally, the past decade has also seen significant progress in studying dark matter and dark energy, the
mysterious substances that comprise most of the universe. The Large Hadron Collider (LHC) in Switzerland
has conducted experiments to search for dark matter particles. At the same time, the European Space Agency’s
Euclid will be launched on a Falcon 9 in July 2023. Following a travel time of 30 days, it will be stabilised to
travel a Lissajous path of large amplitude (about 1 million kilometres) around the Sun-Earth Lagrangian point
L2. It will study dark energy and dark matter through observations of the universe’s structure and evolution.
All these developments have made the past decade a golden eara for astronomy research. The story is
becoming even more exciting with new telescopes of bigger size are being built and commissioned. However,
the universe still remains a mystery to mankind and astronomy is still in its infancy.
[Will continue in next issue]
About the Author
Professor Ninan Sajeeth Philip is a Visiting Professor at the Inter-University Centre for Astronomy
and Astrophysics (IUCAA), Pune. He is also an Adjunct Professor of AI in Applied Medical Sciences [BCMCH,
Thiruvalla] and a Senior Advisor for the Pune Knowledge Cluster (PKC). He is the Dean and Director of airis4D
and has a teaching experience of 33+ years in Physics. His area of specialisation is AI and ML.
50
About airis4D
Artificial Intelligence Research and Intelligent Systems (airis4D) is an AI and Bio-sciences Research Centre.
The Centre aims to create new knowledge in the field of Space Science, Astronomy, Robotics, Agri Science,
Industry, and Biodiversity to bring Progress and Plenitude to the People and the Planet.
Vision
Humanity is in the 4th Industrial Revolution era, which operates on a cyber-physical production system. Cutting-
edge research and development in science and technology to create new knowledge and skills become the key to
the new world economy. Most of the resources for this goal can be harnessed by integrating biological systems
with intelligent computing systems offered by AI. The future survival of humans, animals, and the ecosystem
depends on how efficiently the realities and resources are responsibly used for abundance and wellness. Artificial
intelligence Research and Intelligent Systems pursue this vision and look for the best actions that ensure an
abundant environment and ecosystem for the planet and the people.
Mission Statement
The 4D in airis4D represents the mission to Dream, Design, Develop, and Deploy Knowledge with the fire of
commitment and dedication towards humanity and the ecosystem.
Dream
To promote the unlimited human potential to dream the impossible.
Design
To nurture the human capacity to articulate a dream and logically realise it.
Develop
To assist the talents to materialise a design into a product, a service, a knowledge that benefits the community
and the planet.
Deploy
To realise and educate humanity that a knowledge that is not deployed makes no difference by its absence.
Campus
Situated in a lush green village campus in Thelliyoor, Kerala, India, airis4D was established under the auspicious
of SEED Foundation (Susthiratha, Environment, Education Development Foundation) a not-for-profit company
for promoting Education, Research. Engineering, Biology, Development, etc.
The whole campus is powered by Solar power and has a rain harvesting facility to provide sufficient water supply
for up to three months of drought. The computing facility in the campus is accessible from anywhere through a
dedicated optical fibre internet connectivity 24×7.
There is a freshwater stream that originates from the nearby hills and flows through the middle of the campus.
The campus is a noted habitat for the biodiversity of tropical Fauna and Flora. airis4D carry out periodic and
systematic water quality and species diversity surveys in the region to ensure its richness. It is our pride that
the site has consistently been environment-friendly and rich in biodiversity. airis4D is also growing fruit plants
that can feed birds and provide water bodies to survive the drought.