Cover page
Image Name: Peering Into the Tendrils of NGC 604 with NASAs Webb
Image credit: NASA, ESA, CSA, STScI
At the center of the image is a nebula on the black background of space. The nebula is comprised of wispy
filaments of light blue clouds. At the center-right of the blue clouds is a large cavernous bubble. The bottom
left edge of this cavernous bubble is filled with hues of pink and white gas. There are hundreds of dim
stars that fill the surrounding area of the nebula. For more information read : https://www.flickr.com/photos/
nasawebbtelescope/53577720515/in/album-72177720313923911
Managing Editor Chief Editor Editorial Board Correspondence
Ninan Sajeeth Philip Abraham Mulamoottil K Babu Joseph The Chief Editor
Ajit K Kembhavi airis4D
Geetha Paul Thelliyoor - 689544
Arun Kumar Aniyan India
Sindhu G
Jorunal Publisher Details
Publisher : airis4D, Thelliyoor 689544, India
Website : www.airis4d.com
Email : nsp@airis4d.com
Phone : +919497552476
i
Editorial
by Fr Dr Abraham Mulamoottil
airis4D, Vol.2, No.5, 2024
www.airis4d.com
We are continuing with our a monthly interactive
program called “Speak with an Astronomer”, which
is inspired by Ajit Kembhavi’s article “Black Hole
Stories-8, Rotating Black Holes”. This program pro-
vides young enthusiasts with the chance to explore the
topic in depth and acquire profound knowledge.
Blesson George’s article “Types of Attention Mod-
els” explores attention networks, focusing on three
types of attention mechanisms critical to their func-
tionality. These mechanisms include global and local
attention, which differ in their scope of focus on in-
put data, and soft and hard attention, which describe
the method of attention application. Additionally, self-
attention, a mechanism allowing models to prioritize
input parts independently, is discussed. Global atten-
tion considers the entire input sequence, while local
attention focuses on specific subsets. Soft attention
dynamically allocates attention weights, while hard
attention involves stochastic selection. These mech-
anisms enhance model interpretability and efficiency
within neural network architectures.
In “Black Hole Stories-8 Rotating Black Holes”
by Ajit Kembhavi, the focus shifts to black holes with
mass and angular momentum, or spin. Unlike in the
case of Schwarzschild black holes, which have only
mass, spinning black holes have a more complex space-
time structure. The Kerr metric, discovered by Roy
Kerr in 1963, describes the geometry of spinning black
holes, revealing intricate features like the ergosphere.
The article explores the properties of the Schwarzschild
metric, the nature of geodesics around black holes, and
the Kerr metric’s special cases when the spin or mass
approaches zero. These insights provide a deeper un-
derstanding of rotating black holes dynamics and their
impact on astrophysics.
In “Beginners Guide to Machine Learning in Python
- Part 2” by Linn Abraham, Pythons suitability for ma-
chine learning is discussed, emphasizing its readability
and the availability of libraries like NumPy and Pan-
das. The article explores the fundamentals of machine
learning, comparing it with deep learning and high-
lighting the importance of frameworks like TensorFlow
and PyTorch. It also covers practical aspects such as
GPU utilization, data preprocessing, and model evalua-
tion. Overall, the article provides a concise overview of
implementing machine learning workflows in Python,
hinting at future discussions on advanced topics.
In “Unlocking the Mysteries of Star Clusters: Ce-
lestial Ensembles of Cosmic Wonder” by Sindhu G,
star clusters are explored as groupings of stars bound
by gravity, varying in size and composition. The article
delves into the formation of star clusters, highlighting
types such as globular clusters, open clusters, and em-
bedded clusters. It discusses the distinct properties and
significance of each type, offering insights into stellar
evolution, galactic dynamics, and the history of the
universe. Additionally, the article touches upon the
challenges and methods of observing these clusters,
showcasing their role in advancing our understanding
of the cosmos.
“X-ray Astronomy: Through Missions” by Aro-
mal P traces the history of X-ray astronomy from its
beginnings with balloon experiments in the early 20th
century to the development of rocket missions and
satellites for X-ray observations. The article highlights
key milestones, such as the discovery of solar X-rays
in 1949 and the detection of X-rays from outside the
solar system in 1962 with the launch of an Air Force
Aerobee rocket. The significance of these discover-
ies led to further exploration through rocket launches
and balloon experiments, eventually paving the way
for dedicated X-ray astronomical satellites. The dis-
cussion emphasizes the evolution of technology and
the contributions of scientists like Riccardo Giacconi
and Herbert Gursky in shaping our understanding of
the X-ray universe.
“Radio Galaxies: An Introduction” by Kshitij
Thorat provides an overview of radio galaxies, which
emit a significant portion of their light in the radio
bands due to large-scale jets and lobes. These jets,
believed to originate from supermassive black holes at
their centers, extend over vast distances, making radio
galaxies some of the largest objects in the universe.
The article discusses the structure of radio galaxies us-
ing Cygnus A as an example, highlighting features like
jets, lobes, and hotspots. It explains the process be-
hind the emission of radio waves and explores different
types of radio galaxies based on their structures. The
significance of radio galaxies in understanding galac-
tic activity and structure formation is also emphasized.
Additionally, the article mentions radio telescopes like
the Giant Metrewave Radio Telescope (GMRT) and
the Square Kilometre Array (SKA), which contribute
to studying radio galaxies in detail.
Atharva Pathak in “Genetically Engineered War-
riors: Indias New Hope in Cancer Treatment” explores
Indias breakthrough in cancer treatment with CAR T-
cell therapy. This revolutionary approach, exemplified
by NexCAR19, offers accessible treatment for B-cell
cancers. The article also highlights the role of AI/ML
in cancer care, emphasizing their potential in diag-
nosis, treatment, and drug discovery. Despite chal-
lenges, the article concludes optimistically, showcas-
ing the transformative impact of scientific innovation
on cancer treatment in India.
Geetha Paul’s article delves into DNA sequenc-
ing, particularly Next-Generation Sequencing (NGS),
highlighting its significance in deciphering genetic in-
formation. It explains NGS’s high-throughput capabil-
ities and the steps involved in the sequencing process,
from sample extraction to data analysis. Emphasizing
the importance of quality control and bioinformatics,
the article underscores NGS’s transformative potential
in advancing biological research. Overall, it offers a
concise overview of NGS and its implications for sci-
entific discovery.
Jinsu Ann Mathew’s article explores the signif-
icance of DNA methylation and introduces bisulfite
sequencing as a technique to uncover hidden methy-
lation patterns. DNA methylation, crucial for gene
regulation, is often concealed in standard sequencing
methods. Bisulfite conversion, a chemical process, re-
veals these patterns by distinguishing methylated from
unmethylated cytosines. The technique involves DNA
isolation, bisulfite conversion, PCR amplification, and
DNA sequencing. By interpreting the sequenced DNA,
researchers discern the original methylation status, aid-
ing in understanding gene regulation, development, and
disease. Bisulfite sequencing emerges as a powerful
tool offering insights into epigenetic modifications and
their role in cellular processes.
iii
Contents
Editorial ii
I Artificial Intelligence and Machine Learning 1
1 Types of Attention Models 2
1.1 Global and Local Attention Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Beginners Guide to Machine Learning in Python - Part 2 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Why Machine Learning? What Problems Does It Solve? . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Under-the-hood of a Deep Learning Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
II Astronomy and Astrophysics 9
1 Black Hole Stories-8
Rotating Black Holes 10
1.1 Black Holes With Spin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 The Schwarzschild Metric A Brief Recapitulation . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 The Kerr Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Special Cases of the Kerr Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 X-ray Astronomy: Through Missions 13
3 Radio Galaxies: An Introduction 16
4 Unlocking the Mysteries of Star Clusters: Celestial Ensembles of Cosmic Wonder 18
4.1 Star Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Globular Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Open Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Embedded Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 Super Star Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
III Biosciences 24
1 Genetically Engineered Warriors: India’s New Hope in Cancer Treatment 25
1.1 Supercharging the Immune System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.2 A Breakthrough for India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3 Looking Ahead: A Brighter Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4 Unleashing the Power of AI and ML in Cancer and Medicine . . . . . . . . . . . . . . . . . . . . 26
CONTENTS
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 DNA Sequencing
Next-Generation Sequencing (NGS) 29
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Next Generation Sequencing (NGS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Step 1: Sample Isolation and Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Step 2: Library Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Step 3: Sequencing Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 NGS Data Analysis Using Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 How Bisulfite Sequencing Reveals Hidden Messages? 35
3.1 What is Bisulphite Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 DNA Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Bisulphite Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 PCR Amplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
v
Part I
Artificial Intelligence and Machine Learning
Types of Attention Models
by Blesson George
airis4D, Vol.2, No.5, 2024
www.airis4d.com
In our previous episodes, we delved into the fasci-
nating world of attention networks, examining their
powerful capabilities and the fundamental building
blocks that define their structure. We thoroughly an-
alyzed the different components that contribute to the
networks’ unique abilities to process and interpret data
effectively.
Continuing our exploration, this issue will focus
on expanding our understanding by discussing three
specific types of attention mechanisms that are criti-
cal to the functionality and versatility of these models.
These types are categorized based on distinct opera-
tional features and methodologies they employ: global
and local attention, which differ in the scope of focus
they apply to input data; hard and soft attention, which
describe the method and flexibility of the attention ap-
plication; and self-attention, a mechanism that allows
models to weigh and prioritize different parts of the
input independently.
1.1 Global and Local Attention
Models
Attention networks are a type of neural network
architecture that allows models to focus on specific
parts of the input data while making predictions or
decisions. These networks are designed to dynamically
weigh the importance of different elements in the input,
enabling the model to selectively attend to relevant
information. By incorporating attention mechanisms,
the network can learn to assign varying degrees of
importance to different parts of the input sequence,
enhancing its ability to capture complex relationships
Figure 1: Figure showing a global and local atten-
tion networks. In contrast to global attention, which
evaluates all intermediate hidden states across an entire
input sequence, local attention narrows its focus to a
select, fixed-size subset of these states. This approach
typically centers the attention around a specific point
or follows a predefined alignment, limiting the scope to
only the most relevant parts of the input. By doing so,
local attention significantly reduces computational de-
mands, making it more efficient, especially for longer
sequences. However, this efficiency comes at the cost
of potentially overlooking useful context outside the se-
lected window. Therefore, the choice between global
and local attention often balances between computa-
tional efficiency and the richness of contextual infor-
mation utilized. Image Credit: Nagahisarchoghaei, Mohammad, et al. An empirical
survey on explainable ai technologies: Recent trends, use-cases, and categories from technical and
application perspectives.” Electronics 12.5 (2023): 1092.
1.1 Global and Local Attention Models
and dependencies within the data.
The development of attention networks was driven
by the need to address the limitations of traditional neu-
ral network architectures, such as the inability to ef-
fectively handle long-range dependencies and capture
intricate patterns in sequential data. By introducing
attention mechanisms, researchers aimed to improve
the interpretability and performance of deep learning
models, particularly in tasks involving natural language
processing, machine translation, and image recogni-
tion. Attention networks enable models to focus on
specific parts of the input sequence, allowing for more
precise and context-aware predictions.
Global and local attention mechanisms are two
common variants of attention networks that serve dis-
tinct purposes in enhancing model interpretability. Global
attention mechanisms consider the entire input sequence
when assigning attention weights, allowing the model
to capture long-range dependencies and relationships
across the entire input. In contrast, local attention
mechanisms focus on a specific subset of the input se-
quence, providing a more fine-grained and localized
view of the data. By incorporating both global and lo-
cal attention mechanisms, models can effectively bal-
ance between capturing broad contextual information
and focusing on specific details within the input data,
leading to more accurate and insightful predictions.
Global and local attention mechanisms are fun-
damental components of neural networks that enhance
model interpretability by allowing selective focus on
different parts of the input sequence. Global atten-
tion considers the entire input sequence when assigning
attention weights, capturing long-range dependencies
and relationships across the data. The global attention
weight α
i
for each element in the input sequence is
calculated as:
α
i
=
exp(ϵ
j
)
n
j=1
exp(ϵ
j
)
(1.1)
represents the relevance score of the i-th element in
the input sequence. This calculation ensures that the
model weighs the importance of each element based on
its relevance score, providing a comprehensive view of
the input data.
In contrast, local attention mechanisms concen-
trate on specific subsets of the input sequence, offering
a more localized perspective. By employing a window-
based approach, local attention focuses on a fixed-size
window of elements around a central position. The
attention weights for local attention are computed sim-
ilarly to global attention but with constraints on the
range of elements considered. This strategy enables
the model to emphasize specific regions of the input
sequence, capturing detailed information while man-
aging computational complexity effectively.
1.1.1 Soft and Hard Attention Models
Soft attention in image caption generation is a
technique that trains a model to dynamically focus on
various parts of an image when generating captions.
This model is fully differentiable, which means it can
be seamlessly integrated with gradient-based learning
methods like backpropagation, facilitating straightfor-
ward training and enhancing the model’s interpretabil-
ity. In soft attention mechanisms, every part of the
input data, such as different regions of an image, is
assigned a weight calculated typically through a soft-
max function. These weights are fractional and col-
lectively add up to one, ensuring a comprehensive and
smooth distribution of attention across the entire input.
As a result, a context vector is formed by computing
a weighted sum of the features, where each features
influence on the final output is proportionate to its as-
signed weight, thus allowing every part of the image to
contribute to the generated caption based on its calcu-
lated relevance.
Conversely, hard attention operates on a stochas-
tic mechanism, where the focus areas within the input
are randomly sampled during each step of the caption
generation process. This selection process is based on
a probability distribution that emerges from the fea-
tures of the data, making hard attention inherently ran-
dom and making each selection unique. Because hard
attention involves making discrete choices—focusing
intently on certain parts while completely disregarding
others—it lacks differentiability. This characteristic
complicates its integration with conventional training
methods like backpropagation. Instead, hard attention
3
1.2 Conclusion
models often require alternative training strategies such
as maximizing an approximate variational lower bound
or employing algorithms like REINFORCE, which rely
on reinforcement learning principles or Monte Carlo
methods to estimate gradients.
Both Luong et al. and Xu et al. have exten-
sively discussed these concepts in their respective pa-
pers. They differentiate between the two models by
highlighting that soft attention calculates the context
vector as a weighted sum of all encoder hidden states.
In contrast, hard attention, as utilized particularly in
image captioning scenarios, uses attention scores to
select a single hidden state or feature vector (typically
generated by a Convolutional Neural Network, CNN).
The challenge with hard attention arises when select-
ing this state; functions like argmax might be used
for selection due to their ability to pinpoint the index
with the maximum score. However, such functions
are not differentiable—minor adjustments in network
weights during training do not alter the selected in-
dex—necessitating the use of more complex computa-
tional techniques to effectively train the model. This
delineation clearly shows how the soft attention mech-
anism, with its smooth and inclusive focus across all
inputs, contrasts sharply with the selective and com-
putationally intensive nature of hard attention in the
context of image caption generation.
1.2 Conclusion
In this comprehensive exploration of attention net-
works, we have uncovered the nuanced differences and
applications of various attention mechanisms within
neural network architectures. From the broad-reaching
global attention that captures extensive contextual in-
formation across entire input sequences, to the pre-
cision of local attention focusing on specific regions,
these mechanisms significantly enhance the interpretabil-
ity and efficiency of models. We delved deeper into the
distinctions between soft and hard attention models,
highlighting their respective advantages and limitations
in terms of differentiability and computational demand.
Soft attentions integrability with gradient-based learn-
ing stands in contrast to the stochastic and computation-
ally intensive nature of hard attention, which requires
more complex training techniques such as reinforce-
ment learning.
The discussion illustrates how attention mecha-
nisms are pivotal in addressing the challenges of tradi-
tional neural networks, particularly in managing long-
range dependencies and processing large and complex
datasets efficiently. By enabling selective focus, these
networks do not merely react to the most prominent
features but intelligently weigh all parts of the input to
generate contextually rich outputs. As we continue to
push the boundaries of what is possible with machine
learning, attention networks represent a critical step
toward more dynamic, flexible, and powerful artificial
intelligence systems. This journey into the intricacies
of attention models not only enhances our understand-
ing but also opens up new avenues for innovation in
various domains, including natural language process-
ing, computer vision, and beyond.
References
1. Nagahisarchoghaei, Mohammad, et al. An em-
pirical survey on explainable ai technologies:
Recent trends, use-cases, and categories from
technical and application perspectives.” Elec-
tronics 12.5 (2023): 1092.
2. Luong, Minh-Thang, Hieu Pham, and Christo-
pher D. Manning. ”Effective approaches to attention-
based neural machine translation.” arXiv preprint
arXiv:1508.04025 (2015).
3. Xu, Kelvin, et al. ”Show, attend and tell: Neural
image caption generation with visual attention.”
International conference on machine learning.
PMLR, 2015.
4. Different types of Attention in Neural Networks
4
1.2 Conclusion
About the Author
Dr. Blesson George presently serves as an
Assistant Professor of Physics at CMS College Kot-
tayam, Kerala. His research pursuits encompass the
development of machine learning algorithms, along
with the utilization of machine learning techniques
across diverse domains.
5
Beginners Guide to Machine Learning in
Python - Part 2
by Linn Abraham
airis4D, Vol.2, No.5, 2024
www.airis4d.com
2.1 Introduction
In the first part of this series we got a brief overview
of the different stages in a machine learning project.
We started out with setting up the environment, the
hardware and software requirements. In this brief ar-
ticle we go a bit more in-depth to see the steps in-
volved in a machine learning workflow. Especially the
moving parts involved in a successful training session.
This article also mentions the considerations to be had
when making choices regarding language, platforms,
libraries etc.
2.2 Why Python?
A programming language is fundamentally a tool
which helps us convey an idea to the machine. Thus
it should be immaterial which language is used for
any particular task. However there are some practical
considerations that makes us prefer one language over
others. What are some advantages and disadvantages
that Python has when it comes to machine learning?
2.2.1 Software libraries
Coding a deep neural network from scratch in
Python is possible but is heavily advised against. When
one makes heavy use of software libraries one saves
time by not reinventing the wheel. The disadvantage to
this approach is that the code is no longer in one’s con-
trol and subject to change. This change is inevitable
in the domain of technology. In science where repro-
ducibility is critical it might not be desirable to have
your code break. Thus the first step to working in
Python is often to create an environment (also called
a virtual environment) that is isolated from the system
Python and to version control the code. All external
libraries are installed within this virtual environment.
Version control of code is often done by ‘Git together
with a ‘requirements.txt file that list the version of
each external software library used in your code (also
called a dependency). The virtual environment is also
useful to manage dependencies when working on dif-
ferent projects that might require different version of
each of the dependencies.
2.2.2 Wrapper code
An overlooked advantage of Python is its read-
ability. Python code is often said to be almost like
pseudo code and hence very readable. The downside
of this is that its slower than other languages such as C,
Fortran etc. However this is not very bad since there
exists a lot of Python wrapper code that just provides
an interface to code written in a faster language that
does the actual heavy lifting. We use the python code
to pass inputs and to receive the outputs. This is very
often encountered in machine learning where a lot of
the actual heavy lifting is done by faster languages like
C and C++.
The basics of a programming language can be
learnt in a considerably small amount of time. The rest
2.3 Why Machine Learning? What Problems Does It Solve?
of the time is spent understanding the code written by
others and troubleshooting the usage. Time is mostly
spend in discussion forums like stackoverflow to un-
derstand the error messages spit out by the code you
are trying to fix and seeing other peoples solutions.
Thus learning python involves learning to use a lot of
different tools be it code editing software, virtual envi-
ronments and version control software and so on and
so forth.
2.2.3 Some useful libraries
Depending on the kind of data that one wants
to deal with there are many python libraries that one
cannot avoid using. NumPy which adds support for nu-
merical arrays. Pandas that add support for numerical
arrays that can have more than just numerical data but
also strings. It also enables the indexing of arrays us-
ing strings. Scipy adds support for scientific functions.
Matplotlib is a very rich library that supports almost
any kind of data visualization that you can think of.
PIL allows to read images into python. Astropy adds
support for astronomy related functions.
2.3 Why Machine Learning? What
Problems Does It Solve?
Most things that we as humans learn cannot be
put into a sequence of instructions to be followed word
to word by any person or machine. Think about how
you learnt to walk, speak, identify plants birds animals
etc, distinguish new faces from familiar faces. ML is
a way of harnessing this power of the human brain to
solve problems without explicit instructions. And to
apply it to niche problems in every walk of life. Mostly
to solve just one designed problem with curated data.
Remember that it is no magic bullet either. It helps
us to predict patterns in data which are difficult for the
average human by delegating the effort to computers.
It fails when the problem itself has no patterns - think
why ML cannot help you to hack the share market. It
fails when there are patterns but the data you have is
not enough to capture the variance.
2.3.1 Machine learning vs Deep learning
Deep learning refers to a special class of machine
learning techniques. Although there is no strict bound-
ary here there are some clues that help us distinguish
between these. Most deep learning techniques make
use of neural networks. Often there are layers of these
networks stacked on top of each other that makes them
“deep”. One advantage that comes with using neu-
ral networks versus traditional learning algorithms is
that they are quite versatile and do not require data to
be transformed to the specific requirements of the un-
derlying algorithm. However this is often where the
algorithms lose its interpretability. Hence the coinage
that neural networks are black boxes.
2.3.2 CUDA and the GPU revolution
When PCs transformed from being mostly text
based to being heavily dependent on graphics, people
developed CPUs that are specialized for matrix manip-
ulations. Remember that a screen is simply a matrix
of pixel values. The real breakthrough in deep learn-
ing occurred when researchers found a use for these in
training neural networks. Nvidia was the gpu making
company that opened up the use of its GPU for anyone
interested in doing such things by introducing a plat-
form called CUDA. The python deep learning libraries
that we are going to get introduced to make use of the
CUDA platform in order to run code on the GPUs.
2.3.3 Tensorflow or PyTorch
There are currently two frameworks which are
commonly used to implement deep neural networks in
python. Tensorflow or Tensorflow / Keras which was
initially developed by Google, and PyTorch which was
initially developed by Facebook. It is mostly a matter
of personal taste regarding which one to use. The
scikit-learn library has a lot of the non deep learning
algorithms as well as a lot of utility functions that can
be used during the training of deep neural networks.
7
2.4 Under-the-hood of a Deep Learning Network
2.3.4 Vision or Speech
Two major areas of application in deep learning is
computer vision and natural language processing. This
can probably be attributed to the fact that vision and
language are two traits that are the hallmarks of our in-
telligence. This also translates to two different formats
of digital data. Images and text data. Computer vision
techniques are developed to make use of data that has
a fixed grid shape like images. NLP techniques are
developed to deal with data of variable input size. A
sentence has no restriction in the number of words it
should have. Depending on the kind of data and prob-
lem at hand, we need to look into models developed in
either of these application fields. For example, since
images are a big part of astronomical surveys, models
used in Computer Vision applications like Convolu-
tional Neural Networks or CNNs are often helpful for
solving problems. However if the data at hand is a time
series signal you may have to look into techniques like
transformers that are developed by people interest in
Natural Language Processing applications.
2.4 Under-the-hood of a Deep
Learning Network
Stochastic Gradient Descent or SGD is the engine
of modern deep learning techniques. To get an idea
of how it works let us consider a supervised image
classification problem. This means that the input is an
image and the output is a class label. Since strings or
text data is not natural in such cases we encode the class
by attaching n neurons at the end of the network. Where
n corresponds to the number of classes in the problem.
All the outputs are restricted to be between some fixed
range like (0,1) using non linear functions like sigmoid.
Then the last layer neuron with the highest output value
can be the predicted class label.
Most practical datasets are too big to be com-
pletely held in a computers memory. This is why
we need generators that load the data into memory
in batches. All the weights, i.e. parameter values in
the network are randomly initialized. A single batch
of data is forward passed through the network. A loss
function is used to get a feedback regarding how much
the predictions differ from the expected output. The
errors are backpropagated to the initial layers using
gradients. The weights are adjusted and the loop con-
tinues.
Soon the need arrives to have controlled sets for
testing the performance of a trained model. Ideally we
should not make decisions in model parameters etc.
based on this test model. What happens then is the
information from the test set leaks back into our model
and our test set is no longer unbiased or fair. This is
why we often have a train/validation/test split. Where
the validation set which is like a test set is used for
improving the model parameters.
This constitutes the basic workflow of an deep
learning training session. But there are lot more things
to be done. How do we properly assess the learning
process during the training session itself? Once the
training is done how can we assess it? What if the
datasets are imbalanced? Does traditional metrics like
accuracy work in evaluating the performance? If your
dataset is small, is there statistical significance for your
results? And finally even if its producing good results,
how can you be sure that the model is looking for
patterns that you see or finding some other hidden bias?
All these can be the content of a future article in this
series- watch out.
About the Author
Linn Abraham is a researcher in Physics,
specializing in A.I. applications to astronomy. He is
currently involved in the development of CNN based
Computer Vision tools for prediction of solar flares
from images of the Sun, morphological classifica-
tions of galaxies from optical images surveys and ra-
dio galaxy source extraction from radio observations.
8
Part II
Astronomy and Astrophysics
Black Hole Stories-8
Rotating Black Holes
by Ajit Kembhavi
airis4D, Vol.2, No.5, 2024
www.airis4d.com
So far in our Black Hole Stories, we have con-
sidered Schwarzschild black holes, which have only
one parameter, which is mass. In the present story
we will consider black holes which have mass as well
as angular momentum or spin. The space-time struc-
ture around spinning black holes is more complicated
than the simple Schwarzschild geometry. That reflects
in the shape of trajectories of particles and photons
around them, and the structure of the singularity and
the event horizon. There are also features like the er-
gosphere which only exist when spin is present. We
will describe some of these properties in this story and
the next one.
1.1 Black Holes With Spin
Karl Schwarzschild discovered the first exact so-
lution of Einsteins equations in 1916, just a year after
the equations were first published. His solution de-
scribes the space-time structure, i.e. the gravitational
field around a point particle with mass and no other
properties. As we have seen through our stories, such
a solution corresponds to a black hole. One family
of such black holes, known as stellar mass black holes,
are formed when stars much more massive than the Sun
complete their evolution and explode, leaving behind a
black hole, which can have mass in the range of a few
times the mass of the Sun to several tens of times the
mass of the Sun. Another family of black holes, known
as supermassive black holes, have mass ranging from
about a million times the Solar mass to many billions
of Solar masses. Such black holes are believed to form
in the collapse of very large clouds of gas. They are
located in the centres of galaxies and their mass can
steadily increase for billions of years after formation
due to capture of gas and stars from the surrounding
galaxy.
Stars and gas clouds always have angular momen-
tum which causes them to rotate. Some of this angular
momentum can be lost during the processes which lead
to the formation of the black hole, but it is natural to
expect that at least part of the angular momentum will
remain with the collapsing object. Therefore, black
holes should have non-zero spin, which will have an
effect on their space-time structure. The exact solution
for a spinning black hole was discovered by Roy Kerr
in 1963. This very seminal work has enabled a full
study of the very complex geometry of such a black
hole, and has important implications to astrophysics,
which became clear only decades after the discovery
of the solution.
1.2 The Schwarzschild Metric A
Brief Recapitulation
Here we will briefly summarise the some proper-
ties of the Schwarzschild metric and of the Schwarzschild
black hole, which we have described in some detail in
Stories 5 and 1. The space-time the metric describes
is around a point particle of a given mass M. Since
the particle has no direction dependent properties, the
1.3 The Kerr Metric
space-time around it is spherically symmetric, so it is
best described in terms of the coordinates t, r, θ, φ.
As described in Story 5, t is the time coordinate, and
r, θ, φ indicate the position of a point in space. The
two angular coordinates a θ and φ are similar to the
two angles from the spherical polar coordinates used
to describe flat 3-dimentional space, but the radial co-
ordinate r is somewhat different. Because the space is
curved, r no longer is the distance from the origin, but
it helps to fix the position in space. The mass M is
located at the origin r=0. If we take a fixed value of
r and vary the angular coordinates over their ranges, a
spherical surface is generated. The area of this sphere
is 4πr
2
as in flat space.
The spherical surface with radius R
S
= 2GM/c
2
is
known as the event horizon. This has the property
that no particle or light ray can travel from inside the
event horizon to the outside. The region inside the
event horizon is cut off from the rest of the Universe
and therefore we have a black hole. At the position of
the point mass M, the matter density is infinitely large
and so is the curvature of space-time, and so we have
a space-time singularity. The outside world cannot
see the singularity because of the event horizon. It is
possible for matter and light to fall into the black hole
from the outside the event horizon.
As described in Story 5, the motion of particles
with mass in a gravitational field is described by time-
like geodesics, while that of a light ray is described by
a null geodesic. There are two symmetries associated
with the Schwarzschild metric: it is independent of
time and is spherically symmetric. Therefore the en-
ergy and angular momentum of a particle or light ray in
orbit around a Schwarzschild black hole are conserved,
that is they remain constant. It is therefore possible to
analyse the nature of the geodesics in a simple manner.
In Story 6, we have described how the nature of time
like geodesics is studied using an effective potential
V
eff
. For a particle with a given angular momentum,
the effective potential depends only on the radial coor-
dinate r . In general it has a maximum and minimum,
which produces a potential well. Depending on its en-
ergy, (1) a particle can come in from large distances,
swing around the centre and recede again to large dis-
tances, (2) it can fall into the black hole, or (3) move
in a bound orbit around the black hole with shape cor-
responding to a precessing ellipse. When the energy
of the particle is equal to the minimum of the effective
potential, the orbit is circular in shape. As described
in Story 8, the behaviour of light rays, i.e. photons is
somewhat different. They can have orbits as in (1) and
(2), but the only bound orbits occur at a fixed value of
r=1.5r
S
. These orbits are circular and unstable.
1.3 The Kerr Metric
The Kerr metric provides the structure of space-
time around a particle which has mass and angular
momentum or spin. The angular momentum defines a
direction around which the particle spins. That is easy
to visualise for an extended body like the Earth, but
the same physics applies to a point particle too. Be-
cause the mass and angular momentum are constant,
the metric is constant in time. The spin axis is also
a symmetry axis, in the sense that the metric remains
the same for all points in a plane perpendicular to the
spin axis (this and other such concepts can be mathe-
matically defined for the curved space-time of general
relativity, but I am using simple expressions for qual-
itative understanding). Roy Kerr obtained an exact
solution for Einstein’s equations for the special case of
a spinning, massive particle.
It is convenient to express the Kerr solution in
terms of coordinate system t, r, θ, φ known as Boyer-
Lindquist coordinates. Here t is the time coordinate as
usual; the other three coordinates have the appearance
of the spherical polar coordinates used in Schwarzschild
metric, but the appearance is deceptive. For example,
the coordinate r does not have the same meaning as in
the Schwarzschild case. There a surface with r constant
has a spherical shape with area 4πr
2
, though r is not the
distance from the origin, which is at r = 0. This inter-
pretation is no longer applicable in the Boyer-Lindquist
coordinates. The angle φ goes round the axis defined
by the direction of the spin, while the interpretation of
angle θ is the familiar one only in the special cases we
will consider below.
The Kerr metric depends on two parameters, the
11
1.4 Special Cases of the Kerr Metric
mass of the black hole M and a parameter a which is
related to the angular momentum J of the black hole:
where c is the speed of light. While M can be
chosen to have any value, it turns out that there is a
maximum value of the parameter a permitted, which
leads to a maximum value on the angular momentum
J:
A black hole with this maximum spin value is
known as an extreme Kerr black hole. We will see later
how extreme black holes can develop in astrophysical
situations.
1.4 Special Cases of the Kerr Metric
The structure of the Kerr metric is rather complex,
and as mentioned above, even the interpretation of the
coordinates is not straightforward. It therefore helps to
consider special cases to gain insight into the nature of
the metric.
The Kerr metric depends on two parameter, mass
M and spin parameter a. If a0, then the angu-
lar momentum J0, and we expect to recover the
Schwarzschild metric which depends only on the mass.
That is found to be correct, and in this approximation
of vanishing spin the coordinates r, θ, φ acquire their
usual meaning of spherical polar coordinates as appli-
cable to the Schwarzschild metric.
The other interesting approximation is of vanish-
ing mass, M0. In this case there is no gravitating
mass left, and we expect that the structure of space-
time should be the flat space-time of special relativity.
It is indeed possible to transform to Cartesian coordi-
nates x, y, z in which we recover the usual geometry of
flat space. It is interesting to know that in this case, the
Boyer-Lindquist coordinate r=0 corresponds to a ring
of radius a in the xy plane defined by z=0. This an
example of the complex nature of the metric and the
Boyer-Lindquist coordinates, and has implications for
the structure of the singularity and the event horizon.
There are also interesting concepts associated with the
Kerr geometry like frame dragging and ergosphere,
which are not present in the Schwarzschild metric. We
will consider these in the next story.
About the Author
Professor Ajit Kembhavi is an emeritus
Professor at Inter University Centre for Astronomy
and Astrophysics and is also the Principal Investiga-
tor of the Pune Knowledge Cluster. He was the former
director of Inter University Centre for Astronomy and
Astrophysics (IUCAA), Pune, and the International
Astronomical Union vice president. In collaboration
with IUCAA, he pioneered astronomy outreach ac-
tivities from the late 80s to promote astronomy re-
search in Indian universities. The Speak with an
Astronomer monthly interactive program to answer
questions based on his article will allow young enthu-
siasts to gain profound knowledge about the topic.
12
X-ray Astronomy: Through Missions
by Aromal P
airis4D, Vol.2, No.5, 2024
www.airis4d.com
“Science does not have a moral dimension. It is
like a knife. If you give it to a surgeon or a murderer,
each will use it differently.”
Wernher von Braun
Beginning : Rockets and Balloons
Cosmic rays were discovered by Victor Hess after
a series of balloon experiments conducted in 1912 and
it given insights to the scientific community that there
were many things beyond the atmosphere that were
unknown to humankind. The quest for the new knowl-
edge accelerated thereby. Most of them were focused
for military uses and the first and second world war
accelerated those studies mainly focusing on military
application only. When the second world war ended
in 1945 and the world saw enough bloodshed, nations
started intellectual wars!
Those who gain the unknown knowledge became
more powerful. The missiles used in war becomes
Rockets for scientific expeditions. After world war
II US military offered various institute to carry their
scientific experiments through rockets developed by
Wernher von Braun a famous aerospace engineer who
was part of German military and later joined NASA.
Herbert Friedman used this opportunity for studying
Suns UV and X-rays. Friedman used combinations
of filters and gas mixtures to develop photo multi-
plier tubes that are sensitive in narrow frequency range.
With the help of V-2 rocket that launched on 1949 from
White Sands, for the first time in the history of humans
an X-ray instrument reached above the atmosphere to
Figure 1: Friedman and the adaptation of the tube used
in a Geiger-Mueller counter Credits:Public Domain
detect the X-ray photons emitting from sun’s corona.
After decades of efforts and with further development
in technology using more advanced Aerobee Rockets,
Friedman and his colleagues obtained the first X-ray
images of sun using pinhole camera. Friedman was
the first to flew a Bragg spectrometer for measuring
hard X-rays.
Even-though Solar X-rays were discovered in 1949,
there wasnt much progress in detecting X-rays from
any other sources. The cold war happened between
USA and Soviet Union paved the further fast devel-
opment in the X-ray astronomy. After the ”Sputnik
Shock” of 1957 when the Soviet Union leading the
space race, more funds were allotted to space programs
in USA as well. In September 1959 Bruno Rossi, who
was the chairman of board of American Science and
Engineering (AS&E) suggested to Riccardo Giacconi,
head of Space Science Division of AS&E to develop
Figure 2: Discovery of X-rays from Scorpius X-1.
Credit: Giaconni et al. 1962
research program on X-ray astronomy. Riccardo Gi-
aconni submitted two proposals to the newly formed
NASA for developing X-ray telescope and rocket mis-
sion to study about X-rays from moon and crab neb-
ula. NASA accepted the first one and rejected the
second one as the officials in NASA thought it impos-
sible to detect X-rays from moon. Riccardo Giaconni
send rejected proposal to the Air Force Cambridge Re-
search Laboratory and he got fund to a series of rocket
launches that changed the entire fate of X-ray astron-
omy.
On June 18, 1962, an Air Force Aerobee rocket
was launched from the White Sands Missile Range in
New Mexico with an array of X-ray sensors on board.
Three large area Geiger counters made up the setup.
Every Geiger counter had seven separate mica win-
dows, each with a 20 cm square, arranged in one face
of the counter. These detectors sensitivity ranged from
2 to 8
˚
A for X-rays. An anti-coincidence scintillation
counter intended to lower the cosmic-ray background
surrounded each Geiger counter. Upon Analysing the
data from the detectors Riccardo Giacconi, Herbert
Gursky, and Frank R. Paolini and Bruno B. Rossi found
the evidence for X-rays from outside the solar system.
The source called as Scorpius X-1 And it marked the
beginning of X-ray astronomy.
After the discovery of X-rays from Scorpius X-1
further studies were carried out by scientist to study
the X-rays and many rockets launched to the sky for
that which can reach an altitude of 200 km and around
.
Figure 3: Atmospheric absorption as a function of the
wavelength (bottom axis). The solid lines indicate the
fraction of the atmosphere, expressed in unit of 1 atmo-
sphere pressure (right vertical axis) or in terms of alti-
tude (left vertical axis), at which half of the incoming
celestial radiation is absorbed by the atmosphere.(Credit:
High Energy Astrophysics Group, University of Tubingen)
45 rockets were launched to carryout X-ray observa-
tions before 1970s. One of the main problem they
faced is that They wont get enough time to observe
the variability of a source during a rocket experiment.
Maximum time of 20 minutes is not sufficient to study
the variability in the sources. Balloon experiment were
carried out to observe X-ray sources for long exposure.
Balloon can reach a maximum height of 35 km but it
can be used to take hours long observations. Balloon
experiment were taken in different parts of the world.
Tata Institute of Fundamental Research, Mumbai also
hosted several balloon experiment to study X-rays.
We cant control both balloons and rockets once its
launched. Rocket experiments were restricted by the
total exposure time and balloon experiments were re-
stricted by the altitude thus science community needed
a permanent solutions so that they wanted high altitude
observations for long exposures and that too by con-
trolling it from earth. Solution to the riddle was setting
up a satellite dedicated to X-ray astronomical observa-
tions. Its discussions were started in the early 1960s
and first satellite were launched on 1970 and then our
understandings about the cosmos changed drastically.
We can discuss about satellite missions that changed
our views about the X-ray universe in the coming arti-
cles.
14
Reference
Santangelo, Andrea and Madonia, Rosalia and
Piraino, Santina A Chronological History of X-
ray Astronomy Missions.Handbook of X-ray and
Gamma-ray Astrophysics.ISBN 9789811645440
Riccardo Giacconi, Herbert Gursky, and Frank
R. Paolini, Bruno B. Rossi Evidence for x Rays
From Sources Outside the Solar System. Phys.
Rev. Lett. DOI 10.1103/PhysRevLett.9.439
About the Author
Aromal P is a research scholar in Depart-
ment of Astronomy Astrophysics and Space Engineer-
ing (DAASE) in IIT Indore. His research mainly fo-
cuses on neutron stars and blackholes
15
Radio Galaxies: An Introduction
by Kshitij Thorat
airis4D, Vol.2, No.5, 2024
www.airis4d.com
Most of us are familiar with the night sky as a
carpet of stars and planets, which we can see with our
own eyes. With a small, 6-inch telescope, you might
even spy fainter details and objects not visible to the
eye, like moons of Jupiter, nebulae and even close-by
galaxies, if you have a clear sky.
With larger telescopes, you can look at the de-
tails of far-away objects, many of them beyond our
galaxy, the Milky Way. However, our eyes are typically
sensitive to the so-called “visible spectrum”, ranging
roughly from light at red wavelengths at one end and
purple to the other. In contrast, celestial objects can
shine in different bands, like ultraviolet, infrared, X-
rays and radio waves. While we cant see this light, we
can use specialised telescopes which are able to do this
and thus give us a view of the sky literally in a different
light.
Among the celestial objects which lie beyond our
own galaxy, the Milky Way, radio galaxies are some of
the most fascinating. Very briefly, radio galaxies are
galaxies in which a large part of the emitted light comes
in the radio bands via large-scale jets and “lobes (there
are other kinds of galaxies in which this radio emission
comes from remnants of dead stars and the light com-
ing from the process of star-formation, but we’ll not
focus on this class in this article). The jets associated
with radio galaxies are at a scale truly awesome; span-
ning at their largest millions of light-years and even
typically hundreds of thousands of light-years, making
them some of the largest objects in the Universe.
Where do these jets come from and how do they
form? This is actually an area of active research, but
the consensus is that the jets come from the centre of
the galaxy, where a supermassive black hole resides.
It is now thought that most of the galaxies have su-
permassive black holes (SMBHs henceforth) in their
centres, just like the Milky Way has one (Sagittarius
A*). Not all galaxies are radio galaxies, though, in-
cluding our own. What separates the SMBHs which
give rise to jets is their “activeness - some of them are
accreting - eating - the matter surrounding them; this
process sometimes gives rise to the spectacular radio
jets we see in radio galaxies.
Fig 1 shows a radio galaxy - perhaps the most well-
studied and famous radio galaxy - Cygnus A. Cygnus
A is, in fact, one of the first radio sources discovered
by radio astronomers almost a century ago. Note that
all the details you see in the image are made from
radio telescope observations at 1.4 GHz and rendered
in pseudocolour (Cygnus A is really not orange!). As
you can see from the figure, Cygnus A shows a clear
pair of jets emanating from a central bright, pointlike
“core”, going in opposite directions and forming fluffy,
diffuse structures called “lobes”. The total size of these
jets is 500000 light-years! For comparison, the size of
our solar system, expressed as the distance between the
Sun and Pluto, is barely around 4-6 light hours. As
such, these jets extend far, far beyond the extent of the
galaxy as seen in the visible band. These jets eventually
terminate in bright “hotspots”, which are sites of shock
formation. The core, on the other hand, marks the
position of the SMBH sitting inside the galaxy’s heart
from which these jets arise.
The basic picture behind the light coming from
radio galaxies is thought to be the following: the jets,
which are formed from highly relativistic particles,
Image Credits: Legacy Astronomical Images, “Cygnus A,” NRAO/AUI Archives,
https://www.nrao.edu/archives/items/show/33386.
Figure 1: Cygnus A, an archetype of powerful radio
galaxies. The thin “jets start from the bright “core
in the image and stop at the brighter points at the end
or “hotspots”. The hotspots are surrounded by diffuse,
hazier clouds, the “lobes”. At a distance of almost 700
million light-years, Cygnus A is one of the brightest
objects in the radio sky.
which, travelling at a speed an appreciable fraction
of the speed of light, spiral through magnetic fields
which generates the so-called “Synchrotron radiation
(as we know that accelerating charged particles emit
radiation).
The hotspots in which these jets terminate form
sites from which the particles can flow back towards
the galaxy and form the lobes.
Such structures are features of many radio galax-
ies, but of course, radio galaxies can have a variety of
structures, including the so-called X-shaped, S-shaped,
Z-shaped, Bent-tailed types of radio galaxies depend-
ing on the exact process which gives rise to the jets and
the interplay of the jets with the environment in which
the radio galaxy resides.
These beautiful galaxies can be viewed with radio
telescopes, which are made up of dishes or antennas.
In particular, detailed images of radio galaxies can be
made using radio interferometers like the Giant Me-
trewave Radio Telescope (GMRT) near Pune and the
upcoming Square Kilometre Array (SKA) , an interna-
tional project, to which India contributes significantly.
Remembering that the jets start in the activity near
the core of the SMBH accreting matter, we can see that
the larger structure of the radio galaxies, in fact, forms
a sort of a signpost to what the ongoing activity at the
heart of the galaxy; far easier to see than the actual
SMBH generating it.
Additionally, the jets themselves extend, as we
have seen, far beyond the visible extent of the galaxy
and can interact with other galaxies as well! The jets
can, variously, suppress the ongoing process of build-
ing galaxies through the process of star formation or
can further enhance it, making radio galaxies a key
player in the structure formation of our Cosmos.
Further Reading:
1. Radio galaxies: the mysterious, secretive “beasts
of the Universe
2. Radio galaxy article on Wikipedia
3. Hotspots in Cygnus A: an active galactic nucleus
4. Synchrotron Radiation
About the Author
Dr Kshitij Thorat is a senior lecturer at the
University of Pretoria in South Africa. His research
interests revolve around extragalactic radio galaxies,
their lifecycles and their interactions with their envi-
ronments.
17
Unlocking the Mysteries of Star Clusters:
Celestial Ensembles of Cosmic Wonder
by Sindhu G
airis4D, Vol.2, No.5, 2024
www.airis4d.com
4.1 Star Clusters
A star cluster refers to a grouping of stars that
are bound together by gravitational forces. These clus-
ters can vary in size and composition, ranging from
small gatherings of a few dozen stars to massive con-
glomerations containing thousands or even millions of
stars. Star clusters are formed from the same cloud of
gas and dust, typically within a galaxy, and they often
share similar ages and chemical compositions.
Let’s take a brief look at how a star cluster forms.
Stars emerge from clouds of gas and dust under pre-
cise conditions. Gravity triggers the collapse of this
primarily hydrogen gas and dust. As the cloud con-
denses and pressure mounts, its core heats up, forming
a protostar. This protostar continues to accrete matter,
evolving into a fully-fledged star. This stellar birth pro-
cess typically spans about a million years. Once born,
some stars can persist for over 10 billion years. Often,
when conditions favor the formation of one star, multi-
ple stars form, creating a cluster. Over time, stars may
depart the cluster through dispersion or ejection, while
others perish within it. Additionally, various factors
such as ultraviolet light, stellar winds, and supernovae
can expel gas and dust from the cluster, impeding new
star formation.
Star clusters serve as important tools for astronomers
to study various aspects of stellar evolution, galactic
dynamics, and the history of the universe. Star clusters
visible to the naked eye include the Pleiades, Hyades,
and 47 Tucanae. Three primary types of star clusters
exist: globular clusters, open clusters, and stellar asso-
ciations. Each category possesses distinct properties
that offer astronomers diverse insights.
4.2 Globular Clusters
Globular clusters are densely packed groups of
stars, typically containing hundreds of thousands to
millions of stars bound together by gravity. These
clusters are some of the oldest objects in the universe,
with ages spanning billions of years. Their spheri-
cal shape and tightly packed arrangement make them
distinct from other types of star clusters. One remark-
able aspect of globular clusters is their stellar popu-
lations. The stars within these clusters are typically
old and metal-poor, meaning they formed early in the
universe’s history and contain elements heavier than
helium in relatively low abundance. Studying these
ancient stars provides valuable insights into the early
stages of galactic evolution and the conditions present
in the early universe.
Globular clusters contain minimal free dust or gas,
thereby prohibiting new star formation within them.
Stellar densities in the inner regions of a globular clus-
ter are significantly higher when compared to regions
such as those surrounding the Sun. Globular clus-
ters also serve as natural laboratories for studying stel-
lar dynamics and evolution. The interactions between
stars within the cluster, such as gravitational encoun-
ters and binary star systems, can have profound effects
on their evolution. By observing these interactions,
4.3 Open Cluster
astronomers can gain a better understanding of stellar
evolution and the processes that shape the universe.
Moreover, globular clusters are essential for mea-
suring the age and distance of the galaxies in which
they reside. Since they contain some of the oldest stars
in the universe, determining the age of globular clusters
provides valuable constraints on the age of their host
galaxies. Additionally, the brightness of these clusters
allows astronomers to calculate distances to galaxies
with remarkable precision.
When seen with the unaided eye, globular clus-
ters resemble faint smudges of light amidst the dark-
ness of space. However, when observed through a
telescope, their true essence emerges: thousands to
millions of stars coalesce into a spherical configura-
tion, featuring a luminous and densely packed core.
In the Milky Way, they are situated within both the
halo and the bulge regions. The stars within these
clusters remain confined and do not disperse beyond
their boundaries. Our Milky Way hosts approximately
200 globular clusters, notable examples being 47 Tuc,
M4, and Omega Centauri(Figure: 2), although there
is ongoing debate regarding whether the latter may
actually be a captured dwarf spheroidal galaxy. Con-
versely, the Andromeda galaxy boasts approximately
400 globular clusters, while the M87 galaxy hosts over
10,000, as reported by the Harvard and Smithsonian
Center for Astrophysics. Some of the most luminous
globular clusters can be seen without the aid of a tele-
scope; among them, Omega Centauri shines brightest
and was even noted in ancient times, initially cataloged
as a single star prior to the advent of telescopes. In the
northern hemisphere, the brightest globular cluster is
M13, located in the constellation of Hercules.
4.3 Open Cluster
Open clusters consist of tens to a few thousand
stars originating from the same massive molecular
cloud, exhibiting similar ages and chemical compo-
sitions. These clusters are loosely bound by mutual
gravitational forces and are commonly located within
spiral and irregular galaxies. Open clusters lack a de-
fined shape. Unlike globular clusters, open clusters are
Image credit: ESA/Hubble and NASA/A. Sarajedini
Figure 1: Globular star cluster NGC 6717 is located
about 20,000 light-years from Earth. SindhuFig:1
Figure 2: Globular star cluster Omega Centauri which
is located about 15,790 light-years from Earth. Image
Credit: NASA/ESA/Hubble SM4 ERO Team
19
4.4 Embedded Clusters
Figure 3: The globular cluster NGC 6397. Image
Credit: NASA, ESA, and T. Brown and S. Casertano
(STScI)
Figure 4: Messier 68, a loose globular cluster. Image
Credit: ESA/Hubble/ NASA
smaller and less densely populated, encompassing stars
of varying ages, from young to older ones. They serve
as vital subjects for studying stellar evolution due to
their uniform properties, facilitating the determination
of characteristics such as distance, age, metallicity, and
velocity, which can be more challenging with isolated
stars.
Stars within open clusters exhibit greater disper-
sion, rendering these clusters relatively unstable, with
stars prone to dispersing over the course of a few mil-
lion years. As open clusters with fewer stars are less
tightly bound by gravity, it is relatively simple for their
stars to drift away from the cluster when influenced by
external forces, such as interactions with giant molec-
ular clouds. However, this isn’t the sole mechanism
through which open clusters shed stars. During the or-
bits of stars within the cluster, close encounters can oc-
cur, leading to gravitational interactions. In instances
of close encounters involving multiple stars, one star
may be expelled from the cluster at a high velocity. If
this velocity surpasses a certain threshold, the star can
escape the gravitational pull of the cluster entirely.
Typically observed in regions of active star forma-
tion within spiral and irregular galaxies, open clusters
offer valuable insights into the processes of star birth
and evolution. Within the Milky Way galaxy alone,
over 1,100 open clusters have been identified, with nu-
merous others presumed to exist, significantly enrich-
ing our understanding of the universe. In the Milky
Way, these clusters can be spotted in our galaxy’s disk,
both in and between its spiral arms. The most promi-
nent open clusters are the Pleiades and Hyades in Tau-
rus.
4.4 Embedded Clusters
Embedded star clusters represent a type of star
cluster still enveloped by the molecular clouds from
which they originated. They are the youngest variety
of star cluster, housing recently formed and forming
stars that remain concealed by the gas and dust of their
parent molecular cloud. Typically, embedded clus-
ters serve as active regions of star formation, hosting
stars of similar ages and compositions. Embedded
20
4.4 Embedded Clusters
Figure 5: M47 is an open cluster in the constellation
Puppis. Image Credit: NOIRLab / NSF / AURA
Figure 6: This mosaic from NASA WISE Telescope
is of the Soul Nebula. It is an open cluster of stars sur-
rounded by a cloud of dust and gas located about 6,500
light-years from Earth in the constellation Cassiopeia,
near the Heart Nebula. Image Credit: NASA/JPL-
Caltech/UCLA
Figure 7: The Hubble Space Telescope spied this open
star cluster, named NGC 299, in the southern constel-
lation of Toucana (The Toucan), about 200,000 light-
years away. Image Credit: ESA/Hubble/NASA
Figure 8: The Jewel Box cluster, one of the best south-
ern sky open clusters to observe with a small telescope.
Image Credit: M. Bessell
21
4.5 Super Star Cluster
Figure 9: X-ray view of Orion showing the Trapezium
embedded cluster. Image Credit: NASA/CXC/Penn
State/E Feigelson/K.Getman et al.
clusters are believed to be fundamental units in the
process of star formation, as a substantial portion of
stars emerge within them. Over time, as the molec-
ular cloud dissipates, embedded clusters evolve into
open clusters. Due to heavy obscuration by dust and
gas, embedded clusters are challenging to observe in
visible light. However, infrared and X-ray observa-
tions can penetrate the cloud material, unveiling the
stars within. Renowned examples of embedded clus-
ters include the Trapezium cluster within the Orion
Nebula, L1688 within the Rho Ophiuchi cloud com-
plex, as well as clusters within the Trifid Nebula and
Eagle Nebula. Recent studies employing simulations
have offered insights into the initial evolution and three-
dimensional structure of embedded clusters, revealing
that their morphology can rapidly change and may not
necessarily reflect their long-term evolution.
4.5 Super Star Cluster
A super star cluster (SSC) represents a notably
massive young open cluster, often regarded as a pre-
cursor to globular clusters. These clusters stand out
for their elevated luminosity and mass in comparison
to other young star clusters. Typically, super star clus-
Figure 10: A few young stars shine through dense
clouds of gas and dust in the Orion Nebulas Trapezium
embedded cluster, 1,500 light-years from Earth. The
left image is taken in visible light; the right image is
taken in infrared light. Image Credit: NASA, C.R.
O’Dell and S.K. Wong (Rice University)
ters harbor a significant population of young, massive
stars that generate ionization within a surrounding HII
region or even an ”Ultra dense HII region (UDHII)”
within the Milky Way Galaxy or other galaxies. They
commonly inhabit regions of intense star formation,
such as areas influenced by galactic interactions or
mergers. Crucial to comprehending massive star for-
mation, super star clusters are thought to transition into
globular clusters as they age. To observe them effec-
tively, radio and infrared imaging prove superior due to
the high extinction levels in certain visible light wave-
lengths. Super star clusters generally boast masses sur-
passing 10
5
solar masses, with radii around 5 parsecs
and ages roughly estimated at 100 million years. They
exhibit notable electron densities and pressures asso-
ciated with the HII regions enveloping them. While
observed within the Milky Way Galaxy, super star clus-
ters are more abundantly identified in distant regions
of the universe, substantially contributing to our under-
standing of both star formation and galactic evolution.
Westerlund 1 (Figure: 11) is a compact young super
star cluster about 3.8 kpc (12,000 ly) away from Earth.
References:
Star Clusters: Inside the Universe’s Stellar Col-
lections
Star Clusters
What are star clusters?
22
4.5 Super Star Cluster
Figure 11: Westerlund 1. Image Credit:
2MASS/UMass/IPAC-Caltech/NASA/NSF
Star Clusters
Star cluster
Globular cluster
Open cluster
Embedded cluster
Hubble’s Star Clusters
Embedded Clusters
Early Evolution and 3D Structure of Embedded
Star Clusters
About the Author
Sindhu G is a research scholar in Physics
doing research in Astronomy & Astrophysics. Her
research mainly focuses on classification of variable
stars using different machine learning algorithms. She
is also doing the period prediction of different types
of variable stars, especially eclipsing binaries and on
the study of optical counterparts of X-ray binaries.
23
Part III
Biosciences
Genetically Engineered Warriors: India’s
New Hope in Cancer Treatment
by Atharva Pathak
airis4D, Vol.2, No.5, 2024
www.airis4d.com
Cancer, once considered an unbeatable foe, is fac-
ing a new challenger in India CAR T-cell therapy.
This revolutionary treatment harnesses the power of a
patient’s immune system to fight the disease. As Dr.
Siddhartha Mukherjee, a renowned oncologist, said,
”Immunotherapy is fundamentally changing how we
approach cancer”. Let’s delve into how this innovative
therapy works and what it holds for the future of cancer
care in India.
1.1 Supercharging the Immune
System
Imagine training an army to recognize and destroy
your enemy’s troops. That’s the essence of CAR T-cell
therapy. Here’s a breakdown of the process:
1. Extraction: Doctors extract T cells from the pa-
tient’s blood, a type of white blood cell crucial
for fighting infections.
2. Engineering: In a lab, scientists genetically mod-
ify the T cells with a particular receptor called
CAR (Chimeric Antigen Receptor). Think of
CAR as a helmet with a targeting sight.
3. Expansion: The engineered T cells are multi-
plied in large numbers.
4. Reinfusion: The powerful, CAR-equipped T cells
are infused into the patient’s bloodstream.
The CAR on the T cells acts like a homing bea-
con, allowing them to recognize and latch onto cancer
cells with specific surface proteins. Once attached, the
Credits:https://www.cancer.gov/publications/dictionaries/
cancer-terms/def/car-t-cell-therapy
Figure 1: The Fight Within
T cells unleash a targeted attack, destroying the can-
cer cells. This specificity is what makes CAR T-cell
therapy so promising.
1.2 A Breakthrough for India
Developed by a team of researchers in Indian In-
stitute of Technology Bombay - IITB, in collaboration
with Tata Memorial Hospital, NexCAR19 is Indias
first indigenous CAR T-cell therapy. This is a signifi-
cant achievement, as CAR T-cell therapies have tradi-
tionally been expensive.
Accessible and affordable CAR-T cell therapy
provides a new hope for the whole of humankind”,
1.3 Looking Ahead: A Brighter Future
Figure 2: T cells (pink) attack a cancer cell (yellow)
in this scanning electron micrograph image.
Credit: Steve Gschmeissner/SPL & Nature
Credits: ImmunoACT website
[https://www.immunoact.com/nexcar19]
said President Droupadi Murmu at the launch of Nex-
CAR19 in April 2024. This therapy offers a poten-
tial lifeline for patients with B-cell cancers, such as
leukaemia and lymphoma, where conventional treat-
ments have failed.
1.3 Looking Ahead: A Brighter
Future
The success of NexCAR19 is a stepping stone
for further advancements in CAR T-cell therapy in
India. Researchers are exploring ways to target dif-
ferent types of cancers and personalize the treatment
for each patient’s unique needs. Additionally, making
the manufacturing process more efficient could reduce
Credit: ImmunoACT, Nature
Figure 3: A member of the ImmunoACT team pre-
pares the NexCAR19 cancer treatment.
treatment costs, making it accessible to a broader pop-
ulation. A single treatment of NexCAR19, manufac-
tured by Mumbai-based ImmunoACT, costs between
US$30,000 and $40,000. The first CAR-T therapy was
approved in the United States in 2017, and commercial
CAR-T therapies currently cost between $370,000 and
$530,000, not including hospital fees and drugs to treat
side effects.
1.4 Unleashing the Power of AI and
ML in Cancer and Medicine
In the ever-evolving landscape of healthcare, Ar-
tificial Intelligence (AI) and Machine Learning (ML)
have emerged as revolutionary tools, offering new hope
and possibilities in the fight against cancer and other
diseases. These technologies are transforming how we
diagnose, treat, and manage illnesses, ushering in a
new era of personalized medicine.
One of the most significant contributions of AI
and ML in medicine is in cancer detection and diag-
nosis. These technologies can analyze vast amounts of
medical data, including images, genetic information,
and patient records, to identify patterns and anomalies
that may indicate the presence of cancer. This ability
has led to the development of more accurate and ef-
ficient diagnostic tools, such as AI-powered imaging
systems that can detect cancerous lesions with remark-
able precision.
Artificial intelligence will transform the practice
of medicine. It will enable us to provide truly person-
26
1.5 Conclusion
alized care and make healthcare more accessible and
affordable for everyone”, says Fei-Fei Li, Co-Director
of the Stanford Institute for Human-Centered AI.
Moreover, AI and ML are revolutionizing cancer
treatment by enabling the development of targeted ther-
apies. By analyzing genetic data from tumors, these
technologies can identify specific mutations that drive
cancer growth, allowing for the creation of drugs that
target these mutations with greater precision. This
approach, known as precision medicine, has shown
promising results in improving treatment outcomes and
reducing side effects.
In addition to diagnosis and treatment, AI and ML
are also transforming cancer research. These technolo-
gies can analyze large datasets to uncover new insights
into the underlying causes of cancer, leading to the
discovery of new biomarkers and therapeutic targets.
This knowledge is crucial for developing innovative
therapies and improving our understanding of cancer
biology.
Despite the remarkable progress made possible by
AI and ML, challenges remain. One major challenge
is the integration of these technologies into existing
healthcare systems. This requires addressing issues
related to data privacy, regulatory compliance, and the
need for healthcare professionals to be trained in the
use of AI and ML tools.
”The real challenge is not whether machines can
think but whether men do”, says B. F. Skinner, Amer-
ican psychologist.
Looking ahead, several exciting developments are
on the horizon. One promising area is the use of AI
and ML in predicting patient outcomes and tailoring
treatment plans accordingly. By analyzing a patient’s
medical history, genetic profile, and other factors, these
technologies can help clinicians make more informed
decisions about the best course of action for each indi-
vidual.
Another emerging trend is the use of AI and ML
in drug discovery. These technologies can analyze vast
libraries of chemical compounds to identify potential
drug candidates, significantly accelerating the drug de-
velopment process. This approach has the potential
to bring new and more effective treatments to market
faster than ever before.
In conclusion, AI and ML are revolutionizing the
field of cancer and medicine, offering new hope and
possibilities for patients and healthcare providers alike.
While challenges remain, the future looks bright, with
new technologies and approaches on the horizon that
promise to further transform healthcare and improve
patient outcomes.
1.5 Conclusion
Indias entry into CAR T-cell therapy marks a new
era in cancer treatment. This revolutionary approach
holds immense promise for offering patients a renewed
chance at life. As Nelson Mandela said, ”Hope is a
powerful thing. It can make a start of what seems im-
possible”. With continued research and development,
CAR T-cell therapy has the potential to become a pow-
erful weapon in Indias fight against cancer.
References:
Press Information Bureau, Government of India
[pib.gov.in]
The New Indian Express [newindianexpress.com]
National Cancer Institute Website [cancer.gov]
Nature Article https://www.nature.com/articles/
d41586-024-00809-y
Li, Fei-Fei. ”How AI Can Save Our Humanity.”
TED Talk, 2018.
Skinner, B. F. ”Beyond Freedom and Dignity.”
Hackett Publishing Company, 1971.
27
1.5 Conclusion
About the Author
Atharva Pathak currently work as a Soft-
ware Engineer & Data Manager for the Pune Knowl-
edge Cluster, A project under the Office of Principal
Scientific Advisor, Govt. of India & Supported by
IUCAA, Pune, IN. Before this, I was an Astronomer
at the Inter-University Centre for Astronomy & Astro-
physics, IUCAA. I have also worked on various free-
lance projects, development required for websites and
applications, And localization of different software.
I am also a life member of Jyotirvidya Parisanstha,
Indias Oldest association of Amateur Astronomers,
and I look after the IOTA-India Occultation section
as a webmaster and data curator.
28
DNA Sequencing
Next-Generation Sequencing (NGS)
by Geetha Paul
airis4D, Vol.2, No.5, 2024
www.airis4d.com
2.1 Introduction
DNA sequencing is a fundamental laboratory tech-
nique utilised to ascertain the precise sequence of nu-
cleotides, or bases, within a DNA molecule. The se-
quence of these bases—typically denoted by the first
letters of their chemical names: A (adenine), T (thymine),
C (cytosine), and G (guanine)—encapsulates the bio-
logical information crucial for cellular development
and functioning. Deciphering the DNA sequence is
pivotal for unravelling the functionality of genes and
other genomic components. DNA sequencing resem-
bles interpreting printed text: storing data analogous to
written words, learning its language, and comprehend-
ing its significance. In the past, literacy was limited,
leaving many unable to read, while today, advance-
ments have made information more accessible.
Similarly, technological breakthroughs in DNA
sequencing have democratised access to our genetic
code, empowering broader understanding and explo-
ration. Yet, the ongoing challenge remains in fully
unlocking the implications of this genetic information
for our health and well-being. Various methods are
available for DNA sequencing, each characterised by
unique attributes. Ongoing advancements in genomics
continue to drive the exploration and development of
novel sequencing techniques.
2.2 Next Generation Sequencing
(NGS)
NGS is a type of DNA sequencing technology that
uses parallel sequencing of multiple small DNA frag-
ments to determine the sequence. This”high-throughput”
technology has allowed a dramatic increase in the speed
(and a decrease in the cost) at which an individual’s
genome can be sequenced. Next-generation sequenc-
ing (NGS), or high-throughput sequencing, represents
a robust platform capable of concurrently sequencing
thousands to millions of DNA molecules. This technol-
ogy encompasses various modern sequencing method-
ologies designed to meet the growing demand for cost-
effective sequencing. Sequencing DNA means deter-
mining the order of the four chemical building blocks
- called ”bases” - that make up the DNA molecule.
The sequence tells scientists the kind of genetic in-
formation that is carried in a particular DNA seg-
ment. The technology is used to determine the order
of nucleotides in entire genomes or targeted regions
of DNA or RNA. Driven by the imperative for lower-
cost sequencing solutions, high-throughput sequencing
methods have been developed to generate thousands or
millions of sequences in a single run. This advance-
ment aims to surpass the limitations of conventional
dye-terminator techniques ( a technique in which each
of the four dideoxynucleotide chain terminators is la-
belled with fluorescent dyes, each emitting light at dif-
ferent wavelengths. Next Next-generation sequencing
2.3 Step 1: Sample Isolation and Extraction
Image Courtesy: https://microbenotes.com/next-generation-sequencing-ngs/
Figure 1: Diagrammatic representation of the Next
Generation Sequencing workflow, Step 1. DNA ex-
traction, Step 2. The fragmented DNA binds with the
adapter for Library Preparation, Step 3. sequencing
and Step 4. Analysis
(NGS) is used to sequence both DNA and RNA. Bil-
lions of DNA strands get sequenced simultaneously
using NGS. Meanwhile, with Sanger sequencing, only
one strand is sequenced at a time. The advent of these
cutting-edge technologies has drastically accelerated
the pace and reduced the expense of DNA and RNA
sequencing compared to traditional Sanger sequencing
methods. Consequently, NGS has catalysed ground-
breaking advancements in genomics and molecular bi-
ology research.
In cases of low quantities of nucleic acids (e.g.,
when using single cells as the source), isolated DNA
and RNA may be amplified using polymerases ap-
propriate for whole genome amplification (WGA) and
whole transcriptome amplification (WTA), respectively,
to increase the amount of starting template before NGS
library preparation. WGA and WTA can help obtain
more sequencing reads, better coverage, improved sen-
sitivity, and better variant detection from limited sam-
ple amounts. Phi29 DNA polymerase is commonly
used for WGA because of its high processivity, re-
duced bias, high fidelity, and ability to synthesise DNA
isothermally at a low temperature.
Next-generation sequencing (NGS) can be con-
ducted on samples containing DNA or RNA, includ-
ing cell cultures, fresh-frozen tissues, formalin-fixed
paraffin-embedded (FFPE) tissues, blood, saliva, and
bone marrow. Different extraction protocols tailored to
the specific sample type are available, each optimised
to maximise the yield and quality of nucleic acids ob-
tained.
The four steps of next-generation sequencing (NGS)
include nucleic acid isolation and extraction, library
preparation, clonal amplification and sequencing, and
data analysis. Nucleic acid extraction and isolation are
vital first steps in next-generation sequencing.
2.3 Step 1: Sample Isolation and
Extraction
Nucleic acid extraction is a fundamental initial
step in the NGS workflow, regardless of whether youre
sequencing genomic DNA (gDNA), total RNA, or var-
ious RNA types. Choosing an isolation method or kit
that facilitates proper cell and tissue lysis is crucial.
This ensures the attainment of sufficient yield, purity,
and quality necessary for subsequent library prepara-
tion steps. Yield: The isolation or extraction method
should yield nanograms (ng) to micrograms (µg) of
DNA or RNA, which is crucial for library prepara-
tion. Maximum yield is essential, especially from lim-
ited or archived sources like cell-free DNA (cfDNA)
and formalin-fixed, paraffin-embedded (FFPE) sam-
ples. Purity: Isolated nucleic acids must be devoid of
compounds that might inhibit enzymes during library
preparation. Common inhibitors include reagents from
nucleic acid isolation (e.g., phenol, ethanol) or contam-
inants from biological samples (e.g., heparin, humic
acid). The chosen method should effectively remove
or minimise these contaminants. Quality: Integrity
and quality of isolated nucleic acids are vital. Most
of the DNA should be of high molecular weight and
intact for gDNA. RNA should be minimally degraded,
maintaining heterogeneity and representing the origi-
nal samples nucleic acid populations. With FFPE sam-
ples, where DNA and RNA are fragmented, appropri-
ate isolation methods or kits should be selected to en-
sure sufficient yield and quality for sequencing. Yield,
purity, and quality of isolated nucleic acids should be
assessed before proceeding to NGS library preparation.
The following are methods commonly used to examine
these attributes: UV spectrophotometric assays mea-
30
2.4 Step 2: Library Preparation
sure A
260
, A
260
:A
280
ratio, and A
260
:A
230
ratio to help
assess sample purity and yield. Fluorometric assays
help quantify specific types of nucleic acids (e.g., ss-
DNA, dsDNA, small RNA). Gel-based or microfluidic
electrophoresis helps determine fragment size, distri-
bution, and quantity.
2.4 Step 2: Library Preparation
Library preparation from RNA or DNA samples
involves three primary steps. After isolation and pu-
rification, the sequencers prepare nucleic acids for pro-
cessing and reading. These prepared, ready-to-sequence
samples are commonly called ”libraries” because they
represent a sequenceable collection of molecules. Al-
though the library preparation procedure may vary de-
pending on the methods and reagents used, the general
steps for Illumina systems are as follows: Nucleic Acid
Fragmentation or Amplification: In this initial step,
target sequences are amplified to generate a pool of
fragments of appropriate size. If RNA is the start-
ing material, a reverse transcription step is required
to convert RNA into cDNA. The nucleic acid sample
is fragmented into small pieces suitable for massively
parallel sequencing. The optimal range of fragment
sizes depends on the sequencers and sequencing appli-
cations.
The Illumina platform utilises solid-phase ampli-
fication in which each fragment in the library first an-
neals to the primers on the sequencing chip (known
as the flow cell) via the adapters. Through a series
of amplification reactions known as bridge amplifica-
tion [4] (Figure 2A), each fragment forms a cluster of
identical molecules called clonal clusters (Figure 2B);
therefore, every cluster represents one primary library
molecule. Note that clonal amplification on a pat-
terned flow cell with predefined arrays employs a dif-
ferent method called exclusion amplification (ExAmp)
chemistry. The ExAmp technology involves the instan-
taneous amplification of a DNA fragment after binding
to the primer on the patterned flow cell, excluding other
DNA fragments from forming a polyclonal cluster [5].
This process of clonal amplification should not be
confused with library amplification, which is carried
Image courtesy: https://www.thermofisher.com/in/en/home/life-science/cloning/cloning-learning-
center/invitrogen-school-of-molecular-biology/next-generation-sequencing/illumina-workflow.
Figure 2: Amplification steps. (A) Bridge amplifica-
tion. (1) The complementary strand of a DNA frag-
ment in the library is synthesised from the flow cell’s
priming oligo. (2) After removal of the original strand,
the complementary strand folds over and anneals with
the other type of flow cell oligo. A double-stranded
bridge is formed after the synthesis of its complemen-
tary strand. (3) The double-stranded bridge is dena-
tured, forming two single strands attached to the flow
cell. (4) The process of bridge amplification repeats,
and (5) more clones of double-stranded bridges are
formed. (B) Cluster generation. The double-stranded
clonal bridges are denatured (only one strand is shown
here for simplicity), the reverse strands are removed,
and the forward strands remain as clusters for sequenc-
ing.
31
2.5 Step 3: Sequencing Reaction
Image courtesy: https://www.thermofisher.com/in/en/home/life-science/cloning/cloning-learning-
center/invitrogen-school-of-molecular-biology/next-generation-sequencing/ Illumina-workflow.
Figure 3: Sequencing by cyclic reversible termina-
tion, in which nucleotides incorporated by a DNA
polymerase into the complementary DNA strand of
the clonal clusters are detected one base at a time.
out to increase library input before loading onto a flow
cell.
Adapter Ligation: Sequencing adapters are added
to the DNA or cDNA fragments following amplifica-
tion. These adapters contain sequences that will inter-
act with the NGS platform. Adapters, such as P5 and
P7, contain oligonucleotide sequences complementary
to the priming oligos on the sequencing chips. The ends
of nucleic acid fragments are ligated with adapters to
enable sequencing. Since Illumina adapters are spe-
cific to the sequencing platform, they are not inter-
changeable. If multiple samples are to be sequenced
simultaneously, unique identifiers or barcodes can be
ligated to each amplicon. This allows for pooling nu-
merous libraries into a single sequencing run, which
can then be”demultiplexed” during data analysis to as-
sign reads to their respective samples.
The Illumina sequencing technology employs flu-
orescent dye-labelled dNTPs with a reversible termina-
( Image courtesy:
https://www.thermofisher.com/in/en/home/life-science/cloning/cloning-learning-center/invitrogen-
school-of-molecular-biology/next-generation-sequencing/Illumina-workflow.)
Figure 4: Workflow of NGS library preparation for
Illumina systems.
tor to capture fluorescent signals in each cycle, utilising
a process known as cyclic reversible termination. In
each cycle, only one of the four fluorescent dNTPs is
incorporated by the DNA polymerase, based on com-
plementarity, after which unbound dNTPs are washed
away. Images of the clusters are captured following the
incorporation of each nucleotide. The incorporated nu-
cleotide’s emission wavelength and fluorescence inten-
sity are measured to identify the base contained in each
cluster during that cycle. After imaging, the fluores-
cent dye and the terminator are cleaved and released,
marking the completion of one cycle. Subsequently,
the next cycle of synthesis, imaging, and deprotection
commences. This sequential process allows each base
to be sequenced one cycle at a time. To achieve a read
length of ”n” bases, this cycle is repeated”n” times.
Library Quantitation: A sequencing library rep-
resents a pool of DNA fragments with adapters attached
to their ends after preparation. Prepared libraries must
be quantified (and normalised as needed) to load an op-
timal concentration of molecules onto the sequencers
for sequencing. This quality control step ensures con-
sistent data output, quality, and efficient use of sequenc-
ing chips. Fluorometric spectroscopy and real-time
PCR are standard methods used for library quantifica-
tion.
2.5 Step 3: Sequencing Reaction
Parallel sequencing uses a next-generation sequenc-
ing (NGS) platform. The prepared library is loaded
onto the sequencer, which then ”reads” the nucleotides
individually. The number of reads generated varies
depending on the specific sequencing platform and
32
2.6 NGS Data Analysis Using Bioinformatics
Image courtesy: https://irepertoire.com/ngs-overview-from-sample-to-sequencer-to-results/
Figure 5: Sequencing workflow in Illumina sequencer.
Library fragment undergoes hybridisation with spe-
cific primers, forming clusters, which are then ampli-
fied to generate millions to billions of clonal clusters.
Following cluster formation, fluorescently labelled nu-
cleotides synthesise a complementary strand for each
fragment. With the addition of each tagged nucleotide,
the flow cell undergoes imaging, capturing the emitted
fluorescence from each cluster. The wavelength and
intensity of the fluorescent emission are subsequently
analysed to identify the sequence of the templates.
kit employed. Various methods of NGS have been
developed, including pyrosequencing, sequencing by
ligation (SOLiD), sequencing by synthesis (SBS - Illu-
mina), and Ion Torrent sequencing. Illumina sequenc-
ing is the most prevalent among these, contributing
to approximately 90% of the world’s sequencing data
(as per Illuminas website). While all NGS platforms
perform sequencing of millions of small fragments of
DNA or cDNA, there are several different sequencing
technologies. Illumina pioneered the most prevalent
and successful sequencing technology. Illumina se-
quencers use a glass flow cell coated with millions
of oligonucleotides complementary to the sequenc-
ing adaptors. Each library fragment hybridised with
these primers during sequencing, forming clusters am-
plified to generate millions to billions of clonal clus-
ters. Subsequently, fluorescently labelled nucleotides
are utilised to synthesise a complementary strand for
each fragment. After adding each tagged nucleotide,
the flow cell undergoes imaging, and the emitted flu-
orescence from each cluster is recorded. The wave-
length and intensity of the fluorescent emission are
then utilised to identify the sequence of the templates.
2.6 NGS Data Analysis Using
Bioinformatics
The final step in the NGS workflow involves pro-
cessing, analysis, and interpretation of the sequencing
data generated. Bioinformatic tools play a crucial role
in converting raw sequencing data into meaningful re-
sults. However, due to the vast amount of data gener-
ated by NGS (gigabases of raw data), the availability
and capability of computing power to process and anal-
yse such large datasets pose significant challenges to
the workflow.
This step of the NGS workflow can be broadly cat-
egorised into three stages. The applications and goals
of NGS experiments often determine how the data are
processed and analysed and which bioinformatic tools
are utilised.
Stages of NGS Data Analysis
1. Pre-processing: In this stage, raw sequencing
data undergoes pre-processing to remove low-
quality reads, adapter sequences, and other arte-
facts. Quality control checks are performed
to ensure the reliability of the data. Typical
tasks include read trimming, quality filtering,
and adapter removal.
2. Alignment and Mapping: Once pre-processed,
the sequencing reads are aligned or mapped to a
reference genome or transcriptome. This step in-
volves identifying the genomic or transcriptome
locations where the reads originated. Various
alignment algorithms and tools are employed for
this purpose, considering factors such as read
length, sequencing technology, and genome com-
plexity.
3. Variant Calling and Analysis: After alignment,
variant calling is performed to identify genetic
variations such as single nucleotide polymor-
phisms (SNPs), insertions, deletions, and struc-
tural variants. Statistical algorithms and filters
are applied to distinguish true variants from se-
quencing errors and artefacts. Following variant
calling, downstream analysis may include func-
33
2.6 NGS Data Analysis Using Bioinformatics
tional annotation, pathway analysis, and inter-
pretation of the biological significance of de-
tected variants.
In conclusion, Next-Generation Sequencing (NGS) is
a transformative technique capable of producing vast
volumes of data, offering the potential for ground-
breaking biological discoveries. While the NGS work-
flow encompasses numerous intricate processes and
considerations, grasping the fundamental principles
of its key steps is pivotal. This understanding aids
in meticulously planning NGS experiments, ensuring
high-quality data acquisition and attaining significant
outcomes. By comprehending the core principles un-
derlying NGS methodologies, researchers can navigate
the complexities of sample preparation, sequencing,
and data analysis more precisely. This, in turn, en-
hances the reliability and robustness of the results ob-
tained from NGS experiments, facilitating the elucida-
tion of novel biological insights and advancing scien-
tific knowledge. Ultimately, with a solid grasp of NGS
fundamentals, researchers can harness the full poten-
tial of this powerful technology to unlock the mysteries
of the genome and beyond.
References
Schroeder A, Mueller O, Stocker S et al. (2006)
The RIN: an RNA integrity number for assigning in-
tegrity values to RNA measurements. BMC Mol Biol.7:3.
Thermo Fisher Scientific, Inc. (2018) Qubit RNA
IQ Assay: a fast and easy fluorometric RNA quality
assessment. (Application note)
Stepanauskas R, Fergusson EA, Brown J et al.
(2017) Improved genome recovery and integrated cell-
size analyses of individual uncultured microbial cells
and viral particles. Nat Commun8(1):84.
Illumina, Inc. (2017) An Introduction to Next-
Generation Sequencing Technology. (Brochure)
Illumina, Inc. Patterned Flow Cell Technology.
(Website)
Bentley DR, Balasubramanian S, Swerdlow HP et
al. (2008) Accurate whole human genome sequencing
using reversible terminator chemistry. Nature456(7218):53–
59.
Illumina, Inc. (2018) Illumina CMOS Chip and
One-Channel SBS Chemistry. (Technical note)
https://microbenotes.com/next-generation-sequencing-ngs/
https://www.genome.gov/genetics-glossary/Genetic-Code
https://irepertoire.com/ngs-overview-from-sample-to-sequencer-to-results/
https://www.thermofisher.com/in/en/home/industrial/
spectroscopy-elemental-isotope-analysis/molecular-spectroscopy/
fluorometers.html
https://www.thermofisher.com/in/en/home/life-science/dna-
rna-purification-analysis/nucleic-acid-gel-electrophoresis/e-
gel-electrophoresis-system/e-gel-precast-agarose-gels.html
About the Author
Geetha Paul is one of the directors of
airis4D. She leads the Biosciences Division. Her
research interests extends from Cell & Molecular Bi-
ology to Environmental Sciences, Odonatology, and
Aquatic Biology.
34
How Bisulfite Sequencing Reveals Hidden
Messages?
by Jinsu Ann Mathew
airis4D, Vol.2, No.5, 2024
www.airis4d.com
DNA methylation is a crucial part of how our
genes work. It happens when certain parts of our DNA
get tagged with small chemicals, usually at spots called
CpG-rich regions. These tags can affect how genes are
turned on or off, which is really important for under-
standing how our bodies function.
Now, here’s the tricky part: regular sequencing
methods cant directly show us where these tags are.
They can only tell us the basic A, T, G, and C build-
ing blocks of DNA without detailing whether they’re
tagged with these chemicals or not.
But theres a cool technique called bisulfite con-
version that changes all that. It’s like a magic trick that
reveals the hidden methylation patterns in our DNA.
By combining this technique with sequencing, scien-
tists can finally see where these methylation tags are
and how they affect our genes.
In this article, we’ll dive into the world of bisulfite
sequencing, breaking down how it works in simple
terms and why its so important for understanding DNA
methylation.
3.1 What is Bisulphite Sequencing
Bisulphite sequencing is a powerful technique
used to study DNA methylation, an essential epige-
netic modification that influences gene expression and
various cellular processes. Methylation is typically in-
vestigated in gene promoter regions, with a focus on
CpG dinucleotides. Within these regions, methylation
occurs through the addition of a methyl group to the
(image
courtesy:https://geneticeducation.co.in/what-is-bisulfite-sequencing-beginners-to-advance-guide/)
Figure 1: Conversion of Cytosine to 5-methylcytosine
C5 carbon of the cytosine nucleotide, resulting in the
formation of 5-methylcytosine (Figure 1).
The principle behind bisulfite sequencing lies in
the chemical conversion of cytosine bases. Sodium
bisulfite, a chemical agent, specifically targets and
chemically modifies cytosine residues in DNA. Im-
portantly, it converts unmethylated cytosines to uracil
while leaving methylated cytosines unchanged. This
chemical conversion provides a way to differentiate
between methylated and unmethylated cytosines.
After bisulfite treatment, the modified DNA un-
dergoes polymerase chain reaction (PCR) amplifica-
tion. PCR selectively amplifies the DNA regions of
interest, which contain the converted cytosines (uracil)
and any remaining methylated cytosines. This step
generates multiple copies of the DNA fragments for
subsequent sequencing analysis.
The PCR-amplified DNA fragments are then sub-
jected to DNA sequencing. During sequencing, the
modified cytosines (originally methylated or unmethy-
lated) are read as thymines (T), while the methylated
3.2 DNA Isolation
(Image courtesy:
https://geneticeducation.co.in/what-is-bisulfite-sequencing-beginners-to-advance-guide/)
Figure 2: Illustration of the complete bisulfite se-
quencing process.
cytosines, which were protected from bisulfite con-
version, are read as cytosines (C). By comparing the
sequenced DNA with the original reference sequence,
researchers can identify the locations of methylated
cytosines (Figure 2).
Steps in Bisulfite Sequencing
3.2 DNA Isolation
DNA isolation, a fundamental technique in molec-
ular biology, involves extracting the genetic material
from a cell. This purified DNA serves as the founda-
tion for various downstream applications like genetic
testing or gene cloning. The process typically follows
a multi-step approach:
Cell Lysis and Breakdown: The initial step dis-
rupts the cell wall and membrane, releasing the cel-
lular contents including DNA. This can be achieved
mechanically using homogenization, with the action of
specific enzymes, or with detergents that dissolve the
cell membrane.
Purification and Isolation: Following cell dis-
ruption, unwanted molecules like proteins and RNA
are removed. This often involves enzymatic diges-
tion to break down these contaminants. Finally, the
DNA is separated from the remaining cellular debris.
Techniques like alcohol precipitation or chromatogra-
phy can be employed for this purpose, resulting in a
purified and concentrated DNA sample.
(Image courtesy: https://www.researchgate.net/figure/Bi-molecular-hybridization-and-
denaturation-of-DNA fig2 253962134)
Figure 3: Denaturation of DNA
3.3 Bisulphite Conversion
Bisulfite conversion is a chemical treatment used
to investigate DNA methylation patterns at single-nucleotide
resolution. This process involves the treatment of ge-
nomic DNA with sodium bisulfite, a compound that
chemically modifies unmethylated cytosine residues,
while leaving methylated cytosines unchanged. Fol-
lowing are the steps involved:
Denaturation: Exposing the Cytosines: The
first step involves breaking apart the double-stranded
DNA (dsDNA) into single strands (Figure 3). This is
achieved through denaturation, typically by applying
heat or chemicals. This step is critical because bisul-
fite conversion only works on single-stranded DNA.
The presence of the complementary strand in dsDNA
physically protects cytosines from the conversion pro-
cess.
Chemical Conversion: Unmasking Unmethy-
lated Cytosines: With the DNA single-stranded, the
sample is then incubated with sodium bisulfite at a
specific temperature. This chemical reacts with un-
methylated cytosines (C) in the DNA, causing them to
deaminate and transform into uracil (U). Importantly,
methylated cytosines (5-methylcytosine) remain unaf-
fected by sodium bisulfite.
Purification: Preparing for Analysis: The final
36
3.4 PCR Amplification
step involves desalting and desulfonation. This crucial
cleaning process removes all the leftover sodium bisul-
fite and any unconverted single-stranded DNA frag-
ments. The remaining purified DNA now contains
uracil (U) where there were originally unmethylated
cytosines, while the methylated cytosines retain their
original form (C).
3.4 PCR Amplification
Polymerase Chain Reaction (PCR) amplification
in bisulfite sequencing plays a pivotal role in selectively
amplifying the regions of interest within the bisulfite-
treated DNA sample. Bisulfite conversion transforms
unmethylated cytosines (C) into uracil (U). But, reg-
ular DNA polymerases used in PCR can only read
and incorporate the standard DNA bases (A, C, G, T).
They can’t directly work with uracil (U). This creates
a roadblock for amplifying the bisulfite-treated DNA,
as it now contains uracil where unmethylated cytosines
originally resided.
To overcome this hurdle, bisulfite sequencing em-
ploys specialized polymerases. These enzymes are
aptly named ”bisulfite-converted DNA compatible” poly-
merases. They possess the unique ability to recognize
and incorporate adenine (A) opposite uracil (U) during
PCR. Through a series of heating, cooling, and exten-
sion cycles, the targeted region is amplified, including
both the converted (originally unmethylated) and un-
converted (originally methylated) sections. With each
cycle, the target DNA fragments are exponentially am-
plified, resulting in a substantial increase in the number
of DNA copies (Figure 4). After PCR amplification
is complete, the resulting PCR products containing
the bisulfite-converted DNA fragments are analyzed
to confirm successful amplification.
3.5 Sequencing
Following PCR amplification in bisulfite sequenc-
ing, DNA sequencing serves as the final step to trans-
late the methylation information into a readable format.
Unlike standard sequencing that identifies the classical
A, C, G, and T bases, bisulfite sequencing requires a
(Image courtesy:https://www.researchgate.net/figure/The-exponential-amplification-of-DNA-in-
PCR fig4 236065209)
Figure 4: Amplification of DNA in PCR.
careful interpretation due to the prior conversion step.
The key lies in remembering that bisulfite treat-
ment converts unmethylated cytosines (C) to uracil (U).
During PCR amplification, this uracil gets incorpo-
rated as thymine (T) into the newly synthesized DNA
strands. Therefore, analyzing the final sequenced DNA
provides a map of the original methylation pattern.
By comparing the sequenced DNA fragments to the
reference genome, researchers can discern whether a
cytosine was originally methylated (if it remains a cy-
tosine) or unmethylated (if it was converted to uracil).
Through this analysis, methylation profiles and maps
are generated, providing valuable insights into DNA
methylation patterns and their role in gene regulation,
development, and disease.
3.6 Conclusion
In conclusion, bisulfite conversion offers a pow-
erful tool for investigating DNA methylation, a key
epigenetic modification that influences gene expres-
sion and cellular function. By selectively converting
unmethylated cytosines to uracil, this technique allows
researchers to create a map of methylation patterns
across a specific DNA region. Through subsequent
PCR amplification and DNA sequencing, the original
methylation status can be determined. Bisulfite conver-
sion plays a vital role in various research areas, includ-
ing understanding gene regulation in development and
disease, and offers valuable insights for advancing our
understanding of how the epigenome shapes cellular
processes.
37
3.6 Conclusion
References
What is Bisulfite Sequencing?- Beginners to Ad-
vance Guide
DNA methylation detection: Bisulfite genomic
sequencing analysis
Bisulfite sequencing
Brush Up: What Is Bisulfite Sequencing and
How Do Researchers Use It to Study DNA Methy-
lation?
BS-Seq/Bisulfite-seq/WGBS
Principles and Workflow of Whole Genome Bisul-
fite Sequencing
About the Author
Jinsu Ann Mathew is a research scholar
in Natural Language Processing and Chemical Infor-
matics. Her interests include applying basic scientific
research on computational linguistics, practical appli-
cations of human language technology, and interdis-
ciplinary work in computational physics.
38
About airis4D
Artificial Intelligence Research and Intelligent Systems (airis4D) is an AI and Bio-sciences Research Centre.
The Centre aims to create new knowledge in the field of Space Science, Astronomy, Robotics, Agri Science,
Industry, and Biodiversity to bring Progress and Plenitude to the People and the Planet.
Vision
Humanity is in the 4th Industrial Revolution era, which operates on a cyber-physical production system. Cutting-
edge research and development in science and technology to create new knowledge and skills become the key to
the new world economy. Most of the resources for this goal can be harnessed by integrating biological systems
with intelligent computing systems offered by AI. The future survival of humans, animals, and the ecosystem
depends on how efficiently the realities and resources are responsibly used for abundance and wellness. Artificial
intelligence Research and Intelligent Systems pursue this vision and look for the best actions that ensure an
abundant environment and ecosystem for the planet and the people.
Mission Statement
The 4D in airis4D represents the mission to Dream, Design, Develop, and Deploy Knowledge with the fire of
commitment and dedication towards humanity and the ecosystem.
Dream
To promote the unlimited human potential to dream the impossible.
Design
To nurture the human capacity to articulate a dream and logically realise it.
Develop
To assist the talents to materialise a design into a product, a service, a knowledge that benefits the community
and the planet.
Deploy
To realise and educate humanity that a knowledge that is not deployed makes no difference by its absence.
Campus
Situated in a lush green village campus in Thelliyoor, Kerala, India, airis4D was established under the auspicious
of SEED Foundation (Susthiratha, Environment, Education Development Foundation) a not-for-profit company
for promoting Education, Research. Engineering, Biology, Development, etc.
The whole campus is powered by Solar power and has a rain harvesting facility to provide sufficient water supply
for up to three months of drought. The computing facility in the campus is accessible from anywhere through a
dedicated optical fibre internet connectivity 24×7.
There is a freshwater stream that originates from the nearby hills and flows through the middle of the campus.
The campus is a noted habitat for the biodiversity of tropical Fauna and Flora. airis4D carry out periodic and
systematic water quality and species diversity surveys in the region to ensure its richness. It is our pride that
the site has consistently been environment-friendly and rich in biodiversity. airis4D is also growing fruit plants
that can feed birds and provide water bodies to survive the drought.