Cover page

Image Name: Peering Into the Tendrils of NGC 604 with NASA’s Webb

Image credit: NASA, ESA, CSA, STScI

At the center of the image is a nebula on the black background of space. The nebula is comprised of wispy

ﬁlaments of light blue clouds. At the center-right of the blue clouds is a large cavernous bubble. The bottom

left edge of this cavernous bubble is ﬁlled with hues of pink and white gas. There are hundreds of dim

stars that ﬁll the surrounding area of the nebula. For more information read : https://www.ﬂickr.com/photos/

nasawebbtelescope/53577720515/in/album-72177720313923911

Managing Editor Chief Editor Editorial Board Correspondence

Ninan Sajeeth Philip Abraham Mulamoottil K Babu Joseph The Chief Editor

Ajit K Kembhavi airis4D

Geetha Paul Thelliyoor - 689544

Arun Kumar Aniyan India

Sindhu G

Jorunal Publisher Details

Publisher : airis4D, Thelliyoor 689544, India

Website : www.airis4d.com

Email : nsp@airis4d.com

Phone : +919497552476

Editorial

by Fr Dr Abraham Mulamoottil

airis4D, Vol.2, No.5, 2024

www.airis4d.com

We are continuing with our a monthly interactive

program called “Speak with an Astronomer”, which

is inspired by Ajit Kembhavi’s article “Black Hole

Stories-8, Rotating Black Holes”. This program pro-

vides young enthusiasts with the chance to explore the

topic in depth and acquire profound knowledge.

Blesson George’s article “Types of Attention Mod-

els” explores attention networks, focusing on three

types of attention mechanisms critical to their func-

tionality. These mechanisms include global and local

attention, which diﬀer in their scope of focus on in-

put data, and soft and hard attention, which describe

the method of attention application. Additionally, self-

attention, a mechanism allowing models to prioritize

input parts independently, is discussed. Global atten-

tion considers the entire input sequence, while local

attention focuses on speciﬁc subsets. Soft attention

dynamically allocates attention weights, while hard

attention involves stochastic selection. These mech-

anisms enhance model interpretability and eﬃciency

within neural network architectures.

In “Black Hole Stories-8 Rotating Black Holes”

by Ajit Kembhavi, the focus shifts to black holes with

mass and angular momentum, or spin. Unlike in the

case of Schwarzschild black holes, which have only

mass, spinning black holes have a more complex space-

time structure. The Kerr metric, discovered by Roy

Kerr in 1963, describes the geometry of spinning black

holes, revealing intricate features like the ergosphere.

The article explores the properties of the Schwarzschild

metric, the nature of geodesics around black holes, and

the Kerr metric’s special cases when the spin or mass

approaches zero. These insights provide a deeper un-

derstanding of rotating black holes’ dynamics and their

impact on astrophysics.

In “Beginners Guide to Machine Learning in Python

- Part 2” by Linn Abraham, Python’s suitability for ma-

chine learning is discussed, emphasizing its readability

and the availability of libraries like NumPy and Pan-

das. The article explores the fundamentals of machine

learning, comparing it with deep learning and high-

lighting the importance of frameworks like TensorFlow

and PyTorch. It also covers practical aspects such as

GPU utilization, data preprocessing, and model evalua-

tion. Overall, the article provides a concise overview of

implementing machine learning workﬂows in Python,

hinting at future discussions on advanced topics.

In “Unlocking the Mysteries of Star Clusters: Ce-

lestial Ensembles of Cosmic Wonder” by Sindhu G,

star clusters are explored as groupings of stars bound

by gravity, varying in size and composition. The article

delves into the formation of star clusters, highlighting

types such as globular clusters, open clusters, and em-

bedded clusters. It discusses the distinct properties and

signiﬁcance of each type, oﬀering insights into stellar

evolution, galactic dynamics, and the history of the

universe. Additionally, the article touches upon the

challenges and methods of observing these clusters,

showcasing their role in advancing our understanding

of the cosmos.

“X-ray Astronomy: Through Missions” by Aro-

mal P traces the history of X-ray astronomy from its

beginnings with balloon experiments in the early 20th

century to the development of rocket missions and

satellites for X-ray observations. The article highlights

key milestones, such as the discovery of solar X-rays

in 1949 and the detection of X-rays from outside the

solar system in 1962 with the launch of an Air Force

Aerobee rocket. The signiﬁcance of these discover-

ies led to further exploration through rocket launches

and balloon experiments, eventually paving the way

for dedicated X-ray astronomical satellites. The dis-

cussion emphasizes the evolution of technology and

the contributions of scientists like Riccardo Giacconi

and Herbert Gursky in shaping our understanding of

the X-ray universe.

“Radio Galaxies: An Introduction” by Kshitij

Thorat provides an overview of radio galaxies, which

emit a signiﬁcant portion of their light in the radio

bands due to large-scale jets and lobes. These jets,

believed to originate from supermassive black holes at

their centers, extend over vast distances, making radio

galaxies some of the largest objects in the universe.

The article discusses the structure of radio galaxies us-

ing Cygnus A as an example, highlighting features like

jets, lobes, and hotspots. It explains the process be-

hind the emission of radio waves and explores diﬀerent

types of radio galaxies based on their structures. The

signiﬁcance of radio galaxies in understanding galac-

tic activity and structure formation is also emphasized.

Additionally, the article mentions radio telescopes like

the Giant Metrewave Radio Telescope (GMRT) and

the Square Kilometre Array (SKA), which contribute

to studying radio galaxies in detail.

Atharva Pathak in “Genetically Engineered War-

riors: India’s New Hope in Cancer Treatment” explores

India’s breakthrough in cancer treatment with CAR T-

cell therapy. This revolutionary approach, exempliﬁed

by NexCAR19, oﬀers accessible treatment for B-cell

cancers. The article also highlights the role of AI/ML

in cancer care, emphasizing their potential in diag-

nosis, treatment, and drug discovery. Despite chal-

lenges, the article concludes optimistically, showcas-

ing the transformative impact of scientiﬁc innovation

on cancer treatment in India.

Geetha Paul’s article delves into DNA sequenc-

ing, particularly Next-Generation Sequencing (NGS),

highlighting its signiﬁcance in deciphering genetic in-

formation. It explains NGS’s high-throughput capabil-

ities and the steps involved in the sequencing process,

from sample extraction to data analysis. Emphasizing

the importance of quality control and bioinformatics,

the article underscores NGS’s transformative potential

in advancing biological research. Overall, it oﬀers a

concise overview of NGS and its implications for sci-

entiﬁc discovery.

Jinsu Ann Mathew’s article explores the signif-

icance of DNA methylation and introduces bisulﬁte

sequencing as a technique to uncover hidden methy-

lation patterns. DNA methylation, crucial for gene

regulation, is often concealed in standard sequencing

methods. Bisulﬁte conversion, a chemical process, re-

veals these patterns by distinguishing methylated from

unmethylated cytosines. The technique involves DNA

isolation, bisulﬁte conversion, PCR ampliﬁcation, and

DNA sequencing. By interpreting the sequenced DNA,

researchers discern the original methylation status, aid-

ing in understanding gene regulation, development, and

disease. Bisulﬁte sequencing emerges as a powerful

tool oﬀering insights into epigenetic modiﬁcations and

their role in cellular processes.

iii

Contents

Editorial ii

I Artiﬁcial Intelligence and Machine Learning 1

1 Types of Attention Models 2

1.1 Global and Local Attention Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Beginners Guide to Machine Learning in Python - Part 2 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Why Machine Learning? What Problems Does It Solve? . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Under-the-hood of a Deep Learning Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

II Astronomy and Astrophysics 9

1 Black Hole Stories-8

Rotating Black Holes 10

1.1 Black Holes With Spin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2 The Schwarzschild Metric – A Brief Recapitulation . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 The Kerr Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Special Cases of the Kerr Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 X-ray Astronomy: Through Missions 13

3 Radio Galaxies: An Introduction 16

4 Unlocking the Mysteries of Star Clusters: Celestial Ensembles of Cosmic Wonder 18

4.1 Star Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 Globular Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Open Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.4 Embedded Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.5 Super Star Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

III Biosciences 24

1 Genetically Engineered Warriors: India’s New Hope in Cancer Treatment 25

1.1 Supercharging the Immune System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.2 A Breakthrough for India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3 Looking Ahead: A Brighter Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.4 Unleashing the Power of AI and ML in Cancer and Medicine . . . . . . . . . . . . . . . . . . . . 26

CONTENTS

1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 DNA Sequencing

Next-Generation Sequencing (NGS) 29

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2 Next Generation Sequencing (NGS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3 Step 1: Sample Isolation and Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Step 2: Library Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5 Step 3: Sequencing Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.6 NGS Data Analysis Using Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 How Bisulﬁte Sequencing Reveals Hidden Messages? 35

3.1 What is Bisulphite Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 DNA Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3 Bisulphite Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 PCR Ampliﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.5 Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Part I

Artiﬁcial Intelligence and Machine Learning

Types of Attention Models

by Blesson George

airis4D, Vol.2, No.5, 2024

www.airis4d.com

In our previous episodes, we delved into the fasci-

nating world of attention networks, examining their

powerful capabilities and the fundamental building

blocks that deﬁne their structure. We thoroughly an-

alyzed the diﬀerent components that contribute to the

networks’ unique abilities to process and interpret data

eﬀectively.

Continuing our exploration, this issue will focus

on expanding our understanding by discussing three

speciﬁc types of attention mechanisms that are criti-

cal to the functionality and versatility of these models.

These types are categorized based on distinct opera-

tional features and methodologies they employ: global

and local attention, which diﬀer in the scope of focus

they apply to input data; hard and soft attention, which

describe the method and ﬂexibility of the attention ap-

plication; and self-attention, a mechanism that allows

models to weigh and prioritize diﬀerent parts of the

input independently.

1.1 Global and Local Attention

Models

Attention networks are a type of neural network

architecture that allows models to focus on speciﬁc

parts of the input data while making predictions or

decisions. These networks are designed to dynamically

weigh the importance of diﬀerent elements in the input,

enabling the model to selectively attend to relevant

information. By incorporating attention mechanisms,

the network can learn to assign varying degrees of

importance to diﬀerent parts of the input sequence,

enhancing its ability to capture complex relationships

Figure 1: Figure showing a global and local atten-

tion networks. In contrast to global attention, which

evaluates all intermediate hidden states across an entire

input sequence, local attention narrows its focus to a

select, ﬁxed-size subset of these states. This approach

typically centers the attention around a speciﬁc point

or follows a predeﬁned alignment, limiting the scope to

only the most relevant parts of the input. By doing so,

local attention signiﬁcantly reduces computational de-

mands, making it more eﬃcient, especially for longer

sequences. However, this eﬃciency comes at the cost

of potentially overlooking useful context outside the se-

lected window. Therefore, the choice between global

and local attention often balances between computa-

tional eﬃciency and the richness of contextual infor-

mation utilized. Image Credit: Nagahisarchoghaei, Mohammad, et al. ”An empirical

survey on explainable ai technologies: Recent trends, use-cases, and categories from technical and

application perspectives.” Electronics 12.5 (2023): 1092.

1.1 Global and Local Attention Models

and dependencies within the data.

The development of attention networks was driven

by the need to address the limitations of traditional neu-

ral network architectures, such as the inability to ef-

fectively handle long-range dependencies and capture

intricate patterns in sequential data. By introducing

attention mechanisms, researchers aimed to improve

the interpretability and performance of deep learning

models, particularly in tasks involving natural language

processing, machine translation, and image recogni-

tion. Attention networks enable models to focus on

speciﬁc parts of the input sequence, allowing for more

precise and context-aware predictions.

Global and local attention mechanisms are two

common variants of attention networks that serve dis-

tinct purposes in enhancing model interpretability. Global

attention mechanisms consider the entire input sequence

when assigning attention weights, allowing the model

to capture long-range dependencies and relationships

across the entire input. In contrast, local attention

mechanisms focus on a speciﬁc subset of the input se-

quence, providing a more ﬁne-grained and localized

view of the data. By incorporating both global and lo-

cal attention mechanisms, models can eﬀectively bal-

ance between capturing broad contextual information

and focusing on speciﬁc details within the input data,

leading to more accurate and insightful predictions.

Global and local attention mechanisms are fun-

damental components of neural networks that enhance

model interpretability by allowing selective focus on

diﬀerent parts of the input sequence. Global atten-

tion considers the entire input sequence when assigning

attention weights, capturing long-range dependencies

and relationships across the data. The global attention

weight α

for each element in the input sequence is

calculated as:

exp(ϵ

)

∑

j=1

exp(ϵ

)

(1.1)

represents the relevance score of the i-th element in

the input sequence. This calculation ensures that the

model weighs the importance of each element based on

its relevance score, providing a comprehensive view of

the input data.

In contrast, local attention mechanisms concen-

trate on speciﬁc subsets of the input sequence, oﬀering

a more localized perspective. By employing a window-

based approach, local attention focuses on a ﬁxed-size

window of elements around a central position. The

attention weights for local attention are computed sim-

ilarly to global attention but with constraints on the

range of elements considered. This strategy enables

the model to emphasize speciﬁc regions of the input

sequence, capturing detailed information while man-

aging computational complexity eﬀectively.

1.1.1 Soft and Hard Attention Models

Soft attention in image caption generation is a

technique that trains a model to dynamically focus on

various parts of an image when generating captions.

This model is fully diﬀerentiable, which means it can

be seamlessly integrated with gradient-based learning

methods like backpropagation, facilitating straightfor-

ward training and enhancing the model’s interpretabil-

ity. In soft attention mechanisms, every part of the

input data, such as diﬀerent regions of an image, is

assigned a weight calculated typically through a soft-

max function. These weights are fractional and col-

lectively add up to one, ensuring a comprehensive and

smooth distribution of attention across the entire input.

As a result, a context vector is formed by computing

a weighted sum of the features, where each feature’s

inﬂuence on the ﬁnal output is proportionate to its as-

signed weight, thus allowing every part of the image to

contribute to the generated caption based on its calcu-

lated relevance.

Conversely, hard attention operates on a stochas-

tic mechanism, where the focus areas within the input

are randomly sampled during each step of the caption

generation process. This selection process is based on

a probability distribution that emerges from the fea-

tures of the data, making hard attention inherently ran-

dom and making each selection unique. Because hard

attention involves making discrete choices—focusing

intently on certain parts while completely disregarding

others—it lacks diﬀerentiability. This characteristic

complicates its integration with conventional training

methods like backpropagation. Instead, hard attention

1.2 Conclusion

models often require alternative training strategies such

as maximizing an approximate variational lower bound

or employing algorithms like REINFORCE, which rely

on reinforcement learning principles or Monte Carlo

methods to estimate gradients.

Both Luong et al. and Xu et al. have exten-

sively discussed these concepts in their respective pa-

pers. They diﬀerentiate between the two models by

highlighting that soft attention calculates the context

vector as a weighted sum of all encoder hidden states.

In contrast, hard attention, as utilized particularly in

image captioning scenarios, uses attention scores to

select a single hidden state or feature vector (typically

generated by a Convolutional Neural Network, CNN).

The challenge with hard attention arises when select-

ing this state; functions like argmax might be used

for selection due to their ability to pinpoint the index

with the maximum score. However, such functions

are not diﬀerentiable—minor adjustments in network

weights during training do not alter the selected in-

dex—necessitating the use of more complex computa-

tional techniques to eﬀectively train the model. This

delineation clearly shows how the soft attention mech-

anism, with its smooth and inclusive focus across all

inputs, contrasts sharply with the selective and com-

putationally intensive nature of hard attention in the

context of image caption generation.

1.2 Conclusion

In this comprehensive exploration of attention net-

works, we have uncovered the nuanced diﬀerences and

applications of various attention mechanisms within

neural network architectures. From the broad-reaching

global attention that captures extensive contextual in-

formation across entire input sequences, to the pre-

cision of local attention focusing on speciﬁc regions,

these mechanisms signiﬁcantly enhance the interpretabil-

ity and eﬃciency of models. We delved deeper into the

distinctions between soft and hard attention models,

highlighting their respective advantages and limitations

in terms of diﬀerentiability and computational demand.

Soft attention’s integrability with gradient-based learn-

ing stands in contrast to the stochastic and computation-

ally intensive nature of hard attention, which requires

more complex training techniques such as reinforce-

ment learning.

The discussion illustrates how attention mecha-

nisms are pivotal in addressing the challenges of tradi-

tional neural networks, particularly in managing long-

range dependencies and processing large and complex

datasets eﬃciently. By enabling selective focus, these

networks do not merely react to the most prominent

features but intelligently weigh all parts of the input to

generate contextually rich outputs. As we continue to

push the boundaries of what is possible with machine

learning, attention networks represent a critical step

toward more dynamic, ﬂexible, and powerful artiﬁcial

intelligence systems. This journey into the intricacies

of attention models not only enhances our understand-

ing but also opens up new avenues for innovation in

various domains, including natural language process-

ing, computer vision, and beyond.

References

1. Nagahisarchoghaei, Mohammad, et al. ”An em-

pirical survey on explainable ai technologies:

Recent trends, use-cases, and categories from

technical and application perspectives.” Elec-

tronics 12.5 (2023): 1092.

2. Luong, Minh-Thang, Hieu Pham, and Christo-

pher D. Manning. ”Eﬀective approaches to attention-

based neural machine translation.” arXiv preprint

arXiv:1508.04025 (2015).

3. Xu, Kelvin, et al. ”Show, attend and tell: Neural

image caption generation with visual attention.”

International conference on machine learning.

PMLR, 2015.

4. Diﬀerent types of Attention in Neural Networks

1.2 Conclusion

About the Author

Dr. Blesson George presently serves as an

Assistant Professor of Physics at CMS College Kot-

tayam, Kerala. His research pursuits encompass the

development of machine learning algorithms, along

with the utilization of machine learning techniques

across diverse domains.

Beginners Guide to Machine Learning in

Python - Part 2

by Linn Abraham

airis4D, Vol.2, No.5, 2024

www.airis4d.com

2.1 Introduction

In the ﬁrst part of this series we got a brief overview

of the diﬀerent stages in a machine learning project.

We started out with setting up the environment, the

hardware and software requirements. In this brief ar-

ticle we go a bit more in-depth to see the steps in-

volved in a machine learning workﬂow. Especially the

moving parts involved in a successful training session.

This article also mentions the considerations to be had

when making choices regarding language, platforms,

libraries etc.

2.2 Why Python?

A programming language is fundamentally a tool

which helps us convey an idea to the machine. Thus

it should be immaterial which language is used for

any particular task. However there are some practical

considerations that makes us prefer one language over

others. What are some advantages and disadvantages

that Python has when it comes to machine learning?

2.2.1 Software libraries

Coding a deep neural network from scratch in

Python is possible but is heavily advised against. When

one makes heavy use of software libraries one saves

time by not reinventing the wheel. The disadvantage to

this approach is that the code is no longer in one’s con-

trol and subject to change. This change is inevitable

in the domain of technology. In science where repro-

ducibility is critical it might not be desirable to have

your code break. Thus the ﬁrst step to working in

Python is often to create an environment (also called

a virtual environment) that is isolated from the system

Python and to version control the code. All external

libraries are installed within this virtual environment.

Version control of code is often done by ‘Git’ together

with a ‘requirements.txt’ ﬁle that list the version of

each external software library used in your code (also

called a dependency). The virtual environment is also

useful to manage dependencies when working on dif-

ferent projects that might require diﬀerent version of

each of the dependencies.

2.2.2 Wrapper code

An overlooked advantage of Python is its read-

ability. Python code is often said to be almost like

pseudo code and hence very readable. The downside

of this is that its slower than other languages such as C,

Fortran etc. However this is not very bad since there

exists a lot of Python wrapper code that just provides

an interface to code written in a faster language that

does the actual heavy lifting. We use the python code

to pass inputs and to receive the outputs. This is very

often encountered in machine learning where a lot of

the actual heavy lifting is done by faster languages like

C and C++.

The basics of a programming language can be

learnt in a considerably small amount of time. The rest

2.3 Why Machine Learning? What Problems Does It Solve?

of the time is spent understanding the code written by

others and troubleshooting the usage. Time is mostly

spend in discussion forums like stackoverﬂow to un-

derstand the error messages spit out by the code you

are trying to ﬁx and seeing other people’s solutions.

Thus learning python involves learning to use a lot of

diﬀerent tools be it code editing software, virtual envi-

ronments and version control software and so on and

so forth.

2.2.3 Some useful libraries

Depending on the kind of data that one wants

to deal with there are many python libraries that one

cannot avoid using. NumPy which adds support for nu-

merical arrays. Pandas that add support for numerical

arrays that can have more than just numerical data but

also strings. It also enables the indexing of arrays us-

ing strings. Scipy adds support for scientiﬁc functions.

Matplotlib is a very rich library that supports almost

any kind of data visualization that you can think of.

PIL allows to read images into python. Astropy adds

support for astronomy related functions.

2.3 Why Machine Learning? What

Problems Does It Solve?

Most things that we as humans learn cannot be

put into a sequence of instructions to be followed word

to word by any person or machine. Think about how

you learnt to walk, speak, identify plants birds animals

etc, distinguish new faces from familiar faces. ML is

a way of harnessing this power of the human brain to

solve problems without explicit instructions. And to

apply it to niche problems in every walk of life. Mostly

to solve just one designed problem with curated data.

Remember that it is no magic bullet either. It helps

us to predict patterns in data which are diﬃcult for the

average human by delegating the eﬀort to computers.

It fails when the problem itself has no patterns - think

why ML cannot help you to hack the share market. It

fails when there are patterns but the data you have is

not enough to capture the variance.

2.3.1 Machine learning vs Deep learning

Deep learning refers to a special class of machine

learning techniques. Although there is no strict bound-

ary here there are some clues that help us distinguish

between these. Most deep learning techniques make

use of neural networks. Often there are layers of these

networks stacked on top of each other that makes them

“deep”. One advantage that comes with using neu-

ral networks versus traditional learning algorithms is

that they are quite versatile and do not require data to

be transformed to the speciﬁc requirements of the un-

derlying algorithm. However this is often where the

algorithms lose its interpretability. Hence the coinage

that neural networks are black boxes.

2.3.2 CUDA and the GPU revolution

When PCs transformed from being mostly text

based to being heavily dependent on graphics, people

developed CPUs that are specialized for matrix manip-

ulations. Remember that a screen is simply a matrix

of pixel values. The real breakthrough in deep learn-

ing occurred when researchers found a use for these in

training neural networks. Nvidia was the gpu making

company that opened up the use of its GPU for anyone

interested in doing such things by introducing a plat-

form called CUDA. The python deep learning libraries

that we are going to get introduced to make use of the

CUDA platform in order to run code on the GPUs.

2.3.3 Tensorﬂow or PyTorch

There are currently two frameworks which are

commonly used to implement deep neural networks in

python. Tensorﬂow or Tensorﬂow / Keras which was

initially developed by Google, and PyTorch which was

initially developed by Facebook. It is mostly a matter

of personal taste regarding which one to use. The

scikit-learn library has a lot of the non deep learning

algorithms as well as a lot of utility functions that can

be used during the training of deep neural networks.

2.4 Under-the-hood of a Deep Learning Network

2.3.4 Vision or Speech

Two major areas of application in deep learning is

computer vision and natural language processing. This

can probably be attributed to the fact that vision and

language are two traits that are the hallmarks of our in-

telligence. This also translates to two diﬀerent formats

of digital data. Images and text data. Computer vision

techniques are developed to make use of data that has

a ﬁxed grid shape like images. NLP techniques are

developed to deal with data of variable input size. A

sentence has no restriction in the number of words it

should have. Depending on the kind of data and prob-

lem at hand, we need to look into models developed in

either of these application ﬁelds. For example, since

images are a big part of astronomical surveys, models

used in Computer Vision applications like Convolu-

tional Neural Networks or CNNs are often helpful for

solving problems. However if the data at hand is a time

series signal you may have to look into techniques like

transformers that are developed by people interest in

Natural Language Processing applications.

2.4 Under-the-hood of a Deep

Learning Network

Stochastic Gradient Descent or SGD is the engine

of modern deep learning techniques. To get an idea

of how it works let us consider a supervised image

classiﬁcation problem. This means that the input is an

image and the output is a class label. Since strings or

text data is not natural in such cases we encode the class

by attaching n neurons at the end of the network. Where

n corresponds to the number of classes in the problem.

All the outputs are restricted to be between some ﬁxed

range like (0,1) using non linear functions like sigmoid.

Then the last layer neuron with the highest output value

can be the predicted class label.

Most practical datasets are too big to be com-

pletely held in a computers memory. This is why

we need generators that load the data into memory

in batches. All the weights, i.e. parameter values in

the network are randomly initialized. A single batch

of data is forward passed through the network. A loss

function is used to get a feedback regarding how much

the predictions diﬀer from the expected output. The

errors are backpropagated to the initial layers using

gradients. The weights are adjusted and the loop con-

tinues.

Soon the need arrives to have controlled sets for

testing the performance of a trained model. Ideally we

should not make decisions in model parameters etc.

based on this test model. What happens then is the

information from the test set leaks back into our model

and our test set is no longer unbiased or fair. This is

why we often have a train/validation/test split. Where

the validation set which is like a test set is used for

improving the model parameters.

This constitutes the basic workﬂow of an deep

learning training session. But there are lot more things

to be done. How do we properly assess the learning

process during the training session itself? Once the

training is done how can we assess it? What if the

datasets are imbalanced? Does traditional metrics like

accuracy work in evaluating the performance? If your

dataset is small, is there statistical signiﬁcance for your

results? And ﬁnally even if its producing good results,

how can you be sure that the model is looking for

patterns that you see or ﬁnding some other hidden bias?

All these can be the content of a future article in this

series- watch out.

About the Author

Linn Abraham is a researcher in Physics,

specializing in A.I. applications to astronomy. He is

currently involved in the development of CNN based

Computer Vision tools for prediction of solar ﬂares

from images of the Sun, morphological classiﬁca-

tions of galaxies from optical images surveys and ra-

dio galaxy source extraction from radio observations.

Part II

Astronomy and Astrophysics

Black Hole Stories-8

Rotating Black Holes

by Ajit Kembhavi

airis4D, Vol.2, No.5, 2024

www.airis4d.com

So far in our Black Hole Stories, we have con-

sidered Schwarzschild black holes, which have only

one parameter, which is mass. In the present story

we will consider black holes which have mass as well

as angular momentum or spin. The space-time struc-

ture around spinning black holes is more complicated

than the simple Schwarzschild geometry. That reﬂects

in the shape of trajectories of particles and photons

around them, and the structure of the singularity and

the event horizon. There are also features like the er-

gosphere which only exist when spin is present. We

will describe some of these properties in this story and

the next one.

1.1 Black Holes With Spin

Karl Schwarzschild discovered the ﬁrst exact so-

lution of Einstein’s equations in 1916, just a year after

the equations were ﬁrst published. His solution de-

scribes the space-time structure, i.e. the gravitational

ﬁeld around a point particle with mass and no other

properties. As we have seen through our stories, such

a solution corresponds to a black hole. One family

of such black holes, known as stellar mass black holes,

are formed when stars much more massive than the Sun

complete their evolution and explode, leaving behind a

black hole, which can have mass in the range of a few

times the mass of the Sun to several tens of times the

mass of the Sun. Another family of black holes, known

as supermassive black holes, have mass ranging from

about a million times the Solar mass to many billions

of Solar masses. Such black holes are believed to form

in the collapse of very large clouds of gas. They are

located in the centres of galaxies and their mass can

steadily increase for billions of years after formation

due to capture of gas and stars from the surrounding

galaxy.

Stars and gas clouds always have angular momen-

tum which causes them to rotate. Some of this angular

momentum can be lost during the processes which lead

to the formation of the black hole, but it is natural to

expect that at least part of the angular momentum will

remain with the collapsing object. Therefore, black

holes should have non-zero spin, which will have an

eﬀect on their space-time structure. The exact solution

for a spinning black hole was discovered by Roy Kerr

in 1963. This very seminal work has enabled a full

study of the very complex geometry of such a black

hole, and has important implications to astrophysics,

which became clear only decades after the discovery

of the solution.

1.2 The Schwarzschild Metric – A

Brief Recapitulation

Here we will brieﬂy summarise the some proper-

ties of the Schwarzschild metric and of the Schwarzschild

black hole, which we have described in some detail in

Stories 5 and 1. The space-time the metric describes

is around a point particle of a given mass M. Since

the particle has no direction dependent properties, the

1.3 The Kerr Metric

space-time around it is spherically symmetric, so it is

best described in terms of the coordinates t, r, θ, φ.

As described in Story 5, t is the time coordinate, and

r, θ, φ indicate the position of a point in space. The

two angular coordinates a θ and φ are similar to the

two angles from the spherical polar coordinates used

to describe ﬂat 3-dimentional space, but the radial co-

ordinate r is somewhat diﬀerent. Because the space is

curved, r no longer is the distance from the origin, but

it helps to ﬁx the position in space. The mass M is

located at the origin r=0. If we take a ﬁxed value of

r and vary the angular coordinates over their ranges, a

spherical surface is generated. The area of this sphere

is 4πr

as in ﬂat space.

The spherical surface with radius R

= 2GM/c

known as the event horizon. This has the property

that no particle or light ray can travel from inside the

event horizon to the outside. The region inside the

event horizon is cut oﬀ from the rest of the Universe

and therefore we have a black hole. At the position of

the point mass M, the matter density is inﬁnitely large

and so is the curvature of space-time, and so we have

a space-time singularity. The outside world cannot

see the singularity because of the event horizon. It is

possible for matter and light to fall into the black hole

from the outside the event horizon.

As described in Story 5, the motion of particles

with mass in a gravitational ﬁeld is described by time-

like geodesics, while that of a light ray is described by

a null geodesic. There are two symmetries associated

with the Schwarzschild metric: it is independent of

time and is spherically symmetric. Therefore the en-

ergy and angular momentum of a particle or light ray in

orbit around a Schwarzschild black hole are conserved,

that is they remain constant. It is therefore possible to

analyse the nature of the geodesics in a simple manner.

In Story 6, we have described how the nature of time

like geodesics is studied using an eﬀective potential

eﬀ

. For a particle with a given angular momentum,

the eﬀective potential depends only on the radial coor-

dinate r . In general it has a maximum and minimum,

which produces a potential well. Depending on its en-

ergy, (1) a particle can come in from large distances,

swing around the centre and recede again to large dis-

tances, (2) it can fall into the black hole, or (3) move

in a bound orbit around the black hole with shape cor-

responding to a precessing ellipse. When the energy

of the particle is equal to the minimum of the eﬀective

potential, the orbit is circular in shape. As described

in Story 8, the behaviour of light rays, i.e. photons is

somewhat diﬀerent. They can have orbits as in (1) and

(2), but the only bound orbits occur at a ﬁxed value of

r=1.5r

. These orbits are circular and unstable.

1.3 The Kerr Metric

The Kerr metric provides the structure of space-

time around a particle which has mass and angular

momentum or spin. The angular momentum deﬁnes a

direction around which the particle spins. That is easy

to visualise for an extended body like the Earth, but

the same physics applies to a point particle too. Be-

cause the mass and angular momentum are constant,

the metric is constant in time. The spin axis is also

a symmetry axis, in the sense that the metric remains

the same for all points in a plane perpendicular to the

spin axis (this and other such concepts can be mathe-

matically deﬁned for the curved space-time of general

relativity, but I am using simple expressions for qual-

itative understanding). Roy Kerr obtained an exact

solution for Einstein’s equations for the special case of

a spinning, massive particle.

It is convenient to express the Kerr solution in

terms of coordinate system t, r, θ, φ known as Boyer-

Lindquist coordinates. Here t is the time coordinate as

usual; the other three coordinates have the appearance

of the spherical polar coordinates used in Schwarzschild

metric, but the appearance is deceptive. For example,

the coordinate r does not have the same meaning as in

the Schwarzschild case. There a surface with r constant

has a spherical shape with area 4πr

, though r is not the

distance from the origin, which is at r = 0. This inter-

pretation is no longer applicable in the Boyer-Lindquist

coordinates. The angle φ goes round the axis deﬁned

by the direction of the spin, while the interpretation of

angle θ is the familiar one only in the special cases we

will consider below.

The Kerr metric depends on two parameters, the

1.4 Special Cases of the Kerr Metric

mass of the black hole M and a parameter a which is

related to the angular momentum J of the black hole:

where c is the speed of light. While M can be

chosen to have any value, it turns out that there is a

maximum value of the parameter a permitted, which

leads to a maximum value on the angular momentum

A black hole with this maximum spin value is

known as an extreme Kerr black hole. We will see later

how extreme black holes can develop in astrophysical

situations.

1.4 Special Cases of the Kerr Metric

The structure of the Kerr metric is rather complex,

and as mentioned above, even the interpretation of the

coordinates is not straightforward. It therefore helps to

consider special cases to gain insight into the nature of

the metric.

The Kerr metric depends on two parameter, mass

M and spin parameter a. If a→0, then the angu-

lar momentum J→0, and we expect to recover the

Schwarzschild metric which depends only on the mass.

That is found to be correct, and in this approximation

of vanishing spin the coordinates r, θ, φ acquire their

usual meaning of spherical polar coordinates as appli-

cable to the Schwarzschild metric.

The other interesting approximation is of vanish-

ing mass, M→0. In this case there is no gravitating

mass left, and we expect that the structure of space-

time should be the ﬂat space-time of special relativity.

It is indeed possible to transform to Cartesian coordi-

nates x, y, z in which we recover the usual geometry of

ﬂat space. It is interesting to know that in this case, the

Boyer-Lindquist coordinate r=0 corresponds to a ring

of radius a in the xy plane deﬁned by z=0. This an

example of the complex nature of the metric and the

Boyer-Lindquist coordinates, and has implications for

the structure of the singularity and the event horizon.

There are also interesting concepts associated with the

Kerr geometry like frame dragging and ergosphere,

which are not present in the Schwarzschild metric. We

will consider these in the next story.

About the Author

Professor Ajit Kembhavi is an emeritus

Professor at Inter University Centre for Astronomy

and Astrophysics and is also the Principal Investiga-

tor of the Pune Knowledge Cluster. He was the former

director of Inter University Centre for Astronomy and

Astrophysics (IUCAA), Pune, and the International

Astronomical Union vice president. In collaboration

with IUCAA, he pioneered astronomy outreach ac-

tivities from the late 80s to promote astronomy re-

search in Indian universities. The Speak with an

Astronomer monthly interactive program to answer

questions based on his article will allow young enthu-

siasts to gain profound knowledge about the topic.

X-ray Astronomy: Through Missions

by Aromal P

airis4D, Vol.2, No.5, 2024

www.airis4d.com

“Science does not have a moral dimension. It is

like a knife. If you give it to a surgeon or a murderer,

each will use it diﬀerently.”

— Wernher von Braun

Beginning : Rockets and Balloons

Cosmic rays were discovered by Victor Hess after

a series of balloon experiments conducted in 1912 and

it given insights to the scientiﬁc community that there

were many things beyond the atmosphere that were

unknown to humankind. The quest for the new knowl-

edge accelerated thereby. Most of them were focused

for military uses and the ﬁrst and second world war

accelerated those studies mainly focusing on military

application only. When the second world war ended

in 1945 and the world saw enough bloodshed, nations

started intellectual wars!

Those who gain the unknown knowledge became

more powerful. The missiles used in war becomes

Rockets for scientiﬁc expeditions. After world war

II US military oﬀered various institute to carry their

scientiﬁc experiments through rockets developed by

Wernher von Braun a famous aerospace engineer who

was part of German military and later joined NASA.

Herbert Friedman used this opportunity for studying

Sun’s UV and X-rays. Friedman used combinations

of ﬁlters and gas mixtures to develop photo multi-

plier tubes that are sensitive in narrow frequency range.

With the help of V-2 rocket that launched on 1949 from

White Sands, for the ﬁrst time in the history of humans

an X-ray instrument reached above the atmosphere to

Figure 1: Friedman and the adaptation of the tube used

in a Geiger-Mueller counter Credits:Public Domain

detect the X-ray photons emitting from sun’s corona.

After decades of eﬀorts and with further development

in technology using more advanced Aerobee Rockets,

Friedman and his colleagues obtained the ﬁrst X-ray

images of sun using pinhole camera. Friedman was

the ﬁrst to ﬂew a Bragg spectrometer for measuring

hard X-rays.

Even-though Solar X-rays were discovered in 1949,

there wasn’t much progress in detecting X-rays from

any other sources. The cold war happened between

USA and Soviet Union paved the further fast devel-

opment in the X-ray astronomy. After the ”Sputnik

Shock” of 1957 when the Soviet Union leading the

space race, more funds were allotted to space programs

in USA as well. In September 1959 Bruno Rossi, who

was the chairman of board of American Science and

Engineering (AS&E) suggested to Riccardo Giacconi,

head of Space Science Division of AS&E to develop

Figure 2: Discovery of X-rays from Scorpius X-1.

Credit: Giaconni et al. 1962

research program on X-ray astronomy. Riccardo Gi-

aconni submitted two proposals to the newly formed

NASA for developing X-ray telescope and rocket mis-

sion to study about X-rays from moon and crab neb-

ula. NASA accepted the ﬁrst one and rejected the

second one as the oﬃcials in NASA thought it impos-

sible to detect X-rays from moon. Riccardo Giaconni

send rejected proposal to the Air Force Cambridge Re-

search Laboratory and he got fund to a series of rocket

launches that changed the entire fate of X-ray astron-

omy.

On June 18, 1962, an Air Force Aerobee rocket

was launched from the White Sands Missile Range in

New Mexico with an array of X-ray sensors on board.

Three large area Geiger counters made up the setup.

Every Geiger counter had seven separate mica win-

dows, each with a 20 cm square, arranged in one face

of the counter. These detectors’ sensitivity ranged from

2 to 8

A for X-rays. An anti-coincidence scintillation

counter intended to lower the cosmic-ray background

surrounded each Geiger counter. Upon Analysing the

data from the detectors Riccardo Giacconi, Herbert

Gursky, and Frank R. Paolini and Bruno B. Rossi found

the evidence for X-rays from outside the solar system.

The source called as Scorpius X-1 And it marked the

beginning of X-ray astronomy.

After the discovery of X-rays from Scorpius X-1

further studies were carried out by scientist to study

the X-rays and many rockets launched to the sky for

that which can reach an altitude of 200 km and around

Figure 3: Atmospheric absorption as a function of the

wavelength (bottom axis). The solid lines indicate the

fraction of the atmosphere, expressed in unit of 1 atmo-

sphere pressure (right vertical axis) or in terms of alti-

tude (left vertical axis), at which half of the incoming

celestial radiation is absorbed by the atmosphere.(Credit:

High Energy Astrophysics Group, University of Tubingen)

45 rockets were launched to carryout X-ray observa-

tions before 1970s. One of the main problem they

faced is that They wont get enough time to observe

the variability of a source during a rocket experiment.

Maximum time of 20 minutes is not suﬃcient to study

the variability in the sources. Balloon experiment were

carried out to observe X-ray sources for long exposure.

Balloon can reach a maximum height of 35 km but it

can be used to take hours long observations. Balloon

experiment were taken in diﬀerent parts of the world.

Tata Institute of Fundamental Research, Mumbai also

hosted several balloon experiment to study X-rays.

We cant control both balloons and rockets once its

launched. Rocket experiments were restricted by the

total exposure time and balloon experiments were re-

stricted by the altitude thus science community needed

a permanent solutions so that they wanted high altitude

observations for long exposures and that too by con-

trolling it from earth. Solution to the riddle was setting

up a satellite dedicated to X-ray astronomical observa-

tions. Its discussions were started in the early 1960s

and ﬁrst satellite were launched on 1970 and then our

understandings about the cosmos changed drastically.

We can discuss about satellite missions that changed

our views about the X-ray universe in the coming arti-

cles.

Reference

Santangelo, Andrea and Madonia, Rosalia and

Piraino, Santina A Chronological History of X-

ray Astronomy Missions.Handbook of X-ray and

Gamma-ray Astrophysics.ISBN 9789811645440

Riccardo Giacconi, Herbert Gursky, and Frank

R. Paolini, Bruno B. Rossi Evidence for x Rays

From Sources Outside the Solar System. Phys.

Rev. Lett. DOI 10.1103/PhysRevLett.9.439

About the Author

Aromal P is a research scholar in Depart-

ment of Astronomy Astrophysics and Space Engineer-

ing (DAASE) in IIT Indore. His research mainly fo-

cuses on neutron stars and blackholes

Radio Galaxies: An Introduction

by Kshitij Thorat

airis4D, Vol.2, No.5, 2024

www.airis4d.com

Most of us are familiar with the night sky as a

carpet of stars and planets, which we can see with our

own eyes. With a small, 6-inch telescope, you might

even spy fainter details and objects not visible to the

eye, like moons of Jupiter, nebulae and even close-by

galaxies, if you have a clear sky.

With larger telescopes, you can look at the de-

tails of far-away objects, many of them beyond our

galaxy, the Milky Way. However, our eyes are typically

sensitive to the so-called “visible spectrum”, ranging

roughly from light at red wavelengths at one end and

purple to the other. In contrast, celestial objects can

shine in diﬀerent bands, like ultraviolet, infrared, X-

rays and radio waves. While we can’t see this light, we

can use specialised telescopes which are able to do this

and thus give us a view of the sky literally in a diﬀerent

light.

Among the celestial objects which lie beyond our

own galaxy, the Milky Way, radio galaxies are some of

the most fascinating. Very brieﬂy, radio galaxies are

galaxies in which a large part of the emitted light comes

in the radio bands via large-scale jets and “lobes” (there

are other kinds of galaxies in which this radio emission

comes from remnants of dead stars and the light com-

ing from the process of star-formation, but we’ll not

focus on this class in this article). The jets associated

with radio galaxies are at a scale truly awesome; span-

ning at their largest millions of light-years and even

typically hundreds of thousands of light-years, making

them some of the largest objects in the Universe.

Where do these jets come from and how do they

form? This is actually an area of active research, but

the consensus is that the jets come from the centre of

the galaxy, where a supermassive black hole resides.

It is now thought that most of the galaxies have su-

permassive black holes (SMBHs henceforth) in their

centres, just like the Milky Way has one (Sagittarius

A*). Not all galaxies are radio galaxies, though, in-

cluding our own. What separates the SMBHs which

give rise to jets is their “activeness” - some of them are

accreting - eating - the matter surrounding them; this

process sometimes gives rise to the spectacular radio

jets we see in radio galaxies.

Fig 1 shows a radio galaxy - perhaps the most well-

studied and famous radio galaxy - Cygnus A. Cygnus

A is, in fact, one of the ﬁrst radio sources discovered

by radio astronomers almost a century ago. Note that

all the details you see in the image are made from

radio telescope observations at 1.4 GHz and rendered

in pseudocolour (Cygnus A is really not orange!). As

you can see from the ﬁgure, Cygnus A shows a clear

pair of jets emanating from a central bright, pointlike

“core”, going in opposite directions and forming ﬂuﬀy,

diﬀuse structures called “lobes”. The total size of these

jets is 500000 light-years! For comparison, the size of

our solar system, expressed as the distance between the

Sun and Pluto, is barely around 4-6 light hours. As

such, these jets extend far, far beyond the extent of the

galaxy as seen in the visible band. These jets eventually

terminate in bright “hotspots”, which are sites of shock

formation. The core, on the other hand, marks the

position of the SMBH sitting inside the galaxy’s heart

from which these jets arise.

The basic picture behind the light coming from

radio galaxies is thought to be the following: the jets,

which are formed from highly relativistic particles,

Image Credits: Legacy Astronomical Images, “Cygnus A,” NRAO/AUI Archives,

https://www.nrao.edu/archives/items/show/33386.

Figure 1: Cygnus A, an archetype of powerful radio

galaxies. The thin “jets” start from the bright “core”

in the image and stop at the brighter points at the end

or “hotspots”. The hotspots are surrounded by diﬀuse,

hazier clouds, the “lobes”. At a distance of almost 700

million light-years, Cygnus A is one of the brightest

objects in the radio sky.

which, travelling at a speed an appreciable fraction

of the speed of light, spiral through magnetic ﬁelds

which generates the so-called “Synchrotron radiation”

(as we know that accelerating charged particles emit

radiation).

The hotspots in which these jets terminate form

sites from which the particles can ﬂow back towards

the galaxy and form the lobes.

Such structures are features of many radio galax-

ies, but of course, radio galaxies can have a variety of

structures, including the so-called X-shaped, S-shaped,

Z-shaped, Bent-tailed types of radio galaxies depend-

ing on the exact process which gives rise to the jets and

the interplay of the jets with the environment in which

the radio galaxy resides.

These beautiful galaxies can be viewed with radio

telescopes, which are made up of dishes or antennas.

In particular, detailed images of radio galaxies can be

made using radio interferometers like the Giant Me-

trewave Radio Telescope (GMRT) near Pune and the

upcoming Square Kilometre Array (SKA) , an interna-

tional project, to which India contributes signiﬁcantly.

Remembering that the jets start in the activity near

the core of the SMBH accreting matter, we can see that

the larger structure of the radio galaxies, in fact, forms

a sort of a signpost to what the ongoing activity at the

heart of the galaxy; far easier to see than the actual

SMBH generating it.

Additionally, the jets themselves extend, as we

have seen, far beyond the visible extent of the galaxy

and can interact with other galaxies as well! The jets

can, variously, suppress the ongoing process of build-

ing galaxies through the process of star formation or

can further enhance it, making radio galaxies a key

player in the structure formation of our Cosmos.

Further Reading:

1. Radio galaxies: the mysterious, secretive “beasts”

of the Universe

2. Radio galaxy article on Wikipedia

3. Hotspots in Cygnus A: an active galactic nucleus

4. Synchrotron Radiation

About the Author

Dr Kshitij Thorat is a senior lecturer at the

University of Pretoria in South Africa. His research

interests revolve around extragalactic radio galaxies,

their lifecycles and their interactions with their envi-

ronments.

Unlocking the Mysteries of Star Clusters:

Celestial Ensembles of Cosmic Wonder

by Sindhu G

airis4D, Vol.2, No.5, 2024

www.airis4d.com

4.1 Star Clusters

A star cluster refers to a grouping of stars that

are bound together by gravitational forces. These clus-

ters can vary in size and composition, ranging from

small gatherings of a few dozen stars to massive con-

glomerations containing thousands or even millions of

stars. Star clusters are formed from the same cloud of

gas and dust, typically within a galaxy, and they often

share similar ages and chemical compositions.

Let’s take a brief look at how a star cluster forms.

Stars emerge from clouds of gas and dust under pre-

cise conditions. Gravity triggers the collapse of this

primarily hydrogen gas and dust. As the cloud con-

denses and pressure mounts, its core heats up, forming

a protostar. This protostar continues to accrete matter,

evolving into a fully-ﬂedged star. This stellar birth pro-

cess typically spans about a million years. Once born,

some stars can persist for over 10 billion years. Often,

when conditions favor the formation of one star, multi-

ple stars form, creating a cluster. Over time, stars may

depart the cluster through dispersion or ejection, while

others perish within it. Additionally, various factors

such as ultraviolet light, stellar winds, and supernovae

can expel gas and dust from the cluster, impeding new

star formation.

Star clusters serve as important tools for astronomers

to study various aspects of stellar evolution, galactic

dynamics, and the history of the universe. Star clusters

visible to the naked eye include the Pleiades, Hyades,

and 47 Tucanae. Three primary types of star clusters

exist: globular clusters, open clusters, and stellar asso-

ciations. Each category possesses distinct properties

that oﬀer astronomers diverse insights.

4.2 Globular Clusters

Globular clusters are densely packed groups of

stars, typically containing hundreds of thousands to

millions of stars bound together by gravity. These

clusters are some of the oldest objects in the universe,

with ages spanning billions of years. Their spheri-

cal shape and tightly packed arrangement make them

distinct from other types of star clusters. One remark-

able aspect of globular clusters is their stellar popu-

lations. The stars within these clusters are typically

old and metal-poor, meaning they formed early in the

universe’s history and contain elements heavier than

helium in relatively low abundance. Studying these

ancient stars provides valuable insights into the early

stages of galactic evolution and the conditions present

in the early universe.

Globular clusters contain minimal free dust or gas,

thereby prohibiting new star formation within them.

Stellar densities in the inner regions of a globular clus-

ter are signiﬁcantly higher when compared to regions

such as those surrounding the Sun. Globular clus-

ters also serve as natural laboratories for studying stel-

lar dynamics and evolution. The interactions between

stars within the cluster, such as gravitational encoun-

ters and binary star systems, can have profound eﬀects

on their evolution. By observing these interactions,

4.3 Open Cluster

astronomers can gain a better understanding of stellar

evolution and the processes that shape the universe.

Moreover, globular clusters are essential for mea-

suring the age and distance of the galaxies in which

they reside. Since they contain some of the oldest stars

in the universe, determining the age of globular clusters

provides valuable constraints on the age of their host

galaxies. Additionally, the brightness of these clusters

allows astronomers to calculate distances to galaxies

with remarkable precision.

When seen with the unaided eye, globular clus-

ters resemble faint smudges of light amidst the dark-

ness of space. However, when observed through a

telescope, their true essence emerges: thousands to

millions of stars coalesce into a spherical conﬁgura-

tion, featuring a luminous and densely packed core.

In the Milky Way, they are situated within both the

halo and the bulge regions. The stars within these

clusters remain conﬁned and do not disperse beyond

their boundaries. Our Milky Way hosts approximately

200 globular clusters, notable examples being 47 Tuc,

M4, and Omega Centauri(Figure: 2), although there

is ongoing debate regarding whether the latter may

actually be a captured dwarf spheroidal galaxy. Con-

versely, the Andromeda galaxy boasts approximately

400 globular clusters, while the M87 galaxy hosts over

10,000, as reported by the Harvard and Smithsonian

Center for Astrophysics. Some of the most luminous

globular clusters can be seen without the aid of a tele-

scope; among them, Omega Centauri shines brightest

and was even noted in ancient times, initially cataloged

as a single star prior to the advent of telescopes. In the

northern hemisphere, the brightest globular cluster is

M13, located in the constellation of Hercules.

4.3 Open Cluster

Open clusters consist of tens to a few thousand

stars originating from the same massive molecular

cloud, exhibiting similar ages and chemical compo-

sitions. These clusters are loosely bound by mutual

gravitational forces and are commonly located within

spiral and irregular galaxies. Open clusters lack a de-

ﬁned shape. Unlike globular clusters, open clusters are

Image credit: ESA/Hubble and NASA/A. Sarajedini

Figure 1: Globular star cluster NGC 6717 is located

about 20,000 light-years from Earth. SindhuFig:1

Figure 2: Globular star cluster Omega Centauri which

is located about 15,790 light-years from Earth. Image

Credit: NASA/ESA/Hubble SM4 ERO Team

4.4 Embedded Clusters

Figure 3: The globular cluster NGC 6397. Image

Credit: NASA, ESA, and T. Brown and S. Casertano

(STScI)

Figure 4: Messier 68, a loose globular cluster. Image

Credit: ESA/Hubble/ NASA

smaller and less densely populated, encompassing stars

of varying ages, from young to older ones. They serve

as vital subjects for studying stellar evolution due to

their uniform properties, facilitating the determination

of characteristics such as distance, age, metallicity, and

velocity, which can be more challenging with isolated

stars.

Stars within open clusters exhibit greater disper-

sion, rendering these clusters relatively unstable, with

stars prone to dispersing over the course of a few mil-

lion years. As open clusters with fewer stars are less

tightly bound by gravity, it is relatively simple for their

stars to drift away from the cluster when inﬂuenced by

external forces, such as interactions with giant molec-

ular clouds. However, this isn’t the sole mechanism

through which open clusters shed stars. During the or-

bits of stars within the cluster, close encounters can oc-

cur, leading to gravitational interactions. In instances

of close encounters involving multiple stars, one star

may be expelled from the cluster at a high velocity. If

this velocity surpasses a certain threshold, the star can

escape the gravitational pull of the cluster entirely.

Typically observed in regions of active star forma-

tion within spiral and irregular galaxies, open clusters

oﬀer valuable insights into the processes of star birth

and evolution. Within the Milky Way galaxy alone,

over 1,100 open clusters have been identiﬁed, with nu-

merous others presumed to exist, signiﬁcantly enrich-

ing our understanding of the universe. In the Milky

Way, these clusters can be spotted in our galaxy’s disk,

both in and between its spiral arms. The most promi-

nent open clusters are the Pleiades and Hyades in Tau-

rus.

4.4 Embedded Clusters

Embedded star clusters represent a type of star

cluster still enveloped by the molecular clouds from

which they originated. They are the youngest variety

of star cluster, housing recently formed and forming

stars that remain concealed by the gas and dust of their

parent molecular cloud. Typically, embedded clus-

ters serve as active regions of star formation, hosting

stars of similar ages and compositions. Embedded

4.4 Embedded Clusters

Figure 5: M47 is an open cluster in the constellation

Puppis. Image Credit: NOIRLab / NSF / AURA

Figure 6: This mosaic from NASA WISE Telescope

is of the Soul Nebula. It is an open cluster of stars sur-

rounded by a cloud of dust and gas located about 6,500

light-years from Earth in the constellation Cassiopeia,

near the Heart Nebula. Image Credit: NASA/JPL-

Caltech/UCLA

Figure 7: The Hubble Space Telescope spied this open

star cluster, named NGC 299, in the southern constel-

lation of Toucana (The Toucan), about 200,000 light-

years away. Image Credit: ESA/Hubble/NASA

Figure 8: The Jewel Box cluster, one of the best south-

ern sky open clusters to observe with a small telescope.

Image Credit: M. Bessell

4.5 Super Star Cluster

Figure 9: X-ray view of Orion showing the Trapezium

embedded cluster. Image Credit: NASA/CXC/Penn

State/E Feigelson/K.Getman et al.

clusters are believed to be fundamental units in the

process of star formation, as a substantial portion of

stars emerge within them. Over time, as the molec-

ular cloud dissipates, embedded clusters evolve into

open clusters. Due to heavy obscuration by dust and

gas, embedded clusters are challenging to observe in

visible light. However, infrared and X-ray observa-

tions can penetrate the cloud material, unveiling the

stars within. Renowned examples of embedded clus-

ters include the Trapezium cluster within the Orion

Nebula, L1688 within the Rho Ophiuchi cloud com-

plex, as well as clusters within the Triﬁd Nebula and

Eagle Nebula. Recent studies employing simulations

have oﬀered insights into the initial evolution and three-

dimensional structure of embedded clusters, revealing

that their morphology can rapidly change and may not

necessarily reﬂect their long-term evolution.

4.5 Super Star Cluster

A super star cluster (SSC) represents a notably

massive young open cluster, often regarded as a pre-

cursor to globular clusters. These clusters stand out

for their elevated luminosity and mass in comparison

to other young star clusters. Typically, super star clus-

Figure 10: A few young stars shine through dense

clouds of gas and dust in the Orion Nebula’s Trapezium

embedded cluster, 1,500 light-years from Earth. The

left image is taken in visible light; the right image is

taken in infrared light. Image Credit: NASA, C.R.

O’Dell and S.K. Wong (Rice University)

ters harbor a signiﬁcant population of young, massive

stars that generate ionization within a surrounding HII

region or even an ”Ultra dense HII region (UDHII)”

within the Milky Way Galaxy or other galaxies. They

commonly inhabit regions of intense star formation,

such as areas inﬂuenced by galactic interactions or

mergers. Crucial to comprehending massive star for-

mation, super star clusters are thought to transition into

globular clusters as they age. To observe them eﬀec-

tively, radio and infrared imaging prove superior due to

the high extinction levels in certain visible light wave-

lengths. Super star clusters generally boast masses sur-

passing 10

solar masses, with radii around 5 parsecs

and ages roughly estimated at 100 million years. They

exhibit notable electron densities and pressures asso-

ciated with the HII regions enveloping them. While

observed within the Milky Way Galaxy, super star clus-

ters are more abundantly identiﬁed in distant regions

of the universe, substantially contributing to our under-

standing of both star formation and galactic evolution.

Westerlund 1 (Figure: 11) is a compact young super

star cluster about 3.8 kpc (12,000 ly) away from Earth.

References:

Star Clusters: Inside the Universe’s Stellar Col-

lections

Star Clusters

What are star clusters?

4.5 Super Star Cluster

Figure 11: Westerlund 1. Image Credit:

2MASS/UMass/IPAC-Caltech/NASA/NSF

Star Clusters

Star cluster

Globular cluster

Open cluster

Embedded cluster

Hubble’s Star Clusters

Embedded Clusters

Early Evolution and 3D Structure of Embedded

Star Clusters

About the Author

Sindhu G is a research scholar in Physics

doing research in Astronomy & Astrophysics. Her

research mainly focuses on classiﬁcation of variable

stars using diﬀerent machine learning algorithms. She

is also doing the period prediction of diﬀerent types

of variable stars, especially eclipsing binaries and on

the study of optical counterparts of X-ray binaries.

Part III

Biosciences

Genetically Engineered Warriors: India’s

New Hope in Cancer Treatment

by Atharva Pathak

airis4D, Vol.2, No.5, 2024

www.airis4d.com

Cancer, once considered an unbeatable foe, is fac-

ing a new challenger in India – CAR T-cell therapy.

This revolutionary treatment harnesses the power of a

patient’s immune system to ﬁght the disease. As Dr.

Siddhartha Mukherjee, a renowned oncologist, said,

”Immunotherapy is fundamentally changing how we

approach cancer”. Let’s delve into how this innovative

therapy works and what it holds for the future of cancer

care in India.

1.1 Supercharging the Immune

System

Imagine training an army to recognize and destroy

your enemy’s troops. That’s the essence of CAR T-cell

therapy. Here’s a breakdown of the process:

1. Extraction: Doctors extract T cells from the pa-

tient’s blood, a type of white blood cell crucial

for ﬁghting infections.

2. Engineering: In a lab, scientists genetically mod-

ify the T cells with a particular receptor called

CAR (Chimeric Antigen Receptor). Think of

CAR as a helmet with a targeting sight.

3. Expansion: The engineered T cells are multi-

plied in large numbers.

4. Reinfusion: The powerful, CAR-equipped T cells

are infused into the patient’s bloodstream.

The CAR on the T cells acts like a homing bea-

con, allowing them to recognize and latch onto cancer

cells with speciﬁc surface proteins. Once attached, the

Credits:https://www.cancer.gov/publications/dictionaries/

cancer-terms/def/car-t-cell-therapy

Figure 1: The Fight Within

T cells unleash a targeted attack, destroying the can-

cer cells. This speciﬁcity is what makes CAR T-cell

therapy so promising.

1.2 A Breakthrough for India

Developed by a team of researchers in Indian In-

stitute of Technology Bombay - IITB, in collaboration

with Tata Memorial Hospital, NexCAR19 is India’s

ﬁrst indigenous CAR T-cell therapy. This is a signiﬁ-

cant achievement, as CAR T-cell therapies have tradi-

tionally been expensive.

”Accessible and aﬀordable CAR-T cell therapy

provides a new hope for the whole of humankind”,

1.3 Looking Ahead: A Brighter Future

Figure 2: T cells (pink) attack a cancer cell (yellow)

in this scanning electron micrograph image.

Credit: Steve Gschmeissner/SPL & Nature

Credits: ImmunoACT website

[https://www.immunoact.com/nexcar19]

said President Droupadi Murmu at the launch of Nex-

CAR19 in April 2024. This therapy oﬀers a poten-

tial lifeline for patients with B-cell cancers, such as

leukaemia and lymphoma, where conventional treat-

ments have failed.

1.3 Looking Ahead: A Brighter

Future

The success of NexCAR19 is a stepping stone

for further advancements in CAR T-cell therapy in

India. Researchers are exploring ways to target dif-

ferent types of cancers and personalize the treatment

for each patient’s unique needs. Additionally, making

the manufacturing process more eﬃcient could reduce

Credit: ImmunoACT, Nature

Figure 3: A member of the ImmunoACT team pre-

pares the NexCAR19 cancer treatment.

treatment costs, making it accessible to a broader pop-

ulation. A single treatment of NexCAR19, manufac-

tured by Mumbai-based ImmunoACT, costs between

US$30,000 and $40,000. The ﬁrst CAR-T therapy was

approved in the United States in 2017, and commercial

CAR-T therapies currently cost between $370,000 and

$530,000, not including hospital fees and drugs to treat

side eﬀects.

1.4 Unleashing the Power of AI and

ML in Cancer and Medicine

In the ever-evolving landscape of healthcare, Ar-

tiﬁcial Intelligence (AI) and Machine Learning (ML)

have emerged as revolutionary tools, oﬀering new hope

and possibilities in the ﬁght against cancer and other

diseases. These technologies are transforming how we

diagnose, treat, and manage illnesses, ushering in a

new era of personalized medicine.

One of the most signiﬁcant contributions of AI

and ML in medicine is in cancer detection and diag-

nosis. These technologies can analyze vast amounts of

medical data, including images, genetic information,

and patient records, to identify patterns and anomalies

that may indicate the presence of cancer. This ability

has led to the development of more accurate and ef-

ﬁcient diagnostic tools, such as AI-powered imaging

systems that can detect cancerous lesions with remark-

able precision.

”Artiﬁcial intelligence will transform the practice

of medicine. It will enable us to provide truly person-

1.5 Conclusion

alized care and make healthcare more accessible and

aﬀordable for everyone”, says Fei-Fei Li, Co-Director

of the Stanford Institute for Human-Centered AI.

Moreover, AI and ML are revolutionizing cancer

treatment by enabling the development of targeted ther-

apies. By analyzing genetic data from tumors, these

technologies can identify speciﬁc mutations that drive

cancer growth, allowing for the creation of drugs that

target these mutations with greater precision. This

approach, known as precision medicine, has shown

promising results in improving treatment outcomes and

reducing side eﬀects.

In addition to diagnosis and treatment, AI and ML

are also transforming cancer research. These technolo-

gies can analyze large datasets to uncover new insights

into the underlying causes of cancer, leading to the

discovery of new biomarkers and therapeutic targets.

This knowledge is crucial for developing innovative

therapies and improving our understanding of cancer

biology.

Despite the remarkable progress made possible by

AI and ML, challenges remain. One major challenge

is the integration of these technologies into existing

healthcare systems. This requires addressing issues

related to data privacy, regulatory compliance, and the

need for healthcare professionals to be trained in the

use of AI and ML tools.

”The real challenge is not whether machines can

think but whether men do”, says B. F. Skinner, Amer-

ican psychologist.

Looking ahead, several exciting developments are

on the horizon. One promising area is the use of AI

and ML in predicting patient outcomes and tailoring

treatment plans accordingly. By analyzing a patient’s

medical history, genetic proﬁle, and other factors, these

technologies can help clinicians make more informed

decisions about the best course of action for each indi-

vidual.

Another emerging trend is the use of AI and ML

in drug discovery. These technologies can analyze vast

libraries of chemical compounds to identify potential

drug candidates, signiﬁcantly accelerating the drug de-

velopment process. This approach has the potential

to bring new and more eﬀective treatments to market

faster than ever before.

In conclusion, AI and ML are revolutionizing the

ﬁeld of cancer and medicine, oﬀering new hope and

possibilities for patients and healthcare providers alike.

While challenges remain, the future looks bright, with

new technologies and approaches on the horizon that

promise to further transform healthcare and improve

patient outcomes.

1.5 Conclusion

India’s entry into CAR T-cell therapy marks a new

era in cancer treatment. This revolutionary approach

holds immense promise for oﬀering patients a renewed

chance at life. As Nelson Mandela said, ”Hope is a

powerful thing. It can make a start of what seems im-

possible”. With continued research and development,

CAR T-cell therapy has the potential to become a pow-

erful weapon in India’s ﬁght against cancer.

References:

Press Information Bureau, Government of India

[pib.gov.in]

The New Indian Express [newindianexpress.com]

National Cancer Institute Website [cancer.gov]

Nature Article https://www.nature.com/articles/

d41586-024-00809-y

Li, Fei-Fei. ”How AI Can Save Our Humanity.”

TED Talk, 2018.

Skinner, B. F. ”Beyond Freedom and Dignity.”

Hackett Publishing Company, 1971.

1.5 Conclusion

About the Author

Atharva Pathak currently work as a Soft-

ware Engineer & Data Manager for the Pune Knowl-

edge Cluster, A project under the Oﬃce of Principal

Scientiﬁc Advisor, Govt. of India & Supported by

IUCAA, Pune, IN. Before this, I was an Astronomer

at the Inter-University Centre for Astronomy & Astro-

physics, IUCAA. I have also worked on various free-

lance projects, development required for websites and

applications, And localization of diﬀerent software.

I am also a life member of Jyotirvidya Parisanstha,

India’s Oldest association of Amateur Astronomers,

and I look after the IOTA-India Occultation section

as a webmaster and data curator.

DNA Sequencing

Next-Generation Sequencing (NGS)

by Geetha Paul

airis4D, Vol.2, No.5, 2024

www.airis4d.com

2.1 Introduction

DNA sequencing is a fundamental laboratory tech-

nique utilised to ascertain the precise sequence of nu-

cleotides, or bases, within a DNA molecule. The se-

quence of these bases—typically denoted by the ﬁrst

letters of their chemical names: A (adenine), T (thymine),

C (cytosine), and G (guanine)—encapsulates the bio-

logical information crucial for cellular development

and functioning. Deciphering the DNA sequence is

pivotal for unravelling the functionality of genes and

other genomic components. DNA sequencing resem-

bles interpreting printed text: storing data analogous to

written words, learning its language, and comprehend-

ing its signiﬁcance. In the past, literacy was limited,

leaving many unable to read, while today, advance-

ments have made information more accessible.

Similarly, technological breakthroughs in DNA

sequencing have democratised access to our genetic

code, empowering broader understanding and explo-

ration. Yet, the ongoing challenge remains in fully

unlocking the implications of this genetic information

for our health and well-being. Various methods are

available for DNA sequencing, each characterised by

unique attributes. Ongoing advancements in genomics

continue to drive the exploration and development of

novel sequencing techniques.

2.2 Next Generation Sequencing

(NGS)

NGS is a type of DNA sequencing technology that

uses parallel sequencing of multiple small DNA frag-

ments to determine the sequence. This”high-throughput”

technology has allowed a dramatic increase in the speed

(and a decrease in the cost) at which an individual’s

genome can be sequenced. Next-generation sequenc-

ing (NGS), or high-throughput sequencing, represents

a robust platform capable of concurrently sequencing

thousands to millions of DNA molecules. This technol-

ogy encompasses various modern sequencing method-

ologies designed to meet the growing demand for cost-

eﬀective sequencing. Sequencing DNA means deter-

mining the order of the four chemical building blocks

- called ”bases” - that make up the DNA molecule.

The sequence tells scientists the kind of genetic in-

formation that is carried in a particular DNA seg-

ment. The technology is used to determine the order

of nucleotides in entire genomes or targeted regions

of DNA or RNA. Driven by the imperative for lower-

cost sequencing solutions, high-throughput sequencing

methods have been developed to generate thousands or

millions of sequences in a single run. This advance-

ment aims to surpass the limitations of conventional

dye-terminator techniques ( a technique in which each

of the four dideoxynucleotide chain terminators is la-

belled with ﬂuorescent dyes, each emitting light at dif-

ferent wavelengths. Next Next-generation sequencing

2.3 Step 1: Sample Isolation and Extraction

Image Courtesy: https://microbenotes.com/next-generation-sequencing-ngs/

Figure 1: Diagrammatic representation of the Next

Generation Sequencing workﬂow, Step 1. DNA ex-

traction, Step 2. The fragmented DNA binds with the

adapter for Library Preparation, Step 3. sequencing

and Step 4. Analysis

(NGS) is used to sequence both DNA and RNA. Bil-

lions of DNA strands get sequenced simultaneously

using NGS. Meanwhile, with Sanger sequencing, only

one strand is sequenced at a time. The advent of these

cutting-edge technologies has drastically accelerated

the pace and reduced the expense of DNA and RNA

sequencing compared to traditional Sanger sequencing

methods. Consequently, NGS has catalysed ground-

breaking advancements in genomics and molecular bi-

ology research.

In cases of low quantities of nucleic acids (e.g.,

when using single cells as the source), isolated DNA

and RNA may be ampliﬁed using polymerases ap-

propriate for whole genome ampliﬁcation (WGA) and

whole transcriptome ampliﬁcation (WTA), respectively,

to increase the amount of starting template before NGS

library preparation. WGA and WTA can help obtain

more sequencing reads, better coverage, improved sen-

sitivity, and better variant detection from limited sam-

ple amounts. Phi29 DNA polymerase is commonly

used for WGA because of its high processivity, re-

duced bias, high ﬁdelity, and ability to synthesise DNA

isothermally at a low temperature.

Next-generation sequencing (NGS) can be con-

ducted on samples containing DNA or RNA, includ-

ing cell cultures, fresh-frozen tissues, formalin-ﬁxed

paraﬃn-embedded (FFPE) tissues, blood, saliva, and

bone marrow. Diﬀerent extraction protocols tailored to

the speciﬁc sample type are available, each optimised

to maximise the yield and quality of nucleic acids ob-

tained.

The four steps of next-generation sequencing (NGS)

include nucleic acid isolation and extraction, library

preparation, clonal ampliﬁcation and sequencing, and

data analysis. Nucleic acid extraction and isolation are

vital ﬁrst steps in next-generation sequencing.

2.3 Step 1: Sample Isolation and

Extraction

Nucleic acid extraction is a fundamental initial

step in the NGS workﬂow, regardless of whether you’re

sequencing genomic DNA (gDNA), total RNA, or var-

ious RNA types. Choosing an isolation method or kit

that facilitates proper cell and tissue lysis is crucial.

This ensures the attainment of suﬃcient yield, purity,

and quality necessary for subsequent library prepara-

tion steps. Yield: The isolation or extraction method

should yield nanograms (ng) to micrograms (µg) of

DNA or RNA, which is crucial for library prepara-

tion. Maximum yield is essential, especially from lim-

ited or archived sources like cell-free DNA (cfDNA)

and formalin-ﬁxed, paraﬃn-embedded (FFPE) sam-

ples. Purity: Isolated nucleic acids must be devoid of

compounds that might inhibit enzymes during library

preparation. Common inhibitors include reagents from

nucleic acid isolation (e.g., phenol, ethanol) or contam-

inants from biological samples (e.g., heparin, humic

acid). The chosen method should eﬀectively remove

or minimise these contaminants. Quality: Integrity

and quality of isolated nucleic acids are vital. Most

of the DNA should be of high molecular weight and

intact for gDNA. RNA should be minimally degraded,

maintaining heterogeneity and representing the origi-

nal sample’s nucleic acid populations. With FFPE sam-

ples, where DNA and RNA are fragmented, appropri-

ate isolation methods or kits should be selected to en-

sure suﬃcient yield and quality for sequencing. Yield,

purity, and quality of isolated nucleic acids should be

assessed before proceeding to NGS library preparation.

The following are methods commonly used to examine

these attributes: UV spectrophotometric assays mea-

2.4 Step 2: Library Preparation

sure A

260

, A

260

280

ratio, and A

260

230

ratio to help

assess sample purity and yield. Fluorometric assays

help quantify speciﬁc types of nucleic acids (e.g., ss-

DNA, dsDNA, small RNA). Gel-based or microﬂuidic

electrophoresis helps determine fragment size, distri-

bution, and quantity.

2.4 Step 2: Library Preparation

Library preparation from RNA or DNA samples

involves three primary steps. After isolation and pu-

riﬁcation, the sequencers prepare nucleic acids for pro-

cessing and reading. These prepared, ready-to-sequence

samples are commonly called ”libraries” because they

represent a sequenceable collection of molecules. Al-

though the library preparation procedure may vary de-

pending on the methods and reagents used, the general

steps for Illumina systems are as follows: Nucleic Acid

Fragmentation or Ampliﬁcation: In this initial step,

target sequences are ampliﬁed to generate a pool of

fragments of appropriate size. If RNA is the start-

ing material, a reverse transcription step is required

to convert RNA into cDNA. The nucleic acid sample

is fragmented into small pieces suitable for massively

parallel sequencing. The optimal range of fragment

sizes depends on the sequencers and sequencing appli-

cations.

The Illumina platform utilises solid-phase ampli-

ﬁcation in which each fragment in the library ﬁrst an-

neals to the primers on the sequencing chip (known

as the ﬂow cell) via the adapters. Through a series

of ampliﬁcation reactions known as bridge ampliﬁca-

tion [4] (Figure 2A), each fragment forms a cluster of

identical molecules called clonal clusters (Figure 2B);

therefore, every cluster represents one primary library

molecule. Note that clonal ampliﬁcation on a pat-

terned ﬂow cell with predeﬁned arrays employs a dif-

ferent method called exclusion ampliﬁcation (ExAmp)

chemistry. The ExAmp technology involves the instan-

taneous ampliﬁcation of a DNA fragment after binding

to the primer on the patterned ﬂow cell, excluding other

DNA fragments from forming a polyclonal cluster [5].

This process of clonal ampliﬁcation should not be

confused with library ampliﬁcation, which is carried

Image courtesy: https://www.thermoﬁsher.com/in/en/home/life-science/cloning/cloning-learning-

center/invitrogen-school-of-molecular-biology/next-generation-sequencing/illumina-workﬂow.

Figure 2: Ampliﬁcation steps. (A) Bridge ampliﬁca-

tion. (1) The complementary strand of a DNA frag-

ment in the library is synthesised from the ﬂow cell’s

priming oligo. (2) After removal of the original strand,

the complementary strand folds over and anneals with

the other type of ﬂow cell oligo. A double-stranded

bridge is formed after the synthesis of its complemen-

tary strand. (3) The double-stranded bridge is dena-

tured, forming two single strands attached to the ﬂow

cell. (4) The process of bridge ampliﬁcation repeats,

and (5) more clones of double-stranded bridges are

formed. (B) Cluster generation. The double-stranded

clonal bridges are denatured (only one strand is shown

here for simplicity), the reverse strands are removed,

and the forward strands remain as clusters for sequenc-

ing.

2.5 Step 3: Sequencing Reaction

Image courtesy: https://www.thermoﬁsher.com/in/en/home/life-science/cloning/cloning-learning-

center/invitrogen-school-of-molecular-biology/next-generation-sequencing/ Illumina-workﬂow.

Figure 3: Sequencing by cyclic reversible termina-

tion, in which nucleotides incorporated by a DNA

polymerase into the complementary DNA strand of

the clonal clusters are detected one base at a time.

out to increase library input before loading onto a ﬂow

cell.

Adapter Ligation: Sequencing adapters are added

to the DNA or cDNA fragments following ampliﬁca-

tion. These adapters contain sequences that will inter-

act with the NGS platform. Adapters, such as P5 and

P7, contain oligonucleotide sequences complementary

to the priming oligos on the sequencing chips. The ends

of nucleic acid fragments are ligated with adapters to

enable sequencing. Since Illumina adapters are spe-

ciﬁc to the sequencing platform, they are not inter-

changeable. If multiple samples are to be sequenced

simultaneously, unique identiﬁers or barcodes can be

ligated to each amplicon. This allows for pooling nu-

merous libraries into a single sequencing run, which

can then be”demultiplexed” during data analysis to as-

sign reads to their respective samples.

The Illumina sequencing technology employs ﬂu-

orescent dye-labelled dNTPs with a reversible termina-

( Image courtesy:

https://www.thermoﬁsher.com/in/en/home/life-science/cloning/cloning-learning-center/invitrogen-

school-of-molecular-biology/next-generation-sequencing/Illumina-workﬂow.)

Figure 4: Workﬂow of NGS library preparation for

Illumina systems.

tor to capture ﬂuorescent signals in each cycle, utilising

a process known as cyclic reversible termination. In

each cycle, only one of the four ﬂuorescent dNTPs is

incorporated by the DNA polymerase, based on com-

plementarity, after which unbound dNTPs are washed

away. Images of the clusters are captured following the

incorporation of each nucleotide. The incorporated nu-

cleotide’s emission wavelength and ﬂuorescence inten-

sity are measured to identify the base contained in each

cluster during that cycle. After imaging, the ﬂuores-

cent dye and the terminator are cleaved and released,

marking the completion of one cycle. Subsequently,

the next cycle of synthesis, imaging, and deprotection

commences. This sequential process allows each base

to be sequenced one cycle at a time. To achieve a read

length of ”n” bases, this cycle is repeated”n” times.

Library Quantitation: A sequencing library rep-

resents a pool of DNA fragments with adapters attached

to their ends after preparation. Prepared libraries must

be quantiﬁed (and normalised as needed) to load an op-

timal concentration of molecules onto the sequencers

for sequencing. This quality control step ensures con-

sistent data output, quality, and eﬃcient use of sequenc-

ing chips. Fluorometric spectroscopy and real-time

PCR are standard methods used for library quantiﬁca-

tion.

2.5 Step 3: Sequencing Reaction

Parallel sequencing uses a next-generation sequenc-

ing (NGS) platform. The prepared library is loaded

onto the sequencer, which then ”reads” the nucleotides

individually. The number of reads generated varies

depending on the speciﬁc sequencing platform and

2.6 NGS Data Analysis Using Bioinformatics

Image courtesy: https://irepertoire.com/ngs-overview-from-sample-to-sequencer-to-results/

Figure 5: Sequencing workﬂow in Illumina sequencer.

Library fragment undergoes hybridisation with spe-

ciﬁc primers, forming clusters, which are then ampli-

ﬁed to generate millions to billions of clonal clusters.

Following cluster formation, ﬂuorescently labelled nu-

cleotides synthesise a complementary strand for each

fragment. With the addition of each tagged nucleotide,

the ﬂow cell undergoes imaging, capturing the emitted

ﬂuorescence from each cluster. The wavelength and

intensity of the ﬂuorescent emission are subsequently

analysed to identify the sequence of the templates.

kit employed. Various methods of NGS have been

developed, including pyrosequencing, sequencing by

ligation (SOLiD), sequencing by synthesis (SBS - Illu-

mina), and Ion Torrent sequencing. Illumina sequenc-

ing is the most prevalent among these, contributing

to approximately 90% of the world’s sequencing data

(as per Illumina’s website). While all NGS platforms

perform sequencing of millions of small fragments of

DNA or cDNA, there are several diﬀerent sequencing

technologies. Illumina pioneered the most prevalent

and successful sequencing technology. Illumina se-

quencers use a glass ﬂow cell coated with millions

of oligonucleotides complementary to the sequenc-

ing adaptors. Each library fragment hybridised with

these primers during sequencing, forming clusters am-

pliﬁed to generate millions to billions of clonal clus-

ters. Subsequently, ﬂuorescently labelled nucleotides

are utilised to synthesise a complementary strand for

each fragment. After adding each tagged nucleotide,

the ﬂow cell undergoes imaging, and the emitted ﬂu-

orescence from each cluster is recorded. The wave-

length and intensity of the ﬂuorescent emission are

then utilised to identify the sequence of the templates.

2.6 NGS Data Analysis Using

Bioinformatics

The ﬁnal step in the NGS workﬂow involves pro-

cessing, analysis, and interpretation of the sequencing

data generated. Bioinformatic tools play a crucial role

in converting raw sequencing data into meaningful re-

sults. However, due to the vast amount of data gener-

ated by NGS (gigabases of raw data), the availability

and capability of computing power to process and anal-

yse such large datasets pose signiﬁcant challenges to

the workﬂow.

This step of the NGS workﬂow can be broadly cat-

egorised into three stages. The applications and goals

of NGS experiments often determine how the data are

processed and analysed and which bioinformatic tools

are utilised.

Stages of NGS Data Analysis

1. Pre-processing: In this stage, raw sequencing

data undergoes pre-processing to remove low-

quality reads, adapter sequences, and other arte-

facts. Quality control checks are performed

to ensure the reliability of the data. Typical

tasks include read trimming, quality ﬁltering,

and adapter removal.

2. Alignment and Mapping: Once pre-processed,

the sequencing reads are aligned or mapped to a

reference genome or transcriptome. This step in-

volves identifying the genomic or transcriptome

locations where the reads originated. Various

alignment algorithms and tools are employed for

this purpose, considering factors such as read

length, sequencing technology, and genome com-

plexity.

3. Variant Calling and Analysis: After alignment,

variant calling is performed to identify genetic

variations such as single nucleotide polymor-

phisms (SNPs), insertions, deletions, and struc-

tural variants. Statistical algorithms and ﬁlters

are applied to distinguish true variants from se-

quencing errors and artefacts. Following variant

calling, downstream analysis may include func-

2.6 NGS Data Analysis Using Bioinformatics

tional annotation, pathway analysis, and inter-

pretation of the biological signiﬁcance of de-

tected variants.

In conclusion, Next-Generation Sequencing (NGS) is

a transformative technique capable of producing vast

volumes of data, oﬀering the potential for ground-

breaking biological discoveries. While the NGS work-

ﬂow encompasses numerous intricate processes and

considerations, grasping the fundamental principles

of its key steps is pivotal. This understanding aids

in meticulously planning NGS experiments, ensuring

high-quality data acquisition and attaining signiﬁcant

outcomes. By comprehending the core principles un-

derlying NGS methodologies, researchers can navigate

the complexities of sample preparation, sequencing,

and data analysis more precisely. This, in turn, en-

hances the reliability and robustness of the results ob-

tained from NGS experiments, facilitating the elucida-

tion of novel biological insights and advancing scien-

tiﬁc knowledge. Ultimately, with a solid grasp of NGS

fundamentals, researchers can harness the full poten-

tial of this powerful technology to unlock the mysteries

of the genome and beyond.

References

Schroeder A, Mueller O, Stocker S et al. (2006)

The RIN: an RNA integrity number for assigning in-

tegrity values to RNA measurements. BMC Mol Biol.7:3.

Thermo Fisher Scientiﬁc, Inc. (2018) Qubit RNA

IQ Assay: a fast and easy ﬂuorometric RNA quality

assessment. (Application note)

Stepanauskas R, Fergusson EA, Brown J et al.

(2017) Improved genome recovery and integrated cell-

size analyses of individual uncultured microbial cells

and viral particles. Nat Commun8(1):84.

Illumina, Inc. (2017) An Introduction to Next-

Generation Sequencing Technology. (Brochure)

Illumina, Inc. Patterned Flow Cell Technology.

(Website)

Bentley DR, Balasubramanian S, Swerdlow HP et

al. (2008) Accurate whole human genome sequencing

using reversible terminator chemistry. Nature456(7218):53–

59.

Illumina, Inc. (2018) Illumina CMOS Chip and

One-Channel SBS Chemistry. (Technical note)

https://microbenotes.com/next-generation-sequencing-ngs/

https://www.genome.gov/genetics-glossary/Genetic-Code

https://irepertoire.com/ngs-overview-from-sample-to-sequencer-to-results/

https://www.thermoﬁsher.com/in/en/home/industrial/

spectroscopy-elemental-isotope-analysis/molecular-spectroscopy/

ﬂuorometers.html

https://www.thermoﬁsher.com/in/en/home/life-science/dna-

rna-puriﬁcation-analysis/nucleic-acid-gel-electrophoresis/e-

gel-electrophoresis-system/e-gel-precast-agarose-gels.html

About the Author

Geetha Paul is one of the directors of

airis4D. She leads the Biosciences Division. Her

research interests extends from Cell & Molecular Bi-

ology to Environmental Sciences, Odonatology, and

Aquatic Biology.

How Bisulﬁte Sequencing Reveals Hidden

Messages?

by Jinsu Ann Mathew

airis4D, Vol.2, No.5, 2024

www.airis4d.com

DNA methylation is a crucial part of how our

genes work. It happens when certain parts of our DNA

get tagged with small chemicals, usually at spots called

CpG-rich regions. These tags can aﬀect how genes are

turned on or oﬀ, which is really important for under-

standing how our bodies function.

Now, here’s the tricky part: regular sequencing

methods can’t directly show us where these tags are.

They can only tell us the basic A, T, G, and C build-

ing blocks of DNA without detailing whether they’re

tagged with these chemicals or not.

But there’s a cool technique called bisulﬁte con-

version that changes all that. It’s like a magic trick that

reveals the hidden methylation patterns in our DNA.

By combining this technique with sequencing, scien-

tists can ﬁnally see where these methylation tags are

and how they aﬀect our genes.

In this article, we’ll dive into the world of bisulﬁte

sequencing, breaking down how it works in simple

terms and why it’s so important for understanding DNA

methylation.

3.1 What is Bisulphite Sequencing

Bisulphite sequencing is a powerful technique

used to study DNA methylation, an essential epige-

netic modiﬁcation that inﬂuences gene expression and

various cellular processes. Methylation is typically in-

vestigated in gene promoter regions, with a focus on

CpG dinucleotides. Within these regions, methylation

occurs through the addition of a methyl group to the

(image

courtesy:https://geneticeducation.co.in/what-is-bisulﬁte-sequencing-beginners-to-advance-guide/)

Figure 1: Conversion of Cytosine to 5-methylcytosine

C5 carbon of the cytosine nucleotide, resulting in the

formation of 5-methylcytosine (Figure 1).

The principle behind bisulﬁte sequencing lies in

the chemical conversion of cytosine bases. Sodium

bisulﬁte, a chemical agent, speciﬁcally targets and

chemically modiﬁes cytosine residues in DNA. Im-

portantly, it converts unmethylated cytosines to uracil

while leaving methylated cytosines unchanged. This

chemical conversion provides a way to diﬀerentiate

between methylated and unmethylated cytosines.

After bisulﬁte treatment, the modiﬁed DNA un-

dergoes polymerase chain reaction (PCR) ampliﬁca-

tion. PCR selectively ampliﬁes the DNA regions of

interest, which contain the converted cytosines (uracil)

and any remaining methylated cytosines. This step

generates multiple copies of the DNA fragments for

subsequent sequencing analysis.

The PCR-ampliﬁed DNA fragments are then sub-

jected to DNA sequencing. During sequencing, the

modiﬁed cytosines (originally methylated or unmethy-

lated) are read as thymines (T), while the methylated

3.2 DNA Isolation

(Image courtesy:

https://geneticeducation.co.in/what-is-bisulﬁte-sequencing-beginners-to-advance-guide/)

Figure 2: Illustration of the complete bisulﬁte se-

quencing process.

cytosines, which were protected from bisulﬁte con-

version, are read as cytosines (C). By comparing the

sequenced DNA with the original reference sequence,

researchers can identify the locations of methylated

cytosines (Figure 2).

Steps in Bisulﬁte Sequencing

3.2 DNA Isolation

DNA isolation, a fundamental technique in molec-

ular biology, involves extracting the genetic material

from a cell. This puriﬁed DNA serves as the founda-

tion for various downstream applications like genetic

testing or gene cloning. The process typically follows

a multi-step approach:

Cell Lysis and Breakdown: The initial step dis-

rupts the cell wall and membrane, releasing the cel-

lular contents including DNA. This can be achieved

mechanically using homogenization, with the action of

speciﬁc enzymes, or with detergents that dissolve the

cell membrane.

Puriﬁcation and Isolation: Following cell dis-

ruption, unwanted molecules like proteins and RNA

are removed. This often involves enzymatic diges-

tion to break down these contaminants. Finally, the

DNA is separated from the remaining cellular debris.

Techniques like alcohol precipitation or chromatogra-

phy can be employed for this purpose, resulting in a

puriﬁed and concentrated DNA sample.

(Image courtesy: https://www.researchgate.net/ﬁgure/Bi-molecular-hybridization-and-

denaturation-of-DNA ﬁg2 253962134)

Figure 3: Denaturation of DNA

3.3 Bisulphite Conversion

Bisulﬁte conversion is a chemical treatment used

to investigate DNA methylation patterns at single-nucleotide

resolution. This process involves the treatment of ge-

nomic DNA with sodium bisulﬁte, a compound that

chemically modiﬁes unmethylated cytosine residues,

while leaving methylated cytosines unchanged. Fol-

lowing are the steps involved:

Denaturation: Exposing the Cytosines: The

ﬁrst step involves breaking apart the double-stranded

DNA (dsDNA) into single strands (Figure 3). This is

achieved through denaturation, typically by applying

heat or chemicals. This step is critical because bisul-

ﬁte conversion only works on single-stranded DNA.

The presence of the complementary strand in dsDNA

physically protects cytosines from the conversion pro-

cess.

Chemical Conversion: Unmasking Unmethy-

lated Cytosines: With the DNA single-stranded, the

sample is then incubated with sodium bisulﬁte at a

speciﬁc temperature. This chemical reacts with un-

methylated cytosines (C) in the DNA, causing them to

deaminate and transform into uracil (U). Importantly,

methylated cytosines (5-methylcytosine) remain unaf-

fected by sodium bisulﬁte.

Puriﬁcation: Preparing for Analysis: The ﬁnal

3.4 PCR Ampliﬁcation

step involves desalting and desulfonation. This crucial

cleaning process removes all the leftover sodium bisul-

ﬁte and any unconverted single-stranded DNA frag-

ments. The remaining puriﬁed DNA now contains

uracil (U) where there were originally unmethylated

cytosines, while the methylated cytosines retain their

original form (C).

3.4 PCR Ampliﬁcation

Polymerase Chain Reaction (PCR) ampliﬁcation

in bisulﬁte sequencing plays a pivotal role in selectively

amplifying the regions of interest within the bisulﬁte-

treated DNA sample. Bisulﬁte conversion transforms

unmethylated cytosines (C) into uracil (U). But, reg-

ular DNA polymerases used in PCR can only read

and incorporate the standard DNA bases (A, C, G, T).

They can’t directly work with uracil (U). This creates

a roadblock for amplifying the bisulﬁte-treated DNA,

as it now contains uracil where unmethylated cytosines

originally resided.

To overcome this hurdle, bisulﬁte sequencing em-

ploys specialized polymerases. These enzymes are

aptly named ”bisulﬁte-converted DNA compatible” poly-

merases. They possess the unique ability to recognize

and incorporate adenine (A) opposite uracil (U) during

PCR. Through a series of heating, cooling, and exten-

sion cycles, the targeted region is ampliﬁed, including

both the converted (originally unmethylated) and un-

converted (originally methylated) sections. With each

cycle, the target DNA fragments are exponentially am-

pliﬁed, resulting in a substantial increase in the number

of DNA copies (Figure 4). After PCR ampliﬁcation

is complete, the resulting PCR products containing

the bisulﬁte-converted DNA fragments are analyzed

to conﬁrm successful ampliﬁcation.

3.5 Sequencing

Following PCR ampliﬁcation in bisulﬁte sequenc-

ing, DNA sequencing serves as the ﬁnal step to trans-

late the methylation information into a readable format.

Unlike standard sequencing that identiﬁes the classical

A, C, G, and T bases, bisulﬁte sequencing requires a

(Image courtesy:https://www.researchgate.net/ﬁgure/The-exponential-ampliﬁcation-of-DNA-in-

PCR ﬁg4 236065209)

Figure 4: Ampliﬁcation of DNA in PCR.

careful interpretation due to the prior conversion step.

The key lies in remembering that bisulﬁte treat-

ment converts unmethylated cytosines (C) to uracil (U).

During PCR ampliﬁcation, this uracil gets incorpo-

rated as thymine (T) into the newly synthesized DNA

strands. Therefore, analyzing the ﬁnal sequenced DNA

provides a map of the original methylation pattern.

By comparing the sequenced DNA fragments to the

reference genome, researchers can discern whether a

cytosine was originally methylated (if it remains a cy-

tosine) or unmethylated (if it was converted to uracil).

Through this analysis, methylation proﬁles and maps

are generated, providing valuable insights into DNA

methylation patterns and their role in gene regulation,

development, and disease.

3.6 Conclusion

In conclusion, bisulﬁte conversion oﬀers a pow-

erful tool for investigating DNA methylation, a key

epigenetic modiﬁcation that inﬂuences gene expres-

sion and cellular function. By selectively converting

unmethylated cytosines to uracil, this technique allows

researchers to create a map of methylation patterns

across a speciﬁc DNA region. Through subsequent

PCR ampliﬁcation and DNA sequencing, the original

methylation status can be determined. Bisulﬁte conver-

sion plays a vital role in various research areas, includ-

ing understanding gene regulation in development and

disease, and oﬀers valuable insights for advancing our

understanding of how the epigenome shapes cellular

processes.

3.6 Conclusion

References

What is Bisulﬁte Sequencing?- Beginners to Ad-

vance Guide

DNA methylation detection: Bisulﬁte genomic

sequencing analysis

Bisulﬁte sequencing

Brush Up: What Is Bisulﬁte Sequencing and

How Do Researchers Use It to Study DNA Methy-

lation?

BS-Seq/Bisulﬁte-seq/WGBS

Principles and Workﬂow of Whole Genome Bisul-

ﬁte Sequencing

About the Author

Jinsu Ann Mathew is a research scholar

in Natural Language Processing and Chemical Infor-

matics. Her interests include applying basic scientiﬁc

research on computational linguistics, practical appli-

cations of human language technology, and interdis-

ciplinary work in computational physics.

About airis4D

Artiﬁcial Intelligence Research and Intelligent Systems (airis4D) is an AI and Bio-sciences Research Centre.

The Centre aims to create new knowledge in the ﬁeld of Space Science, Astronomy, Robotics, Agri Science,

Industry, and Biodiversity to bring Progress and Plenitude to the People and the Planet.

Vision

Humanity is in the 4th Industrial Revolution era, which operates on a cyber-physical production system. Cutting-

edge research and development in science and technology to create new knowledge and skills become the key to

the new world economy. Most of the resources for this goal can be harnessed by integrating biological systems

with intelligent computing systems oﬀered by AI. The future survival of humans, animals, and the ecosystem

depends on how eﬃciently the realities and resources are responsibly used for abundance and wellness. Artiﬁcial

intelligence Research and Intelligent Systems pursue this vision and look for the best actions that ensure an

abundant environment and ecosystem for the planet and the people.

Mission Statement

The 4D in airis4D represents the mission to Dream, Design, Develop, and Deploy Knowledge with the ﬁre of

commitment and dedication towards humanity and the ecosystem.

Dream

To promote the unlimited human potential to dream the impossible.

Design

To nurture the human capacity to articulate a dream and logically realise it.

Develop

To assist the talents to materialise a design into a product, a service, a knowledge that beneﬁts the community

and the planet.

Deploy

To realise and educate humanity that a knowledge that is not deployed makes no diﬀerence by its absence.

Campus

Situated in a lush green village campus in Thelliyoor, Kerala, India, airis4D was established under the auspicious

of SEED Foundation (Susthiratha, Environment, Education Development Foundation) a not-for-proﬁt company

for promoting Education, Research. Engineering, Biology, Development, etc.

The whole campus is powered by Solar power and has a rain harvesting facility to provide suﬃcient water supply

for up to three months of drought. The computing facility in the campus is accessible from anywhere through a

dedicated optical ﬁbre internet connectivity 24×7.

There is a freshwater stream that originates from the nearby hills and ﬂows through the middle of the campus.

The campus is a noted habitat for the biodiversity of tropical Fauna and Flora. airis4D carry out periodic and

systematic water quality and species diversity surveys in the region to ensure its richness. It is our pride that

the site has consistently been environment-friendly and rich in biodiversity. airis4D is also growing fruit plants

that can feed birds and provide water bodies to survive the drought.