Using PyVO to find and acquire HEASARC data#
Learning Goals#
By the end of this tutorial, you will be able to:
Access NuSTAR data using the VO python client
pyvo.Find and download data for a specific object.
Introduction#
This notebook presents a tutorial of how to access HEASARC data using the virtual observatory (VO)
python client pyvo.
We handle the case of a user searching for data on a specific astronomical object from a specific high-energy mission observation table.
We will find all NuSTAR observations of 3C 105 that have an exposure of less than 10 ks.
Inputs#
The name of the object to identify observations of, in this case 3C 105.
Outputs#
NuSTAR observation files for the selected object.
Runtime#
As of 9th February 2026, this notebook takes ~60 s to run to completion on Fornax using the ‘Default Astrophysics’ image and the ‘small’ server with 8GB RAM/ 2 cores.
Imports#
import glob
import os
import pyvo
from astropy.coordinates import SkyCoord
Global Setup#
Functions#
Constants#
Configuration#
1. Finding the observations#
This part assumes we know the ID of the VO service. Generally these are of
the form: ivo://nasa.heasarc/{table_name}.
We assume that we already know the name of the NuSTAR ‘master’ table that lists all NuSTAR observations - ‘numaster’.
If you don’t know the name of the table, you can search the VO registry using
the pyvo.registry.search() function:
pyvo.registry.search("nustar master")
<DALResultsTable length=1>
ivoid ...
...
object ...
--------------------------- ...
ivo://nasa.heasarc/numaster ...
The search service#
First, we create a cone search service instance, passing the VO service ID, and retrieving the cone search service object:
# First, set up the VO object we need to access the numaster table
nu_services = pyvo.regsearch(ivoid="ivo://nasa.heasarc/numaster")[0]
# Retrieve the cone search service object
cs_service = nu_services.get_service("conesearch")
We can examine the attributes and methods of the cone search service object using
Python’s built-in dir() function:
dir(cs_service)
['__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getstate__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'_baseurl',
'_capability_description',
'_get_metadata',
'_session',
'baseurl',
'capability_description',
'columns',
'create_query',
'describe',
'description',
'search']
As well as the docstring written for the cone search service object and the list of
possible input parameters, using Python’s built-in help() function:
help(cs_service)
Help on SCSService in module pyvo.dal.scs object:
class SCSService(pyvo.dal.query.DALService)
| SCSService(baseurl, *, capability_description=None, session=None)
|
| a representation of a Cone Search service
|
| Method resolution order:
| SCSService
| pyvo.dal.query.DALService
| builtins.object
|
| Methods defined here:
|
| __init__(self, baseurl, *, capability_description=None, session=None)
| instantiate a Cone Search service
|
| Parameters
| ----------
| baseurl : str
| the base URL for submitting search queries to the service.
| session : object
| optional session to use for network requests
|
| create_query(self, pos=None, radius=None, *, verbosity=None, **keywords)
| create a query object that constraints can be added to and then
| executed. The input arguments will initialize the query with the
| given values.
|
| Parameters
| ----------
| pos : astropy.coordinates.SkyCoord
| a SkyCoord instance defining the position of the center of the
| circular search region.
| converted if it's a iterable containing scalars,
| assuming icrs degrees.
| radius : `~astropy.units.Quantity` or float
| a Quantity instance defining the radius of the circular search
| region, in degrees.
| converted if it is another unit.
| verbosity : int
| an integer value that indicates the volume of columns
| to return in the result table. 0 means the minimum
| set of columns, 3 means as many columns as are available.
| **keywords :
| additional case insensitive parameters can be given via arbitrary
| case insensitive keyword arguments. Where there is overlap
| with the parameters set by the other arguments to
| this function, these keywords will override.
|
| Returns
| -------
| SCSQuery
| the query instance
|
| See Also
| --------
| SCSQuery
|
| describe(self)
| describe the general information about the DAL service
|
| search(self, pos, radius=1.0, *, verbosity=2, **keywords)
| submit a simple Cone Search query that requests objects or observations
| whose positions fall within some distance from a search position.
|
| Parameters
| ----------
| pos : astropy.coordinates.SkyCoord
| a SkyCoord instance defining the position of the center of the
| circular search region.
| converted if it's a iterable containing scalars,
| assuming icrs degrees.
| radius : `~astropy.units.Quantity` or float
| a Quantity instance defining the radius of the circular search
| region, in degrees.
| converted if it is another unit.
| verbosity : int
| an integer value that indicates the volume of columns
| to return in the result table. 0 means the minimum
| set of columns, 3 means as many columns as are available.
| **keywords :
| additional case insensitive parameters can be given via arbitrary
| case insensitive keyword arguments. Where there is overlap
| with the parameters set by the other arguments to
| this function, these keywords will override.
|
| Returns
| -------
| SCSResults
| a container holding a table of matching catalog records
|
| Raises
| ------
| DALServiceError
| for errors connecting to or communicating with the service
| DALQueryError
| if the service responds with an error,
| including a query syntax error.
|
| See Also
| --------
| SCSResults
| pyvo.dal.DALServiceError
| pyvo.dal.DALQueryError
|
| ----------------------------------------------------------------------
| Readonly properties defined here:
|
| columns
| the available columns on this service
|
| description
| the service description.
|
| ----------------------------------------------------------------------
| Methods inherited from pyvo.dal.query.DALService:
|
| __repr__(self) -> str
| Return repr(self).
|
| ----------------------------------------------------------------------
| Readonly properties inherited from pyvo.dal.query.DALService:
|
| baseurl
| the base URL identifying the location of the service and where
| queries are submitted (read-only)
|
| capability_description
| The service description.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from pyvo.dal.query.DALService:
|
| __dict__
| dictionary for instance variables
|
| __weakref__
| list of weak references to the object
Finding the data#
Next, we will use the search function in cs_service to search for observations
around our source. We’ve already set up a constant for the source name, in
the ‘Global Setup: Constants’ section:
SRC_NAME
'3C 105'
The search function takes as input, the sky position either as a list
of [RA, DEC], or as an astropy sky coordinate object SkyCoord.
# Find the coordinates of the source
pos = SkyCoord.from_name(SRC_NAME)
# Show the retrieved coordinates
pos
<SkyCoord (ICRS): (ra, dec) in deg
(61.81861303, 3.7071705)>
Now we run a cone search on the NuSTAR observation summary table (numaster), centered on the position of our source:
search_result = cs_service.search(pos)
We can quickly examine the output of the search by converting it to an Astropy table and displaying it by putting it at the end of the cell:
# Convert the result to an Astropy Table and render it
search_result.to_table()
| __row | name | ra | dec | time | obsid | status | exposure_a | observation_mode | obs_type | processing_date | public_date | issue_flag | Search_Offset |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| deg | deg | d | s | d | d | ||||||||
| object | object | float64 | float64 | float64 | object | object | float64 | object | object | float64 | int32 | int16 | float64 |
| 3578 | 3C105 | 61.8022 | 3.6837 | 56338.0876 | 60061044002 | archived | 4807 | SCIENCE | EGS | 59168.6000 | 57112 | 0 | 1.7172 |
| 3579 | 3C105 | 61.8059 | 3.6858 | 56339.1640 | 60061044006 | archived | 5583 | SCIENCE | EGS | 59168.5000 | 57112 | 0 | 1.4912 |
| 3580 | 3C105 | 61.8034 | 3.6859 | 56338.6258 | 60061044004 | archived | 6208 | SCIENCE | EGS | 59168.6000 | 57112 | 0 | 1.5680 |
| 3581 | 3C105 | 61.8020 | 3.6872 | 57826.4661 | 60261003004 | archived | 20703 | SCIENCE | ELS | 59110.3000 | 57836 | 0 | 1.5573 |
| 3583 | 3C105 | 61.8410 | 3.7349 | 57621.2994 | 60261003002 | archived | 20737 | SCIENCE | ELS | 59113.7000 | 57625 | 0 | 2.1365 |
2. Applying observation selection criteria#
The search results table has several entries, each representing a different NuSTAR observation.
We can filter the results to only include observations that we’re interested in. As a slightly arbitrary example, we can select only those observations with an exposure less than 10 ks.
Due to the current design of the Python object returned by the cs_service.search(pos)
call, we have to loop through the results to filter them, rather than applying a
boolean mask as we might for Astropy Table or Pandas DataFrame objects:
obs_to_explore = [row for row in search_result if row["exposure_a"] <= 10000]
obs_to_explore
[('3578', '3C105', '61.8022', '3.6837', '56338.0876', '60061044002', 'archived', '4807.3775', 'SCIENCE', 'EGS', '59168.6', '57112', '0', '1.7172299582982673'),
('3579', '3C105', '61.8059', '3.6858', '56339.164', '60061044006', 'archived', '5582.6419', 'SCIENCE', 'EGS', '59168.5', '57112', '0', '1.4911511920888738'),
('3580', '3C105', '61.8034', '3.6859', '56338.6258', '60061044004', 'archived', '6208.4217', 'SCIENCE', 'EGS', '59168.6', '57112', '0', '1.5679511763430094')]
3. Identifying where to download observation data files#
Extracting links to the data#
The exposure selection resulted in three observations (this may change as more observations are collected). Let’s try to download them for analysis.
To see what data products are available for these three observations, we use the VO’s datalinks. A datalink is a way to query and retrieve data products related to a search result.
The results of a datalink call will depend on the specific observation. To see the type of products that are available for our observations, we start by looking at one of them.
# Retrieve a single observation
obs = obs_to_explore[0]
# Fetch the datalink that will allow us to access the data associated
# with this observation
dlink = obs.getdatalink()
# Convert the return into a table, and select three summary columns to be printed
dlink.to_table()[["ID", "access_url", "content_type"]]
| ID | access_url | content_type |
|---|---|---|
| object | object | object |
| ivo://nasa.heasarc/numaster?60061044002 | https://heasarc.gsfc.nasa.gov/xamin/bib?table=numaster&id=60061044002 | text/html |
| ivo://nasa.heasarc/numaster?60061044002 | https://heasarc.gsfc.nasa.gov/xamin/vo/datalink?datalink_key&id=ivo://nasa.heasarc/numaster?60061044002/nustar.obs | application/x-votable+xml;content=datalink |
| ivo://nasa.heasarc/numaster?60061044002 | https://heasarc.gsfc.nasa.gov/FTP/nustar/data/obs/00/6//60061044002/ | directory |
Filtering the data links#
Three products are available for our selected observation. From the content_type
column, we see that one is a directory containing the observation files. The
access_url column gives the direct url to the data (The other two include another
datalink service for housekeeping data, and a document to list publications related
to the selected observation).
We can now loop through our selected observations in obs_to_explore, and extract
the url addresses with content_type equal to directory.
Note that an empty datalink product indicates that no public data is available for that observation, likely because it is in proprietary mode.
links = []
for obs in obs_to_explore:
dlink = obs.getdatalink()
dlink_to_dir = [dl for dl in dlink if dl["content_type"] == "directory"]
# if we have no directory product, the data is likely not public yet
if len(dlink_to_dir) == 0:
continue
link = dlink_to_dir[0]["access_url"]
links.append(link)
We can take a look at the relevant data links we just retrieved:
links
['https://heasarc.gsfc.nasa.gov/FTP/nustar/data/obs/00/6//60061044002/',
'https://heasarc.gsfc.nasa.gov/FTP/nustar/data/obs/00/6//60061044006/',
'https://heasarc.gsfc.nasa.gov/FTP/nustar/data/obs/00/6//60061044004/']
Downloading the observations#
We can download the data directories using wget (or curl):
# Use wget to download the data when outside SciServer
wget_cmd = (
f"wget -q -nH --no-check-certificate --no-parent --cut-dirs=6 "
f"-r -l0 -c -N -np -R 'index*' -erobots=off --retr-symlinks "
f"-P {ROOT_DATA_DIR} {{}}"
)
for link in links:
os.system(wget_cmd.format(link))
Note
All HEASARC data is available locally when working on SciServer, mounted at /FTP/, so
all you could replace this download step with a copy command. The data links strings
could be split on ‘FTP’, and then have ‘/FTP/’ prepended, to get the SciServer local path.
We can now examine the directory containing the downloaded data:
glob.glob(os.path.join(ROOT_DATA_DIR, "**/**"))
['/home/jovyan/project/_data/NuSTAR/60061044002/auxil',
'/home/jovyan/project/_data/NuSTAR/60061044002/event_cl',
'/home/jovyan/project/_data/NuSTAR/60061044002/event_uf',
'/home/jovyan/project/_data/NuSTAR/60061044002/hk',
'/home/jovyan/project/_data/NuSTAR/60061044002/nu60061044002.cat.gz',
'/home/jovyan/project/_data/NuSTAR/60061044002/pipe.log',
'/home/jovyan/project/_data/NuSTAR/60061044006/auxil',
'/home/jovyan/project/_data/NuSTAR/60061044006/event_cl',
'/home/jovyan/project/_data/NuSTAR/60061044006/event_uf',
'/home/jovyan/project/_data/NuSTAR/60061044006/hk',
'/home/jovyan/project/_data/NuSTAR/60061044006/nu60061044006.cat.gz',
'/home/jovyan/project/_data/NuSTAR/60061044006/pipe.log',
'/home/jovyan/project/_data/NuSTAR/60061044004/auxil',
'/home/jovyan/project/_data/NuSTAR/60061044004/event_cl',
'/home/jovyan/project/_data/NuSTAR/60061044004/event_uf',
'/home/jovyan/project/_data/NuSTAR/60061044004/hk',
'/home/jovyan/project/_data/NuSTAR/60061044004/nu60061044004.cat.gz',
'/home/jovyan/project/_data/NuSTAR/60061044004/pipe.log']
About this notebook#
Author: Abdu Zoghbi, HEASARC Staff Scientist
Author: David Turner, HEASARC Staff Scientist
Updated On: 2026-02-09
Additional Resources#
Contact the HEASARC helpdesk for further assistance.
Acknowledgements#
References#
Taghizadeh-Popp M., Kim J. W., Lemson G. et al. (2020) - SciServer: A science platform for astronomy and beyond