Astrophysics internships with algorithms

Half of my summer adventures were spent at astrophysics internships. I received my second summer internship at the end of last spring semester, this time at Brookhaven National Laboratory (BNL) as part of the  Department of Energy’s Science Undergraduate Laboratory Internship (SULI) program. The classes and labs I have taken at Florida Tech helped me land my first internship last summer at the Air Force Research Laboratory (AFRL), where I learned the program Interactive Data Language (IDL) to study solar flare data. Using IDL not only helped me join Dr. Petit’s research group here at Florida Tech for my student design project, but it also made me a very competitive candidate for this past summer. I say this because I got six internship offers after adding IDL to my resume, and my mentor at BNL, Laurie Gregory, said she chose me among others because of it (so learn IDL if you can; or any other programming language for that matter—it’s needed everywhere!).

Laurie, Rick, and me on the roof of our work building setting up a sun photometer for some sample data. The photometer is the gold instrument, pointing at the ground at the moment and not collecting data, while the giant box I’m standing in front houses the control panel.
Laurie, Richard, and me on the roof of our work building setting up a sun photometer for some sample data. The photometer is the gold instrument, pointing at the ground at the moment and not collecting data, while the giant box I’m standing in front houses the control panel.

 

Astrophysics internships

BNL is located on Long Island, New York, and as I mentioned above, I worked with Laurie Gregory and her colleague, Richard Wagener, in the Data Management and Software Engineering group with data from the Atmospheric Radiation Measurement research facility (ARM). More specifically, I was analyzing data collected from sun photometers around the world, to begin the process of automating the detection of faulty data caused by instrumental errors. The raw data from the various instruments was already analyzed by NASA, one such analysis using light readings from the multiple filters in the photometers to calculate the aerosol optical thickness (AOT).

The AOT tells us the amount of aerosols present in the atmosphere by measuring how light is scattered off of them. Aerosols are particles floating around in our atmosphere, including anything from dust, smoke, sea salt, volcanic ash, and many types of human-made pollution. The scattering changes how much energy makes it to the Earth’s surface, particularly when the light is scattered back out into space. Since the Industrial Revolution, human-made aerosols have spiked drastically, and areas near a facility that lets off a lot of smoke can experience an overall cooling over time because it consistently gets less light than surrounding areas. Uneven cooling across the globe will affect future climate change, but no one is sure how, therefore instruments such as the sun photometers have been put all over the world to measureand monitor various aspects of the atmosphere, including aerosols and their optical thickness.

Because of the way the AOT calculation is normalized to the zenith (straight up from wherever you’re standing), an obstruction in one of the photometers results in what is called a “cosine curve” (called so because the normalization is done by dividing cos(θ), not because it’s a mathematical cosine curve). My project was to write an algorithm to automatically detect these cosine curves.

An example of good and bad data collected from the sun photometers. A normal, sunny day should look like the figure on the left (some clouds would result in spikes in the graph). But the figure to the right is an example of bad data, caused by something blocking one of the tubes of the photometer.
An example of good and bad data collected from the sun photometers. A normal, sunny day should look like the figure on the left (some clouds would result in spikes in the graph). But the figure to the right is an example of bad data, caused by something blocking one of the tubes of the photometer.

 

I detected the cosines by reading in the data, plotting it, and fitting the plotted data with a linear regression, which in this case looked like

y=A0+ A1t + A2cos(θ(t))

A0, A1, and A2 are the calculated coefficients from the IDL function “regress”, A0 corresponding to a constant, A1 to the linear coefficients, and A2 to the cosine coefficients. In an ideal data set, all the A2 values would be zero, meaning there is no cosine curve. The cosine coefficients were then used to make a histogram of each channel that had collected data in a certain set.

 

Examples of histograms made from the number of returned cosine coefficients for each channel, each from the sun photometer located at the Eastern North Atlantic (ENA) site. The one on the left shows coefficients from data between November 2014 and March 2015, and the channels exhibit a visible shift in the cosine coefficients, indicating a problem (they should both be centered at zero, since we want to see zero coefficients as the majority). The one on the right shows coefficients from data between April 2015 and June 2015, the period after the problem was fixed, and the channels are now closer to what is expected of good data.
Examples of histograms made from the number of returned cosine coefficients for each channel, each from the sun photometer located at the Eastern North Atlantic (ENA) site. The one on the left shows coefficients from data between November 2014 and March 2015, and the channels exhibit a visible shift in the cosine coefficients, indicating a problem (they should both be centered at zero, since we want to see zero coefficients as the majority). The one on the right shows coefficients from data between April 2015 and June 2015, the period after the problem was fixed, and the channels are now closer to what is expected of good data.

The above examples were used to test if my algorithm was working correctly, because this problem at the ENA site was already fixed before my internship started. The research group hopes to use my algorithm to catch problems like this faster so they can be corrected; the problem at ENA was not fixed until after four or five months of bad data was collected. This is only the beginning, however. A cosine curve is just one of many problems that can show up in the data that flags a problem with the photometer, and the ideal would be to get the computer to find patterns by itself without being told what to look for (aka machine learning). The data management group is well on its way to automating detection of data quality problems, and will hopefully soon be able to leave it all to the computer and stop manually checking each plot every day for problems.

I enjoyed working with Laurie and Richard, and am very happy to have gotten an internship at BNL. It gave me more experience with IDL and Linux, and is a great addition to my undergraduate experience!

BNL_logoDOE logo

 

 

 

References

 

 

Show More
Back to top button
Close