Submitting author: @amanchokshi (Aman Chokshi)
Repository: https://github.com/amanchokshi/EMBERS
Version: v0.7.0
Editor: @mbobra
Reviewers: @teuben, @mbobra
Archive: Pending
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status badge code:
HTML: <a href="https://joss.theoj.org/papers/5d0d8c8aa9cfa96790cf4109d1fc00c7"><img src="https://joss.theoj.org/papers/5d0d8c8aa9cfa96790cf4109d1fc00c7/status.svg"></a>
Markdown: [](https://joss.theoj.org/papers/5d0d8c8aa9cfa96790cf4109d1fc00c7)
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
@teuben & @mbobra, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mbobra know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @teuben, @hasantahir it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1017/pasa.2018.30 is OK
MISSING DOIs
- None
INVALID DOIs
- None
👋 @teuben @hasantahir Thank you so much for agreeing to review this submission! You'll see your reviewer checklists above. If you'd like more information on any of these items, please reference the review critera. Also, we encourage open communication between the reviewers, submitting author, and editor. So please feel free to ask questions here on this review thread. I'm happy to help!
/ooo September 14 until September 18
:+1: Marked @mbobra as OOO from Monday, September 14th 2020 to Friday, September 18th 2020. :calendar:
@teuben @hasantahir How is the review going? Do you need any help? Please let me know, I'm happy to help!
@teuben Please respond and let me know if you are still able to review.
@whedon remove @hasantahir as reviewer
OK, @hasantahir is no longer a reviewer
@whedon add @mbobra as reviewer
OK, @mbobra is now a reviewer
👋 @amanchokshi I am going through the checklist and your package looks really nice. It has been an easy review so far. I have a couple comments:
Functionality
python -m test_tool
, I get the error: No module named test_tool
.Documentation
From the online documentation:
EMBERS is a python package designed to enable polarised measurements of radio telescope antenna beam-patterns using satellites.
From the JOSS paper:
EMBERS is a python package which provides a modular framework for radio telescopes and interferometric arrays such as the MWA, HERA, and the upcoming SKA to accurately measure the all sky beam responses of their antennas using weather and communication satellites.
Could you perhaps clarify and expand the statement of need in the documentation? Also, the documentation doesn't mention the target audience for this software (e.g. astronomers, systems engineers?). Could you please include this?
Software Paper
@mbobra Thanks Monica. I've addressed most of your comments.
In Embers by Example, the second and third code blocks are meant for people unfamiliar with python. test_tool
is an example package which doesn't exist, but is used to demonstrate that the tools can be run using either python -m test_tool
or simply test_tool
. The following code block show that help can be accessed for each tool using the --help
flag. I've made the preceding text clearer by saying that this is an example tool.
I've updated the statement of need in the online docs to match the JOSS paper and have included the target audience.
Added that all previous experiments were validations of the methodology, and this was the first large scale test. There was no previously available software, suitable for the large data volume. I've also update the caption of Fig. 1 with much more detail on what all the plots mean, including the grey regions.
Unfortunately, Chokshi et al., in prep is under collaboration review right now, and will probably take a couple of months before it's published. The other paper which is relevant is Line et al., which I've cited. It's about one of the previous validation experiments and is not as relevant as it deals with much smaller scale and does not use the EMBERS package.
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@amanchokshi Thanks for your updates, I really appreciate it. A couple more comments:
Examples
I'm working through the examples and am still a bit confused.
One example in the Embers by Example docs says:
To get a quick preview of the raw RF data, we create waterfall plots. The following code creates a waterfall plot of sample data provided with EMBERS
$ waterfall_single
--------------------------------------------------
No input data provided, using packaged sample data
>>> waterfall_single --help, for more options
--------------------------------------------------
Waterfall plot saved to ./embers_out/rf_tools/S06XX_2019-10-10-02:30.png
But this doesn't work exactly as written for me; I have to do a
from embers.rf_tools import rf_data
rf_data.single_waterfall
But then I'm stuck, because I don't see any example rf_tiles to use with rf_data.single_waterfall()
.
Would it be possible to create a couple examples that a user can execute from start to finish? You have a really nice example gallery, so by no means am I asking to re-do all of it, but I think users may be confused about how exactly to get started. Some sample data may be a nice way to quickly show users how they can use the package.
References
Unfortunately, Chokshi et al., in prep is under collaboration review right now, and will probably take a couple of months before it's published. The other paper which is relevant is Line et al., which I've cited. It's about one of the previous validation experiments and is not as relevant as it deals with much smaller scale and does not use the EMBERS package.
I totally understand; does the journal allow you to put Chokshi et al. on a preprint service, such as arXiv? Readers need to have access to the references, so Chokshi et al. must be accessible or replaced with a reference to something else. If you presented the work as a poster or talk at a conference before, then you can put your poster or slides on Zenodo and get a DOI for it and reference that instead. Let me know if you want help with that.
@mbobra thanks Monica, it seems like you've caught a bug in my code. I created a new virtual env and installed EMBERS following my examples. I had the same issue. Apparently my sample data is not being installed with pip. I will try and get it to work or modify my instructions.
For now, what works:
git clone https://github.com/amanchokshi/EMBERS.git
cd EMBERS
pip install --editable .
waterfall_single
I will talk to my supervisor and see when I can upload a draft of my paper to arXiv. By the end of this review process I may be able to update the paper with an arXiv doi. If now, I can definitely upload a conference talk to zenodo and cite that.
@mbobra I've fixed the bug. If you update embers to the latest version 0.7.3, the examples should now work as shown in the docs
@amanchokshi I'm sorry -- I'm probably missing something simple. I'm looking at the Embers by Example guide, which says:
$ mkdir ~/EMBERS
$ cd ~/EMBERS
So I did that. Then I skipped the stuff about the test tool. Then the next code block says:
waterfall_single
But I'm still confused 😟 Where are the sample data? How do I access these data? What import statements do I need to make before I can run waterfall_single
? Are there any example notebooks I can look at?
I was looking at a recently published JOSS paper called PyCS3 (which I picked at random). They provide a tutorial and example notebooks, which users can download and run. I don't want to be overly prescriptive -- so PYCS3's approach is by no means the only way to provide examples, but I personally find it easy to understand. Perhaps this can be fixed by a few explanatory sentences if I am missing something simple.
References
I will talk to my supervisor and see when I can upload a draft of my paper to arXiv. By the end of this review process I may be able to update the paper with an arXiv doi. If now, I can definitely upload a conference talk to zenodo and cite that.
Yes another possibility (if you want to this JOSS paper and your other your peer-reviewed paper to act as companion papers) is to pause publication of this JOSS paper until the other one is accepted for publication. Then you can cite the DOI for that paper in this one and vice-versa.
@mbobra
waterfall_single
is a cli tool which makes it easy to interact with the underlying single_waterfall function.
Internally, what waterfall_single
does:
import pkg_resources
from embers.rf_tools.rf_data import single_waterfall
# If no data provided, use package sample data
rf_file = pkg_resources.resource_filename("embers.kindle", "data/rf_data/S06XX/2019-10-10/S06XX_2019-10-10-02:30.txt")
out_dir = "embers_out/rf_tools"
single_waterfall(rf_file, out_dir)
In Embers by Example I have added a link which takes you to the api docs of the relevant function. If you think this is sufficient, I can all similar links to all the examples.
Alternately, I could create a notebook, with accompanying example data.
@amanchokshi
Ah, this is awesome! 🚀 Yes I do think having one or two simple notebooks, exactly like the code block above, will really help folks get started.
I checked off the functionality box above; after the notebooks, I'll check off the examples. Then all we have to do is sort out the references and we're done!
@mbobra Thanks Monica! I'll let you know when I've completed the notebooks. It may take a couple of days.
Hi @mbobra, there have been quite significant updates to the Embers by Example section of the documentation.
Can you begin by updating to version 0.7.4 and deleting all the old files you may have created?
The instructions begin with downloading one day of sample data from a new data repository, which all the subsequent examples use. Each section now has instruction on how to use the cli-tool, with an equivalent example code-block below. I've also linked the functions use which will take you to the relevant section of the EMBERS API.
Certain examples will take a little while to run and I expect that you can run all the examples in less than 3 hours. It's important to run them in the sequence presented, because data products created early on are used at later stages.
Hope things work this time and are much more clear 😁
Just a suggestion on this test though: is there no way you could make
this test run in a few minutes? If a cosmological simulation takes 3
hours, you can either take fewer steps or a smaller grid, but it should
functionally give some answer (that can be compared to something given
as baseline result). I'm not too happy that I would need to wait 3
hours on a test.
On 10/26/20 12:20 PM, Aman Chokshi wrote:
>
Hi @mbobra https://github.com/mbobra, there have been quite
significant updates to the Embers by Example
https://embers.readthedocs.io/en/latest/embersbyexample.html section
of the documentation.Can you begin by updating to version 0.7.4 and deleting all the old
files you may have created?The instructions begin with downloading one day of sample data from a
new data repository, which all the subsequent examples use. Each
section now has instruction on how to use the cli-tool, with an
equivalent example code-block below. I've also linked the functions
use which will take you to the relevant section of the EMBERS API
https://embers.readthedocs.io/en/latest/api.html.Certain examples will take a little while to run and I expect that you
can run all the examples in less than 3 hours. It's important to run
them in the sequence presented, because data products created early on
are used at later stages.Hope things work this time and are much more clear 😁
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/2629#issuecomment-716660063,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZ4MGOEXOJQZCQZGVOCCM3SMWOVXANCNFSM4QSAJ3TA.
@teuben I agree that it would be nice if the tests run faster. This is not a requirement for passing review, though (unless the software makes performance claims). I advise opening an issue on the repo about the runtime. And I'm happy to co-sign that. But this is not a blocking item.
@amanchokshi Thank you for all this hard work. I'm happy to recommend Embers for publication pending an accessible reference to Chokshi et al., in prep.
Hi @teuben, the tests for embers are automated, and take ~20 minutes on Travis.
The examples shown in Embers by Example do take longer, but are meant to be a guide to show new users how to begin interacting with Embers. I've reduces the amount of sample data as much as possible (only 1 day of data, instead of 6 months), but some of the later functions are statistical in nature and require multiple satellite passes along with their corresponding data files to accurately calibrate the instrument.
Thanks @mbobra, I've talked to my supervisor regarding the reference to Chokshi et al., in prep, and we've decided to remove the reference from the JOSS paper. The other reference, Line et al. outlines the premise of the experimental setup, and provide sufficient scientific background for this work.
I've removed Chokshi et al., in prep, from the paper and have reworded things a bit. Hope that's okay.
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
I've been adding some comments in the issue tracker of the repo, author is working on those. I'm about halfway through the examples, a bit more work than I thought, will take a night sleep on it. more tomorrow.
@teuben Thanks a lot Peter!
@amanchokshi As I'm running through the examples, you mentioned there is a quick test, presumably via pytest.ini and the tests/ directory. I don't see the example command line in the README or somewhere how to run this, so "beginners" will be baffled. Can you add that somewhere (and remind me here as well, as I don't remember)
@amanchokshi apart from some minor things I know you are looking into, my last hurdle on the checkmark list is performance.
There are no good comments on performance that I could find (in fact, in the Example there should be some warnings in more places, but you are working on the aggressive CPU usage already). I have not looked at the code, but let me try to explain how I would lke to try and work towards an answer why this is so slow. We have N files, with each file M lines. Yet, the example you mentioned took 40 core hours. Can this be understood in perhaps an O(N) analysis? Are there no loops in python? Is everything done in good numpy notation,without looping?
There is of course the potential speedup using XLA/numba/dask, which is mentioned in one of the issues, but before resorting to this, would be nice if we could understand in simple terms why it takes as long as it does.
I've added instructions to run the automated tests at the bottom of the examples page - Testing EMBERS. It should take no more than 2 minutes to run the tests. To run the test you need to clone the directory and install a few additional dependancies:
git clone https://github.com/amanchokshi/EMBERS.git
cd EMBERS
# Setup a virtual enviroment
python -m venv embers-env
source embers-env/bin/activate
pip install .
pip install -r requirements_dev.txt
# Run automated tests
pytest
@teuben I've been playing around with align_batch
and the --max_cores
option to try and get a handle of the parallel performance. On my laptop, I've tried running all the sample data with between 1 and 4 cpu cores. I used the GNU time
tool to gauge the performance.
The system cpu usage vs number of cpu cores is as expected:
The time vs number of cpu cores seems to fall off at higher number of cores. This is probably partially due to running things on my laptop which has a limited resources free.
Below are the raw terminal outputs which I used to generate the two figure
$ time align_batch --out_dir=tmp --max_cores=1
>>> Aligned files saved to: tmp
>>> align_batch --out_dir=tmp --max_cores=1 879.14s user 47.24s system 104% cpu 14:45.86 total
$ time align_batch --out_dir=tmp --max_cores=2
>>> Aligned files saved to: tmp
>>> align_batch --out_dir=tmp --max_cores=2 999.20s user 72.56s system 212% cpu 8:23.51 total
$ time align_batch --out_dir=tmp --max_cores=3
>>> Aligned files saved to: tmp
>>> align_batch --out_dir=tmp --max_cores=3 1012.36s user 129.45s system 309% cpu 6:08.94 total
$ time align_batch --out_dir=tmp --max_cores=4
>>> Aligned files saved to: tmp
>>> align_batch --out_dir=tmp --max_cores=4 1143.28s user 137.17s system 401% cpu 5:18.70 total
In the code, I have a for loop which selects pairs of data files, which will be temporally aligned, smoothed using a savgol filter and interpolated to a lower frequency. Beyond the initial for loop to select the files, the interpolation and filtering is achieved using scipy
acting on numpy
arrays.
I don't really know that much about code optimization, but I think that numpy
and scipy
may already be optimized to grab available cores. If that's the case, it could explain the drop off we see in time vs number of cores.
The above tests used the 1440 data files available in the sample data repository. The complete data set includes 202,000 files with ~17,000 lines of data each, amounting to almost 300 GB of data.
This is a very useful analysis to see how it scales with the processor! Nicely done. Theoretically for ideal parallel behavior the user and system time should not depend on max_cores (NP), and cpu (elapsed) goes as 1/NP. So, although we do see that the % (user+system)/elapsed) scales quite nicely, you can see the user time goes from 879" to 1143", a 30% "loss". You can see the system time went from 47" to 137", a factor of 3, far worse. This could point to I/O contention, more than CPU s clashing. Modern i7 laptop cpus are pretty good in general, but terrible when it comes to dealing with parallel tasks when 100% of the threads are used. You didn't try more than 4, I'm guessing there are 4 cores and each with 2 threads, the most common situation these days, so CPU wise you didn't get to the worse point. But the system times scales much worse. In myexperience running between 4 and 8 actually doesn't improve the elapsed time, in fact, more than often it reverses and goes slower.
There is still the lingering question if we can understand a single core performance of 15 minutes. Each txt file in tiles_data has about 17k lines. I only see one day (2019-10-10) in that tree, and yes, there are thus 1440 files. So in other words, this is like a N=1440 x M=17000 matrix, or 25M points. The O(N) analysis means that I would like to understand how this problem scales with M and N. And then looking inside, there are only ~25M points. What needs to be done that takes 15 minutes., is that something that makes sense? Another way to think is at a 6GFlop machine, it amounts to ~200k floating point operations per point. This is the point I don't understand about EMBERS, and I think it would add value to the paper (or README somewhere) to say something about it.
I've added instructions to run the automated tests at the bottom of the examples page - Testing EMBERS. It should take no more than 2 minutes to run the tests. To run the test you need to clone the directory and install a few additional dependancies:
git clone https://github.com/amanchokshi/EMBERS.git cd EMBERS # Setup a virtual enviroment python -m venv embers-env source embers-env/bin/activate pip install . pip install -r requirements_dev.txt # Run automated tests pytest
this ran fine for me. I had two failures (and a few deprecation warnings):
tests/test_mwa_utils/test_mwa_pointings.py::test_clean_meta_json_start FAILED
tests/test_tile_maps/test_ref_fee_healpix.py::test_create_model_last FAILED
The depreciation warnings come from a package which I use - mwa-pb, and has required me to freeze some of the dependancies in the requirement.txt. I'm not sure why those two tests are failing. I cloned a fresh copy of EMBERS and tried the tests again and all of them seem to pass. Could it be a python version thing? I'm stumped
The automated tests on travis are all passing too. Check out Latest Test and All Past Tests
Regarding the performance, I've re-run align_batch
using only a single core, with a varying number of files. The probes your question of how the problem scales with N. The results seem to indicate a linear relationship between number of files and processing time.
I'm quite out of my depth with this kind of code testing, but let me try and clarify exactly what the code does. Each data file has ~17k lines, which each contain 112 floating point numbers corresponding to the 112 frequency channels available. The 17k lines correspond to 17k timesteps at which power was samples. A pair of two such files are selected.
I've been looking into the performance as a function of the number of lines in the files. I did this by modifying embers.rf_tools.align_data.savgol_interp
, to take a fraction
and only read the first fraction of lines in the files. The align_batch
function is built around the savgol_interp
function by paralelizing it and writing the outputs to disk.
I repeated the test for each fraction
I used 10 times, and used the median value as an estimate of time. If we increase the number of loops, I expect the plot to stabilise further.
My inference of the above plot is that almost half of the processing time for each set of files comes from reading in the data, beyond which the time increases linearly with number of lines.
Below is the code which I used to generate this analysis:
import math
from time import time
import numpy as np
from embers.rf_tools.rf_data import read_data
from matplotlib import pyplot as plt
from scipy import interpolate
from scipy.signal import savgol_filter
def savgol_interp(
ref,
tile,
savgol_window_1=None,
savgol_window_2=None,
polyorder=None,
interp_type=None,
interp_freq=None,
fraction=1,
):
"""Interpolate a power array followed by savgol smoothing.
"""
# Read time and power arrays from data files
ref_power, ref_time = read_data(ref)
tile_power, tile_time = read_data(tile)
ref_length = ref_time.shape[0]
tile_length = tile_time.shape[0]
# Number of lines in different files will differ due to varied
# recording rates of different recievers
# print(f"Number of lines in rf0XX file: {ref_length}")
# print(f"Number of lines in S06XX file: {tile_length}")
# Select the first [Fraction] of number of lines
ref_len_new = int(round(fraction * ref_length))
tile_len_new = int(round(fraction * tile_length))
ref_power = ref_power[:ref_len_new, ::]
ref_time = ref_time[:ref_len_new]
tile_power = tile_power[:tile_len_new, ::]
tile_time = tile_time[:tile_len_new]
# Round up/down to nearest integer of time
start_time = math.ceil(max(ref_time[0], tile_time[0]))
stop_time = math.floor(min(ref_time[-1], tile_time[-1]))
# Array of times at which to evaluate the interpolated data
time_array = np.arange(start_time, stop_time, (1 / interp_freq))
# Mathematical interpolation functions
f = interpolate.interp1d(ref_time, ref_power, axis=0, kind=interp_type)
g = interpolate.interp1d(tile_time, tile_power, axis=0, kind=interp_type)
# New power array, evaluated at the desired frequency
ref_ali = f(time_array)
tile_ali = g(time_array)
# Savgol level 1. Capture nulls / small scale structure
ref_ali = savgol_filter(ref_ali, savgol_window_1, polyorder, axis=0)
tile_ali = savgol_filter(tile_ali, savgol_window_1, polyorder, axis=0)
# Savgol level 2. Smooth noise
ref_ali = savgol_filter(ref_ali, savgol_window_2, polyorder, axis=0)
tile_ali = savgol_filter(tile_ali, savgol_window_2, polyorder, axis=0)
return (ref_ali, tile_ali, time_array, ref_power, tile_power, ref_time, tile_time)
if __name__ == "__main__":
ref = "tiles_data/rf0XX/2019-10-10/rf0XX_2019-10-10-00:00.txt"
tile = "tiles_data/S06XX/2019-10-10/S06XX_2019-10-10-00:00.txt"
fractions = np.linspace(0.02, 1, 50)
med_time = []
for f in fractions:
# Repeat measurements 10 times and take median
time_array = []
for i in range(10):
start = time()
si_out = savgol_interp(
ref,
tile,
savgol_window_1=11,
savgol_window_2=15,
polyorder=2,
interp_type="cubic",
interp_freq=1,
fraction=f,
)
end = time()
time_array.append(end - start)
med_time.append(np.median(time_array))
fig = plt.figure(figsize=(6, 4))
plt.style.use("seaborn")
plt.plot(fractions, med_time, linewidth=2.1, marker="o", color="maroon")
plt.xlabel("Fraction of Number of lines")
plt.ylabel("Median Time [seconds]")
plt.tight_layout()
plt.savefig("lines_time.png")
The depreciation warnings come from a package which I use - mwa-pb, and has required me to freeze some of the dependancies in the requirement.txt. I'm not sure why those two tests are failing. I cloned a fresh copy of EMBERS and tried the tests again and all of them seem to pass. Could it be a python version thing? I'm stumped
The automated tests on travis are all passing too. Check out Latest Test and All Past Tests
I'm stumped too, i'm going to ignore this for the review, as the examples seem to run (ok , i do have one failure still, the ephem_chrono task).
Hi Peter, I'll try and summarise what I've found in my performance tests:
scipy
interpolation + filtering, which scales linearly with array sizethanks Aman, this is indeed very useful stuff to understand. 50% in the
I/O is quite a lot. These are 2kb files, that should read in 0.02ms on
a typical 100MB/s disk. So a significant time it does something else.
Maybe the decoding? Even the readlines() routine isn't able to perform
at formal speed. I measured indeed about 250ms for reading one file.
When not processing any line, so returning nothing, it still look 5-10
ms. I might play with this, I didn't see any obvious ways to speed this up.
On 11/18/20 8:02 PM, Aman Chokshi wrote:
>
Hi Peter, I'll try and summarise what I've found in my performance tests:
- O(N) - Performance scales linearly with the number of files, on a
single core- O(M) - Performance scales linearly with the number of lines of
data, with a significant overhead reading in the data. By my
estimate, more than half the time processing a single file of ~17k
lines can be attributed to reading in the file. This is ~0.27
seconds per file. The processing itself is done with |scipy|
interpolation + filtering, which scales linearly with array size—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/2629#issuecomment-730054795,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZ4MGONTDIFMWS7OYUI55LSQRVBJANCNFSM4QSAJ3TA.
thanks Aman, this is indeed very useful stuff to understand. 50% in the I/O is quite a lot. These are 2kb files, that should read in 0.02ms on a typical 100MB/s disk. So a significant time it does something else. Maybe the decoding? Even the readlines() routine isn't able to perform at formal speed. I measured indeed about 250ms for reading one file. When not processing any line, so returning nothing, it still look 5-10 ms. I might play with this, I didn't see any obvious ways to speed this up.
oops, i messed up my units. The files are 2MB, so that should be around 10ms, which corresponds to my just processing the lines, no decoding. So that decoding is expensive, more like 200 ms, 20 times as expensive as reading the line.
To close up the review, and assuming you can add a few words on the performance, the only puzzle remaining is why I could not get ephem_chrono to work. It seems a few other tools after that also fail for me, possibly because of this? I also had a case where leaving out the --max_cores=4 would cause it to fail with the error "ValueError: max_workers must be greater than 0"
I think I know where the error in --max_cores comes from. I'll test it out and ping you. The ephem_chrono problem is a puzzle, but I'll try it again once I've finished the max_core issue
The batch scripts should now work after you update EMBERS to the latest version - 0.8.2. I had added a bad default value to the max_cores argparse option.
I ran ephem_chrono
again and things worked. Do you have a folder called TLE
in the sat_utils
directory? This should have been copied over when you cloned the sample data and moved the sample data to the correct dirs.
Before running ephem_batch
:
tree -d sat_utils
sat_utils
└── TLE
After running ephem_batch
:
tree -d sat_utils
sat_utils
├── TLE
├── ephem_data
└── ephem_plots
The above data is required for ephem_chrono
to work. Finally, after ephem_chrono
runs successfully:
tree -d sat_utils
sat_utils
├── TLE
├── ephem_chrono
├── ephem_data
└── ephem_plots
Can you delete all the data in the sat_utils
directory, except the TLE
folder, then run ephem_batch
followed by ephem_chrono
?
I've created a new Performance page in the documentation. This outlines our investigation into parallel and single core performance and states the I/O limitations. I've also added a link to the performance page in the examples.
I've removed all directories (except TLE) in embers_out/sat_utils, the TLE directory has the following 73 files:
21576.txt 25113.txt 25117.txt 25159.txt 25415.txt 25419.txt 25477.txt 25481.txt 25982.txt 25986.txt 33062.txt 40069.txt 40089.txt 41180.txt 41184.txt 41188.txt 44020.txt 44025.txt 44387.txt
23545.txt 25114.txt 25118.txt 25338.txt 25416.txt 25420.txt 25478.txt 25482.txt 25983.txt 28654.txt 33063.txt 40086.txt 40090.txt 41181.txt 41185.txt 41189.txt 44022.txt 44026.txt
23546.txt 25115.txt 25119.txt 25413.txt 25417.txt 25475.txt 25479.txt 25980.txt 25984.txt 33060.txt 33064.txt 40087.txt 40091.txt 41182.txt 41186.txt 44018.txt 44023.txt 44027.txt
25112.txt 25116.txt 25158.txt 25414.txt 25418.txt 25476.txt 25480.txt 25981.txt 25985.txt 33061.txt 33065.txt 40088.txt 41179.txt 41183.txt 41187.txt 44019.txt 44024.txt 44028.txt
I then ran ephem_batch --max_cores=1 and only noticed some warnings
sat_ephemeris.py:286: RuntimeWarning: More than 20 figures have been opened.
Figures created through the pyplot interface (matplotlib.pyplot.figure
) are retained until explicitly closed and may consume too much memory.
Running ephem_chrono immediately returns (though it has the coffee message)
$ ephem_chrono
Saving chronological Ephem files to: ./embers_out/sat_utils/ephem_chrono
Grab a coffee, this may take more than a couple of minutes!
It does create all the json files, but they just contain [], and the log file has 0 lines.
I am doing this in a clean directory, where i only see embers_out and tiles_data. The code is somewhere else, which I installed using "pip install -e ." I am using a "venv", which is a directory inside that EMBERS source tree.
This is bizarre. Let try a couple of long shots. Can you check whether the files in the TLE directory contain anything? Perhaps they've been corrupted somehow?
You should see something like this:
$ tail embers_out/sat_utils/TLE/21576.txt
1 21576U 91050C 19282.72964883 .00000014 00000-0 18223-4 0 9990
2 21576 98.6203 331.5014 0002898 326.0593 34.0410 14.41572163482935
1 21576U 91050C 19283.90959667 +.00000017 +00000-0 +19271-4 0 9992
2 21576 098.6201 332.6919 0002839 321.6987 038.4000 14.41572293482883
Have you been running all the other examples in the same way? With EMBERS installed using pip install -e .
in the source tree?
I've attempted to replicate the way you're running things. I have my source directory at ~/Repos/EMBERS
within which I've created a virtual environment and run pip install --editable .
If I do a pip list when this virtual environment is active, I see that embers
version 0.8.2
is installed form source
Package Version Location
----------------------------- ----------- -----------------------------------
embers 0.8.2 /Users/amanchokshi/Repos/EMBERS/src
I've then created a new directory ~/Repos/em_test/
which just contains tiles_data
and embers_out/sat_utils/TLE
Keeping the original virtual env active, I run the following
$ ephem_batch
$ ephem_chrono
Checking the contents of the first file in the ephem_chrono
dir:
$ tail embers_out/sat_utils/ephem_chrono/2019-10-10-00:00.json
1.5377585726597032,
1.5366094819324243,
1.5354642595022225,
1.534322886123595,
1.5331853425510376,
1.5320516095390462,
1.5309216678421183
]
}
]%
Have I deviated from your setup in any way?
PS: It does usually take a while to process the data (~10 mins), if we were working with a larger data set. The sample data only contains a couple of satellite passes each. I'll remove the the coffee message in the next EMBERS version.
So many matplotlib
figures should not be opened by ephem_batch
, instead figures should be saved to embers_out/sat_utils/ephem_plots
. Does ephem_batch
create this directory and .png
figures for you?
Can you update EMBERS to V0.8.3, or git pull
and build from source again? I've explicitly closed the figures after the plot is saved. This shouldn't be necessary, but may help.
Also have removed the coffee message, which was unnecessary
Indeed my embers_out/sat_utils/TLE/21576.txt looks like yours. I've been able to run all steps before this fatal one just fine, all the figure are like yours.
You have also done what I did , with the pip installing it with source in place. I use a pretty standard anaconda3 as the base for my python, not the system python. I've actually reinstalled that one.
I will retrace the steps, but a question before I do this, can i reinstall and just do ephem_batch, it doesn't need the lengthy align_batch, does it ?
A suggestion on your side might be to have some kind of -v flag for all your scripts, which makes it verbose, to perhaps help to see some expected intermediate output, helping you to see what's going on here. In my own system I have debug=LEVEL, which will create more and more output the higher the level is. Other packages have -vvv instead of -v to increase the verbosity level :[) Even for myself I find it useful, as I forget my own code, and a debug level could help with that.
So many
matplotlib
figures should not be opened byephem_batch
, instead figures should be saved toembers_out/sat_utils/ephem_plots
. Doesephem_batch
create this directory and.png
figures for you?Can you update EMBERS to V0.8.3, or
git pull
and build from source again? I've explicitly closed the figures after the plot is saved. This shouldn't be necessary, but may help.Also have removed the coffee message, which was unnecessary
I noted that 3 files in TLE have 3 length:44019.txt 44020.txt 44025.txt - the rest have anywhere from 20 to 78 lines. There are 73 files in here.
I do see 70 plots in the ephem_plots directory. (not 73, since 3 of my TLE's were 0 length)
I've updated code again. Ran ephem_chrone, but still no json files.
You don't need to run align_batch
again. The only thing that any of the ephem
scripts should need are the TLE files. It's also okay that 3 of the TLE files are empty, they have very sparse satellite coverage and on that data had np passes visible from the site.
I've been using pyenv
to install python versions on my system. Currently I've used pyenv
to set by default version to 3.8.3
. I used pyenv virtualenv
to create my virtual enviroments. I don't think this should be an issue.
Are you on a python version >= 3.6? I have used f-strings, do 3.5 will have problems
Just a thought, is the directory with tiles_data
and embers_out/sat_utils/TLE
on your external VFAT
formatted drive? The names of the json files created have a colon :
in them. Could that be tripping things up?
You don't need to run
align_batch
again. The only thing that any of theephem
scripts should need are the TLE files. It's also okay that 3 of the TLE files are empty, they have very sparse satellite coverage and on that data had np passes visible from the site.I've been using
pyenv
to install python versions on my system. Currently I've usedpyenv
to set by default version to3.8.3
. I usedpyenv virtualenv
to create my virtual enviroments. I don't think this should be an issue.Are you on a python version >= 3.6? I have used f-strings, do 3.5 will have problems
python 3.8.5
the whole tree with embers_out and tiles_data is on a regular ext4 formatted drive.
The json files look fine in their name, e.g.
-rw-rw-r-- 1 teuben teuben 2 Nov 19 09:00 embers_out/sat_utils/ephem_chrono/2019-10-10-00:00.json
Lets rule out problems in ephem_batch
so that we can focus on ephem_chrono
. Can you try and see if the log seems similar?
$ tail embers_out/sat_utils/ephem_data/ephem_batch.log
INFO: ephem_batch: Saved sky coverage plot of satellite [25481] to ./embers_out/sat_utilsephem_plots/25481.png
Saved ephemeris of satellite [25481] to ./embers_out/sat_utilsephem_data/25481.npz
INFO: ephem_batch: Saved sky coverage plot of satellite [23546] to ./embers_out/sat_utilsephem_plots/23546.png
Saved ephemeris of satellite [23546] to ./embers_out/sat_utilsephem_data/23546.npz
INFO: ephem_batch: Saved sky coverage plot of satellite [25479] to ./embers_out/sat_utilsephem_plots/25479.png
Saved ephemeris of satellite [25479] to ./embers_out/sat_utilsephem_data/25479.npz
INFO: ephem_batch: Saved sky coverage plot of satellite [23545] to ./embers_out/sat_utilsephem_plots/23545.png
Saved ephemeris of satellite [23545] to ./embers_out/sat_utilsephem_data/23545.npz
INFO: ephem_batch: Saved sky coverage plot of satellite [25478] to ./embers_out/sat_utilsephem_plots/25478.png
Saved ephemeris of satellite [25478] to ./embers_out/sat_utilsephem_data/25478.npz
Also that the first plot embers_out/sat_utils/ephem_plots/21576.png
looks like this:
And finally that the size of the ephem_data
directory matches mine:
$ du -sh embers_out/sat_utils/ephem_data
>>> 2.2M embers_out/sat_utils/ephem_data
log file looks ok, but has a different order of the files downloaded.
File has 143 lines (73 for files, 70 for successfull ones)
the plot looks similar, though mine has 41 passes, yours has 5.
my du -sh embers_out/sat_utils/ephem_data
17M embers_out/sat_utils/ephem_data
e.g. the one you plotted, where I had 41 passes:
-rw-rw-r-- 1 teuben teuben 146855 Nov 19 08:26 21576.npz
It has 70 files (3 missing due to 0 length TLE files)
On 11/19/20 9:36 AM, Aman Chokshi wrote:
>
Lets rule out problems in |ephem_batch| so that we can focus on
|ephem_chrono|. Can you try and see if the log seems similar?|$ tail embers_out/sat_utils/ephem_data/ephem_batch.log INFO:
ephem_batch: Saved sky coverage plot of satellite [25481] to
./embers_out/sat_utilsephem_plots/25481.png Saved ephemeris of
satellite [25481] to ./embers_out/sat_utilsephem_data/25481.npz INFO:
ephem_batch: Saved sky coverage plot of satellite [23546] to
./embers_out/sat_utilsephem_plots/23546.png Saved ephemeris of
satellite [23546] to ./embers_out/sat_utilsephem_data/23546.npz INFO:
ephem_batch: Saved sky coverage plot of satellite [25479] to
./embers_out/sat_utilsephem_plots/25479.png Saved ephemeris of
satellite [25479] to ./embers_out/sat_utilsephem_data/25479.npz INFO:
ephem_batch: Saved sky coverage plot of satellite [23545] to
./embers_out/sat_utilsephem_plots/23545.png Saved ephemeris of
satellite [23545] to ./embers_out/sat_utilsephem_data/23545.npz INFO:
ephem_batch: Saved sky coverage plot of satellite [25478] to
./embers_out/sat_utilsephem_plots/25478.png Saved ephemeris of
satellite [25478] to ./embers_out/sat_utilsephem_data/25478.npz |Also that the first plot |embers_out/sat_utils/ephem_plots/21576.png|
looks like this:21576
https://user-images.githubusercontent.com/40157844/99679853-881b8080-2ad0-11eb-889c-1b4341201360.pngAnd finally that the size of the |ephem_data| directory matches mine:
|$ du -sh embers_out/sat_utils/ephem_data >>> 2.2M
embers_out/sat_utils/ephem_data |—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/2629#issuecomment-730415928,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZ4MGNROWLMCID7EYTE4S3SQUUNXANCNFSM4QSAJ3TA.
It's really strange that you have 41 satellite passes. Can you try this
$ ephem_single --sat=21576 --out_dir=tmp
$ tree tmp
tmp
├── ephem_data
│ └── 21576.npz
└── ephem_plots
└── 21576.png
The .npz
data file should be only 21K
in size. If the plot has more than 5 satellites, it means that something is wrong with the TLE files. This would explain why ephem_chrono
is making empty files
It's a long shot, but can you check if the contents of the TLE file is the same as mine:
$ more embers_out/sat_utils/TLE/21576.txt
1 21576U 91050C 19282.72964883 .00000014 00000-0 18223-4 0 9990
2 21576 98.6203 331.5014 0002898 326.0593 34.0410 14.41572163482935
1 21576U 91050C 19283.90959667 +.00000017 +00000-0 +19271-4 0 9992
2 21576 098.6201 332.6919 0002839 321.6987 038.4000 14.41572293482883
ls -l tmp/ephem_/
-rw-rw-r-- 1 teuben teuben 146855 Nov 19 10:18 tmp/ephem_data/21576.npz
-rw-rw-r-- 1 teuben teuben 171585 Nov 19 10:18 tmp/ephem_plots/21576.png
so much bigger. The png shows 41 passes.
On 11/19/20 10:07 AM, Aman Chokshi wrote:
>
It's really strange that you have 41 satellite passes. Can you try this
|$ ephem_single --sat=21576 --out_dir=tmp $ tree tmp tmp ├──
ephem_data │ └── 21576.npz └── ephem_plots └── 21576.png |The |.npz| data file should be only |21K| in size. If the plot has
more than 5 satellites, it means that something is wrong with the TLE
files. This would explain why |ephem_chrono| is making empty files—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/2629#issuecomment-730436600,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZ4MGLXEJYJN6DEX5NKB7LSQUYDRANCNFSM4QSAJ3TA.
mine is clearly longer
more embers_out/sat_utils/TLE/21576.txt
1 21576U 91050C 19274.12296826 .00000022 00000-0 20647-4 0 9992
2 21576 98.6215 322.8165 0003370 355.9403 4.1757 14.41571172481698
1 21576U 91050C 19274.74764681 +.00000022 +00000-0 +20705-4 0 9994
2 21576 098.6214 323.4469 0003330 354.0023 006.1125 14.41571283481565
1 21576U 91050C 19275.64996028 .00000018 00000-0 19615-4 0 9999
2 21576 98.6213 324.3575 0003275 351.1025 9.0105 14.41571404481914
1 21576U 91050C 19276.69109109 .00000011 00000-0 17383-4 0 9994
2 21576 98.6211 325.4081 0003216 347.5805 12.5303 14.41571520482061
1 21576U 91050C 19277.73222188 .00000003 00000-0 14745-4 0 9993
2 21576 98.6209 326.4587 0003154 343.9674 16.1414 14.41571600482214
1 21576U 91050C 19278.70394390 -.00000000 +00000-0 +13856-4 0 9996
2 21576 098.6208 327.4393 0003103 340.6734 019.4337 14.41571694482138
1 21576U 91050C 19279.12039615 .00000001 00000-0 14345-4 0 9993
2 21576 98.6207 327.8595 0003082 339.1436 20.9627 14.41571750482416
1 21576U 91050C 19279.74507450 +.00000006 +00000-0 +15778-4 0 9994
2 21576 098.6207 328.4898 0003052 336.8952 023.2099 14.41571851482289
1 21576U 91050C 19280.36975290 .00000009 00000-0 16766-4 0 9996
2 21576 98.6206 329.1202 0003016 334.7424 25.3617 14.41571918482593
1 21576U 91050C 19280.71679643 +.00000012 +00000-0 +17538-4 0 9990
2 21576 098.6206 329.4703 0002999 333.5040 026.5995 14.41571968482421
1 21576U 91050C 19281.68851827 .00000014 00000-0 18394-4 0 9996
2 21576 98.6205 330.4508 0002950 329.8875 30.2144 14.41572084482783
1 21576U 91050C 19282.72964883 .00000014 00000-0 18223-4 0 9990
2 21576 98.6203 331.5014 0002898 326.0593 34.0410 14.41572163482935
On 11/19/20 10:11 AM, Aman Chokshi wrote:
>
It's a long shot, but can you check if the contents of the TLE file is
the same as mine:|$ more embers_out/sat_utils/TLE/21576.txt 1 21576U 91050C
19282.72964883 .00000014 00000-0 18223-4 0 9990 2 21576 98.6203
331.5014 0002898 326.0593 34.0410 14.41572163482935 1 21576U 91050C
19283.90959667 +.00000017 +00000-0 +19271-4 0 9992 2 21576 098.6201
332.6919 0002839 321.6987 038.4000 14.41572293482883 |—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openjournals/joss-reviews/issues/2629#issuecomment-730439126,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZ4MGL4ZM7JLITBB3OK6FLSQUYRPANCNFSM4QSAJ3TA.
The data you have doesn't match what's on github currently. I initially had too much data on the sample data repo and had to reduce it to make the examples run in a reasonable time.
Can you pull down the sample data repo again, or just do download the TLE dir from the sample data repo?
The new TLE files should be sufficient to get ephem_chrono
to work.
Can you check whether all the files in tiles_data
have the time-stamp 2019-10-10
in them? If they do, you have the correct data and the wrong TLE files. If they don't you'll unfortunately have do pull the whole sample data repo again.
Sorry for the hassle. Hopefully it's smooth sailing from here 🤞
I redid the data repo instructions. I now have a short embers_out/sat_utils/TLE/21576.txt file, with 4 lines.
The tiles_data are all (1440 I see) from 2019-10-10
I'm afraid, even with a new align_batch and ephem_batch I still get a blank ephem_chrono.
I think I'm going to redo the whole workflow.
Maybe you could make a bash script with just the important stuff and let it run?
#!/bin/bash
align_batch --max_cores=1
ephem_batch --max_cores=1
ephem_chrono
Although I had done exactly those steps, now it's producing non-zero json files. I see 48 json files, of which the first 16 have a good size. Until 2019-10-10-07:30 The ones from 2019-10-10-08:00.json to 2019-10-10-23:30.json are blanked.
/ooo November 23 until November 30
If we can fix this last item, we're done with the review!
Most helpful comment
I've been adding some comments in the issue tracker of the repo, author is working on those. I'm about halfway through the examples, a bit more work than I thought, will take a night sleep on it. more tomorrow.