Almanac.httparchive.org: Build the development environment to maintain the Almanac

Created on 23 May 2019  Â·  21Comments  Â·  Source: HTTPArchive/almanac.httparchive.org

AI(@HTTPArchive/developers): Design the tech stack for the Almanac.

The Almanac user experience will be entirely static and stateless, so a solution as simple as GitHub Pages could work. I think we want a bit more control over the backend (response headers, SSR templates) so I'm leaning towards a similar setup as https://github.com/HTTPArchive/httparchive.org in which we build on App Engine. Thoughts?

TO DO:

  • [x] Create Python App Engine development environment with Flask
  • [x] Create an initial base template, styles
  • [x] Extend the base template to create a temporary splash page to be deployed on almanac.httparchive.org while the project is under construction
development

Most helpful comment

@rviscomi I happy to help on this. I'm starting to use Python on my country dashboard and it might be related with this one.

My current setup:

  • python to generate the data in CSV and JSON then upload it to Google Storage
  • hugo for static website to serve the data and visualize the data.

All 21 comments

I'm not very familiar with python, but I'll do my best to pick it up quickly. I think the first step (for me, at least) would be to get HttpArchive running locally and kick the tires (just to make sure I understand the reference application). I agree that we would definitely want greater control over the backend, I don't know how you would do this without it. I also agree with server side rendering as much as possible.

If I can ask a couple questions about the front end and UX. Firstly, Is using something like gatsby out of the question? Secondly, for some of the dynamic parts of the application (I'm just assuming there will be some), would it be possible to use something lightweight (like lit-html / lit-element) that has good SVG rendering support?

I see that highcharts was used in HttpArchive, is the idea to use that again? Will users of the almanac be able to interrogate (filter etc) the data, or to have more static / infographic / visualisation type data.

(Sorry, there were more than two questions. I'll probably have more. )

These are all great questions, keep them coming.

Firstly, Is using something like gatsby out of the question?

I'm not ruling anything out. I also realize that the development team will have diverse experiences so I do want to pick a tech stack that allows everyone to iterate quickly and easily. The Python stack is just one I've used before and I'm comfortable with. If we go with something outside of my comfort zone I'll have to rely on others to take the lead.

Secondly, for some of the dynamic parts of the application (I'm just assuming there will be some), would it be possible to use something lightweight (like lit-html / lit-element) that has good SVG rendering support?
Will users of the almanac be able to interrogate (filter etc) the data, or to have more static / infographic / visualisation type data.

Yeah the data visualization will be somewhat interactive. It still remains to be seen what that might look/feel like. I do think SVG is one of our best options here, although I don't have much experience building SVG charts/visualizations from scratch. I'll have to look into lit-html for this. What's your experience like with it?

I see that highcharts was used in HttpArchive, is the idea to use that again?

We're not tied to it and I suspect that it might be overkill for many of our use cases. For example, one stat might be "99% of website are using Foobar". In this case the visualization might just be really big text that says "99%". For the more complex data/chart types (eg pie chart), I wouldn't want us to reinvent the wheel to get it visualized, but I do appreciate having lightweight, custom-built components.

I think it’s MUCH more important that you’re comfortable with the solution. Besides, I now have a reason to learn python properly...

I’m pretty comfortable with lit-html (using it every day), and unless I’m mistaken @tjmonsi is too. It’s really nice at doing simple visualizations / composing / generating svg. If you need something more complicated then highcharts or d3 would probably be better than doing it by hand.

This would only be in the case that the UX would require it. If it can be done server side / statically, it probably should?

Ok cool, in that case it's a good idea for you to try to get the httparchive.org site running locally. I could even find some good first issues for you to work on to get comfortable with it.

This would only be in the case that the UX would require it. If it can be done server side / statically, it probably should?

Yes, I want to do as little as possible on the client side for simplicity and performance.

@rviscomi I happy to help on this. I'm starting to use Python on my country dashboard and it might be related with this one.

My current setup:

  • python to generate the data in CSV and JSON then upload it to Google Storage
  • hugo for static website to serve the data and visualize the data.

Sounds great, glad to have you on board @tyohan! The Python setup in our case will be mostly for serving static content, using Flask. Would be great to exchange ideas.

Maybe we could use Vue for this? We could combine it with ES6 Modules. Therefore we could keep the website static but interactive, and not have to add Node to the mix.

My familiarity with Vue is as a client side rendering library. Is that accurate? I do like it but I think a static site like this would be better to build server side for performance.

@arswaw are you interested in joining the development team? 🤞

Would we need a server were it not for SSR?

I don't know yet if I will be available to join the team.

I think a server would be desirable for having control over things like cache policies, CSP, and redirects.

AWS Cloudfront could do that without the need to manage a server.

I'm only concerned about adding more complexity than is needed.

I like the idea of a static site generator such as Gatsby. If we used gatsby-mdx - authors could write in markdown and we could embed react components for interactive components as needed. This could be a really performant solution and allow for easy editing and development, after the initial setup. What do you guys think?

That said, it may be premature to figure out our solution top to bottom without knowing our content or design. :)

+1 to authoring in markdown. Only complication I can think of now is support for the data viz. I've had success with Flask-Markdown before, which is how we render the FAQ page from the faq.md file.

Not sure about React components though. I'm looking to do as little on the client as possible.

I'm inclined to reuse the same stack as the httparchive.org repo (App Engine, Flask) because a lot of the setup will be the same and anyone familiar with one repo will be able to help in the other. I'm not saying it's the simplest solution but I'm comfortable enough with it that I can be more useful ramping others up. It's also really nice that we could reuse our existing GCP billing account. (paying for things is hard when you don't have an org credit card!)

I would also like a Flask / App Engine site as well with as little JS as
possible. I remember Paul Lewis did a similar thing with Chrome Dev Summit
site that uses a simple JS to load HTML from the Flask / App Engine site to
change the site's page to make it look like it is a single page app, but if
JS is "off" then it will revert back to loading it from the server. It
didn't use any clunky library and framework to boot.

I would think it is good to audit the site's interactivity so that we can
minimize the JS to load (only the reusable interactive components need to
have JS)

On Sat, May 25, 2019 at 2:51 AM Rick Viscomi notifications@github.com
wrote:

+1 to authoring in markdown. Only complication I can think of now is
support for the data viz. I've had success with Flask-Markdown
https://pythonhosted.org/Flask-Markdown/ before, which is how we render
the FAQ page https://httparchive.org/faq from the faq.md file
https://github.com/HTTPArchive/httparchive.org/blob/master/docs/faq.md.

Not sure about React components though. I'm looking to do as little on the
client as possible.

I'm inclined to reuse the same stack as the httparchive.org repo
https://github.com/HTTPArchive/httparchive.org (App Engine, Flask)
because a lot of the setup will be the same and anyone familiar with one
repo will be able to help in the other. I'm not saying it's the simplest
solution but I'm comfortable enough with it that I can be more useful
ramping others up. It's also really nice that we could reuse our existing
GCP billing account. (paying for things is hard when you don't have an org
credit card!)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/HTTPArchive/almanac.httparchive.org/issues/25?email_source=notifications&email_token=AAUF5VXWAJMF6YCKEWG3YBDPXA2LDA5CNFSM4HPHCLL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWGJNSQ#issuecomment-495752906,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAUF5VXRDWOEPHGES2G57TTPXA2LDANCNFSM4HPHCLLQ
.

Thanks for the discussion everyone. I've created a Hello World application built with Flask in the src directory.

@HTTPArchive/developers please make sure you can run it locally using the instructions in the readme and let me know if you run into any issues.

The next step will be to create some basic templates to deploy live as a "coming soon" splash screen.

I can confirm that I can get it running. Not sure if python version matters, but I had to use pip3 instead of pip (pretty sure that's a 'me' problem)?

Another (really trivial) thing, but the paths in the README.md file assume that it is in the repo's root, which at the moment it isn't. So commands like pip install -r src/requirements.txt should be pip install -r requirements.txt.

~A few~ Many questions:

  • Can I fix the paths?
  • Do you want us to PR or can we just push to master (not sure about the way-of-working just yet)?
  • Do you have any idea of what you'd like on the 'coming soon' page that I can run with?
  • Is it reasonable to assume that we'll want to at minimum have a route for each chapter, so can we go ahead and create those so long?
  • Do you want us to investigate ways of merging MD authoring with the potential data-vis requirements?
  • How are we going to track who does what? How would you prefer we work? Via issues?

Sorry for the bombardment! :)

Great questions. For now since it's a very minor change go ahead and push
the path fix to master. I'll get back to you soon about the rest.

On Thu, May 30, 2019, 2:03 AM Mike Geyser notifications@github.com wrote:

I can confirm that I can get it running. Not sure if python version
matters, but I had to use pip3 instead of pip (pretty sure that's a 'me'
problem)?

Another (really trivial) thing, but the paths in the README.md file
assume that it is in the repo's root, which at the moment it isn't. So
commands like pip install -r src/requirements.txt should be pip install
-r requirements.txt.

A few Many questions:

  • Can I fix the paths?
  • Do you want us to PR or can we just push to master (not sure about
    the way-of-working just yet)?
  • Do you have any idea of what you'd like on the 'coming soon' page
    that I can run with?
  • Is it reasonable to assume that we'll want to at minimum have a
    route for each chapter, so can we go ahead and create those so long?
  • Do you want us to investigate ways of merging MD authoring with the
    potential data-vis requirements?
  • How are we going to track who does what? How would you prefer we
    work? Via issues?

Sorry for the bombardment! :)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/HTTPArchive/almanac.httparchive.org/issues/25?email_source=notifications&email_token=AAIRVAHF5OCTNHDK4ACNIY3PX5UZ7A5CNFSM4HPHCLL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWROKAI#issuecomment-497214721,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIRVAG7NDUFEIJIB737TKTPX5UZ7ANCNFSM4HPHCLLQ
.

I've created a team wiki page where we can memorialize some of these FAQ. Feel free to help move them over 😄

I can confirm that I can get it running. Not sure if python version matters, but I had to use pip3 instead of pip (pretty sure that's a 'me' problem)?

Here are my versions:

(env) $ python --version
Python 2.7.12
(env) $ pip --version
pip 19.0.3

GVR just tweeted a reminder about Python 2 reaching end-of-live next year, so it would probably be wise to rely on Python 3.

  • Do you want us to PR or can we just push to master (not sure about the way-of-working just yet)?

Incremental improvements to the README/documentation can be pushed directly to master. Structural changes to the app itself should go through PRs and have at least one LGTM from any member of the dev team. Architectural decisions should be made in issues like this one, so we have alignment before code is written.

  • Do you have any idea of what you'd like on the 'coming soon' page that I can run with?

Something basic like this:

Web Almanac 2019

Coming Soon

  • Is it reasonable to assume that we'll want to at minimum have a route for each chapter, so can we go ahead and create those so long?

I think chapters would share a single route. Similar to how HTTP Archive reports have a single report route. For example:

@app.route('/<year>/<chapter_id>')

This would route URLs like https://almanac.httparchive.org/2019/javascript.

It might be too early to start building the routes if we don't have anything to render yet. We could manually return a 404.

  • Do you want us to investigate ways of merging MD authoring with the potential data-vis requirements?

Yes, that would be a big help!

  • How are we going to track who does what? How would you prefer we work? Via issues?

Yes, we'll use issues for each step of the implementation and assign a developer to each one. So for example, this issue has a todo list of 3 items. We could create sub-issues to track each remaining item.

Sorry for the bombardment! :)

Keep them coming! 😄

The Hello World application worked for me!

FYI...this is my first really collaborative GitHub project with other people, so I may be asking lots of questions. As I try to figure stuff out.

@KJLarson great to hear it's working for you! Getting a new project set up is usually the biggest hurdle, so it should be smooth sailing from here.

Everyone here is coming from various backgrounds and experience levels, so chances are there's going to be someone else with the exact same question as you. Don't hesitate to ask about anything and I know we'd all be glad to answer.

The coming soon page has been deployed at https://almanac.httparchive.org. Thanks @mikegeyser! This issue can be closed now. I'll be filing new issues for all other development team tasks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rviscomi picture rviscomi  Â·  5Comments

rviscomi picture rviscomi  Â·  6Comments

rviscomi picture rviscomi  Â·  6Comments

bazzadp picture bazzadp  Â·  4Comments

rviscomi picture rviscomi  Â·  6Comments