Citrics — Part 2

Samuel Swank
5 min readOct 22, 2020

The Story of the Web App That Helps Digital Nomads Find Their Next Home — Continued

Visualizations Shown in App for Locale, Salt Lake City, UT

Where We Left Off

In a previous article, I discussed the planning processes and initial implementation of the Citrics web application. In particular, I related the need to clean up the API, removing superfluous routes so that it would be readable be more readable for the current web development team and the team that will take over the project this November. I also related my idea to use route referencing and cache-in on the power of our PostgreSQL database, the former reducing the number of redundancies in our code, the latter reducing the number of calculations performed by the API.

Post-Implementation — A World of Difference

Cleaning Up the API Routes

I believe my reader will agree after viewing the before implementation and after implementation screenshots of our API, that the current API is much cleaner.

Left: Before, Right: After

Note in the left that some routes are divided into viz and view types, such as /rent_viz/{city}_{statecode} and /rent_viz_view/{cityname}_{statecode}. In the original API, the former would return a JSON object representing the Plotly graph, and the the latter would return an embedded .png image of the graph. Considering that the only key difference between the two was the return statement, at my my suggestion, the data scientist on our team who wrote the script simply combined the two with view added to the viz route as an additional optional parameter, ending the script with an if-else loop like so:

if view:
img = fig.to_image(format="png")
return StreamingResponse(
io.BytesIO(img),
media_type="image/png"
)
else:
return fig.to_json()

Route Referencing

Implementing a system in which one route would reference another allowed us to prevent generating further clutter in the API and it’s associated directory. Under the previous system, if I were to have developed a

weather_pred/{city}_{state}

route, and associated

weather_pred/viz/{city}_{state}

and

weather_pred/viz/view/{city}_{state}

routes, I would have wound up placing each in a different file with the latter two repeating the former’s code differing from one another only by a few lines of code and a return statement. Fortunately, due to the FastAPI’s elegant design route referencing allows the three routes, now condensed into two as described above, to be stored in a Python script as shown below. FastAPI does this by simply using one route’s function in another route preceded by await.

Route Referencing in weather_pred.py

Database Caching

Notice in the above predictive route how the PostgreSQL database connection is contained in an object stored in another file. This allows for the full versatility of the psycopg2 library, while still the security of the database credentials. This also avoids the verbosity of explicitly declaring the connection in the weather_pred.py script.

The predictive route shown above is implemented in the manner described in my previous article. The function first checks the databases weather_pred table to see if predictions had already been calculated. If the data base has predictions for the city, they are simply returned as such with no unnecessary calculation. If they have not, the predictions are calculated, cached, and returned to the user. An example of the visualization of this output is shown in the featured image shown above (see bottom graph).

Looking Forward

Potential Memory Issues

During the last two weeks of my tenure on this project, while in the process of writing another predictive script for population predictions based on Census Bureau data, we began to run into memory usage problems. Though we were never fully able to diagnose the problem, the fact that none of the models in any of the routes were pickled may have been a contributing factor. This is largely due to the fact that, in working with time series data, rather than working with a single model per route, each model is unique to the the locale for which predictions are being requested. A potential solution to this is shown in the flow chart below.

Pickling Plan

Essentially, each predictive route would be split into two separate routes. One route would handle the model pickling, assigning a name which would identify it with the locale and the time the predictions were made to the pkl file, for easy retrieval by the predictions route. The pkl files would need to be stored on an online hosting service due to their size and quantity. The API would not only need to access said file hosting service for storage, but for retrieval in the predictions route which would treat the former much the way it treats the database.

Data Sourcing

A major limitation of the API’s current deployment is that the bulk of the data used in the project has been taken from datasets in the form of static csv files. The API is, therefore, outdated. In order to update this, a method would need to be devised involving the use of more third party APIs. In some cases, as with the bls (Bureau of Labor Statistics) and census routes, these organizations’ APIs periodically update these data.

With dynamic data, however, we introduce greater complexity, especially with regard to time series modeling. One approach would be to treat each table or column in *the PostgreSQL database as a something of a ring-buffer wherein old information is periodically discarded and new information is periodically inserted into the appropriate tables at the appropriate times, ensuring up-to-date predictions, rather than stale ones based on old data.

--

--