January 29, 2020

Simple ETL with bash, jq, and SQLite

I run a couple simple ETL pipelines for side projects, and after experimenting with Lambda to kick off Python jobs (and finding it frustrating), I've settled on simple bash scripts (along with the awesome jq util) run via cron on a DigitalOcean instance. Not only is it more stable, but I've found it to be much faster. Recently, a thread on HN convinced me to try adding SQLite to the mix, and I've been happy with the results.

For my Twitter timeline scrape, I run a script every five minutes that scrapes the last 200 timeline tweets (via twurl), converts the JSON into a CSV, and imports that CSV into a SQLite db. A unique key on the id col prevents duplicates, and so far the setup has proven solid.

The trick is using jq's @csv filter (along with the -r flag to not escape output as JSON) to output the tweets to CSV:

echo 'id,created_at,user_id,screen_name,is_retweet,text' > tweets.csv
jq -r '.[] | [.id, .created_at, .user.id, .user.screen_name, .retweeted_status != null, .text] | @csv' data/tweets.json >> tweets.csv

Then all that remains is to import the CSV into your SQLite db:

sqlite3 tweets.sqlite -cmd '.mode csv' '.import tweets.csv tweets'

You can pass a single command to sqlite3's CLI, but when you need to chain multiple commands (as importing from CSV requires .mode csv), use the -cmd flag for each of the first N-1 commands (listed commands will be executed in order).

The HN thread gives some tips (recommended SQLite pragmas, etc.) on how to use this with larger pipelines, and I look forward to using this with my transit scraper (1000s of vehicle locations updating every minute).

November 29, 2018

Facebook's responsibility in an election crisis

PBS Frontline interviewed Alex Stamos, ex-Facebook chief security officer, on the company's response in wake of 2016 election. Some interesting quotes:

[One] of my big fears is that we’re going to see other U.S. adversaries—Iran, North Korea, China—jump into the information warfare space in 2018, and especially in 2020.

Prescient statement, as this interview was conducted in September, a month before the intelligence community warned of Russian, Iranian, and Chinese interference leading up to the midterms.

Do you think Facebook has earned the trust to be able to say, “Trust us; we’ve got this”?

I’m not going to answer that. I’m sorry—that’s just—everybody can make that decision for themselves.

To me, that reads "no".

If the GRU pulled the same playbook in 2018, if right now WikiLeaks came out with the email inboxes of the five most vulnerable Democratic Senate candidates, nothing would be different in 2018 than it was in 2016, and I think we’ve got to start to think about when that happens, because that is very much a possibility if not in the midterms, in the presidential election.

Stamos makes the point that government regulation is the answer, but as Zuckerberg had to explain the internet to our aged Senators, I wouldn't wait on government to take the lead. Tech companies understand their own platforms better than anyone else; as such, they are the most capable of understanding the consequences of under-moderation. If you build a platform, the burden of moderation falls upon you.

August 29, 2016

Defining page titles in react-router route config

I’ve been using react-helmet to set page titles in Nomos, but it’s felt weird to me to litter my JSX templates with tags. Thinking it would be cleaner (and would allow me to re-use titles, e.g. for breadcrumbs) if I pulled the page titles into the route config, I created a higher-order component that I now just wrap my container components in.

I use this by defining a title on my route’s config, which can be either a string or a function (which receives the component’s props as an argument). Something like:

<Route
  path='/documents/:documentId'
  component={DocumentContainer}
  title={props => props.documentName}
/>
March 25, 2016

Combining iOS SMS message histories from multiple devices

A couple months back, I gave up on my iPhone 6 (too unwieldy) and began using my old iPhone 5s. Regrettably, I didn’t restore the 5s from a backup, so its SMS history began as a clean slate. Two months later, in preparation for the timely iPhone SE release, I find myself wanting to merge the two SMS histories. Unfortunately, no such way existed, officially or otherwise... until now.

With some Ruby/SQLite hackery, I managed to create ios-sms-merge - you’ll need to download the kludgy iBackupBot to import/export from your iOS backups, but with persistence and this script, you can now combine your SMS histories.

December 13, 2015

Using Slack at home

After reading about Facebook Messenger’s complete disregard for user privacy, I moved communication with the girlfriend over to Slack. We are apparently not the first to do so.

We’ve been happy with the experience. Search, totally absent from FB Messenger, is great. Multi-channeled communication is quite useful for home communication (we separate messages by topic: #home, #groceries, or just #general chat). Additionally, we’ve spun up a chatbot to manage shared lists (e.g., groceries) and tell us when our morning bus is arriving.

It would be great to have more contextual awareness in the Slack app, however. Location, especially, would be nice to have for more relevant chatbot responses. “Where’s my bus?” is a lot more natural than “Where’s the 28 southbound on 34th street?”

I’ve shared my hubot scripts at https://github.com/licyeus/home-hubot.

All content, unless otherwise noted, is licensed under a Creative Commons Attribution 4.0 International License.