•••
Update (June 2019): Much of this post remains true, and is still relevant ~six months later (slow IO perf, WSL feeling like a shim), but there are some major improvements just around the corner.
Specifically, with WSL2 moving to a VM-based architecture, a lot of the perf woes are scheduled to disappear. With VS Code’s Remote extension, the “two halves of the same system” problem - where you have to duplicate your toolchain - is effectively gone (this has been my favorite improvement so far, by a long shot). On the terminal front, we’re almost there: Alacritty still (unfortunately) struggles with Unicode glyph rendering on Windows, but Microsoft has open-sourced their own Windows Terminal, and it’s actually really good, even in this preview state.
I’d say that, six months after writing this post, that WSL (as it exists in June 2019) is not a replacement for every dev environment just yet. But there’s been meaningful steps to make it better, and I’m fighting the “shim” less and less now with WSL2 & the remote extension. macOS is still likely the best ‘default’ choice for many, but it’s good to have options.
It’s been ~5 months since I’ve used macOS proper, after 13+ years of personal use and a handful of work-use. This began when I started using my Windows “gaming” desktop & WSL (Windows Subsystem for Linux) for maintaining OSS projects & other dev-work—in-between dungeons or rounds of Overwatch—purely out of the convenience of being on the same machine.
It came to a head when I realized my 12” MacBook was collecting dust, that I wasn’t using it at work (ChromeOS + Crostini), and when I saw the Surface Pro 6 on sale. I decidd to see if I could live with WSL closer to full-time, and critically, go without macOS. And so I put it up on Craigslist, sold it that weekend, and unpacked the Surface Pro a week later.
I did it partially as an experiment: Windows has been seen some significant improvements as an OSS development over the last couple of years. Could I use it for writing Go, [an increasing amount of] data science / SQL / ML explorations, and testing new cloud infrastructure? Could it really compete with the macOS developer experience, which although not perfect, is pretty darned good? I figured it wouldn’t hurt to try out, seeing as I was most of the way there already: and I figured it’d be a worthwhile process to document for other developers curious about WSL.
If you’re considering the switch, or are just curious as to what it’s like—including how WSL integrates with Windows, what tool choices you have, and importantly, what you’re going to miss from macOS—then read on.
Side-note: I wrote a short guide around my original WSL-based setup a while ago. Some of this article revises the tool choices I made at the time; the rest of it talks around the general Windows + WSL-experience and how it compares to macOS.
“The Shim”
In short: you effectively have “1.5” computers to deal with, and it feels like it at times.
Linux & Windows co-exist via the WSL layer, and although it’s generally pretty great (if not technically impressive), there are parts where the facade peels back to reveal some less-than-great interactions.
Jessie Frazelle wrote a great post on how WSL internals work (Windows <-> Linux syscall translation), and touches on some of the challenges I speak to below.
The first, and most obvious, is the way the filesystems interact. You can write to Windows from WSL - e.g. /mnt/c/Users/Matt/Dropbox/
writes to my Dropbox and works as expected, but you can’t read/write files from Windows -> WSL. Thus, accessing Windows from WSL is the “happy” path: anything you download via Chrome, in your Dropbox, on an external drive, etc - is accessible via /mnt/<driveletter>
. It’s when you’ve cloned a git repo, use wget/curl -O
to pull something down, or are iterating on a $language package in WSL and want to use a Windows-native tool that you’re destined to shuffle things around. I’ve symlinked my core working folders back into the Windows filesystem to make this part a little more livable - e.g. ln -s $USERPROFILE/repos $HOME/repos
.
You notice this filesystem gap the most when dealing with Windows-native editors but WSL-based toolchains: in my case, that’s VS Code on Windows and the Go toolchain inside WSL. VS Code doesn’t know how to look for your toolchain & packages inside WSL, and so you either need to live inside of Windows (losing your Linux tooling), install VS Code inside of WSL, which means losing the ability to open files outside of WSL + native Windows integration. The ‘partial’ solution is to use a shared $GOPATH
within the Windows filesystem, which at least means your packages only need to be fetched once, but you’ll need to be wary of potential differences should a package change implementation across OS’ (inc. the standard lib!). This is far less of a solution for systems programmers. There’s an open issue for this as it relates to vscode-go, but it still speaks to the “1.5 computers” problem I mentioned earlier.
Overall? It’s usable, you learn to live with it, but it adds friction to my day-to-day.
Terminal Emulators
I’ve bounced between a few terminal emulators here. None are perfect, and all of them make me yearn for iTerm2 on macOS. I wish it was better.
The situation is improving though, and with the ConPTY API in the October 2018 Windows 10 build (1809) making it much easier to integrate existing terminal emulators, it can only improve.
What I’ve tried so far:
- Cmder (ConEmu): fast & configurable, but poor Unicode support, tmux glitches & some emulation/escaping issues. Some improvements coming via ConPTY.
- Hyper.js: Cross-platform due to Electron underpinnings, lots of third-party plugins. Same underlying emulator as VS Code (xterm.js), but tends to be very slow launch, spawn new shells, and doesn’t keep up with lots of terminal output. I used Hyper for most of this year because despite the perf issues, it was the least buggy.
- wsltty (Mintty): Fast. Moderately configurable, but config DSL is a pain & docs are lacking. Not a bad option for most, and is the only one with mouse support for tmux out-of-the-box.
- Terminus: Similar to Hyper.js in that’s it’s Electron-based, but faster, and easier to configure. Good font rendering, doesn’t break under tmux, and has a solid tab UI. It’s still innately limited to its Electron roots in that it can be slow to launch, but handles high velocity output much better than Hyper.
- Alacritty: A (very) fast, minimalist cross-OS emulator with a well-documented configuration. Windows support relies on winpty-agent, and font rendering (esp. Unicode fallback) is far from perfect. There is upcoming support for the aforementioned ConPTY API is in the works, and font changes coming.
I’m using Terminus for now, but I’m hopeful about Alacritty becoming my default terminal by end of year. Terminus is “good enough despite the bugs”, which has been a good way to sum up how most tools work under WSL.
Automation & Package Management
There were (are) myriad ways to bootstrap a new Mac: usually some combination of Homebrew, a shell script calling defaults write
to set preferences, and installation of your dotfiles. Certainly, there are ways to do this on Windows—but something lightweight that doesn’t involve directly hacking at registry keys via PowerShell and has a solid community to crib from has been historically lacking.
Thankfully, there are ways to do this on Windows now: both the OS-level configuration as well as desktop package management (via Chocolatey). The answer is Boxstarter, which is a wrapper around Chocolatey itself, as well as a number of convenience functions for modifying Windows Explorer settings, enabling WSL, and removing the (honestly pretty horrible amount of) bundled applications that Windows comes with. Why does my first-party Microsoft hardware comes with a FitBit app and Candy Crush? (rhetorical; it’s $$$).
Here’s a snippet of what my Boxstarter script looks like:
# Pre
Disable-UAC
# Set PC name
$computername = "junior"
if ($env:computername -ne $computername) {
Rename-Computer -NewName $computername
}
# Set DNS upstreams
Set-DNSClientServerAddress -InterfaceIndex $(Get-NetAdapter | Where-object {$_.Name -like "*Wi-Fi*" } | Select-Object -ExpandProperty InterfaceIndex) -ServerAddresses "8.8.8.8", "1.1.1.1", "2001:4860:4860::8888", "2001:4860:4860::8844"
# Set environment variables
setx GOPATH "$env:USERPROFILE\go"
setx WSLENV "$env:WSLENV`:GOPATH/p:USERPROFILE/p"
# Install applications
choco install -y sysinternals
choco install -y vscode
choco install -y googlechrome.dev
choco install -y 1password
choco install -y docker-for-windows
choco install -y cmdermini
choco install -y discord
choco install -y spotify
choco install -y dropbox
choco install -y adobereader
choco install -y 7zip.install
choco install -y firacode
# WSL
choco install -y Microsoft-Hyper-V-All -source windowsFeatures
choco install -y Microsoft-Windows-Subsystem-Linux -source windowsfeatures
Invoke-WebRequest -Uri https://aka.ms/wsl-ubuntu-1804 -OutFile ~/Ubuntu.appx -UseBasicParsing
Add-AppxPackage -Path ~/Ubuntu.appx
RefreshEnv
Ubuntu1804 install --root
Ubuntu1804 run apt update
Ubuntu1804 run apt upgrade
# System-level configuration
Disable-BingSearch
Disable-GameBarTips
Set-WindowsExplorerOptions -EnableShowHiddenFilesFoldersDrives -EnableShowProtectedOSFiles -EnableShowFileExtensions
Set-TaskbarOptions -Size Small -Dock Bottom -Combine Full -Lock
Set-TaskbarOptions -Size Small -Dock Bottom -Combine Full -AlwaysShowIconsOn
You’ll still going to need to write some PowerShell for more advanced things (i.e. setting DNS servers), but you might also consider that a blessing, given it’s power.
Within WSL I’m using Linuxbrew, a fork of Homebrew (and which is on-track to merge with it) in cases where I need more cutting-edge packages beyond the Ubuntu repositories. Using the same brew install
workflow as I’m used to on macOS is pretty nice, and makes it a friendlier development environment without having to add package-specific repositories or build from source.
Docker
Not much has changed from last time: it works, with a few minor problems.
The docker
CLI inside WSL can talk to Docker for Windows (the daemon), so you get Hyper-V benefits there. The catch is that the CLI doesn’t know how to validate the certificates used by the daemon, and thus you either need to disable TLS for connections over localhost (bad), or do a cert-generation dance and edit the Docker for Window config file by hand to use these new certs. It’d be great if the Docker daemon did this for you, so you could just set DOCKER_CERT_PATH=/mnt/c/ProgramData/Docker/pki
and have things work securely.
As a reminder, you don’t get Hyper-V support without Windows Pro, which impacts both Linux Containers on Windows and Windows Containers on Windows (unless you want to use VirtualBox).
What I Miss
I miss FileVault and Apple’s push towards securing the device, especially with their recent Secure Enclave-based improvements: a benefit of verticalizing, really. Windows’ BitLocker continues to be untrustworthy, and I’d be far more worried about a lost Windows machine vs. a lost macOS machine. BitLocker is also awkwardly positioned as a Windows 10 Pro only feature, which in 2018, is very much the wrong thing to nickle-and-dime users over. It’s frustrating to buy a Surface Pro and then have to dole out $99 for the Windows Pro upgrade.
macOS’ community of power-user tooling is also unsurpassed: the aforementioned Alfred App as a powerful search tool, great screen-capture tools, Preview.app (the Windows PDF editor landscape is not good), Quick Look, some fantastic design tools, Automator (still good!), easy keyboard shortcut customization (no RegEdit or third-party tools), consistent keyboard shortcuts, upper quartile battery life due to tight software-hardware integration, and a single filesystem no matter whether you’re in a Cocoa app on macOS or a cross-compiled GNU tool inside iTerm2. There’s room for improvement here in both Windows-itself & WSL-land, but much of it is around developer community, and that’s a hard win.
I also want to say that I don’t share the “macOS” is dead sentiment that others do, and that hasn’t been the driver for the change. It’s just that some alternatives have finally started to close the gap, both in terms of software experience & hardware quality/support, and I was in the position to experiment with them.
Why Not All-In on Linux?
I’ll keep this short: I still depend on Lightroom, writing tools (Notion, Evernote prior), a solid default desktop environment, first-party hardware support (be it a MacBook or Surface) & battery life, and most of all, my time. I respect those who’ve invested the time into maintaining & automating a full Linux environment they can use daily, but I just don’t have the time for that investment nor am I ready to make the trade-offs required for it. To each their own.
So, Are You Going to Stick with WSL?
Before I answer: I’d love to see a few things improve, and although I think they will, some improvements will be challenging given that the WSL and Windows environments are distinct. Specificallly:
- Better interaction between filesystems; if I could access my WSL root partition via a (default, NFS) mount in Windows, then I’d have access both ways. Something like
//wsl/
or //linux
would be fantastic. For contrast, the Linux container environment within ChromeOS (“Crostini”) exposes your files into the native ChromeOS environment, and thus makes working on data across both OS’ a less disruptive process.
- Improved VS Code interactions with WSL-based tools: pointing at language servers and file paths within the WSL environment would be key to this
- A continued march towards a solid terminal emulator or two; I’m hopeful here thanks to the ConPTY changes. Microsoft contributing resources here would likely benefit the viability of WSL.
So, am I going to continue to use WSL as a dev environment?
The answer is a (reserved) yes, because most of the dev-work I do in it is OSS, exploratory or web-based, with tools that I mostly control. If I’d been dealing with the heavily Dockerized environment at my old job, and writing/debugging lots of Lua, the answer might be closer to “no”.
WSL needs another six months of tools development (ConPTY being core to that), and although I’d thought that 6+ months ago, and had hoped the experience would be a little more polished now, at least Microsoft has continued to invest resources into it. I’m not quite convinced that a Linux toolchain makes my life easier than the Darwin-based one in macOS, but here I am.
Still, try asking me again in another 6 months?
•••
FiveThityEight recently released a dataset of what is believed to be ~3 million tweets associated with “Russian trolls”. These tweets are designed to spread misinformation (let’s not mince words: lies), and ultimately influence voters. If you haven’t read the linked article, I highly suggest you do that before continuing on.
Exploring a ~700MB+ CSV file isn’t hugely practical (it’s since been sharded into < 100MB chunks), and so I’ve made the tweets available as a public dataset via Google’s BigQuery analytics engine. BigQuery has a sizeable free tier of 1TB per month, which should allow a fair bit of exploration, even if you’re a student or if paid services present a challenge for you.
Note: This isn’t a BigQuery & SQL tutorial: for that, take a look at the documentation.
If you’re already familiar with BigQuery & accessing public datasets, then you can simply run the below to start exploring the data:
#standardSQL
SELECT
author,
COUNT(*) AS tweets,
followers
FROM
`silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`
GROUP BY
author,
followers
ORDER BY
tweets DESC,
followers DESC
For everyone else: read on.
Accessing the Dataset
We’re going to use the BigQuery web UI, so navigate to the BigQuery interface and select the project you want to access it from. You’ll see the fivethirtyeight_russian_troll_tweets
table appear on the left-hand-side, in the Resource tab. From there, you can inspect the table russian_troll_tweets
, look at the schema (also pasted below), and see a preview of the data.
name |
type |
mode |
external_author_id |
FLOAT |
NULLABLE |
author |
STRING |
NULLABLE |
content |
STRING |
NULLABLE |
region |
STRING |
NULLABLE |
language |
STRING |
NULLABLE |
publish_date |
TIMESTAMP |
NULLABLE |
harvested_date |
TIMESTAMP |
NULLABLE |
following |
INTEGER |
NULLABLE |
followers |
INTEGER |
NULLABLE |
updates |
INTEGER |
NULLABLE |
post_type |
STRING |
NULLABLE |
account_type |
STRING |
NULLABLE |
new_june_2018 |
INTEGER |
NULLABLE |
retweet |
INTEGER |
NULLABLE |
account_category |
STRING |
NULLABLE |
So with the data above, what can we do? We can look at how these tweets were amplified (updates), what language the tweet was posted in (what audience was it for?), and the direct audience of the account (followers). We don’t get details on the followers themselves however, which makes it hard to know how impactful the reach was: is it trolls/bots followed by other trolls, or members of the general Twitter populace?
Analyzing It
OK, let’s take a quick look at the data to get you thinking about it. We’ll answer:
- Was there a specific account with a non-negligible fraction of tweets?
- Which months saw the most activity?
- Which tweets were the most amplified in each language?
-- Was there a specific account with a non-negligible fraction of tweets?
SELECT
author,
COUNT(*) AS count,
FORMAT("%.2f", COUNT(*) / (
SELECT
COUNT(*)
FROM
`silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`) * 100) AS percent
FROM
`silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`
GROUP BY
author
ORDER BY
percent DESC
LIMIT
10
The EXQUOTE
account was definitely a sizeable contributor, although there’s not an order-of-magnitude difference across the top 10.
author |
count |
percent |
EXQUOTE |
59652 |
2.01 |
SCREAMYMONKEY |
44041 |
1.48 |
WORLDNEWSPOLI |
36974 |
1.24 |
AMELIEBALDWIN |
35371 |
1.19 |
TODAYPITTSBURGH |
33602 |
1.13 |
SPECIALAFFAIR |
32588 |
1.10 |
SEATTLE_POST |
30800 |
1.04 |
FINDDIET |
29038 |
0.98 |
KANSASDAILYNEWS |
28890 |
0.97 |
ROOMOFRUMOR |
28360 |
0.95 |
-- Which months saw the most activity?
SELECT
FORMAT("%d-%d", EXTRACT(month
FROM
publish_date), EXTRACT(year
FROM
publish_date) ) AS date,
COUNT(*) AS count
FROM
`silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`
GROUP BY
date
ORDER BY
count DESC
LIMIT
10
Unsuprisingly here, we see October 2016 (just prior to the election on Nov 8th) feature prominently, as well August 2017, in which the North Korean conversation escalated immensely.
date |
count |
8-2017 |
191528 |
12-2016 |
155560 |
10-2016 |
152115 |
7-2015 |
145504 |
4-2017 |
136013 |
1-2017 |
135811 |
11-2015 |
132306 |
3-2017 |
128483 |
11-2016 |
123374 |
8-2015 |
119454 |
-- Which tweets were the most amplified (likes, retweets) by language?
SELECT
language,
content,
updates
FROM (
SELECT
language,
content,
updates,
RANK() OVER (PARTITION BY language ORDER BY updates DESC) AS tweet_rank
FROM
`silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`
GROUP BY
language,
updates,
content ) troll_tweets
WHERE
tweet_rank = 1
GROUP BY
language,
content,
updates
ORDER BY
updates DESC
LIMIT
10
I’ll leave analyzing these tweets as an exercise to the reader, but they certainly appear to prey on the hot button issues in a few places. Also note that I’ve truncated the output here, for brevity. Also be mindful of any links you follow here: I have not vetted them.
language |
truncated_content |
updates |
English |
‘@JustinTrudeau Mr. Trudeau, Canadian citizens dem |
166113 |
Turkish |
KARMA, KARMA, KARMA!!! https://t.co/Eh5XUyILeJ |
165833 |
Catalan |
‘@HCDotNet Excellent! 🇺🇸👠🠻😆’ |
165751 |
Farsi (Persian) |
Shameful https://t.co/rll2JrUzRI |
165468 |
Dutch |
Trump’s tweets. #ThingsITrustMoreThanCNN https:/ |
165407 |
Norwegian |
#2018PredictionsIn5Words Pro-Trump landslide |
165371 |
Vietnamese |
So sad. @TitosVodka rocks!! https://t.co/sWtLlZxL5 |
164288 |
Lithuanian |
Stump for Trump @Stump4TrumpPac https://t.co/S0NS9 |
164082 |
Estonian |
#QAnon @Q #FOLLOWTHEWHITERABBIT 🠇 #FLYSIDFLY# |
163448 |
Croatian |
‘@FoxNews @rayann2320 @POTUS Bravo Mr President!!’ |
163126 |
Wrap
There’s a lot of data to explore here, but it’s also worth keeping in mind that three (3) million tweets is only a small fraction of tweets associated with this kind of content, and this kind of bounded data collection may have some subjectivity to it.
If you have any questions about the dataset itself, you should open an issue on FiveThirtyEight’s GitHub repository. As for questions about exploring it via BigQuery: feel free to tweet @elithrar with your questions or explorations!
•••
In building my sentiment analysis service, I needed a way to get data into BigQuery + Data Studio so I could analyze trends against pricing data. My service (on App Engine) uses Firestore as its primary data store as an append-only log of all analysis runs to date.
The flexible schema (especially during development), solid Go client library & performance story were major draws, but one of the clear attractions was being able to trigger an external Firebase Function (Cloud Function) on Firestore events. Specifically, I wanted to get the results of each analysis run into BigQuery so I could run queries & set up Data Studio visualizations as-needed.
I wrote a quick function that:
- Triggers on each
onCreate
event to Firestore
- Pulls out the relevant fields I wanted to analyze in BigQuery: counts, aggregates and the search query used
- Inserts them into the configured BigQuery dataset & table.
With that data in BigQuery, I’m able pull it into Data Studio, generate charts & analyze trends over time.
Creating the Function
If you haven’t created a Firebase Function before, there’s a great Getting Started guide that steps you through installing the SDK, logging in, and creating the scaffolding for your Function.
Note: Firebase Functions initially need to be created & deployed via the Firebase CLI, although it sounds like Google will support the Firebase-specific event types within Cloud Functions & the gcloud SDK (CLI) in the not-too-distant future.
Within index.js
, we’ll require the necessary libraries, and export our sentimentsToBQ
function. This function has a Firestore trigger: specifically, it triggers when any document that matches /sentiment/{sentimentID}
is created (onCreate
). The {sentimentID}
part is effectively a wildcard: it means “any document under this path”.
const functions = require("firebase-functions")
const BigQuery = require("@google-cloud/bigquery")
exports.sentimentsToBQ = functions.firestore
.document("/sentiments/{sentimentID}")
.onCreate(event => {
console.log(`new create event for document ID: ${event.data.id}`)
// Set via: firebase functions:config:set centiment.{dataset,table}
let config = functions.config()
let datasetName = config.centiment.dataset || "centiment"
let tableName = config.centiment.table || "sentiments"
let bigquery = new BigQuery()
We can use the Firebase CLI to override the config variables that define our dataset & table names as needed via firebase functions:config:set centiment.dataset "centiment"
- useful if we want to change the destination table during a migration/copy.
let dataset = bigquery.dataset(datasetName)
dataset.exists().catch(err => {
console.error(
`dataset.exists: dataset ${datasetName} does not exist: ${JSON.stringify(
err
)}`
)
return err
})
let table = dataset.table(tableName)
table.exists().catch(err => {
console.error(
`table.exists: table ${tableName} does not exist: ${JSON.stringify(err)}`
)
return err
})
We check that the destination dataset & table exist - if they don’t, we return an error. In some cases you may want to create them on-the-fly, but here we expect that they exist with a specific schema.
let document = event.data.data()
document.id = event.data.id
let row = {
insertId: event.data.id,
json: {
id: event.data.id,
count: document.count,
fetchedAt: document.fetchedAt,
lastSeenID: document.lastSeenID,
score: document.score,
variance: document.variance,
stdDev: document.stdDev,
searchTerm: document.searchTerm,
query: document.query,
topic: document.topic,
},
}
The event.data.data()
method returns the current state of the Firestore document, which is what we want to insert. The previous state of the document can also be accessed via event.data.previous.data()
, which could be useful if we were logging specific deltas (say, a field changes by >= 10%) or otherwise tracking per-field changes within a document.
Note that we define an insertId
to prevent duplicate rows in the event the function fails to stream the data and has to retry. The insertId
is simply the auto-generated ID that Firestore provides, which is exactly what we want to de-duplicate a record on should it potentially be inserted twice, as our application treats Firestore as an append-only log. If we were expecting multiple writes to a record every minute, and wanted to stream those to BigQuery as distinct documents, we would need to use a different approach.
Beyond that, we compose an object with explicit columnName
<=> fieldName
mappings, based on our BigQuery schema. We don’t need every possible field from Firestore - only the ones we want to run analyses on. Further, since Firestore has a flexible schema, new fields added to our Firestore documents may not exist in our BigQuery schema.
The last part of our function is responsible for actually inserting the row into BigQuery: we call table.insert
and set raw: true
in the options, since we’re passing a row directly:
return table.insert(row, { raw: true }).catch(err => {
console.error(`table.insert: ${JSON.stringify(err)}`)
return err
})
As table.insert
is a Promise, we should return the Promise itself, which will either resolve (success) or reject (failure). Because we don’t need to do any post-processing in the success case, we only explicitly handle the rejection, logging the error and returning it to signal completion. Not returning the Promise would cause the function to return early, and potentially prevent execution or error handling of our table.insert
. Not good!
Deploying
Deploying our function is straightforward:
# Deploys our function by name
$ firebase deploy --only functions:sentimentsToBQ
=== Deploying to 'project-name'...
i deploying functions
i functions: ensuring necessary APIs are enabled...
✔ functions: all necessary APIs are enabled
i functions: preparing _functions directory for uploading...
i functions: packaged _functions (41.74 KB) for uploading
✔ functions: _functions folder uploaded successfully
i functions: current functions in project: sentimentsToBQ
i functions: uploading functions in project: sentimentsToBQ
i functions: updating function sentimentsToBQ...
✔ functions[sentimentsToBQ]: Successful update operation.
Deployment takes about 10 - 15 seconds, but I’d recommend using the local emulator to ensure the functions behaves as expected.
Querying in BigQuery
So how do we query our data? We use the BigQuery console or the bq
CLI. We’ll use the command line tool here, but the query is still the same:
bq query --nouse_legacy_sql 'SELECT * FROM `centiment.sentiments` ORDER BY fetchedAt LIMIT 5;'
Waiting on bqjob_r1af4578a67b94241_000001618c40385c_1 ... (1s)
Current status: DONE
+----------------------+---------+---------------------+-------+
| id | topic | score | count |
+----------------------+---------+---------------------+-------+
| PSux4gwOsHyUGqqdsdEI | bitcoin | 0.10515464281605692 | 97 |
| ug8Zm5sSZ2dtJXPIQWKj | bitcoin | 0.0653061231180113 | 98 |
| 63Qo2gRgsG7Cz2zywKOO | bitcoin | 0.09264705932753926 | 68 |
| Y5sraBzPrhBzsmOyHcm3 | bitcoin | 0.06601942062956613 | 103 |
| r3XApKXJ6feglUcyG1db | bitcoin | 0.13238095435358221 | 105 |
+----------------------+---------+---------------------+-------+
# Note that I've reduced the number of columns returned so it fits in the blog post
We can now see the results that we originally wrote to Firestore, and run aggregations, analyses and/or export them to other formats as needed.
The Code
For the record, here’s the full function as it is in production at the time of writing:
const functions = require("firebase-functions")
const BigQuery = require("@google-cloud/bigquery")
exports.sentimentsToBQ = functions.firestore
.document("/sentiments/{sentimentID}")
.onCreate(event => {
console.log(`new create event for document ID: ${event.data.id}`)
// Set via: firebase functions:config:set centiment.{dataset,table}
let config = functions.config()
let datasetName = config.centiment.dataset || "centiment"
let tableName = config.centiment.table || "sentiments"
let bigquery = new BigQuery()
let dataset = bigquery.dataset(datasetName)
dataset.exists().catch(err => {
console.error(
`dataset.exists: dataset ${datasetName} does not exist: ${JSON.stringify(
err
)}`
)
return err
})
let table = dataset.table(tableName)
table.exists().catch(err => {
console.error(
`table.exists: table ${tableName} does not exist: ${JSON.stringify(
err
)}`
)
return err
})
let document = event.data.data()
document.id = event.data.id
let row = {
insertId: event.data.id,
json: {
id: event.data.id,
count: document.count,
fetchedAt: document.fetchedAt,
lastSeenID: document.lastSeenID,
score: document.score,
variance: document.variance,
stdDev: document.stdDev,
searchTerm: document.searchTerm,
query: document.query,
topic: document.topic,
},
}
return table.insert(row, { raw: true }).catch(err => {
console.error(`table.insert: ${JSON.stringify(err)}`)
return err
})
})
•••
I recently put together a Windows machine for gaming, and although I still do most of my development on macOS due to a great third-party ecosystem, BSD underpinnings & better programming language support, I decided to see what development life was like on Windows in 2018.
As a spoiler: it’s not perfect, but it’s definitely usable day-to-day. If you’re developing applications that don’t rely on OS-level differences (e.g. not systems programming), you can certainly use a Windows + Windows Subsystem for Linux (WSL) as your only setup. If you’re working with container-based applications, then it becomes even more usable.
I’m going to walk through a setup that gets you up & running with a few staples, namely:
First Things First
You’ll need to enable and install the Windows for Linux Subsystem. Basic familarity with the Linux CLI is also useful here: although this is a step-by-step guide, knowing how to edit text files with vim
or nano
is going to be helpful.
Hyper (your terminal)
Hyper is a fairly new terminal application, and although it’s not as polished as the venerable iTerm on macOS, it gets the job done. It uses the same underpinnings as the integrated terminal in VSCode (xterm.js), which means it sees regular releases and bug-fixes.
Out of the box, Hyper will use the Windows command prompt (cmd.exe) or Powershell (powershell.exe). In order to have it use your WSL shell, you’ll need to make a quick adjustment.
In Hyper, head to Edit > Preferences and modify the following keys:
shell: 'wsl.exe',
// for setting shell arguments (i.e. for using interactive shellArgs: ['-i'])
// by default ['--login'] will be used
shellArgs: [],
Note that if you have multiple Linux distributions installed via WSL, and you don’t want Hyper to use your default, you can set the value for shell
to (e.g.) 'ubuntu.exe'
.
Hyper is extremely configurable, and the awesome-hyper repository over on GitHub includes a long list of themes, plugins and tweaks.
zsh + ohmyzsh (your shell)
We’re also going to set up zsh as our default shell, alongside Oh My Zsh for it’s built-ins, themes and plugins.
First, confirm that zsh
is available and installed (it should be, by default):
And then change your default shell to zsh:
~ chsh -s /usr/bin/zsh
# Enter your password, and hit enter
# Confirm the change
~ echo $SHELL
/usr/bin/zsh
We can now install oh-my-zsh -
# As per the instructions here: https://github.com/robbyrussell/oh-my-zsh#basic-installation
# NOTE: Don't just install any old program by piping a file into sh. Anything your user can do, the script can do. Make sure you at least trust the source of the script.
~ sh -c "$(curl -fsSL https://raw.githubusercontent.com/robbyrussell/oh-my-zsh/master/tools/install.sh)"
Once complete, you can begin tweaking things as per the README https://github.com/robbyrussell/oh-my-zsh#using-oh-my-zsh
tmux
tmux, if you’re not familiar, is a terminal multiplexer. Think of it as a way to run multiple shells quickly-and-easily, either in a grid-like fashion, or via a “tab” paradigm (or both). It’s extremely useful for multi-tasking: edit code or configs in one pane, watch results in another, and tail -f
a log in a third.
The tmux version (2.1) available under Ubuntu 16.04 is getting on, and thus we’ll be building our version (2.6, at the time of writing) from source.
# Fetch the latest version of tmux from this page - e.g.
curl -so tmux-2.6.tar.gz https://github.com/tmux/tmux/releases/download/2.6/tmux-2.6.tar.gz
# Unpack it
~ tar xvf tmux-2.6.tar.gz
~ cd tmux-2.6.tar.gz
# Install the dependencies we need
~ sudo apt-get install build-essential libevent-dev libncurses-dev
# Configure, make & install tmux itself
~ ./configure && make
~ sudo make install
# Confirm it works
~ tmux
We’ll also want zsh to create (or use an existing) tmux session if available, so that we’re always in tmux. Let’s modify .zshrc
to achieve that:
# open .zshrc in your preferred editor - e.g. vim
alias tmux="tmux -2 -u" │
if which tmux 2>&1 >/dev/null; then │
test -z "$TMUX" && (tmux attach || tmux new-session) │
fi
We’ll now make sure zsh uses this updated config:
Visual Studio Code
We have a standalone terminal w/ zsh + Oh My Zsh installed. Let’s make sure VSCode uses it for those times we’re using its integrated terminal. We’ll also want it to launch Hyper as our external terminal application, rather than cmd.exe or Powershell.
Open up VSCode’s preferences via File > Preferences > Settings (Ctrl+,) and update the following keys:
"terminal.external.windowsExec": "%userprofile%\\AppData\\Local\\hyper\\Hyper.exe",
"terminal.integrated.shell.windows": "wsl.exe"
Note: VSCode extensions that rely on background daemons or language servers to provide static analysis, formatting and other features will still use (require) the Windows-based version of these tools by default. There’s an open issue tracking this for Go, but it’s not a solved problem yet.
Docker
We’re also going to install Docker, via Docker for Windows (the daemon) and the Docker CLI (the client, effectively) within our WSL environment. This allows us to make use of Hyper-V and maintain good performance from our containerized applications, and avoid the minefield that is VirtualBox.
Once you’ve installed Docker for Windows—which may require rebooting to install Hyper-V, if not already enabled—you’ll also need to allow connections from legacy clients in the Docker settings. Check “Expose daemon on tcp://localhost:2375 without TLS”.
Note that this reduces the security of your setup slightly: other services already running on your machine could MitM connections between the Docker daemon. This does not expose the daemon to the local network, but there does not appear to be a way to retain TLS authentication between WSL and Docker for Windows yet.
# Install our dependencies
~ sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
# Add the Docker repository
~ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
~ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) edge"
~ sudo apt-get update
# Install Docker Community Edition
~ sudo apt-get install -y docker-ce
# Add your user to the Docker group
~ sudo usermod -aG docker $USER
We’ll also need to tell our Docker client (inside WSL) how to connect to our Docker daemon (Docker on Windows).
# Persist this to shell config
~ echo "export DOCKER_HOST=tcp://0.0.0.0:2375" >> $HOME/.zshrc
~ source ~/.zshrc
# Check that Docker can connect to the daemon (should not get an error)
~ docker images
If you see any errors about not being able to find the Docker host, make sure that Docker for Windows is running, that you’ve allowed legacy connections in settings, and that echo $DOCKER_HOST
correctly returns tcp://0.0.0.0:2375
in the same shell as you’re running the above commands in.
Now, let’s verify that you can run a container and connect to an exposed port:
~ docker run -d -p 8080:80 openresty/openresty:latest
4e0714050e8cc7feac0183a687840bdab67bbcc2dce21ae7170b52683a548de3
~ curl localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to OpenResty!</title>
...
Perfect!
Note: The guide by Nick Janetakis covers more of the details, including getting working mount points up-and-running.
What Else?
It’s worth noting that with Ubuntu 16.04.3 being an LTS release, software versions in the official repositories can be fairly out of date. If you’re relying on later versions of tools, you’ll need to either add their official package repositories (preferred; easier to track updates), install a binary build (good, but rarely self-updating), or build from source (slower, no automatic updates).
As additional tips:
- Yarn (the JS package manager) provides an official package repository, making it easy to keep it up-to-date.
- Ubuntu 16.04’s repositories only have Go 1.6 (3 versions behind as of Dec 2017), and thus you’ll need to download the binaries - keeping in mind you’ll need to manually manage updates to newer Go patch releases and major versions yourself.
- Similarly with Redis, 3.0.6 is available in the official repository. Redis 4.0 included some big changes (inc. the module system, eviction changes, etc) and thus you’ll need to build from source
This is reflective of my experience setting up WSL on Windows 10, and I’ll aim to keep it up-to-date as WSL improves over time—esp. around running later versions of Ubuntu. If you have questions or feedback, ping me on Twitter @elithrar.
•••
Update: I’ve updated the travis.yml
config to reflect Go 1.11.
GitHub has a great Releases feature that allows you surface—and users to download—tagged releases of your projects.
By default, Releases will provide links to a ZIP and a tarball of the source code for that tag. But for projects with binary releases, manually building and then uploading binaries (perhaps for multiple platforms!) is time-consuming and fragile. Making binary releases available automatically is great for the users of a project too: they can use it without having to deal with toolchains (e.g. installing Go) and environments. Making software usable by non-developer is an important goal for many projects.
We can use TravisCI + GitHub Releases to do all of the work for us with a fairly straightforward configuration, so let’s take a look at how to release Go binaries automatically.
Configuration
Here’s the full .travis.yml from a small utility library I wrote at my day job. This will:
- Always build on the latest Go version - “go: 1.x” and sets an env variable. We’ll use this to only build binaries using the latest Go version.
- Build as far back as 1.7
- Builds, but doesn’t fail the entire run, on “tip” (e.g. Go’s master branch, which breaks from time-to-time)
It then runs a fairly straightforward build script using Go’s existing tooling: gofmt (style), go vet (correctness), and then any tests with the race detector enabled.
The final step—and the reason why you’re probably reading this post!—is invoking gox to build binaries for Linux, Darwin (macOS) & Windows, and setting the “Rev” variable to the git commit it was built from. The latter is super useful for debugging or supporting users when combined with a –version command-line flag. We also only release on tagged commits via tags: true so that we’re only releasing binaries with intent. Tests are otherwise automatically run on every branch (inc. Pull Requests).
language: go
sudo: false
matrix:
include:
# "1.x" always refers to the latest Go version, inc. the patch release.
# e.g. "1.x" is 1.11 until 1.11.1 is available.
- go: 1.x
env: LATEST=true
- go: 1.7.x
- go: 1.8.x
- go: 1.9.x
- go: 1.10.x
- go: 1.11.x
- go: tip
allow_failures:
- go: tip
before_install:
# gox simplifies building for multiple architectures
- go get github.com/mitchellh/gox
install:
- # skip
script:
- go get -t -v ./...
- diff -u <(echo -n) <(gofmt -d .)
- go vet $(go list ./... | grep -v /vendor/)
- go test -v -race ./...
# Only build binaries from the latest Go release.
- if [ "${LATEST}" = "true" ]; then gox -os="linux darwin windows" -arch="amd64" -output="logshare.." -ldflags "-X main.Rev=`git rev-parse --short HEAD`" -verbose ./...; fi
deploy:
provider: releases
skip_cleanup: true
api_key:
# Your *encrypted* GitHub key, as the output of the Travis CI CLI tool.
secure: wHqq6Em56Dhkq4GHqdTXfNWB1NU2ixD0/z88Hu31MFXc+Huz5p6np0PUNBOvO9jSFpSzrSGFpsD5lkExAU9rBOI9owSRiEHpR1krIFbMmCboNqNr1uXxzxam9NWLgH8ltL2LNX3hp5teYnNpE4EhIDsGqORR4BrgXfH4eK7mvj/93kDRF2Wxt1slRh9VlxPSQEUxJ1iQNy3lbZ6U2+wouD8TaaJFgzPtueMyyIj2ASQdSlWMRyCVXJPKKgbRd5jLo2XHAWmmDb9mC8u8RS5QlB1klJjGCOl7gNC0KHYknHk6sUVpgIdnmszQBdVMlrZ6yToFDSFI28pj0PDmpb3KFfLauatyQ/bOfDoJFQQWgxyy30du89PawLmqeMoIXUQoA8IWF3nl/YhD+xsLCL1UH3kZdVZStwS/EhMcKqXBPn/AFi1Vbh7m+OMJAVvZp3xnFDe/H8tymczOWy4vDnyfXZQagLMsTouS/SosCFjjeL/Rdz6AEcQRq5bYAiQBhjVwlobNxZSMXWatNSaGz3z78dPEx9qfHnKixmBTacrJd6NlBhWH1kvg1c7TT2zlPxt6XTtsq7Ts/oKNF2iXXhw8HuzZv1idCiWfxobdajZE3EY+8akR060ktT4KEgRmCC/0h6ncPVT0Vaba1XZvbjlraol/p3tswXgGodPsKL87AgM=
file:
# The names of the binaries to output, based on the -output template passed to gox.
- logshare.windows.amd64.exe
- logshare.darwin.amd64
- logshare.linux.amd64
on:
# What to repository to build
repo: username/reponame
# Only build binaries for tagged commits
tags: true
condition: $LATEST = true
Note: It’s critical that you follow TravisCI’s documentation on how to securely encrypt your API key—e.g. don’t paste your raw key into this file, ever. TravisCI’s documentation and CLI tool make this straightforward.
Wrap
Pretty easy, right? If you’re already using Travis CI to test your Go projects, extending your configuration to release binaries on tagged versions is only a few minutes of work.
Further Reading