Last Tuesday we announced
an exciting set of changes to Google BigQuery making your
experience easier, faster and more powerful. In addition to new features and improvements like
table wildcard functions, views, and parallel exports, BigQuery now features increased
streaming capacity, lower pricing, and more.
1000x increase in streaming capacity
Last September we
announced the ability to stream data into BigQuery for instant analysis, with an
ingestion limit of 100 rows per second. While developers have enjoyed and exploited this
capability, they've asked for more capacity. You now can stream up to 100,000 rows per second,
per table into BigQuery - 1,000 times more than before.
Users often partition their big tables into smaller units for data lifecycle and optimization
purposes. For example, instead of having yearly tables, they could be split into monthly or
even daily sets. BigQuery now offers table wildcard functions to help easily query tables that
match common parameters.
The downside of partitioning tables is writing queries that need to access multiple tables.
This would be easier if there was a way to tell BigQuery "process all the tables between March
3rd and March 25th" or "read every table which names start with an 'a'". You can do this with
this release.
TABLE_DATE_RANGE()
queries all tables that overlap with a time range (based on the table names), while
TABLE_QUERY() accepts regular expressions to select the tables to analyze.
BigQuery has adopted SQL as its query language because it's one of the most well known, simple
and powerful ways to analyze data. Nevertheless BigQuery used to impose some restrictions on
traditional SQL-92, like having to write multiple sub-queries instead of simpler multi-joins.
Not anymore, now BigQuery supports multi-join and CROSS JOIN, and improves its SQL
capabilities with more flexible alias support, fewer ORDER BY restrictions, more window
functions, smarter PARTITION BY, and more.
A notable new feature is the ability to save queries as views, and use them as building blocks
for more complex queries. To define a view, you can use the browser tool to save a query, the
API, or the newest version of the BigQuery command-line tool (by downloading the Google Cloud SDK).
User-defined metadata
Now you can annotate each dataset, table, and field with descriptions that are displayed
within BigQuery. This way people you share your datasets with will have an easier time
identifying them.
JSON parsing functions
BigQuery is optimized for structured data: before loading data into BigQuery, you should first
define a table with the right columns. This is not always easy, as JSON schemas might be
flexible and in constant flux. BigQuery now lets you store JSON encoded objects into string
fields, and you can use the JSON_EXTRACT
and JSON_EXTRACT_SCALAR
functions to easily parse them later using JSONPath-like
expressions.
BigQuery is a great place to store all your data and have it ready for instant analysis using
SQL queries. But sometimes SQL is not enough, and you might want to analyze your data with
external tools. That's why we developed the new fast parallel exports: With this feature, you
can define how many workers will be consuming the data, and BigQuery exports the data to
multiple files optimized for the available number of workers.
Check the exporting data
documentation, or stay tuned for the upcoming Hadoop connector to BigQuery
documentation.
Massive price reductions
At Cloud Platform live, we announced a massive price reduction: Storage
costs are going down 68%, from 8 cents per gigabyte per month to only 2.6, while querying
costs are going down 85%, from 3.5 cents per gigabyte to only 0.5. Previously announced
streaming costs are now reduced by 90%. And finally, we announced the ability to purchase
reserved processing capacity, for even cheaper prices and the ability to precisely predict
costs. And you always have the option to burst using on-demand capacity.