Data warehousing on Hadoop - one important DON'T and a few DOs

Quickie

Big Data & Analytics

Click here to save this talk in your agenda

Room 3

Wednesday at 13:10 - 13:25

At the same time, you can have your lunch and learn why you should attend my evening BOF session. Does it sound like a good deal?

The brave new world of Big Data has been around for a while and its tools have been successfully applied to solve different problems. But is it a silver bullet? Is it really completely new? Can you forget the old truths of design, architecture and project management?

The goal of the StraDa project is to gather the log files from hundreds of Roche diagnostic instruments spread around the world and transform these TBs of data into a data warehouse.

We built it, but it wasn't a straightforward task.

Please come to learn what was the biggest mistake we made and then come back at 19:30 for a full-blown session.

Marek Grzenkowicz
Marek Grzenkowicz uses Hadoop and Python to build BI solutions for Big Data. He works with datasets ranging from hyper-structured machine data to unstructured text data and changes the toolkit depending on the use case - he is comfortable with the Cloudera Hadoop ecosystem but he also develops custom solutions to run natural language processing algorithms at scale. He started working for Roche Global IT Solutions 5 years ago as an ETL specialist and became a full-stack Business Intelligence developer over time. All in all, he has more than 10 years of IT experience, with different technologies (VB6, .NET, SharePoint, SQL Server, PowerCenter, Tableau) and in different positions (developer, administrator, team lead).

Marek Grzenkowicz

Marek Grzenkowicz uses Hadoop and Python to build BI solutions for Big Data. He works with datasets ranging from hyper-structured machine data to unstructured text data and changes the toolkit depending on the use case - he is comfortable with the Cloudera Hadoop ecosystem but he also develops custom solutions to run natural language processing algorithms at scale.

He started working for Roche Global IT Solutions 5 years ago as an ETL specialist and became a full-stack Business Intelligence developer over time.

All in all, he has more than 10 years of IT experience, with different technologies (VB6, .NET, SharePoint, SQL Server, PowerCenter, Tableau) and in different positions (developer, administrator, team lead).