EOG-2114 Data warehousing on Hadoop - after a few months in production | Devoxx

Data warehousing on Hadoop - after a few months in production

BOF (Bird of a Feather)

bigd Big Data & Analytics Click here to save this talk in your agenda

Room 3

Wednesday at 19:30 - 20:30

Medical laboratory instruments produce immense volumes of log files. Until recently, they were used only for trouble shooting and maintenance purposes. However, hidden inside are insights that could allow the laboratory managers to streamline and optimize the diagnostic process.

The goal of the StraDa project is to gather the log files from hundreds of Roche diagnostic instruments spread around the world, transform these TBs of data into actionable information and make it available for the business users.

Yet another data warehouse? Sounds reasonably easy? Well, so we thought. And we were wrong.

Please come to learn about some of the mistakes we made and problems we encountered, so you can avoid them.

I will have slides, but I don’t need to follow them. Ask questions! Challenge me! Share you own experience!

Marek Grzenkowicz Marek Grzenkowicz

Marek Grzenkowicz uses Hadoop and Python to build BI solutions for Big Data. He works with datasets ranging from hyper-structured machine data to unstructured text data and changes the toolkit depending on the use case - he is comfortable with the Cloudera Hadoop ecosystem but he also develops custom solutions to run natural language processing algorithms at scale.

He started working for Roche Global IT Solutions 5 years ago as an ETL specialist and became a full-stack Business Intelligence developer over time.

All in all, he has more than 10 years of IT experience, with different technologies (VB6, .NET, SharePoint, SQL Server, PowerCenter, Tableau) and in different positions (developer, administrator, team lead).