PBI - Kimiafarma CCV 1

Rakamin Academy
27 May 202507:35

Summary

TLDRIn this video, Rihan, a Data Scientist and Data Management Supervisor at Kimia Farma, introduces key concepts in data development, focusing on the differences between data lakes, data warehouses, and data marts. He explains how each repository serves different purposes in managing raw, cleansed, and specific-use data. The video also delves into the importance of transforming and structuring data for analytics, with a detailed explanation of star and snowflake schemas. Rihan also shares practical steps for creating a data mart, emphasizing collaboration with business users and careful data requirement analysis.

Takeaways

  • 😀 Data lakes store raw, unprocessed data from various sources, both internal (e.g., company databases) and external (e.g., weather, demographics).
  • 😀 Data warehouses store clean, processed data from multiple sources, organized by specific use cases for analysis.
  • 😀 Data marts are subsets of data warehouses, focused on providing data for specific teams or departments.
  • 😀 Data lakes do not undergo transformation before storage, while data warehouses are processed through cleansing and aggregation for specific use cases.
  • 😀 Data marts involve further aggregation and grouping from data warehouses to serve particular business needs.
  • 😀 Data lakes are typically accessed by core big data teams, while data warehouses are used by both big data teams and departmental data analysts.
  • 😀 Data marts are more accessible to specific departmental data analysts or users who need focused, specific data.
  • 😀 In the data flow, raw data enters the data lake, undergoes transformation, and is then organized into a data warehouse. Data marts are created from data warehouses for more specific uses.
  • 😀 Star schema and snowflake schema are common ways of structuring data in data warehouses: Star schema has a central transactional table surrounded by dimension tables, while snowflake schema involves normalized dimension tables.
  • 😀 When creating a data mart, it's essential to work closely with business users to understand data requirements and objectives for visualization or reporting.
  • 😀 Collaboration between data analysts, system analysts, and data engineers is crucial to identify the right data sources and ensure the data is properly integrated into the system.

Q & A

  • What is the main purpose of the video script?

    -The main purpose of the video is to explain concepts related to data development, focusing on the differences between data lake, data warehouse, and data mart, as well as their application in Kimia Farma's data management process.

  • What is a data lake and how does it differ from a data warehouse?

    -A data lake is a storage repository for raw data from various sources, often unprocessed and uncleaned. In contrast, a data warehouse stores cleaned and processed data, organized for analysis based on specific use cases.

  • How is data in a data lake typically transformed?

    -Data in a data lake is generally not transformed. It is stored as-is from the source, maintaining its raw form without any cleansing or processing.

  • What is the role of a data mart in the data management process?

    -A data mart is a subset of a data warehouse, designed for specific teams or departments, providing data tailored to particular needs or use cases.

  • Who typically has access to the different data repositories?

    -Data lakes are mainly accessed by big data teams due to the raw nature of the data. Data warehouses are accessible to big data teams and data analysts across departments, while data marts are accessible to data analysts and users needing specific data for their work.

  • What are the main differences between the star schema and snowflake schema?

    -In the star schema, a central transactional table is surrounded by dimensional tables, with a denormalized structure, making it simple to query and understand. In the snowflake schema, dimensional tables are further normalized, leading to less redundancy and more efficient storage, though the structure is more complex.

  • What is the benefit of using a star schema?

    -The star schema offers benefits like simplicity in querying, clarity in relational structure, and ease of adjustments when adding new columns or data.

  • Why might an organization choose a snowflake schema over a star schema?

    -An organization might choose a snowflake schema for its reduced storage requirements and absence of redundancy, which can simplify maintenance in large data environments.

  • What are the steps involved in creating a data mart?

    -The steps include defining data requirements with business users, checking the source data, ensuring it exists in the data warehouse, analyzing if one or multiple tables are needed, and finally designing the appropriate schema (star or snowflake) for implementation.

  • What is the importance of collaboration between data analysts and other teams during the data mart creation process?

    -Collaboration ensures the data mart accurately reflects business needs, that the right data is used, and that it is stored in the appropriate format. It helps align technical and business requirements for successful implementation.

Outlines

plate

هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.

قم بالترقية الآن

Mindmap

plate

هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.

قم بالترقية الآن

Keywords

plate

هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.

قم بالترقية الآن

Highlights

plate

هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.

قم بالترقية الآن

Transcripts

plate

هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.

قم بالترقية الآن
Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Data ScienceData ManagementData LakesData WarehouseData MartsKimia FarmaBig DataBusiness IntelligenceData AnalysisStar Schema
هل تحتاج إلى تلخيص باللغة الإنجليزية؟