Pentaho Data Integration支持準備及融合數(shù)據(jù),為您的業(yè)務創(chuàng)建一幅完整的畫面,以進行更好的分析。完整的數(shù)據(jù)集成平臺為任何來源的終端用戶提供精確的,可實時分析的數(shù)據(jù)。由于可視化工具消除了編碼并減小了復雜度,Pentaho將大數(shù)據(jù)和所有的數(shù)據(jù)源放在了商業(yè)和IT用戶最容易獲得的位置。
Pentaho data integration prepares and blends data to create a complete picture of your business that drives actionable insights. The complete data integration platform delivers accurate, "analytics ready" data to end users from any source. With visual tools to eliminate coding and complexity, Pentaho puts Big Data and all data sources at the fingertips of business and IT users alike.
針對拖拽式開發(fā)的簡單可視化設計器
開發(fā)人員使用可視化工具能最大限度的縮減代碼,并且達到更高的效率。

拖拽可視化設計方法
- 圖形提取-轉(zhuǎn)換-加載(ETL)工具,以常規(guī)方式來加載和處理大數(shù)據(jù)源。
- 豐富的預建組件庫能訪問和轉(zhuǎn)換來自廣泛數(shù)據(jù)源的數(shù)據(jù)。
- 可視化界面調(diào)用自定義代碼,分析圖像和視頻文件以創(chuàng)建有意義的元數(shù)據(jù)。
- 動態(tài)轉(zhuǎn)換,使用變量決定映射域,驗證和改進規(guī)則。
- 集成調(diào)試器用以檢測和調(diào)試任務執(zhí)行過程。
零編碼要求的大數(shù)據(jù)集成
Pentaho直觀的工具加速了大數(shù)據(jù)分析方案的設計、開發(fā)和部署,速度提升了高達15倍。

大數(shù)據(jù)集成變得很容易
- 完整的可視化開發(fā)工具消除了SQL編碼或編寫MapReduce Java函數(shù)。
- 通過本地支持的Hadoop、NoSQL和分析數(shù)據(jù)庫可廣泛的鏈接到任何類型數(shù)據(jù)或數(shù)據(jù)源。
- 并行處理引擎確保高效的性能和企業(yè)可擴展性。
- 支持提取和融合現(xiàn)有的多元數(shù)據(jù),以生成高質(zhì)量的實時分析數(shù)據(jù)。
本地靈活支持所有大數(shù)據(jù)源
深層本地連接和自適應大數(shù)據(jù)數(shù)據(jù)層的結合,加速了對領先的Hadoop分布,NoSQL數(shù)據(jù)庫以及其他大數(shù)據(jù)源的訪問。

最廣泛和最深層次的大數(shù)據(jù)支持
- 支持從Cloudera,Hortonworks、MapR到Intel等最新的Hadoop分布。
- 包含針對Cassandra、MongoDB等NoSQL數(shù)據(jù)庫的插件,也可以連接到Amazon Redshift和 Splunk等專業(yè)的數(shù)據(jù)商店。
- 當使用新的版本和功能時,自適應大數(shù)據(jù)層為企業(yè)節(jié)省了大量的開發(fā)時間。
- 高度的靈活性,降低了大數(shù)據(jù)體系變化所帶來的風險和孤立區(qū)。
- 反饋和分析增加的用戶和機器數(shù)據(jù)的數(shù)量,包括網(wǎng)頁內(nèi)容、文檔、社交媒體和日志文件。
- 通過靈活的集群分布,可以將Hadoop數(shù)據(jù)任務集成到全面的IT/ETL/BI解決方案中。
- 支持并行批量數(shù)據(jù)加載工具,以高效的加載數(shù)據(jù)。
強大的管理
包含簡單實時可用的功能,可完成大數(shù)據(jù)集成項目等相關操作。

易于使用的進度管理
- 管理用戶和任務的安全權限。
- 從最近成功檢查點上重啟任務,并從當前失敗中回滾作業(yè)執(zhí)行。
- 集成了LDAP和Active Directory中現(xiàn)有的的安全術語。
- 設置用戶的操作權限: 讀取、執(zhí)行或創(chuàng)建。
- 進度數(shù)據(jù)集成過程實現(xiàn)了有序的流程管理。
- 監(jiān)測和分析數(shù)據(jù)集成處理的性能。
數(shù)據(jù)剖析數(shù)據(jù)質(zhì)量信息
剖析數(shù)據(jù),并結合完整的數(shù)據(jù)管理功能保證了數(shù)據(jù)的質(zhì)量。

數(shù)據(jù)質(zhì)量管理
- 識別不遵守商業(yè)規(guī)則和標準的數(shù)據(jù)。
- 規(guī)范、驗證和清除不一致的或冗余的數(shù)據(jù)。
- 借助人類推理和Melissa數(shù)據(jù)進行數(shù)據(jù)質(zhì)量管理。

Simple Visual Designer for Drag and Drop Development
Empower developers with visual tools to minimize coding and achieve greater productivity.

Drag and Drop Visual Design Approach
- Graphical extract-transform-load (ETL) tool to load and process big data sources in familiar ways.
- Rich library of pre-built components to access and transform data from a full spectrum of sources.
- Visual interface to call custom code, analyze images and video files to create meaningful metadata.
- Dynamic transformations, using variables to determine field mappings, validation and enrichment rules.
- Integrated debugger for testing and tuning job execution.
Big Data Integration with Zero-Coding Required
Pentaho's intuitive tools accelerate the time it takes to design, develop and deploy big data analytics by as much as 15x.

Big Data Integration made easy
- Complete visual development tools eliminate coding in SQL or writing MapReduce Java functions.
- Broad connectivity to any type or source of data with native support for Hadoop, NoSQL and analytic databases.
- Parallel processing engine to ensure high performance and enterprise scalability.
- Extract and blend existing and diverse data to produce consistent high quality ready-to-analyze data.
Native and Flexible Support for all Big Data Sources
A combination of deep native connections and an adaptive big data data layer ensures accelerated access to the leading Hadoop distributions, NoSQL databases, and other big data stores.

Broadest and Deepest Big Data Support
- Support for latest Hadoop distributions from Cloudera, Hortonworks, MapR and Intel.
- Simple plugins to NoSQL databases such as Cassandra and MongoDB, as well as connections to specialized data stores like Amazon Redshift and Splunk.
- Adaptive big data layer saves enterprises considerable development time as they leverage new versions and capabilities.
- Greater flexibility, reduced risk, and insulation from changes in the big data ecosystem.
- Reporting and analysis on growing amounts of user and machine generated data, including web content, documents, social media and log files.
- Integration of Hadoop data tasks into overall IT/ETL/BI solutions with scalable distribution across the cluster.
- Support for parallel bulk data loader utilities for loading data with maximum performance.
Powerful Administration and Management
Simplified out-of-the-box capabilities to manage the operations in a data integration project.

Easy to Use Schedule Management
- Manage security privileges for users and roles.
- Restart jobs from last successful checkpoint and roll back job execution on failure.
- Integrate with existing security definitions in LDAP and Active Directory.
- Set permissions to control user actions: read, execute or create.
- Schedule data integration flows for organized process management.
- Monitor and analyze the performance of data integration processes.
Data Profiling and Data Quality
Profile data and ensure data quality with comprehensive capabilities for data managers.

Data Quality Management
- Identify data that fails to comply with business rules and standards.
- Standardize, validate, de-duplicate and cleanse inconsistent or redundant data.
- Manage data quality with partners such as Human Inference and Melissa Data.