Thursday, 15 February 2018

Code Snippet to clear Talend Hash Components


Instantiate thashinput: 

org.talend.designer.components.hashfile.common.MapHashFile mf_tHashInput_2 = org.talend.designer.components.hashfile.common.MapHashFile.getMapHashFile();


Clear the Cache based on PID:

mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_1");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_10");

mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_2");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_23");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_24");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_25");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_26");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_27");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_28");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_29");

Saturday, 27 January 2018

Convert Long into Integer

How to convert a Long value into an Integer value in Java?

Integer i = (int) (long) theLong;


Tuesday, 7 November 2017

Merging the Content of Files Using TUnite

Scenario: Iterate on files and merge the content

The following Job iterates on a list of files then merges their content and displays the final 2-column content on the console.

Dropping and linking the components

  1. Drop the following components onto the design workspace: tFileListtFileInputDelimitedtUnite and tLogRow.

  2. Connect the tFileList to the tFileInputDelimited using an Iterate connection and connect the other component using a row main link.

Configuring the components

  1. In the tFileList Basic settings view, browse to the directory, where the files to merge are stored.

    The files are pretty basic and contain a list of countries and their respective score.

  2. In the Case Sensitive field, select Yes to consider the letter case.

  3. Select the tFileInputDelimited component, and display this component's Basic settings view.

  4. Fill in the File Name/Stream field by using the Ctrl+Space bar combination to access the variable completion list, and selecting tFileList.CURRENT_FILEPATH from the global variable list to process all files from the directory defined in the tFileList.

  5. Click the Edit Schema button and set manually the 2-column schema to reflect the input files' content.

    For this example, the 2 columns are Country and Points. They are both nullable. The Country column is of String type and the Points column is of Integer type.

  6. Click OK to validate the setting and accept to propagate the schema throughout the Job.

  7. Then select the tUnite component and display the Component view. Notice that the output schema strictly reflects the input schema and is read-only.

  8. In the Basic settings view of tLogRow, select the Table option to display properly the output values.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6, or click Run on the Run console to execute the Job.

    The console shows the data from the various files, merged into one single table.

Monday, 30 October 2017

About sharing Talend development w/ version control


Per http://www.talendforge.org/forum/viewtopic.php?id=6876 (dated Jun 2009)

Q: Is it possible to configure Talend Open Studio with a repository on a shared drive??

A1: No, TOS can only be used on local machine. To share the project, you can use talend Integration Suite, which is the subscription extension of TOS, support multiple users, groups, specific access for these users, specifics privileges etc... For more info, please access our website.

A2: If your drive is mounted, you can try to change the workspace location by adding the "data" option in the shortcut : "C:\TOS\TOS-win32-x86.exe -data Z:\mySharedWorkspace"
This remote workspace will be locked by the first person who use it. So you won't be able to work at the same time with a coworker on this workspace : this option is only available in TIS.

Per http://stackoverflow.com/questions/4604823/talend-project-in-svn (dated Jan 2011)

Q: I need to use Talend Open Studio with a SVN. What is the folder that I should put in the repository?

A1: Well, the "official" answer is to upgrade to Talend Integration Suite (TIS), which includes SVN integration. It goes a long way to synchronizing the activities of several developers using a shared repository.

If TIS is not an option for you, you might have some success by putting the project directory of your Talend Open Studio (TOS) under SVN control. This is the directory immediately under "workspace" that has the same name as your project. You would have to perform all SVN operations, such as commit and update, manually on this directory, for example using Tortoise (if you're on Windows). TOS might be able to use this project as if it was not under SVN. I personally haven't tried this, and would recommend using TIS instead.

A2: I prepared the following for my team. http://www.dariopardo.com/talend-open-studio/svnintegration/ It basically involves doing what was said above. (note – this one uses Subversion repository w/ TortoiseSVN client)

A3: I faced a similar issue an wrote a blog about it, see Putting Talend Open Studio projects under version control. Basically you should version your entire workspace. (note – this one uses Git)

A4: Putting a TOS workspace under SVN is strongly discouraged, as CSV/SVN as file-based by nature while TOS workspace is directory-based. The probability that you incur in workspace corruption are very high.

However I achieved good results putting in under git, instead (have a look here for an idea). I must say it's not elegant and putting the entire workspace under version control is a tremendous waste of space (you will commit compiled files, logs, history, temp...), but it's the only safe solution for your metadata integrity.

Be aware! In *<workspace_root>/.java* subtree there are external libraries and the classpath files for each job in your project you executed at least one time. These paths are absolute. This mean that if you plan to use your version-controlled workspace in a shared environment anyone in team must place the workspace in the absolute location in their local file system (ie. c:/talend_git/workspace). Otherwise you'll get a class not found exception upon executing jobs. Sad but true.

So, my conclusion --- Talend version control for shared development is possible but unless we need to collaborate on Talend jobs frequently and are very skilled in SVN/Git, it can be tricky to incorporate all team members' Talend projects in a repository and once in a while take over for continuing development. The learning curve to incorporate Talend w/ version control software is likely high. As mentioned for answer A4 above, the chances of corrupted file is very high, and I know that can ruin the whole Talend project rather than just one Talend job).

What's the alternative --- just have a repository so everyone can export his/her own projects (or jobs) for others to grab as needed. We have /home/developers/Talend-jobs on dev.berkeley.edu (pahma-dev) we can use, or setup account on code@berkeley.edu (which I have difficulty learning it previously), or maybe the easiest is use Berkeley's "box".

Friday, 20 October 2017

Cycle Handling in PostGreSQL

Below is the way to handle Cycle data in Postgresql database

 CREATE TABLE FOLDER(ID INT , NAME VARCHAR(255), PARENT INT);

INSERT INTO FOLDER VALUES(1, '/', null);
INSERT INTO FOLDER VALUES(2, 'src', 1);
INSERT INTO FOLDER VALUES(3, 'main', 2);
INSERT INTO FOLDER VALUES(4, 'org', 3);
INSERT INTO FOLDER VALUES(5, 'test', 2);


With RECURSIVE CTE(Id,NAME,Parent,Pathh,path, cycle)
as
(
Select Id,NAME,parent, cast(Id  as varchar(30)) pathh,ARRAY[Folder.Id] path,false from Folder
where parent is null
union all
Select 
F.id,F.name,F.Parent,cast(F.Id||'-->'||C.Pathh as varchar(30))  pathh,
C.Path||F.id path,F.id = any(path) as cycle
from 
Folder F,CTE C
where F.Parent=C.ID  and not cycle
)
Select C.* from CTE C
order by 1