Tuesday, 7 November 2017

Merging the Content of Files Using TUnite

Scenario: Iterate on files and merge the content

The following Job iterates on a list of files then merges their content and displays the final 2-column content on the console.

Dropping and linking the components

  1. Drop the following components onto the design workspace: tFileListtFileInputDelimitedtUnite and tLogRow.

  2. Connect the tFileList to the tFileInputDelimited using an Iterate connection and connect the other component using a row main link.

Configuring the components

  1. In the tFileList Basic settings view, browse to the directory, where the files to merge are stored.

    The files are pretty basic and contain a list of countries and their respective score.

  2. In the Case Sensitive field, select Yes to consider the letter case.

  3. Select the tFileInputDelimited component, and display this component's Basic settings view.

  4. Fill in the File Name/Stream field by using the Ctrl+Space bar combination to access the variable completion list, and selecting tFileList.CURRENT_FILEPATH from the global variable list to process all files from the directory defined in the tFileList.

  5. Click the Edit Schema button and set manually the 2-column schema to reflect the input files' content.

    For this example, the 2 columns are Country and Points. They are both nullable. The Country column is of String type and the Points column is of Integer type.

  6. Click OK to validate the setting and accept to propagate the schema throughout the Job.

  7. Then select the tUnite component and display the Component view. Notice that the output schema strictly reflects the input schema and is read-only.

  8. In the Basic settings view of tLogRow, select the Table option to display properly the output values.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6, or click Run on the Run console to execute the Job.

    The console shows the data from the various files, merged into one single table.

Monday, 30 October 2017

About sharing Talend development w/ version control


Per http://www.talendforge.org/forum/viewtopic.php?id=6876 (dated Jun 2009)

Q: Is it possible to configure Talend Open Studio with a repository on a shared drive??

A1: No, TOS can only be used on local machine. To share the project, you can use talend Integration Suite, which is the subscription extension of TOS, support multiple users, groups, specific access for these users, specifics privileges etc... For more info, please access our website.

A2: If your drive is mounted, you can try to change the workspace location by adding the "data" option in the shortcut : "C:\TOS\TOS-win32-x86.exe -data Z:\mySharedWorkspace"
This remote workspace will be locked by the first person who use it. So you won't be able to work at the same time with a coworker on this workspace : this option is only available in TIS.

Per http://stackoverflow.com/questions/4604823/talend-project-in-svn (dated Jan 2011)

Q: I need to use Talend Open Studio with a SVN. What is the folder that I should put in the repository?

A1: Well, the "official" answer is to upgrade to Talend Integration Suite (TIS), which includes SVN integration. It goes a long way to synchronizing the activities of several developers using a shared repository.

If TIS is not an option for you, you might have some success by putting the project directory of your Talend Open Studio (TOS) under SVN control. This is the directory immediately under "workspace" that has the same name as your project. You would have to perform all SVN operations, such as commit and update, manually on this directory, for example using Tortoise (if you're on Windows). TOS might be able to use this project as if it was not under SVN. I personally haven't tried this, and would recommend using TIS instead.

A2: I prepared the following for my team. http://www.dariopardo.com/talend-open-studio/svnintegration/ It basically involves doing what was said above. (note – this one uses Subversion repository w/ TortoiseSVN client)

A3: I faced a similar issue an wrote a blog about it, see Putting Talend Open Studio projects under version control. Basically you should version your entire workspace. (note – this one uses Git)

A4: Putting a TOS workspace under SVN is strongly discouraged, as CSV/SVN as file-based by nature while TOS workspace is directory-based. The probability that you incur in workspace corruption are very high.

However I achieved good results putting in under git, instead (have a look here for an idea). I must say it's not elegant and putting the entire workspace under version control is a tremendous waste of space (you will commit compiled files, logs, history, temp...), but it's the only safe solution for your metadata integrity.

Be aware! In *<workspace_root>/.java* subtree there are external libraries and the classpath files for each job in your project you executed at least one time. These paths are absolute. This mean that if you plan to use your version-controlled workspace in a shared environment anyone in team must place the workspace in the absolute location in their local file system (ie. c:/talend_git/workspace). Otherwise you'll get a class not found exception upon executing jobs. Sad but true.

So, my conclusion --- Talend version control for shared development is possible but unless we need to collaborate on Talend jobs frequently and are very skilled in SVN/Git, it can be tricky to incorporate all team members' Talend projects in a repository and once in a while take over for continuing development. The learning curve to incorporate Talend w/ version control software is likely high. As mentioned for answer A4 above, the chances of corrupted file is very high, and I know that can ruin the whole Talend project rather than just one Talend job).

What's the alternative --- just have a repository so everyone can export his/her own projects (or jobs) for others to grab as needed. We have /home/developers/Talend-jobs on dev.berkeley.edu (pahma-dev) we can use, or setup account on code@berkeley.edu (which I have difficulty learning it previously), or maybe the easiest is use Berkeley's "box".

Friday, 20 October 2017

Cycle Handling in PostGreSQL

Below is the way to handle Cycle data in Postgresql database

 CREATE TABLE FOLDER(ID INT , NAME VARCHAR(255), PARENT INT);

INSERT INTO FOLDER VALUES(1, '/', null);
INSERT INTO FOLDER VALUES(2, 'src', 1);
INSERT INTO FOLDER VALUES(3, 'main', 2);
INSERT INTO FOLDER VALUES(4, 'org', 3);
INSERT INTO FOLDER VALUES(5, 'test', 2);


With RECURSIVE CTE(Id,NAME,Parent,Pathh,path, cycle)
as
(
Select Id,NAME,parent, cast(Id  as varchar(30)) pathh,ARRAY[Folder.Id] path,false from Folder
where parent is null
union all
Select 
F.id,F.name,F.Parent,cast(F.Id||'-->'||C.Pathh as varchar(30))  pathh,
C.Path||F.id path,F.id = any(path) as cycle
from 
Folder F,CTE C
where F.Parent=C.ID  and not cycle
)
Select C.* from CTE C
order by 1 


Friday, 13 October 2017

ORACLE PLSQL Exception Handling



SQL> create table a (x number (10), y number(10);
Table created.

SQL> insert into a values(1, 5);
SQL> insert into a values(2, 10);
SQL> insert into a values(3, 15);
SQL> insert into a values(3, 0);
SQL> insert into a values(4, 20);
SQL> insert into a values(5, 25);
SQL> commit;
SQL> select x, y from a order by x;

X Y
---------- ----------
1 5
2 10
3 15
3 0
4 20
5 25
SQL>
declare
v_temp number:= 30;
v_tot number;
v_x number;
v_y number;
cursor c_temp is
select x, y from a order by x;
begin
open c_temp;
loop
fetch c_temp into v_x, v_y;
exit when c_temp%notfound;
v_tot:=v_x+v_temp/v_y;
dbms_output.put_line('Total is :'||v_tot);
end loop;
close c_temp;
exception
when others then
dbms_output.put_line('Found Oracle error: '||SQLERRM);
end;
/

Total is :7
Total is :5
Total is :5
Found Oracle error: ORA-01476: divisor is equal to zero

PL/SQL procedure successfully completed.

Only Partial Records are Printed. 

Below Code Prints : Complete Numbers 

declare
  v_temp number:= 30;
  v_tot number;
  v_x number;
  v_y number;
  cursor c_temp is
    select x, y from a order by x;
begin
  open c_temp;
  loop
    fetch c_temp into v_x, v_y;
    exit when c_temp%notfound;
    begin
      v_tot:=v_x+v_temp/v_y;
      dbms_output.put_line('Total is :'||v_tot);
    exception
      when others then
      dbms_output.put_line('Found Oracle error: '||SQLERRM);
    end;
  end loop;
  close c_temp;
end;

Thursday, 12 October 2017

PLSQL / Works Cursors with Exception Handling

Simple Stored Procedure to Load the Cursor Dynamically:

create or replace procedure asf_cursor is 
TYPE EmpCurTyp  IS REF CURSOR;
c2  EmpCurTyp;
cursor C1 is (select 1 a from dual union all select 2 a from dual);
dynamic_query varchar2(1000);
i int;
var1 varchar(100);
BEGIN
open c1;
loop
fetch c1 into i;
EXIT WHEN c1%NOTFOUND;
dynamic_query:= 'select '''||'~'||i||'''  from dual';
open c2 for dynamic_query;
  LOOP
    FETCH c2 INTO var1;
    EXIT WHEN c2%NOTFOUND;
  END LOOP;
close c2;
DBMS_OUTPUT.PUT_LINE(var1);
end loop;
close c1;
END; 

set serveroutput on 
execute asf_cursor;

 

Exception Handling for Table Not Found:

-- set serveroutput on;
DECLARE 
TYPE EmpCurTyp  IS REF CURSOR;
c2  EmpCurTyp;
var1 varchar2(30);
drop_tab varchar2(100):='Select * from duall union all Select cast(2 as varchar(5)) from duall';
clob_out clob;
BEGIN
/*
open c2 for select * from duall;
LOOP
    FETCH c2 INTO var1;
    EXIT WHEN c2%NOTFOUND;
    dbms_output.put_line(var1);
    close c2;
  END LOOP;
  */
  open c2 for drop_tab;
  loop
  fetch c2 into var1;
   EXIT WHEN c2%NOTFOUND;
   clob_out:=clob_out||var1;
   end loop;
   close c2;
   dbms_output.put_line(clob_out);
  EXCEPTION
WHEN OTHERS THEN 
IF (SQLCODE = -942) THEN
DBMS_OUTPUT.PUT_LINE('Table Not Found');
clob_out:='~'||null;
dbms_output.put_line(clob_out);
END IF;
END;

Tuesday, 10 October 2017

Setting Different Java Version for Eclipse

  • Open eclipse config file eclipse.ini in your Eclipse folder.
  • Add -vm yourPath\Java\jre6\bin\javaw.exe like:

    -startup  plugins/org.eclipse.equinox.launcher_1.3.0.v20120522-1813.jar  --launcher.library  plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.200.v20120522-1813  -product  org.eclipse.epp.package.java.product  --launcher.defaultAction  openFile  --launcher.XXMaxPermSize  256M  -vm   C:\Java\jre6\bin\javaw.exe  ...

Friday, 6 October 2017

NT Authentication In Talend

The solution is to place ntlmauth.dll in windows/system32 (or sysWow64) folder. but instead of placing it in system32 folder, we can place it in our own local folder and loaded by our talend jobs while establishing a connection to MS SQL Server.


Design time configuration:

1. Copy ntlmauth.dll to bin folder inside talend installed location. bin folder wont be present, create your own.

2. While executing the job, go to Advanced settings and select "Use Specific JVM arguments" and add the following additional argument

-Djava.library.path=./bin

3. Execute the talend job, it will use windows authentication

Inline image 1


Runtime configuration

1. While creating the package, ensure "bin" folder is also bundled along with ntlmauth.dll
2. Edit the .bat or .sh file and add the following entry

-Djava.library.path=./bin


Inline image 2

3. Execute the batch file, the job should run without any SSO error.