Following my habit of using issues as feature requests.
I came across this commit for longest common subsequence and I would love to see this pushed into the string library or other basic text processing library with some of the functionality from python's difflib specifically SequenceMatcher()
Someone who reviewed the code in question (@ben-albrecht, @benharsh?) should correct me if I'm wrong, but I think the implementation in that test is exponential in nature (i.e., will start to take forever quickly as string sizes grow) and that we'd probably want to see a dynamic programming implementation before putting it into a blessed library.
blessed? So mote it be... Nice theming!
@bradcray - That sounds right to me. It probably wouldn't be too hard to write a dynamic programming implementation.
For anyone interested in this good first issue, here's a Chapel implementation derived from a python implementation to start:
proc lcs(a: string, b: string): string {
var lengths: [1..a.size+1, 1..b.size+1] int;
for (i, x) in zip(1..a.size, a) {
for (j, y) in zip(1..b.size, b) {
if x == y {
lengths[i+1, j+1] = lengths[i, j] + 1;
} else {
lengths[i+1, j+1] = max(lengths[i+1, j], lengths[i, j+1]);
}
}
}
var result = '';
var x = a.size+1,
y = b.size+1;
while x != 1 && y != 1 {
if lengths[x, y] == lengths[x-1, y] then x -= 1;
else if lengths[x, y] == lengths[x, y-1] then y -= 1;
else {
assert(a[x-1] == b[y-1]);
result = a[x-1] + result;
x -= 1;
y -= 1;
}
}
return result;
}
// Tests
writeln(lcs('1234', '1224533324')); // 1234
writeln(lcs('thisisatest', 'testing123testing')); // tsitest
I'm not sure what standard module this would fit into - maybe it ought to live in a mason package to begin with.
I wrote something like this a long time ago. test/trivial/diten/lcs/lcs.chpl. It could probably use some cleaning up and modernization if we wanted to make it an official thing.
I tried writing code for common subsequence whose algorithm looks similar to @ben-albrecht . Kindly take a look at it .
`proc input_string(out str1: string, out str2: string)
{
var infile=open(filename, iomode.r).reader();
infile.read(str1);
infile.read(str2);
infile.close();
}
proc lcs ( str1:string,str2:string,len1,len2)
{
var L:[0..len1,0..len2] int;
for i in 0..len1
{
for j in 0..len2
{
if i==0 or j==0
{
L[i,j]=0;
}
elif str1[i-1]==str2[j-1]
{
L[i,j]=L[i-1,j-1]+1 ;
}
else
{
L[i][j] = max(L[i-1][j], L[i][j-1])
}
}
}
var index : int = L[len1][len2];
var lcs[index+1] : string;
lcs[index] = '';
var m:int = len1;
var n:int = len2;
while m > 0 and n > 0
{
if str1[m-1] == str2[n-1])
{
lcs[index-1] = str1[i-1];
m-=1; n-=1; index-=1;
}
elif (L[m-1][n] > L[m][n-1])
m-=1;
else
n-=1;
}
return lcs;
}
proc main()
{
config var filename="commonsubs";
var str1,str2 : string ;
var len1,len2 : int ;
input_string(str1,str2);
len1= str1.size;
len2= str2.size;
writeln(lcs(str1,str2,len1,len2));
}`
Hi @ankugupt - thanks for your interest. I expect this issue to be closed by the addition of the StringUtils package to the mason registry.
If you think your implementation is an improvement over that one, I encourage you to contribute to the StringUtils package: https://github.com/AnubhavUjjawal/Chapel-StringUtils
Most helpful comment
Someone who reviewed the code in question (@ben-albrecht, @benharsh?) should correct me if I'm wrong, but I think the implementation in that test is exponential in nature (i.e., will start to take forever quickly as string sizes grow) and that we'd probably want to see a dynamic programming implementation before putting it into a blessed library.