Discussion:
[Scons-users] Out of memory writing back the database
Hill, Steve (FP COM)
2017-11-28 15:17:36 UTC
Permalink
All,

We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux). We have now reached the case where the database is in excess of 500MB. During the build, the memory usage bubbles along at just over 1GB but, at the end of the run we sometimes get a spike in memory usage that causes an "Out of memory" exception and the build fails.

Looking at the code, it appears that the database file is produced by pickling into memory before writing to the file, hence the memory usage increases by over 0.5GB as it is being written back - and, depending on the database size, this can blow the memory space of a 32-bit process.

Firstly, I do have a medium term plan to move us to 64-bit Python with the move to Python 3 but we still need to stick with 32-bit Python at the moment (for example, we have to load 32-bit DLLs) so I need to fix this for 32-bit Python in the short term.

Secondly, I've had a look at the old (?) .sconsign per directory approach, which would seem to address this problem. I assume that this is still supported and should be as robust as the single file approach?

The only issue with this is that some of the .sconsigns appear in the directories beside the source code, which breaches the contract for our build system, where no file may be produced within the directories containing source code. Where we use VariantDirs (for the actual implementation files), there is no problem but, for header files, the .sconsign ends up in the include or interface directory. I can see how I can reasonably easily monkey patch the DirFile class in SConsign.py to override the directory in which the .sconsign resides but, before I do this, I wanted to check that there is no easier way to achieve the builds without a memory issue.

Thanks,

Steve.
Bill Deegan
2017-12-01 01:36:38 UTC
Permalink
Steve,

How many files are you processing? 500MB is very large for a sconsign.

Could you try mv .sconsign .sconsign.save, let your build run and see how
big the new file is?

-Bill
Post by Hill, Steve (FP COM)
All,
We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux).
We have now reached the case where the database is in excess of 500MB.
During the build, the memory usage bubbles along at just over 1GB but, at
the end of the run we sometimes get a spike in memory usage that causes an
"Out of memory" exception and the build fails.
Looking at the code, it appears that the database file is produced by
pickling into memory before writing to the file, hence the memory usage
increases by over 0.5GB as it is being written back - and, depending on
the database size, this can blow the memory space of a 32-bit process.
Firstly, I do have a medium term plan to move us to 64-bit Python with the
move to Python 3 but we still need to stick with 32-bit Python at the
moment (for example, we have to load 32-bit DLLs) so I need to fix this for
32-bit Python in the short term.
Secondly, I've had a look at the old (?) .sconsign per directory approach,
which would seem to address this problem. I assume that this is still
supported and should be as robust as the single file approach?
The only issue with this is that some of the .sconsigns appear in the
directories beside the source code, which breaches the contract for our
build system, where no file may be produced within the directories
containing source code. Where we use VariantDirs (for the actual
implementation files), there is no problem but, for header files, the
.sconsign ends up in the include or interface directory. I can see how I
can reasonably easily monkey patch the DirFile class in SConsign.py to
override the directory in which the .sconsign resides but, before I do
this, I wanted to check that there is no easier way to achieve the builds
without a memory issue.
Thanks,
Steve.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
Hill, Steve (FP COM)
2017-12-01 08:48:48 UTC
Permalink
Hi Bill,



Our code base is not that big – about 20,000 files, but most files get built multiple times (for different CPUs) and we build it in several different ways, each with many of their own VariantDirs. When the 4 most common builds are done, the .sconsign (having been deleted first) is just under 500MB but, given a bit of history or add in some of the less common builds, and it gets big enough to blow the memory space of Python.



I’ve split the .sconsign up so that largely independent builds on the same codebase are stored in a different .sconsign (initially done due to the time taken to write the .sconsign back), though I’ve had to write some custom Deciders to make this totally reliable. All in all, the .sconsign per directory (though with a .sconsign location outside the main source tree) would seem the best solution for me (and seems like it might be better performing too as not all files will need writing back for incremental builds), unless there is another option that I’m not aware of




Thanks,



S.







Steve,



How many files are you processing? 500MB is very large for a sconsign.



Could you try mv .sconsign .sconsign.save, let your build run and see how big the new file is?



-Bill



On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <***@cobham.com> wrote:

All,

We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux). We have now reached the case where the database is in excess of 500MB. During the build, the memory usage bubbles along at just over 1GB but, at the end of the run we sometimes get a spike in memory usage that causes an "Out of memory" exception and the build fails.

Looking at the code, it appears that the database file is produced by pickling into memory before writing to the file, hence the memory usage increases by over 0.5GB as it is being written back - and, depending on the database size, this can blow the memory space of a 32-bit process.

Firstly, I do have a medium term plan to move us to 64-bit Python with the move to Python 3 but we still need to stick with 32-bit Python at the moment (for example, we have to load 32-bit DLLs) so I need to fix this for 32-bit Python in the short term.

Secondly, I've had a look at the old (?) .sconsign per directory approach, which would seem to address this problem. I assume that this is still supported and should be as robust as the single file approach?

The only issue with this is that some of the .sconsigns appear in the directories beside the source code, which breaches the contract for our build system, where no file may be produced within the directories containing source code. Where we use VariantDirs (for the actual implementation files), there is no problem but, for header files, the .sconsign ends up in the include or interface directory. I can see how I can reasonably easily monkey patch the DirFile class in SConsign.py to override the directory in which the .sconsign resides but, before I do this, I wanted to check that there is no easier way to achieve the builds without a memory issue.

Thanks,

Steve.
Hill, Steve (FP COM)
2017-12-01 13:21:55 UTC
Permalink
Hi Bill,



Just to follow up on this. I’ve just monkey patched the DirFile class in SConsign.py so that it writes the sconsign into a single temporary directory with each sconsign named with an adler32 hash of the directory path.



After doing the 4 basic builds, the directory contains 10,745 sconsigns totalling 674MB. Using –debug=memory, the peak memory usage of the worst case build was 925 MB, compared to ~1.4GB for the same build with the monolithic sconsign (so the difference is basically the cost of pickling the sconsign into memory first).



Cheers,



S.





Hi Bill,



Our code base is not that big – about 20,000 files, but most files get built multiple times (for different CPUs) and we build it in several different ways, each with many of their own VariantDirs. When the 4 most common builds are done, the .sconsign (having been deleted first) is just under 500MB but, given a bit of history or add in some of the less common builds, and it gets big enough to blow the memory space of Python.



I’ve split the .sconsign up so that largely independent builds on the same codebase are stored in a different .sconsign (initially done due to the time taken to write the .sconsign back), though I’ve had to write some custom Deciders to make this totally reliable. All in all, the .sconsign per directory (though with a .sconsign location outside the main source tree) would seem the best solution for me (and seems like it might be better performing too as not all files will need writing back for incremental builds), unless there is another option that I’m not aware of




Thanks,



S.







Steve,



How many files are you processing? 500MB is very large for a sconsign.



Could you try mv .sconsign .sconsign.save, let your build run and see how big the new file is?



-Bill



On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <***@cobham.com> wrote:

All,

We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux). We have now reached the case where the database is in excess of 500MB. During the build, the memory usage bubbles along at just over 1GB but, at the end of the run we sometimes get a spike in memory usage that causes an "Out of memory" exception and the build fails.

Looking at the code, it appears that the database file is produced by pickling into memory before writing to the file, hence the memory usage increases by over 0.5GB as it is being written back - and, depending on the database size, this can blow the memory space of a 32-bit process.

Firstly, I do have a medium term plan to move us to 64-bit Python with the move to Python 3 but we still need to stick with 32-bit Python at the moment (for example, we have to load 32-bit DLLs) so I need to fix this for 32-bit Python in the short term.

Secondly, I've had a look at the old (?) .sconsign per directory approach, which would seem to address this problem. I assume that this is still supported and should be as robust as the single file approach?

The only issue with this is that some of the .sconsigns appear in the directories beside the source code, which breaches the contract for our build system, where no file may be produced within the directories containing source code. Where we use VariantDirs (for the actual implementation files), there is no problem but, for header files, the .sconsign ends up in the include or interface directory. I can see how I can reasonably easily monkey patch the DirFile class in SConsign.py to override the directory in which the .sconsign resides but, before I do this, I wanted to check that there is no easier way to achieve the builds without a memory issue.

Thanks,

Steve.
Bill Deegan
2017-12-01 16:19:44 UTC
Permalink
Would you consider contributing that logic in a pull request and/or
pointing to a commit(s) in your repo?

Currently in the plans is to transition the .sconsign to another format,
possibly json or sqlite with an eye on speed and/or incrementality.

-Bill
Post by Hill, Steve (FP COM)
Hi Bill,
Just to follow up on this. I’ve just monkey patched the DirFile class in
SConsign.py so that it writes the sconsign into a single temporary
directory with each sconsign named with an adler32 hash of the directory
path.
After doing the 4 basic builds, the directory contains 10,745 sconsigns
totalling 674MB. Using –debug=memory, the peak memory usage of the worst
case build was 925 MB, compared to ~1.4GB for the same build with the
monolithic sconsign (so the difference is basically the cost of pickling
the sconsign into memory first).
Cheers,
S.
Hi Bill,
Our code base is not that big – about 20,000 files, but most files get
built multiple times (for different CPUs) and we build it in several
different ways, each with many of their own VariantDirs. When the 4 most
common builds are done, the .sconsign (having been deleted first) is just
under 500MB but, given a bit of history or add in some of the less common
builds, and it gets big enough to blow the memory space of Python.
I’ve split the .sconsign up so that largely independent builds on the same
codebase are stored in a different .sconsign (initially done due to the
time taken to write the .sconsign back), though I’ve had to write some
custom Deciders to make this totally reliable. All in all, the .sconsign
per directory (though with a .sconsign location outside the main source
tree) would seem the best solution for me (and seems like it might be
better performing too as not all files will need writing back for
incremental builds), unless there is another option that I’m not aware of

Thanks,
S.
Steve,
How many files are you processing? 500MB is very large for a sconsign.
Could you try mv .sconsign .sconsign.save, let your build run and see how
big the new file is?
-Bill
On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <
All,
We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux).
We have now reached the case where the database is in excess of 500MB.
During the build, the memory usage bubbles along at just over 1GB but, at
the end of the run we sometimes get a spike in memory usage that causes an
"Out of memory" exception and the build fails.
Looking at the code, it appears that the database file is produced by
pickling into memory before writing to the file, hence the memory usage
increases by over 0.5GB as it is being written back - and, depending on
the database size, this can blow the memory space of a 32-bit process.
Firstly, I do have a medium term plan to move us to 64-bit Python with the
move to Python 3 but we still need to stick with 32-bit Python at the
moment (for example, we have to load 32-bit DLLs) so I need to fix this for
32-bit Python in the short term.
Secondly, I've had a look at the old (?) .sconsign per directory approach,
which would seem to address this problem. I assume that this is still
supported and should be as robust as the single file approach?
The only issue with this is that some of the .sconsigns appear in the
directories beside the source code, which breaches the contract for our
build system, where no file may be produced within the directories
containing source code. Where we use VariantDirs (for the actual
implementation files), there is no problem but, for header files, the
.sconsign ends up in the include or interface directory. I can see how I
can reasonably easily monkey patch the DirFile class in SConsign.py to
override the directory in which the .sconsign resides but, before I do
this, I wanted to check that there is no easier way to achieve the builds
without a memory issue.
Thanks,
Steve.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
Eric Fahlgren
2017-12-01 16:59:18 UTC
Permalink
I got curious while reading this thread and looked at our .sconsign files.
Most of them are around 300K, but the one in our production directory was
2.5M (pretty tiny compared to Steve's 500M, but interesting in that it's
​10x bigger
even though it's building the same source). All of the small ones are
just test builds
​ --
one or two builds in them and then they get deleted. The one in
production is many years old, with one build per day.
python -c 'from SCons import dblite ; x =
dblite.open(".sconsign.dblite") ; print("\n".join("%7s %s"% (len(x[s]), s)
for s in sorted(x.keys())))'

The list of keys is the same for all of those that I examined, but the data
blob seems to grow with use. (I trimmed out three of the most interesting
ones, there were a couple hundred entries in each dump.)

660 win32\build
23546 win32\install
35249 win32\install\bin

In production:

133860 win32\build
2293371 win32\install
83235 win32\install\bin

The growth of the install directory
​ blob​
itself is pretty easy to explain: we create a number of "release versioned"
files there, each has a file name containing the day's build number so we
have something like "WIN32/install/installer1234.exe" with a different
number each day.
Bill Deegan
2017-12-01 17:00:44 UTC
Permalink
There is the sconsign utility to inspect the .sconsign files...
Post by Eric Fahlgren
I got curious while reading this thread and looked at our .sconsign
files. Most of them are around 300K, but the one in our production
directory was 2.5M (pretty tiny compared to Steve's 500M, but interesting
in that it's
​10x bigger
even though it's building the same source). All of the small ones are
just test builds
​ --
one or two builds in them and then they get deleted. The one in
production is many years old, with one build per day.
python -c 'from SCons import dblite ; x =
dblite.open(".sconsign.dblite") ; print("\n".join("%7s %s"% (len(x[s]),
s) for s in sorted(x.keys())))'
The list of keys is the same for all of those that I examined, but the
data blob seems to grow with use. (I trimmed out three of the most
interesting ones, there were a couple hundred entries in each dump.)
660 win32\build
23546 win32\install
35249 win32\install\bin
133860 win32\build
2293371 win32\install
83235 win32\install\bin
The growth of the install directory
​ blob​
itself is pretty easy to explain: we create a number of "release
versioned" files there, each has a file name containing the day's build
number so we have something like "WIN32/install/installer1234.exe" with a
different number each day.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
Bill Deegan
2017-12-01 17:01:42 UTC
Permalink
You're not building production from a clean sandbox? (and/or via CI)
-Bill
Post by Eric Fahlgren
I got curious while reading this thread and looked at our .sconsign
files. Most of them are around 300K, but the one in our production
directory was 2.5M (pretty tiny compared to Steve's 500M, but interesting
in that it's
​10x bigger
even though it's building the same source). All of the small ones are
just test builds
​ --
one or two builds in them and then they get deleted. The one in
production is many years old, with one build per day.
python -c 'from SCons import dblite ; x =
dblite.open(".sconsign.dblite") ; print("\n".join("%7s %s"% (len(x[s]),
s) for s in sorted(x.keys())))'
The list of keys is the same for all of those that I examined, but the
data blob seems to grow with use. (I trimmed out three of the most
interesting ones, there were a couple hundred entries in each dump.)
660 win32\build
23546 win32\install
35249 win32\install\bin
133860 win32\build
2293371 win32\install
83235 win32\install\bin
The growth of the install directory
​ blob​
itself is pretty easy to explain: we create a number of "release
versioned" files there, each has a file name containing the day's build
number so we have something like "WIN32/install/installer1234.exe" with a
different number each day.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
Eric Fahlgren
2017-12-01 17:19:06 UTC
Permalink
No, we just update the repository and do the build. The WIN32/install
directory is basically our installer archive, so it just sits there on one
of our servers accumulating gigabytes of garbage year after year...

Thanks for the tip on the sconsign utility, I just ran it on the old
database and it shows files dated as long ago as 2015-02-19, confirming
that it's at least a year and a half old.

(
It also crashes on that old database, but I don't think it's worth looking
into, do you care? (Works fine on all the newer ones.)

File "t:/Python27/Scripts/sconsign.py", line 277, in map_bkids
result.append(nodeinfo_string(bkids[i], bkidsigs[i], " "))
IndexError: list index out of range
)
Post by Bill Deegan
You're not building production from a clean sandbox? (and/or via CI)
-Bill
Post by Eric Fahlgren
I got curious while reading this thread and looked at our .sconsign
files. Most of them are around 300K, but the one in our production
directory was 2.5M (pretty tiny compared to Steve's 500M, but interesting
in that it's
​10x bigger
even though it's building the same source). All of the small ones are
just test builds
​ --
one or two builds in them and then they get deleted. The one in
production is many years old, with one build per day.
python -c 'from SCons import dblite ; x =
dblite.open(".sconsign.dblite") ; print("\n".join("%7s %s"% (len(x[s]),
s) for s in sorted(x.keys())))'
The list of keys is the same for all of those that I examined, but the
data blob seems to grow with use. (I trimmed out three of the most
interesting ones, there were a couple hundred entries in each dump.)
660 win32\build
23546 win32\install
35249 win32\install\bin
133860 win32\build
2293371 win32\install
83235 win32\install\bin
The growth of the install directory
​ blob​
itself is pretty easy to explain: we create a number of "release
versioned" files there, each has a file name containing the day's build
number so we have something like "WIN32/install/installer1234.exe" with
a different number each day.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
Hill, Steve (FP COM)
2017-12-01 17:31:41 UTC
Permalink
The quick hack was as follows (not committed anywhere as it is very much a hack!):



import SCons

_DirFile = SCons.SConsign.DirFile

hashDir = os.path.join("tmp_files", "sconsigns")

hashFile = os.path.join(hashDir, ".hashes")

sconsignHashes = hash_cache.HashManager(hashFile)



class NewDirFile(_DirFile):

def __init__(self, dir):

"""

dir - the directory for the file

"""



self.dir = dir

internalPath = dir.get_internal_path()

pathHash = sconsignHashes.get_hash(internalPath)

self.sconsign = os.path.join(hashDir, pathHash + '.sconsign')



try:

fp = open(self.sconsign, 'rb')

except IOError:

fp = None



try:

SCons.SConsign.Dir.__init__(self, fp, dir)

except KeyboardInterrupt:

raise

except:

SCons.Warnings.warn(SCons.Warnings.CorruptSConsignWarning,

"Ignoring corrupt .sconsign file: %s"%self.sconsign)



SCons.SConsign.sig_files.append(self)



SCons.SConsign.DirFile = NewDirFile



SConsignFile(None)



Note that the lines in red are the only changes from the DirFile constructor in SConsign.py.



HashManager is a pre-existing class that I’ve reused here. It handles creating (including hash collisions) and caching the (8 character hex string) hash values – plus storing them in a file for the next run.



HTH,



S.





Would you consider contributing that logic in a pull request and/or pointing to a commit(s) in your repo?



Currently in the plans is to transition the .sconsign to another format, possibly json or sqlite with an eye on speed and/or incrementality.



-Bill



On Fri, Dec 1, 2017 at 5:21 AM, Hill, Steve (FP COM) <***@cobham.com> wrote:

Hi Bill,



Just to follow up on this. I’ve just monkey patched the DirFile class in SConsign.py so that it writes the sconsign into a single temporary directory with each sconsign named with an adler32 hash of the directory path.



After doing the 4 basic builds, the directory contains 10,745 sconsigns totalling 674MB. Using –debug=memory, the peak memory usage of the worst case build was 925 MB, compared to ~1.4GB for the same build with the monolithic sconsign (so the difference is basically the cost of pickling the sconsign into memory first).



Cheers,



S.





Hi Bill,



Our code base is not that big – about 20,000 files, but most files get built multiple times (for different CPUs) and we build it in several different ways, each with many of their own VariantDirs. When the 4 most common builds are done, the .sconsign (having been deleted first) is just under 500MB but, given a bit of history or add in some of the less common builds, and it gets big enough to blow the memory space of Python.



I’ve split the .sconsign up so that largely independent builds on the same codebase are stored in a different .sconsign (initially done due to the time taken to write the .sconsign back), though I’ve had to write some custom Deciders to make this totally reliable. All in all, the .sconsign per directory (though with a .sconsign location outside the main source tree) would seem the best solution for me (and seems like it might be better performing too as not all files will need writing back for incremental builds), unless there is another option that I’m not aware of




Thanks,



S.







Steve,



How many files are you processing? 500MB is very large for a sconsign.



Could you try mv .sconsign .sconsign.save, let your build run and see how big the new file is?



-Bill



On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <***@cobham.com> wrote:

All,

We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and Linux). We have now reached the case where the database is in excess of 500MB. During the build, the memory usage bubbles along at just over 1GB but, at the end of the run we sometimes get a spike in memory usage that causes an "Out of memory" exception and the build fails.

Looking at the code, it appears that the database file is produced by pickling into memory before writing to the file, hence the memory usage increases by over 0.5GB as it is being written back - and, depending on the database size, this can blow the memory space of a 32-bit process.

Firstly, I do have a medium term plan to move us to 64-bit Python with the move to Python 3 but we still need to stick with 32-bit Python at the moment (for example, we have to load 32-bit DLLs) so I need to fix this for 32-bit Python in the short term.

Secondly, I've had a look at the old (?) .sconsign per directory approach, which would seem to address this problem. I assume that this is still supported and should be as robust as the single file approach?

The only issue with this is that some of the .sconsigns appear in the directories beside the source code, which breaches the contract for our build system, where no file may be produced within the directories containing source code. Where we use VariantDirs (for the actual implementation files), there is no problem but, for header files, the .sconsign ends up in the include or interface directory. I can see how I can reasonably easily monkey patch the DirFile class in SConsign.py to override the directory in which the .sconsign resides but, before I do this, I wanted to check that there is no easier way to achieve the builds without a memory issue.

Thanks,

Steve.
Mats Wichmann
2017-12-01 18:50:38 UTC
Permalink
Transition: json won't help with that. Sqlite might.
Post by Bill Deegan
Would you consider contributing that logic in a pull request and/or
pointing to a commit(s) in your repo?
Currently in the plans is to transition the .sconsign to another format,
possibly json or sqlite with an eye on speed and/or incrementality.
-Bill
On Fri, Dec 1, 2017 at 5:21 AM, Hill, Steve (FP COM)
Post by Hill, Steve (FP COM)
Hi Bill,
Just to follow up on this. I’ve just monkey patched the DirFile class
in
Post by Hill, Steve (FP COM)
SConsign.py so that it writes the sconsign into a single temporary
directory with each sconsign named with an adler32 hash of the
directory
Post by Hill, Steve (FP COM)
path.
After doing the 4 basic builds, the directory contains 10,745
sconsigns
Post by Hill, Steve (FP COM)
totalling 674MB. Using –debug=memory, the peak memory usage of the
worst
Post by Hill, Steve (FP COM)
case build was 925 MB, compared to ~1.4GB for the same build with the
monolithic sconsign (so the difference is basically the cost of
pickling
Post by Hill, Steve (FP COM)
the sconsign into memory first).
Cheers,
S.
Hi Bill,
Our code base is not that big – about 20,000 files, but most files
get
Post by Hill, Steve (FP COM)
built multiple times (for different CPUs) and we build it in several
different ways, each with many of their own VariantDirs. When the 4
most
Post by Hill, Steve (FP COM)
common builds are done, the .sconsign (having been deleted first) is
just
Post by Hill, Steve (FP COM)
under 500MB but, given a bit of history or add in some of the less
common
Post by Hill, Steve (FP COM)
builds, and it gets big enough to blow the memory space of Python.
I’ve split the .sconsign up so that largely independent builds on the
same
Post by Hill, Steve (FP COM)
codebase are stored in a different .sconsign (initially done due to
the
Post by Hill, Steve (FP COM)
time taken to write the .sconsign back), though I’ve had to write
some
Post by Hill, Steve (FP COM)
custom Deciders to make this totally reliable. All in all, the
.sconsign
Post by Hill, Steve (FP COM)
per directory (though with a .sconsign location outside the main
source
Post by Hill, Steve (FP COM)
tree) would seem the best solution for me (and seems like it might be
better performing too as not all files will need writing back for
incremental builds), unless there is another option that I’m not
aware of

Post by Hill, Steve (FP COM)
Thanks,
S.
Steve,
How many files are you processing? 500MB is very large for a
sconsign.
Post by Hill, Steve (FP COM)
Could you try mv .sconsign .sconsign.save, let your build run and see
how
Post by Hill, Steve (FP COM)
big the new file is?
-Bill
On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <
All,
We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and
Linux).
Post by Hill, Steve (FP COM)
We have now reached the case where the database is in excess of
500MB.
Post by Hill, Steve (FP COM)
During the build, the memory usage bubbles along at just over 1GB
but, at
Post by Hill, Steve (FP COM)
the end of the run we sometimes get a spike in memory usage that
causes an
Post by Hill, Steve (FP COM)
"Out of memory" exception and the build fails.
Looking at the code, it appears that the database file is produced by
pickling into memory before writing to the file, hence the memory
usage
Post by Hill, Steve (FP COM)
increases by over 0.5GB as it is being written back - and, depending
on
Post by Hill, Steve (FP COM)
the database size, this can blow the memory space of a 32-bit
process.
Post by Hill, Steve (FP COM)
Firstly, I do have a medium term plan to move us to 64-bit Python
with the
Post by Hill, Steve (FP COM)
move to Python 3 but we still need to stick with 32-bit Python at the
moment (for example, we have to load 32-bit DLLs) so I need to fix
this for
Post by Hill, Steve (FP COM)
32-bit Python in the short term.
Secondly, I've had a look at the old (?) .sconsign per directory
approach,
Post by Hill, Steve (FP COM)
which would seem to address this problem. I assume that this is still
supported and should be as robust as the single file approach?
The only issue with this is that some of the .sconsigns appear in the
directories beside the source code, which breaches the contract for
our
Post by Hill, Steve (FP COM)
build system, where no file may be produced within the directories
containing source code. Where we use VariantDirs (for the actual
implementation files), there is no problem but, for header files, the
.sconsign ends up in the include or interface directory. I can see
how I
Post by Hill, Steve (FP COM)
can reasonably easily monkey patch the DirFile class in SConsign.py
to
Post by Hill, Steve (FP COM)
override the directory in which the .sconsign resides but, before I
do
Post by Hill, Steve (FP COM)
this, I wanted to check that there is no easier way to achieve the
builds
Post by Hill, Steve (FP COM)
without a memory issue.
Thanks,
Steve.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Bill Deegan
2017-12-01 21:22:26 UTC
Permalink
Mats,

JSON may help memory, but not incrementality if it's a single file.

Thoughts?

-Bill
Post by Mats Wichmann
Transition: json won't help with that. Sqlite might.
Post by Bill Deegan
Would you consider contributing that logic in a pull request and/or
pointing to a commit(s) in your repo?
Currently in the plans is to transition the .sconsign to another format,
possibly json or sqlite with an eye on speed and/or incrementality.
-Bill
On Fri, Dec 1, 2017 at 5:21 AM, Hill, Steve (FP COM) <
Post by Hill, Steve (FP COM)
Hi Bill,
Just to follow up on this. I’ve just monkey patched the DirFile class in
SConsign.py so that it writes the sconsign into a single temporary
directory with each sconsign named with an adler32 hash of the directory
path.
After doing the 4 basic builds, the directory contains 10,745 sconsigns
totalling 674MB. Using –debug=memory, the peak memory usage of the worst
case build was 925 MB, compared to ~1.4GB for the same build with the
monolithic sconsign (so the difference is basically the cost of pickling
the sconsign into memory first).
Cheers,
S.
Hi Bill,
Our code base is not that big – about 20,000 files, but most files get
built multiple times (for different CPUs) and we build it in several
different ways, each with many of their own VariantDirs. When the 4 most
common builds are done, the .sconsign (having been deleted first) is just
under 500MB but, given a bit of history or add in some of the less common
builds, and it gets big enough to blow the memory space of Python.
I’ve split the .sconsign up so that largely independent builds on the
same codebase are stored in a different .sconsign (initially done due to
the time taken to write the .sconsign back), though I’ve had to write some
custom Deciders to make this totally reliable. All in all, the .sconsign
per directory (though with a .sconsign location outside the main source
tree) would seem the best solution for me (and seems like it might be
better performing too as not all files will need writing back for
incremental builds), unless there is another option that I’m not aware of

Thanks,
S.
Steve,
How many files are you processing? 500MB is very large for a sconsign.
Could you try mv .sconsign .sconsign.save, let your build run and see
how big the new file is?
-Bill
On Tue, Nov 28, 2017 at 7:17 AM, Hill, Steve (FP COM) <
All,
We are using 32-bit Python 2.7.12 and SCons 2.5.1 on Windows (and
Linux). We have now reached the case where the database is in excess of
500MB. During the build, the memory usage bubbles along at just over 1GB
but, at the end of the run we sometimes get a spike in memory usage that
causes an "Out of memory" exception and the build fails.
Looking at the code, it appears that the database file is produced by
pickling into memory before writing to the file, hence the memory usage
increases by over 0.5GB as it is being written back - and, depending on
the database size, this can blow the memory space of a 32-bit process.
Firstly, I do have a medium term plan to move us to 64-bit Python with
the move to Python 3 but we still need to stick with 32-bit Python at the
moment (for example, we have to load 32-bit DLLs) so I need to fix this for
32-bit Python in the short term.
Secondly, I've had a look at the old (?) .sconsign per directory
approach, which would seem to address this problem. I assume that this is
still supported and should be as robust as the single file approach?
The only issue with this is that some of the .sconsigns appear in the
directories beside the source code, which breaches the contract for our
build system, where no file may be produced within the directories
containing source code. Where we use VariantDirs (for the actual
implementation files), there is no problem but, for header files, the
.sconsign ends up in the include or interface directory. I can see how I
can reasonably easily monkey patch the DirFile class in SConsign.py to
override the directory in which the .sconsign resides but, before I do
this, I wanted to check that there is no easier way to achieve the builds
without a memory issue.
Thanks,
Steve.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Scons-users mailing list
https://pairlist4.pair.net/mailman/listinfo/scons-users
Loading...