Discussion:
[scons-users] Race condition in SCons scheduler?
Robert Smallshire
2006-10-26 17:15:24 UTC
Permalink
Hi,

I'm building a large system in parallel with SCons on a multiprocessor
(4 core) Windows Server 2003 64-bit edition server. I'm getting frequent
build failures owing to what I suspect to be a race condition in the
SCons scheduler when building in parallel. The failure is always related
to one of my own custom builders, which uses a simple Python function to
generate some C++ code.

My action function which does the code generation opens a file, writes
C++ code to the file, and explicitly closes the file, using the standard
Python try: finally: idiom to ensure that the file is closed before the
function returns.

However, SCons seems to be attempting to compile the resulting C++ file,
before my build action has closed the file. Of course, the compiler
fails to open the file, and the build fails.

Forcing a garbage collection sweep with gc.collect() immediately after
closing the file seems to help, but this is probably because it stops
all Python threads, and effectively serialises the build actions.

Here is some sample build output:

rmsInitializerAction(["D:\build\robertsm\products\rms\8_0s\windows-amd64
-vc_8_0-normal\reditor\ui\reditor_ui__initializer.cpp"],
["'reditor_ui__initializer.cpp'", "''", "''"])
cl /nologo /EHsc /Zc:forScope /GR /W0 /MD /Od /TP /DWIN64 /D__WIN32__
/D_CRT_SECURE_NO_DEPRECATE /D_HAS_ITERATOR_DEBUGGING=0 /D_SECURE_SCL=0
/DWINDOWS /DLITTLEENDIAN /DQT /DQT_DLL /DQT_THREAD_SUPPORT /DNO_DEBUG=1
/DQT_NO_CHECK=1 /DQT_NO_DEBUG=1 /I.
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\qt-3_3_
5\include
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\qt-3_3_
5\mkspecs\win32-msvc.net
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\oiv-5_0
_4\include /c
D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\reditor\
ui\reditor_ui__initializer.cpp
/FoD:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\redit
or\ui\reditor_ui__initializer.obj
reditor_ui__initializer.cpp
c1xx : fatal error C1083: Cannot open source file:
'D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\reditor
\ui\reditor_ui__initializer.cpp': Permission denied
scons: ***
[D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\reditor
\ui\reditor_ui__initializer.obj] Error 2

Unfortunately, I've failed to distill this down to a simple example, as
is the norm with threading problems - but I strongly suspect there is a
problem with the SCons scheduler with parallel builds.

Other details: SCons v0.96.92.D002. Python 2.5 AMD64 build.

Has anybody seen anything similar, or created a fix for this?

Rob Smallshire


DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
Timothee Besset
2006-10-26 17:19:58 UTC
Permalink
I doubt this would be a bug in scons parallelism, but more likely your
dependencies are not being setup correctly in the script, and scons has
no idea that the compile task depends on another being completed.

Using the debug/verbose options in scons might show more precisely why
it decides to build

TTimo
Post by Robert Smallshire
Hi,
I'm building a large system in parallel with SCons on a multiprocessor
(4 core) Windows Server 2003 64-bit edition server. I'm getting frequent
build failures owing to what I suspect to be a race condition in the
SCons scheduler when building in parallel. The failure is always related
to one of my own custom builders, which uses a simple Python function to
generate some C++ code.
My action function which does the code generation opens a file, writes
C++ code to the file, and explicitly closes the file, using the standard
Python try: finally: idiom to ensure that the file is closed before the
function returns.
However, SCons seems to be attempting to compile the resulting C++ file,
before my build action has closed the file. Of course, the compiler
fails to open the file, and the build fails.
Forcing a garbage collection sweep with gc.collect() immediately after
closing the file seems to help, but this is probably because it stops
all Python threads, and effectively serialises the build actions.
rmsInitializerAction(["D:\build\robertsm\products\rms\8_0s\windows-amd64
-vc_8_0-normal\reditor\ui\reditor_ui__initializer.cpp"],
["'reditor_ui__initializer.cpp'", "''", "''"])
cl /nologo /EHsc /Zc:forScope /GR /W0 /MD /Od /TP /DWIN64 /D__WIN32__
/D_CRT_SECURE_NO_DEPRECATE /D_HAS_ITERATOR_DEBUGGING=0 /D_SECURE_SCL=0
/DWINDOWS /DLITTLEENDIAN /DQT /DQT_DLL /DQT_THREAD_SUPPORT /DNO_DEBUG=1
/DQT_NO_CHECK=1 /DQT_NO_DEBUG=1 /I.
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\qt-3_3_
5\include
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\qt-3_3_
5\mkspecs\win32-msvc.net
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\oiv-5_0
_4\include /c
D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\reditor\
ui\reditor_ui__initializer.cpp
/FoD:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\redit
or\ui\reditor_ui__initializer.obj
reditor_ui__initializer.cpp
'D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\reditor
\ui\reditor_ui__initializer.cpp': Permission denied
scons: ***
[D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\reditor
\ui\reditor_ui__initializer.obj] Error 2
Unfortunately, I've failed to distill this down to a simple example, as
is the norm with threading problems - but I strongly suspect there is a
problem with the SCons scheduler with parallel builds.
Other details: SCons v0.96.92.D002. Python 2.5 AMD64 build.
Has anybody seen anything similar, or created a fix for this?
Rob Smallshire
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
---------------------------------------------------------------------
Robert Smallshire
2006-10-26 17:45:38 UTC
Permalink
Hi Timothy,
Post by Timothee Besset
I doubt this would be a bug in scons parallelism, but more
likely your dependencies are not being setup correctly in the
script, and scons has no idea that the compile task depends
on another being completed.
Using the debug/verbose options in scons might show more
precisely why it decides to build
I sure the dependency relationship is correct. Using --debug=dtree
reports the correct dependency relationship between the derived file
generated by the Python function, and the object file which should
result from its compilation (n.b. this is a different example of the
same problem),

+-D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\Silibs
\bt\arr1\silibs_bt_arr1__initializer.obj

+-D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\Silibs
\bt\arr1\silibs_bt_arr1__initializer.cpp

This only occurs on our fastest build server. I've not seen the problem
with the many other parallel builds, using the same setup, we do here.

Rob
Post by Timothee Besset
Post by Robert Smallshire
Hi,
I'm building a large system in parallel with SCons on a
multiprocessor
Post by Robert Smallshire
(4 core) Windows Server 2003 64-bit edition server. I'm getting
frequent build failures owing to what I suspect to be a
race condition
Post by Robert Smallshire
in the SCons scheduler when building in parallel. The failure is
always related to one of my own custom builders, which uses
a simple
Post by Robert Smallshire
Python function to generate some C++ code.
My action function which does the code generation opens a
file, writes
Post by Robert Smallshire
C++ code to the file, and explicitly closes the file, using the
C++ standard
Python try: finally: idiom to ensure that the file is closed before
the function returns.
However, SCons seems to be attempting to compile the resulting C++
file, before my build action has closed the file. Of course, the
compiler fails to open the file, and the build fails.
Forcing a garbage collection sweep with gc.collect()
immediately after
Post by Robert Smallshire
closing the file seems to help, but this is probably
because it stops
Post by Robert Smallshire
all Python threads, and effectively serialises the build actions.
rmsInitializerAction(["D:\build\robertsm\products\rms\8_0s\windows-amd
Post by Robert Smallshire
64 -vc_8_0-normal\reditor\ui\reditor_ui__initializer.cpp"],
["'reditor_ui__initializer.cpp'", "''", "''"]) cl /nologo /EHsc
/Zc:forScope /GR /W0 /MD /Od /TP /DWIN64 /D__WIN32__
/D_CRT_SECURE_NO_DEPRECATE /D_HAS_ITERATOR_DEBUGGING=0
/D_SECURE_SCL=0
Post by Robert Smallshire
/DWINDOWS /DLITTLEENDIAN /DQT /DQT_DLL /DQT_THREAD_SUPPORT
/DNO_DEBUG=1
/DQT_NO_CHECK=1 /DQT_NO_DEBUG=1 /I.
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\qt-3_
Post by Robert Smallshire
3_
5\include
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\qt-3_
Post by Robert Smallshire
3_
5\mkspecs\win32-msvc.net
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\oiv-5
Post by Robert Smallshire
_0
_4\include /c
D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\redito
Post by Robert Smallshire
r\
ui\reditor_ui__initializer.cpp
/FoD:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\red
Post by Robert Smallshire
it
or\ui\reditor_ui__initializer.obj
reditor_ui__initializer.cpp
'D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\redit
Post by Robert Smallshire
or
\ui\reditor_ui__initializer.cpp': Permission denied
scons: ***
[D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\redit
Post by Robert Smallshire
or \ui\reditor_ui__initializer.obj] Error 2
Unfortunately, I've failed to distill this down to a simple
example,
Post by Robert Smallshire
as is the norm with threading problems - but I strongly
suspect there
Post by Robert Smallshire
is a problem with the SCons scheduler with parallel builds.
Other details: SCons v0.96.92.D002. Python 2.5 AMD64 build.
Has anybody seen anything similar, or created a fix for this?
Rob Smallshire
This message contains information that may be privileged or
confidential and is the property of the Roxar Group. It is
intended only for the person to whom it is addressed. If you
are not the intended recipient, you are not authorised to
read, print, retain, copy, disseminate, distribute, or use
this message or any part thereof. If you receive this message
in error, please notify the sender immediately and delete all
copies of this message.
---------------------------------------------------------------------
---------------------------------------------------------------------
DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
Valera Fooksman
2006-10-26 18:21:43 UTC
Permalink
Hi Robert,

Please keep us posted about your progress as we're experiencing the
very same thing, and also only on our fastest build servers (but only
Win2K, not WinXP!). I wrote a message to the group about this on
September 14th (Subj: multiple jobs failure) which remained
unanswered. Since then I verified that the offensive operation was
executed on the same thread where the file copying (or creation) had
just happened, which leaves me completely puzzled. I have to conclude
that either the following python code doesn't really close the file:

shutil.copy2(source, dest)
st = os.stat(source)
os.chmod(dest, stat.S_IMODE(st[stat.ST_MODE]) | stat.S_IWRITE)

or there are evil things like antivirus or disk cache that grab the
file right after its creation and cause the havoc.

Since then I had to disable multiple jobs as I can't spent any more
time investigating the problem.
--
Yours,
\/alera
Post by Robert Smallshire
Hi Timothy,
Post by Timothee Besset
I doubt this would be a bug in scons parallelism, but more
likely your dependencies are not being setup correctly in the
script, and scons has no idea that the compile task depends
on another being completed.
Using the debug/verbose options in scons might show more
precisely why it decides to build
I sure the dependency relationship is correct. Using --debug=dtree
reports the correct dependency relationship between the derived file
generated by the Python function, and the object file which should
result from its compilation (n.b. this is a different example of the
same problem),
+-D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\Silibs
\bt\arr1\silibs_bt_arr1__initializer.obj
+-D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\Silibs
\bt\arr1\silibs_bt_arr1__initializer.cpp
This only occurs on our fastest build server. I've not seen the problem
with the many other parallel builds, using the same setup, we do here.
Rob
Post by Timothee Besset
Post by Robert Smallshire
Hi,
I'm building a large system in parallel with SCons on a
multiprocessor
Post by Robert Smallshire
(4 core) Windows Server 2003 64-bit edition server. I'm getting
frequent build failures owing to what I suspect to be a
race condition
Post by Robert Smallshire
in the SCons scheduler when building in parallel. The failure is
always related to one of my own custom builders, which uses
a simple
Post by Robert Smallshire
Python function to generate some C++ code.
My action function which does the code generation opens a
file, writes
Post by Robert Smallshire
C++ code to the file, and explicitly closes the file, using the
C++ standard
Python try: finally: idiom to ensure that the file is closed before
the function returns.
However, SCons seems to be attempting to compile the resulting C++
file, before my build action has closed the file. Of course, the
compiler fails to open the file, and the build fails.
Forcing a garbage collection sweep with gc.collect()
immediately after
Post by Robert Smallshire
closing the file seems to help, but this is probably
because it stops
Post by Robert Smallshire
all Python threads, and effectively serialises the build actions.
rmsInitializerAction(["D:\build\robertsm\products\rms\8_0s\windows-amd
Post by Robert Smallshire
64 -vc_8_0-normal\reditor\ui\reditor_ui__initializer.cpp"],
["'reditor_ui__initializer.cpp'", "''", "''"]) cl /nologo /EHsc
/Zc:forScope /GR /W0 /MD /Od /TP /DWIN64 /D__WIN32__
/D_CRT_SECURE_NO_DEPRECATE /D_HAS_ITERATOR_DEBUGGING=0
/D_SECURE_SCL=0
Post by Robert Smallshire
/DWINDOWS /DLITTLEENDIAN /DQT /DQT_DLL /DQT_THREAD_SUPPORT
/DNO_DEBUG=1
/DQT_NO_CHECK=1 /DQT_NO_DEBUG=1 /I.
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\qt-3_
Post by Robert Smallshire
3_
5\include
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\qt-3_
Post by Robert Smallshire
3_
5\mkspecs\win32-msvc.net
/ID:\p4workspaces\robertsm\packages\windows-amd64-vc_8_0-release\oiv-5
Post by Robert Smallshire
_0
_4\include /c
D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\redito
Post by Robert Smallshire
r\
ui\reditor_ui__initializer.cpp
/FoD:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\red
Post by Robert Smallshire
it
or\ui\reditor_ui__initializer.obj
reditor_ui__initializer.cpp
'D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\redit
Post by Robert Smallshire
or
\ui\reditor_ui__initializer.cpp': Permission denied
scons: ***
[D:\build\robertsm\products\rms\8_0s\windows-amd64-vc_8_0-normal\redit
Post by Robert Smallshire
or \ui\reditor_ui__initializer.obj] Error 2
Unfortunately, I've failed to distill this down to a simple
example,
Post by Robert Smallshire
as is the norm with threading problems - but I strongly
suspect there
Post by Robert Smallshire
is a problem with the SCons scheduler with parallel builds.
Other details: SCons v0.96.92.D002. Python 2.5 AMD64 build.
Has anybody seen anything similar, or created a fix for this?
Rob Smallshire
This message contains information that may be privileged or
confidential and is the property of the Roxar Group. It is
intended only for the person to whom it is addressed. If you
are not the intended recipient, you are not authorised to
read, print, retain, copy, disseminate, distribute, or use
this message or any part thereof. If you receive this message
in error, please notify the sender immediately and delete all
copies of this message.
---------------------------------------------------------------------
---------------------------------------------------------------------
This message contains information that may be privileged or
confidential and is the property of the Roxar Group. It is intended
only for the person to whom it is addressed. If you are not the
intended recipient, you are not authorised to read, print, retain,
copy, disseminate, distribute, or use this message or any part
thereof. If you receive this message in error, please notify the
sender immediately and delete all copies of this message.
---------------------------------------------------------------------
James Y Knight
2006-10-26 22:45:54 UTC
Permalink
Post by Valera Fooksman
or there are evil things like antivirus or disk cache that grab the
file right after its creation and cause the havoc.
Yes, this is a windows "feature", and there's not much SCons can do
about it. It should even be able to happen without using multiple
build threads, although multiple threads makes it more likely I'm
sure. Basically what happens is that as soon as you close a file, the
antivirus program (or file indexing program) will open the file,
often with an exclusive open mode. An exclusive open makes anybody
else trying to open the file (such as the compiler) fail. Yep, it
sucks, but that's windows for you.

I recommend sysinternals filemon to see what the heck is actually
going on. If it shows anything else opening files right after you've
closed them, uninstall that program.

http://www.sysinternals.com/Utilities/Filemon.html

James
Robert Smallshire
2006-10-27 12:16:10 UTC
Permalink
Hi James and the list,
Post by James Y Knight
I recommend sysinternals filemon to see what the heck is
actually going on. If it shows anything else opening files
right after you've closed them, uninstall that program.
http://www.sysinternals.com/Utilities/Filemon.html
Thanks, that's good advice and a very useful tool. I've run FileMon to
look at what happens to my generated file, the log is below, which shows
what happens to my file from

python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
NOT FOUND Attributes: Error
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
NOT FOUND Attributes: Error
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
NOT FOUND Attributes: Error
python.exe:1140 CREATE table_comp__initializer.cpp SUCCESS Options:
OverwriteIf Access: All
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
BUFFER OVERFLOW FileFsVolumeInformation
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
BUFFER OVERFLOW FileAllInformation
python.exe:1140 WRITE table_comp__initializer.cpp SUCCESS Offset:
0 Length: 320
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
SUCCESS Attributes: A
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
SUCCESS Attributes: A
python.exe:1140 OPEN table_comp__initializer.cpp SUCCESS Options:
Open Access: All
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
BUFFER OVERFLOW FileFsVolumeInformation
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
BUFFER OVERFLOW FileAllInformation
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
BUFFER OVERFLOW FileFsVolumeInformation
python.exe:1140 QUERY INFORMATION table_comp__initializer.cpp
BUFFER OVERFLOW FileAllInformation
python.exe:1140 READ table_comp__initializer.cpp SUCCESS Offset:
0 Length: 4096
python.exe:1140 READ table_comp__initializer.cpp END OF FILE
Offset: 320 Length: 4096
python.exe:1140 CLOSE table_comp__initializer.cpp SUCCESS
cl.exe:1788 QUERY INFORMATION table_comp__initializer.cpp
SUCCESS Attributes: A
cl.exe:1788 OPEN table_comp__initializer.cpp SHARING
VIOLATION Options: Open Access: All
cmd.exe:2648 CLOSE table_comp__initializer.cpp SUCCESS

I can't claim to completely understand the sequence of events here.
SCons CREATEs the file, then WRITEs to it, then OPENs it, READs from it,
then CLOSEs it; all before the compiler jumps in. The OPEN here looks
suspicious, is something somewhere else in SCons opening my file a
second time for read operations - to compute a signature maybe - since
all 320 bytes of the file seem to be read by SCons. Should I expect two
CLOSEs from SCons here - I'm not sure whether the CLOSE I see is paired
with CREATE or OPEN?

Any ideas?

Rob

DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
Gary Oberbrunner
2006-10-27 13:43:10 UTC
Permalink
Post by Robert Smallshire
python.exe:1140 CREATE table_comp__initializer.cpp
python.exe:1140 WRITE table_comp__initializer.cpp
python.exe:1140 OPEN table_comp__initializer.cpp
python.exe:1140 READ table_comp__initializer.cpp
python.exe:1140 READ table_comp__initializer.cpp (END OF FILE)
python.exe:1140 CLOSE table_comp__initializer.cpp SUCCESS
cl.exe:1788 OPEN table_comp__initializer.cpp (SHARING VIOLATION
cmd.exe:2648 CLOSE table_comp__initializer.cpp SUCCESS
SCons CREATEs the file, then WRITEs to it, then OPENs it, READs from it,
then CLOSEs it; all before the compiler jumps in. The OPEN here looks
suspicious, is something somewhere else in SCons opening my file a
second time for read operations - to compute a signature maybe - since
all 320 bytes of the file seem to be read by SCons. Should I expect two
CLOSEs from SCons here - I'm not sure whether the CLOSE I see is paired
with CREATE or OPEN?
I think there should be a CLOSE just before the OPEN (to match the CREATE).
Please post the action function you're using to create
table_comp__initializer.cpp. Also what Python version are you using? Could
be a python bug. What if you used low-level IO calls rather than stdio?

-- Gary
Robert Smallshire
2006-10-27 16:07:01 UTC
Permalink
Hi,
Post by Gary Oberbrunner
I think there should be a CLOSE just before the OPEN (to
match the CREATE).
Please post the action function you're using to create
table_comp__initializer.cpp.
Here is the version using file objects.

def rmsInitializerAction(target, source, env):
trg = target[0].abspath
export_define = source[1].value
export_include = source[2].value
name = os.path.splitext(os.path.basename(trg))[0]
try:
fh = file(trg, 'w')
try:
# Add the #include for the dllexport/dllimport macro if
necessary
if export_include:
fh.write("#include \"%s\"\n\n" % export_include)
# Add the dllexport/dllimport macro if this is in a DLL
if export_define:
fh.write("%s " % export_define)
fh.write(
"int %s() {\n"
" int status = 1;\n"
" return status;\n"
"}\n" % name)
finally:
fh.close()
except IOError, inst:
print "Failed to create initializer"
env.Exit(1)
return None
Post by Gary Oberbrunner
Also what Python version are
you using?
Python 2.5 (r25:51908, Sep 19 2006, 10:05:36) [MSC v.1400 64 bit
(AMD64)] on win32
Post by Gary Oberbrunner
Could be a python bug. What if you used
low-level IO calls rather than stdio?
I've just tried this. Unfortunately. using the low level file API and
file decsriptors does avoid the problem. The only difference is that the
first CREATE is replaced with another OPEN, so now there are two OPENs
and one CLOSE.

:-(

Rob


DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
Gary Oberbrunner
2006-10-27 16:31:57 UTC
Permalink
Post by Robert Smallshire
Post by Gary Oberbrunner
low-level IO calls rather than stdio?
I've just tried this. Unfortunately. using the low level file API and
file decsriptors does avoid the problem. The only difference is that the
first CREATE is replaced with another OPEN, so now there are two OPENs
and one CLOSE.
Did you mean *doesn't* avoid?

Assuming you did, what I would do next is analogous to using print statements
to debug: open & close, or os.stat(), some random file right around the point
where you call fh.close(); you'll see that file in the log (I hope!). Then
you can maybe see whether it's calling fh.close() but not actually closing the
file, or what.

-- Gary
Robert Smallshire
2006-10-27 16:48:12 UTC
Permalink
Post by Robert Smallshire
Post by Robert Smallshire
Post by Gary Oberbrunner
low-level IO calls rather than stdio?
I've just tried this. Unfortunately. using the low level
file API and
Post by Robert Smallshire
file decsriptors does avoid the problem. The only
difference is that
Post by Robert Smallshire
the first CREATE is replaced with another OPEN, so now
there are two
Post by Robert Smallshire
OPENs and one CLOSE.
Did you mean *doesn't* avoid?
Yes, of course! Sorry.
Post by Robert Smallshire
Assuming you did, what I would do next is analogous to using
print statements to debug: open & close, or os.stat(), some
random file right around the point where you call fh.close();
you'll see that file in the log (I hope!). Then you can
maybe see whether it's calling fh.close() but not actually
closing the file, or what.
Good idea. I'll try that next. For what its worth, using print
statements around fh.close() shows that things should be happening in
the right order at the right time, so I'm sure that fh.close() is being
called on the correct object, even if it has no effect.

Rob

DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
Gary Oberbrunner
2006-10-27 16:35:09 UTC
Permalink
Post by Robert Smallshire
I've just tried this. Unfortunately. using the low level file API and
file decsriptors does avoid the problem. The only difference is that the
first CREATE is replaced with another OPEN, so now there are two OPENs
and one CLOSE.
Another possibility to try:

os.close(fileno(fh))

?

-- Gary
Anthony Roach
2006-10-27 16:13:41 UTC
Permalink
Post by Robert Smallshire
I can't claim to completely understand the sequence of events here.
SCons CREATEs the file, then WRITEs to it, then OPENs it, READs from it,
then CLOSEs it; all before the compiler jumps in. The OPEN here looks
suspicious, is something somewhere else in SCons opening my file a
second time for read operations - to compute a signature maybe - since
all 320 bytes of the file seem to be read by SCons. Should I expect two
CLOSEs from SCons here - I'm not sure whether the CLOSE I see is paired
with CREATE or OPEN?
The OPEN and READs are probably the SCons C scanner reading the file
to look for #include lines to fine .h dependencies, which should
happen after your action function has returned and before the .c file
is compiled. I would have expected to see a CLOSE after the CREATE and
WRITEs and before the OPEN, which would be your closing of the file in
your action function. Are you sure you are calling the close() method
on the file object in your action function? Does adding a delay after
the call to close() make the CLOSE show up in the FileMon log?
Robert Smallshire
2006-10-27 16:59:17 UTC
Permalink
I really appreciate all the ideas I'm getting here.
Post by Robert Smallshire
Post by Robert Smallshire
I can't claim to completely understand the sequence of events here.
SCons CREATEs the file, then WRITEs to it, then OPENs it,
READs from
Post by Robert Smallshire
it, then CLOSEs it; all before the compiler jumps in. The OPEN here
looks suspicious, is something somewhere else in SCons
opening my file
Post by Robert Smallshire
a second time for read operations - to compute a signature maybe -
since all 320 bytes of the file seem to be read by SCons. Should I
expect two CLOSEs from SCons here - I'm not sure whether
the CLOSE I
Post by Robert Smallshire
see is paired with CREATE or OPEN?
The OPEN and READs are probably the SCons C scanner reading
the file to look for #include lines to fine .h dependencies,
which should happen after your action function has returned
and before the .c file is compiled. I would have expected to
see a CLOSE after the CREATE and WRITEs and before the OPEN,
which would be your closing of the file in your action
function.
You are correct. This error is happening randomly. I've just examined
the sequence when it is working, there is indeed a CLOSE after the
CREATE and WRITE, before the second OPEN. Occasionally this first CLOSE,
corresponding to CREATE, is missing, which causes the symptom I'm
seeing.
Post by Robert Smallshire
Are you sure you are calling the close() method on
the file object in your action function?
Pretty sure, using print statements around it, but I'll use Gary's
suggestion to try to get some other 'marker' events into the FileMon
log.
Post by Robert Smallshire
Does adding a delay
after the call to close() make the CLOSE show up in the FileMon log?
Adding a short delay of 0.01 s makes the problem occur much less
frequently. Adding a long delay of 0.5 s or calling gc.collect() seems
makes it go away entirely, or at least below what I can easily
reproduce.

Rob

DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
Anthony Roach
2006-10-26 18:29:10 UTC
Permalink
On 10/26/06, Robert Smallshire <***@roxar.com> wrote:
...
Post by Robert Smallshire
My action function which does the code generation opens a file, writes
C++ code to the file, and explicitly closes the file, using the standard
Python try: finally: idiom to ensure that the file is closed before the
function returns.
However, SCons seems to be attempting to compile the resulting C++ file,
before my build action has closed the file. Of course, the compiler
fails to open the file, and the build fails.
You should be able to add a print statement right after your call to
close() on the file. This would let you know if the compile command is
truely being executed prior to the close() call or not. What I suspect
is happening is that there is a small delay between when the file is
closed and when the filesystem lock on the file is released. Windows
file locking has never impressed me as being very reliable or
predictable.
Post by Robert Smallshire
Forcing a garbage collection sweep with gc.collect() immediately after
closing the file seems to help, but this is probably because it stops
all Python threads, and effectively serialises the build actions.
I expect the the gc.collect() call is providing enough of a delay to
allow the OS to release the file lock before the function exits and
scons goes on the build the next thing. Adding a time.sleep(0.01)
would probably have the same effect.
Morten Elo Petersen
2006-10-26 19:58:30 UTC
Permalink
Hi,

We see problems when distributing builds onto a grid engine, even if we
do not run stuff in parallel.

We are pretty sure that the problem lies with our NFS server which fails
to propagate file changes fast enough between the machines in our grid -
especially when the load on our servers are high.

We are using Linux for our builds, but maybe a similar effects can occur
on w2k or XP file systems on multiprocessor servers.

To narrow down the possible cause you should be able to write a python
script which generate a file in one thread an immidiately reads it in
another, run this a few thousand times and see if it always succeed. I
have done something similar, here for our problem, at the same issue
appeared - without scons being involved at all.

regards
Morten
Post by Robert Smallshire
I'm building a large system in parallel with SCons on a multiprocessor
(4 core) Windows Server 2003 64-bit edition server. I'm
getting frequent
build failures owing to what I suspect to be a race condition in the
SCons scheduler when building in parallel. The failure is
always related
to one of my own custom builders, which uses a simple Python
function to
generate some C++ code.
My action function which does the code generation opens a file, writes
C++ code to the file, and explicitly closes the file, using
the standard
Python try: finally: idiom to ensure that the file is closed
before the
function returns.
However, SCons seems to be attempting to compile the
resulting C++ file,
before my build action has closed the file. Of course, the compiler
fails to open the file, and the build fails.
Gary Oberbrunner
2006-10-27 17:04:02 UTC
Permalink
Adding a short delay of 0.01 s makes the problem occur much less
frequently. Adding a long delay of 0.5 s or calling gc.collect() seems
makes it go away entirely, or at least below what I can easily
reproduce.
Aha; and do you then see the proper sequence of OPEN/CLOSE in filemon?

-- Gary
Robert Smallshire
2006-10-30 10:26:44 UTC
Permalink
Adding a short delay of 0.01 s makes the problem occur much less
frequently. Adding a long delay of 0.5 s or calling
gc.collect() seems
makes it go away entirely, or at least below what I can easily
reproduce.
Aha; and do you then see the proper sequence of OPEN/CLOSE in filemon?
Yes I do. I'm not sure I see why adding a delay makes a difference
though...

Rob

DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
Anthony Roach
2006-10-30 14:03:00 UTC
Permalink
Post by Robert Smallshire
Post by Gary Oberbrunner
Aha; and do you then see the proper sequence of OPEN/CLOSE in filemon?
Yes I do. I'm not sure I see why adding a delay makes a difference
though...
To me this looks like a case of buffering gone wrong. It could be in
Python, in the C runtime library or in the OS. Somewhere the first
CLOSE is being delayed and if the OPEN gets in before the CLOSE is
executed, then the fist CLOSE doesn't happen and the file is left open
because the CLOSEs don't balance the CREATE/OPEN.
Robert Smallshire
2006-10-30 16:42:57 UTC
Permalink
Post by Gary Oberbrunner
Post by Robert Smallshire
Post by Gary Oberbrunner
Aha; and do you then see the proper sequence of
OPEN/CLOSE in filemon?
Post by Robert Smallshire
Yes I do. I'm not sure I see why adding a delay makes a difference
though...
To me this looks like a case of buffering gone wrong. It
could be in Python, in the C runtime library or in the OS.
Somewhere the first CLOSE is being delayed and if the OPEN
gets in before the CLOSE is executed, then the fist CLOSE
doesn't happen and the file is left open because the CLOSEs
don't balance the CREATE/OPEN.
I agree. I don't think there's much I can do about this at the SCons
level, other than the 'engineering solution' of incorporating a delay.

Rob

DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
James Y Knight
2006-10-30 17:20:01 UTC
Permalink
Post by Robert Smallshire
I agree. I don't think there's much I can do about this at the SCons
level, other than the 'engineering solution' of incorporating a delay.
One thing to try might be to open all files unbuffered. That is, use
0 as the third argument to open. I've absolutely no idea whether this
would help (it ought not to make any difference), but it's something
to try...

James

Loading...